decoupler.benchmark

decoupler.benchmark(mat, obs, net, perturb, sign, metrics=['auroc', 'auprc', 'mcauroc', 'mcauprc', 'rank', 'nrank'], groupby=None, by='experiment', f_expr=True, f_srcs=False, min_exp=5, pi0=0.5, n_iter=1000, seed=42, verbose=True, use_raw=True, decouple_kws={})

Benchmark methods or networks on a given set of perturbation experiments using activity inference with decoupler.

Parameters:
matlist, DataFrame or AnnData

List of [features, matrix], dataframe (samples x features) or an AnnData instance.

obsDataFrame or None

Metadata containing the perturbed targets and the sign of the perturbation. If mat is AnnData, use mat.obs attribute instead.

netDataFrame, dict

Network in long format. Can be dictionary of nets, where key is the name and value is the long format DataFrame.

perturbstr

Column name in obs with perturbed sources.

signstr, int

Column name in obs with sign of the perturbation. Can be set to 1 or -1 if all experiments are overexpression or knockouts, respectively.

metricslist, str

Performance metric(s) to compute. See the description of get_performance for more details.

groupbylist, str, None

Performance metrics(s) can be computed per groups if enough experiments are available.

bystr

Whether to evaluate performances at the “experiment” or at the “source” level.

f_exprbool

Whether to filter out experiments whose perturbed sources are not in the given net. Defaults to True.

f_srcsbool

Whether to fitler out sources in net for which there are not perturbation data. Defaults to False.

min_expint

Minimum of perturbation experiments per group.

pi0float

Reference ratio for calibrated metrics. Corresponds to the baseline/reference class inbalance to which to set the metric.

n_iterint

Number of downsampling iterations used for the ‘mcroc’ and ‘mcprc’ metrics.

seedint

Random seed to use.

verbosebool

Whether to show progress.

use_rawbool

Use raw attribute of mat if present.

decouple_kwsdict

Parameters for the decoupler.decouple function. If more than one net, use a nested dictionary where the main key is the network name and the value is a dictionary with the requiered arguments.

Returns:
dfDataFrame

DataFrame containing the metrics’ scores.