decoupler.run_gsva

decoupler.run_gsva(mat, net, source='source', target='target', kcdf=False, mx_diff=True, abs_rnk=False, min_n=5, seed=42, verbose=False, use_raw=True)

Gene Set Variation Analysis (GSVA).

GSVA (Hänzelmann et al., 2013) starts by transforming the input molecular readouts in mat to a readout-level statistic using Gaussian kernel estimation of the cumulative density function. Then, readout-level statistics are ranked per sample and normalized to up-weight the two tails of the rank distribution. Afterwards, an enrichment score gsva_estimate is calculated using a running sum statistic that is normalized by subtracting the largest negative estimate from the largest positive one.

Hänzelmann S. et al. (2013) GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics, 14, 7.

Parameters
matlist, DataFrame or AnnData

List of [features, matrix], dataframe (samples x features) or an AnnData instance.

netDataFrame

Network in long format.

sourcestr

Column name in net with source nodes.

targetstr

Column name in net with target nodes.

kcdfbool

Whether to use a Gaussian kernel or not during the non-parametric estimation of the cumulative distribution function. By default no kernel is used (faster), to reproduce GSVA original behaviour in R set to True.

mx_diffbool

Changes how the enrichment statistic (ES) is calculated. If True (default), ES is calculated as the difference between the maximum positive and negative random walk deviations. If False, ES is calculated as the maximum positive to 0.

abs_rnkbool

Used when mx_diff = True. If False (default), the enrichment statistic (ES) is calculated taking the magnitude difference between the largest positive and negative random walk deviations. If True, feature sets with features enriched on either extreme (high or low) will be regarded as ‘highly’ activated.

min_nint

Minimum of targets per source. If less, sources are removed.

seedint

Random seed to use.

verbosebool

Whether to show progress.

use_rawbool

Use raw attribute of mat if present.

Returns
estimateDataFrame

GSVA scores. Stored in .obsm[‘gsva_estimate’] if mat is AnnData.