decoupler.run_gsva

decoupler.run_gsva(mat, net, source='source', target='target', kcdf=False, mx_diff=True, abs_rnk=False, min_n=5, seed=42, verbose=False, use_raw=True)

Gene Set Variation Analysis (GSVA).

GSVA (Hänzelmann et al., 2013) starts by transforming the input molecular readouts in mat to a readout-level statistic using Gaussian kernel estimation of the cumulative density function. Then, readout-level statistics are ranked per sample and normalized to up-weight the two tails of the rank distribution. Afterwards, an enrichment score gsva_estimate is calculated using a running sum statistic that is normalized by subtracting the largest negative estimate from the largest positive one.

Hänzelmann S. et al. (2013) GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics, 14, 7.

Parameters:

matlist, DataFrame or AnnData: List of [features, matrix], dataframe (samples x features) or an AnnData instance.
netDataFrame: Network in long format.
sourcestr: Column name in net with source nodes.
targetstr: Column name in net with target nodes.
kcdfbool: Whether to use a Gaussian kernel or not during the non-parametric estimation of the cumulative distribution function. By default no kernel is used (faster), to reproduce GSVA original behaviour in R set to True.
mx_diffbool: Changes how the enrichment statistic (ES) is calculated. If True (default), ES is calculated as the difference between the maximum positive and negative random walk deviations. If False, ES is calculated as the maximum positive to 0.
abs_rnkbool: Used when mx_diff = True. If False (default), the enrichment statistic (ES) is calculated taking the magnitude difference between the largest positive and negative random walk deviations. If True, feature sets with features enriched on either extreme (high or low) will be regarded as ‘highly’ activated.
min_nint: Minimum of targets per source. If less, sources are removed.
seedint: Random seed to use.
verbosebool: Whether to show progress.
use_rawbool: Use raw attribute of mat if present.

Returns:

estimateDataFrame: GSVA scores. Stored in .obsm[‘gsva_estimate’] if mat is AnnData.