decoupler.run_gsea

decoupler.run_gsea(mat, net, source='source', target='target', times=1000, batch_size=10000, min_n=5, seed=42, verbose=False, use_raw=True)

Gene Set Enrichment Analysis (GSEA).

GSEA (Aravind et al., 2005) starts by transforming the input molecular readouts in mat to ranks for each sample. Then, an enrichment score gsea_estimate is calculated by walking down the list of features, increasing a running-sum statistic when a feature in the target feature set is encountered and decreasing it when it is not. The final score is the maximum deviation from zero encountered in the random walk. Finally, a normalized score gsea_norm, can be obtained by computing the z-score of the estimate compared to a null distribution obtained from N random permutations.

Aravind S. et al. (2005) Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS. 102, 43.

Parameters
matlist, DataFrame or AnnData

List of [features, matrix], dataframe (samples x features) or an AnnData instance.

netDataFrame

Network in long format.

sourcestr

Column name in net with source nodes.

targetstr

Column name in net with target nodes.

timesint

How many random permutations to do.

batch_sizeint

Size of the samples to use for each batch. Increasing this will consume more memmory but it will run faster.

min_nint

Minimum of targets per source. If less, sources are removed.

seedint

Random seed to use.

verbosebool

Whether to show progress.

use_rawbool

Use raw attribute of mat if present.

Returns
estimateDataFrame

GSEA scores. Stored in .obsm[‘gsea_estimate’] if mat is AnnData.

normDataFrame

Normalized GSEA scores. Stored in .obsm[‘gsea_norm’] if mat is AnnData.

pvalsDataFrame

Obtained p-values. Stored in .obsm[‘gsea_pvals’] if mat is AnnData.