- decoupler.run_gsea(mat, net, source='source', target='target', times=1000, batch_size=10000, min_n=5, seed=42, verbose=False, use_raw=True)
Gene Set Enrichment Analysis (GSEA).
GSEA (Aravind et al., 2005) starts by transforming the input molecular readouts in mat to ranks for each sample. Then, an enrichment score gsea_estimate is calculated by walking down the list of features, increasing a running-sum statistic when a feature in the target feature set is encountered and decreasing it when it is not. The final score is the maximum deviation from zero encountered in the random walk. Finally, a normalized score gsea_norm, can be obtained by computing the z-score of the estimate compared to a null distribution obtained from N random permutations.
Aravind S. et al. (2005) Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS. 102, 43.
- matlist, DataFrame or AnnData
List of [features, matrix], dataframe (samples x features) or an AnnData instance.
Network in long format.
Column name in net with source nodes.
Column name in net with target nodes.
How many random permutations to do.
Size of the samples to use for each batch. Increasing this will consume more memmory but it will run faster.
Minimum of targets per source. If less, sources are removed.
Random seed to use.
Whether to show progress.
Use raw attribute of mat if present.
GSEA scores. Stored in .obsm[‘gsea_estimate’] if mat is AnnData.
Normalized GSEA scores. Stored in .obsm[‘gsea_norm’] if mat is AnnData.
Obtained p-values. Stored in .obsm[‘gsea_pvals’] if mat is AnnData.