- decoupler.run_gsea(mat, net, source='source', target='target', times=1000, batch_size=10000, min_n=5, seed=42, verbose=False, use_raw=True)
Gene Set Enrichment Analysis (GSEA).
GSEA (Aravind et al., 2005) starts by transforming the input molecular readouts in mat to ranks for each sample. Then, an enrichment score gsea_estimate is calculated by walking down the list of features, increasing a running-sum statistic when a feature in the target feature set is encountered and decreasing it when it is not. The final score is the maximum deviation from zero encountered in the random walk. Finally, a normalized score gsea_norm, can be obtained by computing the z-score of the estimate compared to a null distribution obtained from N random permutations.
Aravind S. et al. (2005) Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS. 102, 43.
- Parameters:
- matlist, DataFrame or AnnData
List of [features, matrix], dataframe (samples x features) or an AnnData instance.
- netDataFrame
Network in long format.
- sourcestr
Column name in net with source nodes.
- targetstr
Column name in net with target nodes.
- timesint
How many random permutations to do.
- batch_sizeint
Deprecated argument.
- min_nint
Minimum of targets per source. If less, sources are removed.
- seedint
Random seed to use.
- verbosebool
Whether to show progress.
- use_rawbool
Use raw attribute of mat if present.
- Returns:
- estimateDataFrame
GSEA scores. Stored in .obsm[‘gsea_estimate’] if mat is AnnData.
- normDataFrame
Normalized GSEA scores. Stored in .obsm[‘gsea_norm’] if mat is AnnData.
- pvalsDataFrame
Obtained p-values. Stored in .obsm[‘gsea_pvals’] if mat is AnnData.