decoupler.get_pseudobulk

decoupler.get_pseudobulk(adata, sample_col, groups_col, obs=None, layer=None, use_raw=False, mode='sum', min_prop=0.2, min_cells=10, min_counts=1000, min_smpls=2, dtype=<class 'numpy.float32'>, skip_checks=False)

Summarizes expression profiles across cells per sample and group.

Generates summarized expression profiles across cells per sample (e.g. sample id) and group (e.g. cell type) based on the metadata found in .obs. To ensure a minimum quality control, this function removes genes that are not expressed enough across cells (min_prop) or samples (min_smpls), and samples with not enough cells (min_cells) or gene counts (min_counts).

By default this function expects raw integer counts as input and sums them per sample and group (mode='sum'), but other modes are available.

Parameters:

adataAnnData: Input AnnData object.
sample_colstr: Column of obs where to extract the samples names.
groups_colstr: Column of obs where to extract the groups names.
obsDataFrame, None: If provided, metadata dataframe.
layerstr: If provided, which element of layers to use.
use_rawbool: Use raw attribute of adata if present.
modestr: How to perform the pseudobulk. Available options are sum, mean or median.
min_propfloat: Filter to remove genes by a minimum proportion of cells with non-zero values.
min_cellsint: Filter to remove samples by a minimum number of cells.
min_countsint: Filter to remove samples by a minimum number of summed counts.
min_smplsint: Filter to remove genes by a minimum number of samples with non-zero values.
dtypetype: Type of float used.
skip_checksbool: Whether to skip input checks. Set to True when working with positive and negative data, or when counts are not integers.

Returns:

psbulkAnnData: Returns new AnnData object with unormalized pseudobulk profiles per sample and group.