decoupler.get_pseudobulk
- decoupler.get_pseudobulk(adata, sample_col, groups_col, obs=None, layer=None, use_raw=False, mode='sum', min_prop=0.2, min_cells=10, min_counts=1000, min_smpls=2, dtype=<class 'numpy.float32'>, skip_checks=False)
Summarizes expression profiles across cells per sample and group.
Generates summarized expression profiles across cells per sample (e.g. sample id) and group (e.g. cell type) based on the metadata found in
.obs
. To ensure a minimum quality control, this function removes genes that are not expressed enough across cells (min_prop
) or samples (min_smpls
), and samples with not enough cells (min_cells
) or gene counts (min_counts
).By default this function expects raw integer counts as input and sums them per sample and group (
mode='sum'
), but other modes are available.- Parameters:
- adataAnnData
Input AnnData object.
- sample_colstr
Column of obs where to extract the samples names.
- groups_colstr
Column of obs where to extract the groups names.
- obsDataFrame, None
If provided, metadata dataframe.
- layerstr
If provided, which element of layers to use.
- use_rawbool
Use raw attribute of adata if present.
- modestr
How to perform the pseudobulk. Available options are
sum
,mean
ormedian
.- min_propfloat
Filter to remove genes by a minimum proportion of cells with non-zero values.
- min_cellsint
Filter to remove samples by a minimum number of cells.
- min_countsint
Filter to remove samples by a minimum number of summed counts.
- min_smplsint
Filter to remove genes by a minimum number of samples with non-zero values.
- dtypetype
Type of float used.
- skip_checksbool
Whether to skip input checks. Set to
True
when working with positive and negative data, or when counts are not integers.
- Returns:
- psbulkAnnData
Returns new AnnData object with unormalized pseudobulk profiles per sample and group.