decoupler.get_pseudobulk
- decoupler.get_pseudobulk(adata, sample_col, groups_col, obs=None, layer=None, use_raw=False, mode='sum', min_cells=10, min_counts=1000, dtype=<class 'numpy.float32'>, skip_checks=False, min_prop=None, min_smpls=None, remove_empty=True)
Summarizes expression profiles across cells per sample and group.
Generates summarized expression profiles across cells per sample (e.g. sample id) and group (e.g. cell type) based on the metadata found in
.obs
. To ensure a minimum quality control, this function removes genes that are not expressed enough across cells (min_prop
) or samples (min_smpls
), and samples with not enough cells (min_cells
) or gene counts (min_counts
).By default this function expects raw integer counts as input and sums them per sample and group (
mode='sum'
), but other modes are available.This function produces some quality control metrics to assess if is necessary to filter some samples. The number of cells that belong to each sample is stored in
.obs['psbulk_n_cells']
, the total sum of counts per sample in.obs['psbulk_counts']
, and the proportion of cells that express a given gene in.layers['psbulk_props']
.- Parameters:
- adataAnnData
Input AnnData object.
- sample_colstr
Column of obs where to extract the samples names.
- groups_colstr
Column of obs where to extract the groups names. Can be set to
None
to ignore groups.- obsDataFrame, None
If provided, metadata dataframe.
- layerstr
If provided, which element of layers to use.
- use_rawbool
Use raw attribute of adata if present.
- modestr
How to perform the pseudobulk. Available options are
sum
,mean
ormedian
. It also accepts callback functions, like lambda, to perform custom aggregations. Additionally, it is also possible to provide a dictionary of different callback functions, each one stored in a different resulting .layer. In this case, the result of the first callback function of the dictionary is stored in.X
by default. To switch between layers checkdecoupler.swap_layer
.- min_cellsint
Filter to remove samples by a minimum number of cells in a sample-group pair.
- min_countsint
Filter to remove samples by a minimum number of summed counts in a sample-group pair.
- dtypetype
Type of float used.
- skip_checksbool
Whether to skip input checks. Set to
True
when working with positive and negative data, or when counts are not integers.- min_propfloat
Filter to remove features by a minimum proportion of cells with non-zero values. Deprecated parameter, check
decoupler.filter_by_prop
.- min_smplsint
Filter to remove genes by a minimum number of samples with non-zero values. Deprecated parameter, check
decoupler.filter_by_prop
.- remove_emptybool
Whether to remove empty observations (rows) or features (columns).
- Returns:
- psbulkAnnData
Returns new AnnData object with unormalized pseudobulk profiles per sample and group. It also returns quality control metrics that start with the prefix
psbulk_
.