decoupler.get_pseudobulk

decoupler.get_pseudobulk(adata, sample_col, groups_col, obs=None, layer=None, use_raw=False, mode='sum', min_cells=10, min_counts=1000, dtype=<class 'numpy.float32'>, skip_checks=False, min_prop=None, min_smpls=None, remove_empty=True)

Summarizes expression profiles across cells per sample and group.

Generates summarized expression profiles across cells per sample (e.g. sample id) and group (e.g. cell type) based on the metadata found in .obs. To ensure a minimum quality control, this function removes genes that are not expressed enough across cells (min_prop) or samples (min_smpls), and samples with not enough cells (min_cells) or gene counts (min_counts).

By default this function expects raw integer counts as input and sums them per sample and group (mode='sum'), but other modes are available.

This function produces some quality control metrics to assess if is necessary to filter some samples. The number of cells that belong to each sample is stored in .obs['psbulk_n_cells'], the total sum of counts per sample in .obs['psbulk_counts'], and the proportion of cells that express a given gene in .layers['psbulk_props'].

Parameters:
adataAnnData

Input AnnData object.

sample_colstr

Column of obs where to extract the samples names.

groups_colstr

Column of obs where to extract the groups names. Can be set to None to ignore groups.

obsDataFrame, None

If provided, metadata dataframe.

layerstr

If provided, which element of layers to use.

use_rawbool

Use raw attribute of adata if present.

modestr

How to perform the pseudobulk. Available options are sum, mean or median. It also accepts callback functions, like lambda, to perform custom aggregations. Additionally, it is also possible to provide a dictionary of different callback functions, each one stored in a different resulting .layer. In this case, the result of the first callback function of the dictionary is stored in .X by default. To switch between layers check decoupler.swap_layer.

min_cellsint

Filter to remove samples by a minimum number of cells in a sample-group pair.

min_countsint

Filter to remove samples by a minimum number of summed counts in a sample-group pair.

dtypetype

Type of float used.

skip_checksbool

Whether to skip input checks. Set to True when working with positive and negative data, or when counts are not integers.

min_propfloat

Filter to remove features by a minimum proportion of cells with non-zero values. Deprecated parameter, check decoupler.filter_by_prop.

min_smplsint

Filter to remove genes by a minimum number of samples with non-zero values. Deprecated parameter, check decoupler.filter_by_prop.

remove_emptybool

Whether to remove empty observations (rows) or features (columns).

Returns:
psbulkAnnData

Returns new AnnData object with unormalized pseudobulk profiles per sample and group. It also returns quality control metrics that start with the prefix psbulk_.