decoupler.get_pseudobulk

decoupler.get_pseudobulk(adata, sample_col, groups_col, obs=None, layer=None, use_raw=False, mode='sum', min_cells=10, min_counts=1000, dtype=<class 'numpy.float32'>, skip_checks=False, min_prop=None, min_smpls=None, remove_empty=True)

Summarizes expression profiles across cells per sample and group.

Generates summarized expression profiles across cells per sample (e.g. sample id) and group (e.g. cell type) based on the metadata found in .obs. To ensure a minimum quality control, this function removes genes that are not expressed enough across cells (min_prop) or samples (min_smpls), and samples with not enough cells (min_cells) or gene counts (min_counts).

By default this function expects raw integer counts as input and sums them per sample and group (mode='sum'), but other modes are available.

This function produces some quality control metrics to assess if is necessary to filter some samples. The number of cells that belong to each sample is stored in .obs['psbulk_n_cells'], the total sum of counts per sample in .obs['psbulk_counts'], and the proportion of cells that express a given gene in .layers['psbulk_props'].

Parameters:

adataAnnData: Input AnnData object.
sample_colstr: Column of obs where to extract the samples names.
groups_colstr: Column of obs where to extract the groups names. Can be set to None to ignore groups.
obsDataFrame, None: If provided, metadata dataframe.
layerstr: If provided, which element of layers to use.
use_rawbool: Use raw attribute of adata if present.
modestr: How to perform the pseudobulk. Available options are sum, mean or median. It also accepts callback functions, like lambda, to perform custom aggregations. Additionally, it is also possible to provide a dictionary of different callback functions, each one stored in a different resulting .layer. In this case, the result of the first callback function of the dictionary is stored in .X by default. To switch between layers check decoupler.swap_layer.
min_cellsint: Filter to remove samples by a minimum number of cells in a sample-group pair.
min_countsint: Filter to remove samples by a minimum number of summed counts in a sample-group pair.
dtypetype: Type of float used.
skip_checksbool: Whether to skip input checks. Set to True when working with positive and negative data, or when counts are not integers.
min_propfloat: Filter to remove features by a minimum proportion of cells with non-zero values. Deprecated parameter, check decoupler.filter_by_prop.
min_smplsint: Filter to remove genes by a minimum number of samples with non-zero values. Deprecated parameter, check decoupler.filter_by_prop.
remove_emptybool: Whether to remove empty observations (rows) or features (columns).

Returns:

psbulkAnnData: Returns new AnnData object with unormalized pseudobulk profiles per sample and group. It also returns quality control metrics that start with the prefix psbulk_.