Conversion to other organisms

Most of the prior knowledge stored inside Omnipath is derived from human data, therefore they use gene names. Despite this, using homology we can convert gene names to other organisms.

To showcase how to do it inside decoupler, we will load the MSigDB database and convert it into gene symbols for mouse and fly.

[1]:
import decoupler as dc

msigdb = dc.get_resource('MSigDB')
msigdb
[1]:
genesymbol collection geneset
0 MSC oncogenic_signatures PKCA_DN.V1_DN
1 MSC mirna_targets MIR12123
2 MSC chemical_and_genetic_perturbations NIKOLSKY_BREAST_CANCER_8Q12_Q22_AMPLICON
3 MSC immunologic_signatures GSE32986_UNSTIM_VS_GMCSF_AND_CURDLAN_LOWDOSE_S...
4 MSC chemical_and_genetic_perturbations BENPORATH_PRC2_TARGETS
... ... ... ...
2407729 OR2W5P immunologic_signatures GSE22601_DOUBLE_NEGATIVE_VS_CD8_SINGLE_POSITIV...
2407730 OR2W5P immunologic_signatures KANNAN_BLOOD_2012_2013_TIV_AGE_65PLS_REVACCINA...
2407731 OR52L2P immunologic_signatures GSE22342_CD11C_HIGH_VS_LOW_DECIDUAL_MACROPHAGE...
2407732 CSNK2A3 immunologic_signatures OCONNOR_PBMC_MENVEO_ACWYVAX_AGE_30_70YO_7DY_AF...
2407733 AQP12B immunologic_signatures MATSUMIYA_PBMC_MODIFIED_VACCINIA_ANKARA_VACCIN...

2407734 rows × 3 columns

For this example we will filter by the hallmark gene sets collection:

[2]:
# Filter by hallmark
msigdb = msigdb[msigdb['collection']=='hallmark']

# Remove duplicated entries
msigdb = msigdb[~msigdb.duplicated(['geneset', 'genesymbol'])]
msigdb
[2]:
genesymbol collection geneset
11 MSC hallmark HALLMARK_TNFA_SIGNALING_VIA_NFKB
149 ICOSLG hallmark HALLMARK_TNFA_SIGNALING_VIA_NFKB
223 ICOSLG hallmark HALLMARK_INFLAMMATORY_RESPONSE
270 ICOSLG hallmark HALLMARK_ALLOGRAFT_REJECTION
398 FOSL2 hallmark HALLMARK_HYPOXIA
... ... ... ...
878342 FOXO1 hallmark HALLMARK_PANCREAS_BETA_CELLS
878418 GCG hallmark HALLMARK_PANCREAS_BETA_CELLS
878512 PDX1 hallmark HALLMARK_PANCREAS_BETA_CELLS
878605 INS hallmark HALLMARK_PANCREAS_BETA_CELLS
878785 SRP9 hallmark HALLMARK_PANCREAS_BETA_CELLS

7318 rows × 3 columns

Then, we can easily transform the obtained resource into mouse genes. For that we will use the NCBI taxonomy identifiers, we go from human (9606) to mouse (10090). If you want to convert genes into another model organism, you can check its corresponding id here.

Note

The first time using this function might take a while (~ 15 minutes). Since the data is stored in cache, the next times is going to run faster. If you need to reset the cache, run rm -r .pypath/cache/.

[3]:
# Translate targets
mouse_msigdb = dc.translate_net(msigdb, source='geneset', target='genesymbol', source_tax_id=9606, target_tax_id=10090)
mouse_msigdb
[2022-11-25 13:46:39] [curl] Module `pysftp` not available. Only downloading of a small number of resources relies on this module. Please install by PIP if it is necessary for you.
[3]:
genesymbol collection geneset
0 Msc hallmark HALLMARK_TNFA_SIGNALING_VIA_NFKB
1 Fosl2 hallmark HALLMARK_HYPOXIA
2 Fosl2 hallmark HALLMARK_TNFA_SIGNALING_VIA_NFKB
3 Relb hallmark HALLMARK_TNFA_SIGNALING_VIA_NFKB
4 Plau hallmark HALLMARK_TNFA_SIGNALING_VIA_NFKB
... ... ... ...
7684 Gcg hallmark HALLMARK_PANCREAS_BETA_CELLS
7685 Pdx1 hallmark HALLMARK_PANCREAS_BETA_CELLS
7686 Ins1 hallmark HALLMARK_PANCREAS_BETA_CELLS
7687 Ins2 hallmark HALLMARK_PANCREAS_BETA_CELLS
7688 Srp9 hallmark HALLMARK_PANCREAS_BETA_CELLS

7551 rows × 3 columns

Note that when performing homology convertion we might gain or lose some genes from one organism to another.

Let us try the fruit fly (7227) now:

[4]:
# Translate targets
fly_msigdb = dc.translate_net(msigdb, source='geneset', target='genesymbol', source_tax_id=9606, target_tax_id=7227)
fly_msigdb
[4]:
genesymbol collection geneset
0 Hand hallmark HALLMARK_TNFA_SIGNALING_VIA_NFKB
1 CG12648 hallmark HALLMARK_TNFA_SIGNALING_VIA_NFKB
2 twi hallmark HALLMARK_TNFA_SIGNALING_VIA_NFKB
3 HLH54F hallmark HALLMARK_TNFA_SIGNALING_VIA_NFKB
4 dl hallmark HALLMARK_TNFA_SIGNALING_VIA_NFKB
... ... ... ...
6594 Dmel\CG6484 hallmark HALLMARK_PANCREAS_BETA_CELLS
6598 G6P hallmark HALLMARK_PANCREAS_BETA_CELLS
6599 amon hallmark HALLMARK_PANCREAS_BETA_CELLS
6600 Cbp53E hallmark HALLMARK_PANCREAS_BETA_CELLS
6601 Srp9 hallmark HALLMARK_PANCREAS_BETA_CELLS

5868 rows × 3 columns