Conversion to other organisms
Most of the prior knowledge stored inside Omnipath
is derived from human data, therefore they use gene names. Despite this, using homology we can convert gene names to other organisms.
To showcase how to do it inside decoupler
, we will load the MSigDB
database and convert it into gene symbols for mouse and fly.
[1]:
import decoupler as dc
msigdb = dc.get_resource('MSigDB')
msigdb
[1]:
genesymbol | collection | geneset | |
---|---|---|---|
0 | MSC | oncogenic_signatures | PKCA_DN.V1_DN |
1 | MSC | mirna_targets | MIR12123 |
2 | MSC | chemical_and_genetic_perturbations | NIKOLSKY_BREAST_CANCER_8Q12_Q22_AMPLICON |
3 | MSC | immunologic_signatures | GSE32986_UNSTIM_VS_GMCSF_AND_CURDLAN_LOWDOSE_S... |
4 | MSC | chemical_and_genetic_perturbations | BENPORATH_PRC2_TARGETS |
... | ... | ... | ... |
2407729 | OR2W5P | immunologic_signatures | GSE22601_DOUBLE_NEGATIVE_VS_CD8_SINGLE_POSITIV... |
2407730 | OR2W5P | immunologic_signatures | KANNAN_BLOOD_2012_2013_TIV_AGE_65PLS_REVACCINA... |
2407731 | OR52L2P | immunologic_signatures | GSE22342_CD11C_HIGH_VS_LOW_DECIDUAL_MACROPHAGE... |
2407732 | CSNK2A3 | immunologic_signatures | OCONNOR_PBMC_MENVEO_ACWYVAX_AGE_30_70YO_7DY_AF... |
2407733 | AQP12B | immunologic_signatures | MATSUMIYA_PBMC_MODIFIED_VACCINIA_ANKARA_VACCIN... |
2407734 rows × 3 columns
For this example we will filter by the hallmark
gene sets collection:
[2]:
# Filter by hallmark
msigdb = msigdb[msigdb['collection']=='hallmark']
# Remove duplicated entries
msigdb = msigdb[~msigdb.duplicated(['geneset', 'genesymbol'])]
msigdb
[2]:
genesymbol | collection | geneset | |
---|---|---|---|
11 | MSC | hallmark | HALLMARK_TNFA_SIGNALING_VIA_NFKB |
149 | ICOSLG | hallmark | HALLMARK_TNFA_SIGNALING_VIA_NFKB |
223 | ICOSLG | hallmark | HALLMARK_INFLAMMATORY_RESPONSE |
270 | ICOSLG | hallmark | HALLMARK_ALLOGRAFT_REJECTION |
398 | FOSL2 | hallmark | HALLMARK_HYPOXIA |
... | ... | ... | ... |
878342 | FOXO1 | hallmark | HALLMARK_PANCREAS_BETA_CELLS |
878418 | GCG | hallmark | HALLMARK_PANCREAS_BETA_CELLS |
878512 | PDX1 | hallmark | HALLMARK_PANCREAS_BETA_CELLS |
878605 | INS | hallmark | HALLMARK_PANCREAS_BETA_CELLS |
878785 | SRP9 | hallmark | HALLMARK_PANCREAS_BETA_CELLS |
7318 rows × 3 columns
Then, we can easily transform the obtained resource into mouse genes. For that we will use the NCBI taxonomy identifiers, we go from human (9606
) to mouse (10090
). If you want to convert genes into another model organism, you can check its corresponding id here.
Note
The first time using this function might take a while (~ 15 minutes). Since the data is stored in cache, the next times is going to run faster. If you need to reset the cache, run rm -r .pypath/cache/
.
[3]:
# Translate targets
mouse_msigdb = dc.translate_net(msigdb, source='geneset', target='genesymbol', source_tax_id=9606, target_tax_id=10090)
mouse_msigdb
[2022-11-25 13:46:39] [curl] Module `pysftp` not available. Only downloading of a small number of resources relies on this module. Please install by PIP if it is necessary for you.
[3]:
genesymbol | collection | geneset | |
---|---|---|---|
0 | Msc | hallmark | HALLMARK_TNFA_SIGNALING_VIA_NFKB |
1 | Fosl2 | hallmark | HALLMARK_HYPOXIA |
2 | Fosl2 | hallmark | HALLMARK_TNFA_SIGNALING_VIA_NFKB |
3 | Relb | hallmark | HALLMARK_TNFA_SIGNALING_VIA_NFKB |
4 | Plau | hallmark | HALLMARK_TNFA_SIGNALING_VIA_NFKB |
... | ... | ... | ... |
7684 | Gcg | hallmark | HALLMARK_PANCREAS_BETA_CELLS |
7685 | Pdx1 | hallmark | HALLMARK_PANCREAS_BETA_CELLS |
7686 | Ins1 | hallmark | HALLMARK_PANCREAS_BETA_CELLS |
7687 | Ins2 | hallmark | HALLMARK_PANCREAS_BETA_CELLS |
7688 | Srp9 | hallmark | HALLMARK_PANCREAS_BETA_CELLS |
7551 rows × 3 columns
Note that when performing homology convertion we might gain or lose some genes from one organism to another.
Let us try the fruit fly (7227
) now:
[4]:
# Translate targets
fly_msigdb = dc.translate_net(msigdb, source='geneset', target='genesymbol', source_tax_id=9606, target_tax_id=7227)
fly_msigdb
[4]:
genesymbol | collection | geneset | |
---|---|---|---|
0 | Hand | hallmark | HALLMARK_TNFA_SIGNALING_VIA_NFKB |
1 | CG12648 | hallmark | HALLMARK_TNFA_SIGNALING_VIA_NFKB |
2 | twi | hallmark | HALLMARK_TNFA_SIGNALING_VIA_NFKB |
3 | HLH54F | hallmark | HALLMARK_TNFA_SIGNALING_VIA_NFKB |
4 | dl | hallmark | HALLMARK_TNFA_SIGNALING_VIA_NFKB |
... | ... | ... | ... |
6594 | Dmel\CG6484 | hallmark | HALLMARK_PANCREAS_BETA_CELLS |
6598 | G6P | hallmark | HALLMARK_PANCREAS_BETA_CELLS |
6599 | amon | hallmark | HALLMARK_PANCREAS_BETA_CELLS |
6600 | Cbp53E | hallmark | HALLMARK_PANCREAS_BETA_CELLS |
6601 | Srp9 | hallmark | HALLMARK_PANCREAS_BETA_CELLS |
5868 rows × 3 columns