Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computational Challenges in Metabolomics (Part 1)

Similar presentations


Presentation on theme: "Computational Challenges in Metabolomics (Part 1)"— Presentation transcript:

1 Computational Challenges in Metabolomics (Part 1)
David Wishart, University of Alberta Dagstuhl Seminar on Computational Mass Spectrometry Schloss Dagstuhl, Germany Aug , 2015

2 The Pyramid of Life Genome Metabolomics Proteomics Genomics Proteome
Metabolome Physiological Influence Environmental Influence Proteome Genome

3 Why Small Molecules Count
100% of all agricultural products (herbicides, pesticides, fertilizers) are small molecules >99% of all compounds that give food or drinks their aroma, color and taste are small molecules 91% of all known drugs are small molecules >85% of all common clinical assays test for small molecules 60% of all drugs are derived from pre-existing metabolites 10-15% of identified genetic disorders involve diseases of small molecule metabolism

4 Proteomics vs. Metabolomics

5 Proteomics vs. Metabolomics
Very MS or MS/MS oriented Good separation is critical Generates lots of raw data Peptide and protein ID Isotopic labeling (ICAT) helps Possible to derive 3D structure Permits protein imaging Very dependent on databases Spectral processing and deconvolution is challenging Quantitation is challenging Data analysis requires MV stats Data integration is challenging Better software is needed Very MS or MS/MS oriented Good separation is critical Generates lots of raw data Chemical ID Isotopic labeling (SIL) helps Possible to derive 3D structure Permits metabolite imaging Very dependent on databases Spectral processing and deconvolution is challenging Quantitation is challenging Data analysis requires MV stats Data integration is challenging Better software is needed

6 Proteomics vs. Metabolomics

7 Proteomics Workflow Biofluid/Extracts HPLC or PAGE Tryptic Digest
MALDI plate Protein ID Mass Fingerprint MS analysis

8 Protein ID by PMF-MS

9 Metabolomics Workflow
Biological or Tissue Samples Extraction Biofluids or Extracts Compound ID LC/GC-MS Spectra LC-MS or GC-MS

10 Compound ID by GC/LC-MS
LC/GC-MS total Ion chromatogram CH3

11 Proteomics vs. Metabolomics
Polymers of 20 amino acids (chemically similar) 185 million sequences (from DNA sequencing) Sequence defines MS & MS/MS spectra Trypsin gives definable cleavages MS alone can ID proteins (PMF) MS/MS fragmentation at 1 fixed energy MS/MS fragmentation is easily predictable and very distinct 30 common PTMs PTMs are somewhat predictable 1000s of distinct chemical classes (chemically diverse) No information from DNA sequencing Structure defines MS & MS/MS spectra (adducts, fragments) No trypsin for small molecules (CID only) MS alone cannot ID metabolites Different energies for different molecules MS/MS & EI-MS fragments not easily predictable, often similar >400 PTMs via metabolism PTMs are hard to predict

12 Challenges for Metabolomics
Most MS-based metabolomics studies ID <100 cmpds (<1% of the known metabolome) Metabolite ID requires accurate, referential MS/MS or EI-MS spectra and/or RT information Limited experimental MS/MS, EI-MS & RT data The chemical space of most metabolomes is not fully known (perhaps >5 million compounds total) <1% of the chemicals in PubChem are relevant to metabolomics Metabolomics needs specialized compound and spectral (MS/MS, EI-MS, NMR) databases Metabolomics needs computational tools to predict biologically viable metabolites and their spectra

13 LC-MS Spectral DBs MoNA – 236,604 spectra, 69,946 cmpds** (12,000)
METLIN – 68,124 spectra, 13,048 cmpds mzCloud – 422,349 spectra, 2975 cmpds NIST14 MS/MS – 234,284 spectra, 9344 cmpds MassBank – 28,185 spectra, 11,500 cmpds Wiley LC-MSn – >10,000 spectra, 4500 poisons ReSpect – 9107 spectra, 3595 cmpds GNPS – 9000 spectra, 4200 natural products Total #compounds with exp. MS/MS spectra ~20,000 Less than 60% are biologically relevant

14 How to Get Missing Spectra?
Obtain or synthesize all biologically relevant molecules (metabolites, HPVs, drugs, pollutants, foods, etc.), prepare or synthesize their metabolites and collect their NMR, LC-MS and GC-MS spectra COST - 5,000,000 cmpds X $1000/cmpd = $5 billion OR Do this entire exercise computationally COST - 5,000,000 cmpds X $0.10/cmpd = $500,000

15 Computational Metabolomics
Predicted biotransformations (50,000 --> 5,000,000) Known biomolecules (50,000) Match observed spectra to predicted spectra to ID compounds Predicted MS/MS, NMR, GC-MS Spectra of knowns + biotransformed

16 The Human Metabolome Database (HMDB)
A web-accessible resource containing detailed information on 41,993 “quantified”, “detected” and “expected” metabolites Data mined from the literature and other eDBs 100’s of drug metabolites 1000’s of xenobiotics >10,000 reference spectra Supports sequence, spectral, structure and text searches as well as compound browsing Full data downloads

17 The Drug Database (DrugBank v. 4.3)
1602 small molecule drugs >5000 experimental drugs Data mined from the literature and other eDBs >1000 drugs with metabolizing enzyme data >1200 drug metabolites >600 MS+NMR spectra >4200 unique drug targets 208 data fields/drug Supports sequence, spectral, structure and text searches as well as compound browsing Full data downloads

18 The Toxic Exposome Database (T3DB)
Comprehensive data on toxic compounds (drugs, pesticides, herbicides, endocrine disruptors, drugs, solvents, carcinogens, etc.) Data mined from the literature and other eDBs >3600 toxic compounds >1900 reference spectra ~2100 toxic targets Supports sequence, spectral, structure, text searches as well as compound browsing Full data downloads

19 Computational Metabolomics
Predicted biotransformations (50,000 --> 5,000,000) Known biomolecules (50,000) Match observed spectra to predicted spectra to ID compounds Predicted MS/MS, NMR, GC-MS Spectra of knowns + biotransformed

20 Secondary Metabolism Diazepam Tempazepam Oxazepam Nordazepam
CH3 Tempazepam Oxazepam Nordazepam Diazepam N-(2-Benzoyl-4-chlorophenyl)-2-acetamidoacetamide

21 BioTransformer

22 BioTransformer - Flowchart
Query Molecule Other Reactions Phase I Reaction-specific structural constraints Enzyme metabolite? (Machine Learning) YES YES YES NO SOM Predictor (Machine Learning) Metabolite Generator NO SOMs NO Metabolites All structures are generated as SMILES, SDF or MOL files No metabolites

23 BioTransformer – SOM Prediction
Preference Learning based on 100 atomic (e.g. atom type) and 10 molecular features (e.g. mass) SOM predictor was trained for 9 CYP450s Average Prediction accuracy of 84.54% Structures generated based on 92 Phase I reactions

24 BioTransformer Results
? 6,230 Phase I metabolites ? 9,510 Phase II metabolites 5,000 compounds ? 6,110 Microbial metabolites ? 12,340 Other metabolites 34,000 metabolites ~220,000

25 Computational Metabolomics
Predicted biotransformations (50,000 --> 5,000,000) Known biomolecules (50,000) Match observed spectra to predicted spectra to ID compounds Predicted MS/MS, NMR, GC-MS Spectra of knowns + biotransformed

26 Computational Challenges in Metabolomics (Part 2)
Sebastian Böcker, Friedrich Schiller University Dagstuhl Seminar on Computational Mass Spectrometry Schloss Dagstuhl, Germany Aug , 2015


Download ppt "Computational Challenges in Metabolomics (Part 1)"

Similar presentations


Ads by Google