Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ivana Blaženović Postdoctoral Researcher

Similar presentations


Presentation on theme: "Ivana Blaženović Postdoctoral Researcher"— Presentation transcript:

1  Data Processing and Compound Identification in Untargeted Metabolomics and Exposome Research
Ivana Blaženović Postdoctoral Researcher West Coast Metabolomics Center Pittcon 2017

2 Metabolomics Analysis of the metabolome (mass spectrometry)
Metabolome = complete set of small molecules found in a biological sample D. Grapov (WCMC, 2015)

3 Separation, detection, bioinformatics
Analysis of Metabolomic Data Pre-analysis Data processing Statistical analysis Multivariate analysis Significant compounds Structure elucidation Validation Biomarker Separation, detection, bioinformatics Structure elucidation Extraction Bioknowledge T. Kind (WCMC, 2015)

4 The central dilemma in metabolomics
T. Kind (WCMC, 2015)

5 2 X 𝟏𝟎 𝟔 Chemicals = Metabolomics 20 Amino Acids = Proteomics
Omics data complexity 2 X 𝟏𝟎 𝟔 Chemicals = Metabolomics 20 Amino Acids = Proteomics 4 bases = Genomics Chemical Complexity Data complexity increases with number of structures present in analyzed sample Wishart, D. S. Bioanalysis. 2011, 3, (adapted)

6 Why are there so many unknown compounds?
Endogenous pathways ~ 1,000 metabolites ~ 5,000 lipids The epimetabolome modified metabolites with specific biological functions ~ 1,000 metabolites e.g. diacetyl-spermine  cancer methyl-glycine  cancer dimethyl-arginine  asthma oxylipins  inflammation methyl-nicotinamide  pluripotency We are exposed to many compounds: chemicals, food metabolites… Food ~ 200,000 metabolites some / many in circulation O. Fiehn (WCMC, 2015)

7 Why is it (still) so hard to identify compounds?
In silico fragmentation tools retrieve candidate structures and fragment them (using different algorithms and approaches) AND THEN compare those fragments to the product ions in a measured spectrum to determine which candidate explains best the measured compound by assigning it a score. Only 0.088% of known chemicals have MS/MS spectra Mass spectral libraries are very small and lack diversity T. Kind (WCMC, 2015)

8 Critical Assessment of Small Molecule Analysis (CASMI)
Organizers of CASMI change every year so does the focus of the contest. However, it addresses the bottlenecks in metabolomics research.

9 Critical Assessment of Small Molecule Analysis (CASMI) 2016
Objective: structure elucidation of unknown natural products Provided data sets: training (312 MS/MS) and challenge (208 MS/MS) MS/MS spectra: ESI Q Exactive Plus Orbitrap, <5 ppm mass accuracy and MS/MS resolution of 35,000, 20/35/50 HCD nominal collision energies. Category 1 Category 2 Category 3 Best structure Identification on Natural Products Best Automatic Structural Identification – In Silico Fragmentation Only Best Automatic Structural Identification – Full Information Automated methods mimic the approaches of an experienced chemist when determining correct structure based on the MSMS data. This is important as many analysts are using metabolomics platforms are not necessarily with chemistry background. Spectral meta data included the chemspider ID, compound name, monoisotopic mass, molecular formula, SMILES, InChI AND iNDHIKEY. Same Data

10 In silico fragmentation tools
Open source User friendly Participants of CASMI 2016 contest MetFrag: retrieves candidate structures and fragments them using bond dissociation approach CFM-ID: employs a method for learning a generative model of collision induced dissociation fragmentation MAGMa+: parameter optimized version of the original MAGMa software. A python wrap around script: it analyzes substructures and utilizes different bond dissociations MS-FINDER: rule based tool, accounts for mass accuracy, isotopic ratio, neutral loss assignement and the exsistance of the compound in the built-in comprehensive database

11 In silico only (Category 2)
Objectives In silico only (Category 2) A Performance evaluation Improve the existing results

12 In silico + metadata (Category 3)
Objectives In silico + metadata (Category 3) Performance evaluation Improve the existing results

13 Training set Results Category 2: Best Automatic Structural Identification – In Silico Fragmentation Only Software Top hit Top 10 MetFrag 17% 57% MAGMa+ 16% 48% CFM-ID 15% 55% MS-FINDER 10% 38%

14 Voting / consensus model
Criteria A) In silico only (Category 2) # 1 Presence of each candidate / software # 2 𝜔= 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑎𝑠𝑠𝑖𝑔𝑛𝑒𝑑 𝑠𝑡𝑟𝑢𝑐𝑡𝑢𝑟𝑒𝑠 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑎𝑠𝑠𝑖𝑔𝑛𝑒𝑑+𝑓𝑎𝑙𝑠𝑒𝑙𝑦 𝑎𝑠𝑠𝑖𝑔𝑛𝑒𝑑 𝑠𝑡𝑟𝑢𝑐𝑡𝑢𝑟𝑒𝑠 # 3 𝑆 = 𝐴 𝑅𝑎𝑛𝑘𝑖𝑛𝑔 𝑠𝑜𝑓𝑡𝑤𝑎𝑟𝑒 𝐴 𝜔 (𝑡𝑜𝑝 10 𝑠𝑜𝑓𝑡𝑤𝑎𝑟𝑒 𝐴) # 4 1st + 2nd + 3rd = input for voting/consensus model We have developed a voting consensus model which combines the results of all tools we tested and creates a new ranking FOR EVERY CANDIDATE STRUCTURE based on two criteria. If a tool placed a candidate structure high, meaning in the top 20, we wanted to rank this structure high as well. Moreover, if all 4 tested tools placed the candidate structure high, we wanted to give it a boost as well. We have therefore assigned primary scores ranging from 1-4 to account for that. As it is expected that different tools will have different quality, to account for that we wanted to calculate how accurate each tool is when ranking a correct structure using a training data set. Blaženović et al. (under review, 2017)

15 Voting / consensus model – Training set (Category 2: in silico only)
Voting / consensus model improved correct annotations by only 5% Rank Model Top hit Top 10 # 1 MetFrag + CFM-ID in silico Voting/consensus 22% 62% # 2 MetFrag + CFM-ID + MAGMa+ in silico Voting/consensus 20% 60% # 3 MetFrag + MS-FINDER + CFM-ID + MAGMa+ in silico Voting/consensus 19% 58% Blaženović et al. (under review, 2017)

16 Voting / consensus model
Criteria B) Metadata allowed (Category 3) # 1 In silico consensus rank # 2 DB presence (derived from MS-FINDER DB) # 3 2 X DB STOFF - IDENT # 4 4 X DB MS/MS 𝐅𝐢𝐧𝐚𝐥 𝐬𝐜𝐨𝐫𝐞=𝑰𝒏 𝒔𝒊𝒍𝒊𝒄𝒐 𝐜𝐨𝐧𝐬𝐞𝐧𝐬𝐮𝐬 𝐫𝐚𝐧𝐤+𝐃𝐁 𝐩𝐫𝐞𝐬𝐞𝐧𝐜𝐞 +𝟐 𝐗 𝐃 𝐁 𝐒𝐓𝐎𝐅𝐅−𝐈𝐃𝐄𝐍𝐓 +𝟒 𝐗 𝐃 𝐁 𝐌𝐒/𝐌𝐒 Blaženović et al. (under review, 2017)

17 Voting / consensus model – Training set (Category 3: database boosting)
Rank Model Top hit Top 10 # 1 MetFrag + CFM-ID + DB Voting/consensus 77.9% 94.9% # 2 MetFrag + MS-FINDER + CFM-ID + DB Voting/consensus 77.6% 95.5% # 3 MetFrag + CFM-ID + MAGMa(+) + DB Voting/consensus 76.9% 95.2% # 4 MS-FINDER + DB 76.6% 94.2% Voting / consensus model with database boosting improved correct annotations by 56% Blaženović et al. (under review, 2017)

18 Voting / consensus model – Training set (Category 3: power of metadata)
Rank Model Top hit Top 10 # 1 MetFrag + CFM-ID + DB + MS/MS Voting/consensus 92.9% 98.1% # 2 CFM-ID + MAGMa+ ID_sorted + DB + MS/MS Voting/consensus 92.6% # 3 MAGMa+ ID_sorted + DB + MS/MS Voting/consensus 92.3% 98.4% Voting / consensus model with database and MS/MS boosting improved correct annotations by additional 15% Blaženović et al. (under review, 2017)

19 What about using only mass spectral similarity search?
Software: NIST MS PepSearch NIST and MassBank MS/MS libraries were searched with 5 ppm precursor window Data set Number of hits Dot product score Training (312 MS/MS spectra) 88.4% Validation (208 MS/MS spectra) 60% Most analysts will rely on dot product score of 700 and above m/z Int Training 109 117308 1.05E+08 4,7-Phenanthroline

20 Validation set performance
CASMI 2016 Category 3 winner: Tobias Kind with 70% correct top hits 93% 87% 78% 73% 22% 25% In silico + DB + MS/MS In silico + DB In silico only Correct Hits Training set Validation set Blaženović et al. (under review, 2017)

21 Training vs. Validation data set
Molecular descriptors Training set did not fully represent challenge / validation set Blaženović et al. (under review, 2017)

22 Summary Pure in silico algorithms only identified 17% of the compounds correctly Establishment and implementation of voting/consensus model to CASMI Categories 2 and 3 resulting in > 93% correct annotations True challenge: identification of the “unknown – unknown” compounds that are not present in any DB Sharing MS/MS spectra is needed for in silico software improvement

23 Acknowledgement Dr. Oliver Fiehn Dr. Tobias Kind Hrvoje Torbašinović
Slobodan Obrenović Sajjan S. Mehta Dr. Hiroshi Tsugawa Jian Ji Dr. Shen Tong Dr. Oliver Fiehn

24 Thank you! Fiehn Lab UC Davis


Download ppt "Ivana Blaženović Postdoctoral Researcher"

Similar presentations


Ads by Google