Presentation is loading. Please wait.

Presentation is loading. Please wait.

PepArML: A model-free, result-combining peptide identification arbiter via machine learning Xue Wu, Chau-Wen Tseng, Nathan Edwards University of Maryland,

Similar presentations


Presentation on theme: "PepArML: A model-free, result-combining peptide identification arbiter via machine learning Xue Wu, Chau-Wen Tseng, Nathan Edwards University of Maryland,"— Presentation transcript:

1 PepArML: A model-free, result-combining peptide identification arbiter via machine learning Xue Wu, Chau-Wen Tseng, Nathan Edwards University of Maryland, College Park, and Georgetown University Medical Center

2 2 Comparison of Search Engines No single score is comprehensive Search engines disagree Many spectra lack confident peptide assignment Many spectra lack any peptide assignment Searle et al. JPR 7(1), 2008 38% 14% 28% 14% 3% 2% 1% X! Tandem SEQUEST Mascot

3 3 Black-box Techniques Significance re-estimation Target-Decoy search Bimodal distribution fit Supervised machine learning Train predictors on synthetic datasets Select and/or create (many) good features Result combiners Incorrect peptide IDs unlikely to match Significance re-estimation Independence and/or supervised model

4 4 PepArML Unified machine learning result combiner Significance re-estimation too! Model-free feature use and result combination Use agreement and features if useful Unsupervised training procedure No loss of classification performance

5 5 PepArML Overview X!Tandem Mascot OMSSA Other PepArML

6 6 PepArML Overview X!Tandem Mascot OMSSA Other PepArML Feature extraction

7 7 Dataset Construction T F T X!TandemMascotOMSSA T ……

8 8 Dataset Construction Calibrant 8 Protein Mix (C8) 4594 MS/MS spectra (LTQ) 618 (11.2%) true positives Sashimi 17mix_test2 (S17) 1389 MS/MS spectra (Q-TOF) 354 (25.4%) true positives AURUM 1.0 (364 Proteins) 7508 MS/MS spectra (MALDI-TOF-TOF) 3775 (50.3%) true positives

9 9 PepArML Machine Learning Machine learning (generally) helps single search engines PepArML result-combiner (C-TMO) improves on single search engines Sometimes combining two search engines works as well, or better, than three

10 10 PepArML vs Search Engines (C8)

11 11 True vs. Est. FDR (C-TMO, C8)

12 12 PepArML vs Search Engines (C8)

13 13 PepArML Pairs vs PepArML (C8)

14 14 Sensitivity Comparison

15 15 Feature Evaluation 1Peptide length 2hyperscore 3precursor mass delta 4# of matched y-ions 5# of matched b-ions 6# of missed cleavages 7sum matched intensity 8E-value 9sentinel 10score 11precursor mass delta 12# of matched ions 13# of matched peaks 14# of missed cleavages 15E-value 16sentinel 17p-value 18# of matched ions 19E-value 20sentinel Tandem OMSSA Mascot

16 16 Application to Real Data How well do these models generalize? Different instruments Spectral characteristics change scores Search parameters Different parameters change score values Supervised learning requires (Synthetic) experimental data from every instrument Search results from available search engines Training/models for all parameters x search engine sets x instruments

17 17 Model Generalization Train C8 / Score S17 Train S17 / Score S17

18 18 Rescuing Machine Learning Train a new machine learning model for every dataset! Generalization not required No predetermined search engines, parameters, instruments, features Perhaps we can “guess” the true proteins Most proteins not in doubt Machine learning can tolerate imperfect labels

19 19 Unsupervised Learning

20 20 Unsupervised Learning (S17)

21 21 Unsupervised Learning (S17)

22 22 Protein Selection Heuristic Modeled on typical protein identification criteria High confidence peptide IDs At least 2 non-overlapping peptides At least 10% sequence coverage Robust, fast convergence Easily enforce additional constraints

23 23 What about real data? Dr. Rado Goldman (LCCC, GUMC) Proteolytic serum peptides from clinical hepatocellular carcinoma samples ~ 200 MALDI MS/MS Spectra (TOF-TOF) PepArML for non-specific search of IPI-Human Increase in confidence & sensitivity Observation of “ragged” proteolytic trimming

24 24 Protein Identification Example M T O *

25 25 Future Directions Apply to more experimental datasets Integrate novel features new search engines, spectral matching multiple searches with varied parameters, sequence databases Construct meta-search engine FDR by bimodal fit instead of decoys Release as open source http://peparml.sourceforge.org

26 26 http://PepArML.SourceForge.Net

27 27 Acknowledgements Xue Wu* & Dr. Chau-Wen Tseng, Computer Science University of Maryland, College Park Dr. Brian Balgley, Dr. Paul Rudnick Calibrant Biosystems & NIST Dr. Rado Goldman, Dr. Yanming An Department of Oncology Georgetown University Medical Center Kam Ho To Biochemistry Masters student Georgetown University Funding: NIH/NCI CPTAC

28 28

29 29 PepArML vs Search Engines (S17)

30 30 PepArML vs Search Engines (S17)

31 31 PepArML Pairs vs PepArML (C8)

32 32 PepArML Pairs vs PepArML (S17)

33 33 PepArML Pairs vs PepArML (S17)

34 34 Unsupervised Learning (C8)

35 35 Unsupervised Learning (C8)


Download ppt "PepArML: A model-free, result-combining peptide identification arbiter via machine learning Xue Wu, Chau-Wen Tseng, Nathan Edwards University of Maryland,"

Similar presentations


Ads by Google