Application of meta-search, grid-computing, and machine-learning can significantly improve the sensitivity of peptide identification. The PepArML meta-search.

Slides:



Advertisements
Similar presentations
David Campbell 1,, Eric Deutsch 1, Henry Lam 1, Hamid Mirzaei 1, Paola Picotti 2, Jeff Ranish 1, Ning Zhang 1, and Ruedi Aebersold 1,2,3 1.Institute for.
Advertisements

Protein Quantitation II: Multiple Reaction Monitoring
1 st MS 2 2 nd 3 rd 4 th 5 th 6 th 10 th 9 th 8 th 7 th Relative Intensity Fill Times Scan Times “shotgun sequencing”
1336 SW Bertha Blvd, Portland OR 97219
N-Glycopeptide Identification from CID Tandem Mass Spectra using Glycan Databases and False Discovery Rate Estimation Kevin B. Chandler, Petr Pompach,
Improving the Sensitivity of Peptide Identification for Genome Annotation Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown.
PepArML: A model-free, result-combining peptide identification arbiter via machine learning Xue Wu, Chau-Wen Tseng, Nathan Edwards University of Maryland,
De Novo Sequencing v.s. Database Search Bin Ma School of Computer Science University of Waterloo Ontario, Canada.
Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Previous Lecture: Regression and Correlation
Mass Spectrometry. What are mass spectrometers? They are analytical tools used to measure the molecular weight of a sample. Accuracy – 0.01 % of the total.
My contact details and information about submitting samples for MS
Facts and Fallacies about de Novo Sequencing & Database Search.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
Production of polypeptides, Da, and middle-down analysis by LC-MSMS Catherine Fenselau 1, Joseph Cannon 1, Nathan Edwards 2, Karen Lohnes 1,
Introduction The GPM project (The Global Proteome Machine Organization) Salvador Martínez de Bartolomé Bioinformatics support –
Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.
Top-down characterization of proteins in bacteria with unsequenced genomes Nathan Edwards Georgetown University Medical Center.
MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.
Generating Peptide Candidates from Protein Sequence Databases for Protein Identification via Mass Spectrometry Nathan Edwards Informatics Research.
Common parameters At the beginning one need to set up the parameters.
Novel Empirical FDR Estimation in PepArML David Retz and Nathan Edwards Georgetown University Medical Center.
Outline Selection of candidate proteins for the multiplex analysis of DBS via targeted proteomics The currently employed strategies for the selection.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Meta-Search and Result Combining Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center.
Laxman Yetukuri T : Modeling of Proteomics Data
Search Engine Result Combining Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center.
Protein bioinformatics and systems biology Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center.
PeptideProphet Explained Brian C. Searle Proteome Software Inc SW Bertha Blvd, Portland OR (503) An explanation.
FDR Thresholding Caleb J. Emmons Slide: 1. What is FDR? Slide: 2 If decoy proteins are present Protein FDR = # decoy proteins identified # target proteins.
Protein Identification by Database Searching John Cottrell Matrix Science.
Improving the Sensitivity of Peptide Identification for Genome Annotation Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown.
Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center Application of meta-search, grid-computing, and machine-learning.
False-Discovery-Rate Aware Protein Inference by Generalized Protein Parsimony Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.
A Reference Library of Peptide Ion Fragmentation Spectra: Yeast S.E. Stein, L.E. Kilpatrick, P. Neta, Q.L. Pu, J. Roth, X. Yang National Institute of Standards.
Glycoprotein Microheterogeneity via N-Glycopeptide Identification Kevin Brown Chandler, Petr Pompach, Radoslav Goldman, Nathan Edwards Georgetown University.
A Reference Library of Peptide Ion Fragmentation Spectra Stephen Stein 1 ; Lisa Kilpatrick 2 ; Pedatsur Neta 1 ; Jeri Roth 1 ; Xiaoyu Yang 1 National Institute.
Peptide Identification via Tandem Mass Spectrometry Sorin Istrail.
Improving the Sensitivity of Peptide Identification Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical.
Faster, more sensitive peptide identification from tandem mass spectra by sequence database compression Nathan J. Edwards Center for Bioinformatics & Computational.
Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.
Eat Raw & Fresh: Introducing isotopic Mass-to-charge Ratio and Envelope Fingerprinting (iMEF) and ProteinGoggle for Protein Database Search Zhixin(Michael)
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Aggressive Enumeration of Peptide Sequences for MS/MS Peptide Identification Nathan Edwards Center for Bioinformatics and Computational Biology.
Background Spectral library searching Spectral library searching is an alternative approach to traditional sequence database searching for peptide inference.
Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center  Peptide sequence databases, meta-search engine, machine-learning.
Improving the Sensitivity of Peptide Identification by Meta-Search, Grid-Computing, and Machine-Learning Nathan Edwards Georgetown University Medical Center.
Improving the Sensitivity of Peptide Identification for Genome Annotation Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown.
Isotope Labeled Internal Standards in Skyline
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center Application of meta-search, grid-computing, and machine-learning.
Novel Peptide Identification using ESTs and Genomic Sequence Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland,
ISOMATCH-web For automatic matching of isotope peak distributions ■ Automatic matching of a raw spectrum (ASCII format) to theoretical isotopic distributions.
PeptideShaker Overview What makes PeptideShaker special? - proteomics: shaken, not stirred! 1)Free, open-source and platform independent! 2)Focus on user-friendliness.
Constructing high resolution consensus spectra for a peptide library
Minimize Database-Dependence in Proteome Informatics Apr. 28, 2009 Kyung-Hoon Kwon Korea Basic Science Institute.
Using Scaffold OHRI Proteomics Core Facility. This presentation is intended for Core Facility internal training purposes only.
MS Libraries for Forensics: DART-MS and GC-MS
Algorithms and Computation: Bottom-Up Data Analysis Workflows
Bottom-Up Proteomics Data collection
A Database of Peak Annotations of Empirically Derived Mass Spectra
Creation of assays using repositories
Protein Inference by Generalized Protein Parsimony reduces False Positive Proteins in Bottom-Up Workflows Nathan J. Edwards, Department of Biochemistry.
Proteomics Informatics David Fenyő
Proteomics Informatics –
Protein Identification Using Mass Spectrometry
Presentation transcript:

Application of meta-search, grid-computing, and machine-learning can significantly improve the sensitivity of peptide identification. The PepArML meta-search engine is publicly available, free of charge, on-line from: Boosting Peptide Identification Performance by Combining Many Search Engines, Spectral Matching, and Proteotypic and Physicochemical Peptide Properties. Introduction Automatic search engine configuration and execution, parameterized by: Instrument & proteolytic agent Fixed and variable modifications Protein sequence database & MS/MS spectra file Peptide candidate selection MS/MS Spectra Reformatting Charge and precursor enumeration for peptide candidate selection (for charge & 13 C peak correction) Search engine formatting constraints (MGF/mzXML) Consistent MS/MS spectrum identifier tracking Spectrum file “chunking” Prabhakar Gubbala and Nathan J. Edwards, Georgetown University Medical Center Unified MS/MS Search Interface Peptide Identification Meta-Search via Grid-Computing Feature Rankings by Info. Gain Conclusions References The PepArML meta-search engine provides: A unified MS/MS search interface for Mascot, X!Tandem, OMSSA, KScore, SScore, MyriMatch, and InsPecT. Search job scheduling on independent large- scale heterogeneous computational grids. Additional features including tryptic digest, peptide physicochemical, and proteotypic [1] properties; spectra and precursor isotope cluster properties, plus retention-time modeling. Spectral match to synthetic spectra using Zhang’s KineticModel [2,3]. Unsupervised, model-free result combining using machine-learning (PepArML [4]) The PepArML meta-search engine improves peptide identification sensitivity, significantly increasing the number of peptide ids at 10% FDR. Georgetown University 1.P. Mallick, Schirle, M., Chen, S. S., Flory, M. R., Lee, H., Martin, D., Ranish, J., Raught, B., Schmitt, R., Werner, T., Kuster, B., Aebersold, R. Computational prediction of proteotypic peptides for quantitative proteomics. Nature Biotechnology (2006), 25 (1). 2.Z. Zhang, "Prediction of low-energy collision-induced dissociation spectra of peptides". Anal. Chem. (2004), 76(14). 3.Z. Zhang, "Prediction of Low-Energy Collision-Induced Dissociation Spectra of Peptides with Three or More Charges", Anal. Chem. (2005), 77(19). 4.N. Edwards, X. Wu, and C.-W. Tseng. "An Unsupervised, Model-Free, Machine-Learning Combiner for Peptide Identifications from Tandem Mass Spectra." Clinical Proteomics (2009), 5(1). PepArML – Evaluation of non-Search Engine Features NSF TeraGrid CPUs Edwards Lab Scheduler & 80+ CPUs Meta-search with seven search engines; Automatic target & decoy searches. Secure communication Heterogeneous compute resources Scales to 250+ simultaneous searches Free, instant registration Simple search descriptionJob managementResult combining US HUPO 2010 Mascot, Tandem, OMSSA, KScore, SScore, MyriMatch, InsPecT 3969 search jobs, weeks of CPU time. Total elapsed time (Mascot bottleneck): < 28 hours. All non-Mascot jobs: < 19 hours. OMICS 17 Protein Mix LCQ MS/MS Dataset Semi-tryptic search of SwissProt spectra searched ~ 36 times: - Target + 2 decoys, 7 engines, 1+ vs 2+/3+ charge HeuristicVoting & FDR based heuristic. PepArMLPublicly available PepArML combiner. PepArML-PSFPepArML combiner without spectrum or peptide-based proteotypic and physicochemical features. PepArML-Dig,PSFPepArML combiner without trypsin digest, spectrum or peptide-based proteotypic and physicochemical features. PepArML+KMPepArML combiner plus spectral similarity to peptide’s synthetic spectrum (Zhang’s Kinetic Model [1,2])