Background Spectral library searching Spectral library searching is an alternative approach to traditional sequence database searching for peptide inference.

Slides:



Advertisements
Similar presentations
David Campbell 1,, Eric Deutsch 1, Henry Lam 1, Hamid Mirzaei 1, Paola Picotti 2, Jeff Ranish 1, Ning Zhang 1, and Ruedi Aebersold 1,2,3 1.Institute for.
Advertisements

In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics.
1336 SW Bertha Blvd, Portland OR 97219
N-Glycopeptide Identification from CID Tandem Mass Spectra using Glycan Databases and False Discovery Rate Estimation Kevin B. Chandler, Petr Pompach,
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
Building and Using Libraries of Peptide Ion Fragmentation Spectra S.E. Stein, L.E. Kilpatrick, M. Mautner, P. Neta, J. Roth National Institute of Standards.
ProReP - Protein Results Parser v3.0©
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Previous Lecture: Regression and Correlation
Overview We have developed a complete, end-to-end data analysis pipeline that provides an automated, reliable, consistent, and objective analysis of high-throughput.
Scaffold Download free viewer:
Build Results Plasma-only Build Empirical Observability Scores Eric W. Deutsch, Nichole L. King, Jimmy K. Eng, Alexey I. Nesvizhskii, David S. Shteynberg,
My contact details and information about submitting samples for MS
Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Absolute protein quantification estimated by spectral counting using large datasets in PeptideAtlas Ning Zhang 1*, Eric W. Deutsch 1*, Henry Lam 1, Hamid.
Daehee Hwang Leroy Hood Institute for Systems Biology.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
Proteomics Informatics – Data Analysis and Visualization (Week 13)
Conclusions What’s next? * Implementation of additional input formats * Additional vendor support: As vendors become more open with their APIs for accessing.
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
Introduction The GPM project (The Global Proteome Machine Organization) Salvador Martínez de Bartolomé Bioinformatics support –
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.
Common parameters At the beginning one need to set up the parameters.
Novel Empirical FDR Estimation in PepArML David Retz and Nathan Edwards Georgetown University Medical Center.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
A Comprehensive Comparison of the de novo Sequencing Accuracies of PEAKS, BioAnalyst and PLGS Bin Ma 1 ; Amanda Doherty-Kirby 1 ; Aaron Booy 2 ; Bob Olafson.
A Phospho-Peptide Spectrum Library for Improved Targeted Assays Barbara Frewen 1, Scott Peterman 1, John Sinclair 2, Claus Jorgensen 2, Amol Prakash 1,
Laxman Yetukuri T : Modeling of Proteomics Data
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
PeptideProphet Explained Brian C. Searle Proteome Software Inc SW Bertha Blvd, Portland OR (503) An explanation.
Protein Identification via Database searching Attila Kertész-Farkas Protein Structure and Bioinformatics Group, ICGEB, Trieste.
A Reference Library of Peptide Ion Fragmentation Spectra: Yeast S.E. Stein, L.E. Kilpatrick, P. Neta, Q.L. Pu, J. Roth, X. Yang National Institute of Standards.
Software Project MassAnalyst Roeland Luitwieler Marnix Kammer April 24, 2006.
A Reference Library of Peptide Ion Fragmentation Spectra Stephen Stein 1 ; Lisa Kilpatrick 2 ; Pedatsur Neta 1 ; Jeri Roth 1 ; Xiaoyu Yang 1 National Institute.
Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Isotope Labeled Internal Standards in Skyline
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.
Application of meta-search, grid-computing, and machine-learning can significantly improve the sensitivity of peptide identification. The PepArML meta-search.
Deducing protein composition from complex protein preparations by MALDI without peptide separation.. TP #419 Kenneth C. Parker SimulTof Corporation, Sudbury,
 We present the implementation of pluggable scoring in the X! Tandem MS/MS database search program.  We have modified the core search program to facilitate.
ISA Kim Hye mi. Introduction Input Spectrum data (Protein database) Peptide assignment Peptide validation manual validation PeptideProphet.
Constructing high resolution consensus spectra for a peptide library
Minimize Database-Dependence in Proteome Informatics Apr. 28, 2009 Kyung-Hoon Kwon Korea Basic Science Institute.
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Cedar: A Multi-Tiered Protein Identification Scheme for Shotgun Proteomics Terry Farrah (1); Eric Deutsch (1); Gilbert Omenn (2,1); Ruedi Aebersold (3),
CPAS Comparative Proteomics Analysis System Adam Rauch LabKey Software
Agenda Welcome from the Skyline team!
Peptide de novo sequencing Peptide de novo sequencing is the analytical process that derives a peptide’s amino acid sequence from its tandem mass spectrum.
Yiming Yang1,2, Abhay Harpale1 and Subramanian Ganaphathy1
Bottom-Up Proteomics Data collection
A Database of Peak Annotations of Empirically Derived Mass Spectra
MassMatrix Search Results Explained
Protein Identification via Database searching
Creation of assays using repositories
Protein Inference by Generalized Protein Parsimony reduces False Positive Proteins in Bottom-Up Workflows Nathan J. Edwards, Department of Biochemistry.
Bioinformatics Solutions Inc.
Proteomics Informatics David Fenyő
NoDupe algorithm to detect and group similar mass spectra.
Bioinformatics for Proteomics
High level view of the MAE algorithm.
Protein identification using MS/MS.
Sim and PIC scoring results for standard peptides and the test shotgun proteomics dataset. Sim and PIC scoring results for standard peptides and the test.
Proteomics Informatics David Fenyő
Presentation transcript:

Background Spectral library searching Spectral library searching is an alternative approach to traditional sequence database searching for peptide inference from MS/MS spectra. It is particularly suited for targeted proteomics 2 applications, in which one seeks not to discover novel peptides, but to find and study expected peptides in the sample.  Previously observed and confidently identified peptide MS/MS spectra are collected and catalogued in data repositories, such as PeptideAtlas.  Repeated observations of the same peptide ion are combined to create a “consensus spectrum” of that particular peptide ion.  Searchable spectral libraries of consensus spectra are built for fast indexed searching.  During searching, each unknown query spectrum is compared to candidate library spectra one by one; high spectral similarity indicates positive identification. Quality filters Three types of questionable spectra in the resulting consensus spectral library are subject to quality filters:  Spectra that have look-alikes in the library with non-homologous peptide IDs Often sequence-search false positives. Could also lead to false negatives when look-alikes end up as second hits, depressing the delta score artificially  Spectra with many unexplained large peaks Often sequence-search false positives. Even if true, often not representative of the peptide ion due to contamination, leading to false positives  Spectra with only one replicate (singletons) Often sequence-search false positives, especially if large enough pool of spectra is compiled No consensus is made – raw spectra are of poorer quality Development of a Spectral Library Building Tool and Re-Analysis of Human Plasma PeptideAtlas Datasets using Spectral Searching Henry Lam 1, Eric Deutsch 1, James S. Eddes 1, Jimmy K. Eng 1,2, Nichole King 1, Steve Stein 3, Ruedi Aebersold 1,4 1 Institute for Systems Biology, Seattle, WA 3 National Institute of Standards and Technology, Gaithersburg, MD 2 Fred Hutchison Cancer Research Center, Seattle, WA 4 Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland Library spectra Query spectra An example of a confidently-matched query spectrum in spectral searching References 1.Spectral searching and SpectraST: Lam H, Deutsch EW, Eddes JS, Eng JK, King NL, Stein SE, Aebersold R. Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 7(5), (2007). 2.Targeted proteomics: Kuster B, Schirle M, Mallick P, Aebersold R. Scoring proteomes with proteotypic peptide probes. Nature Review Molecular and Cell Biology 6(7), (2005). 3.SEQUEST: Eng JK, A.L. M, Yates JRI: An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the American Society for Mass Spectrometry 5(11), (1994). 4.Trans-Proteomic Pipeline: Keller A, Eng J, Zhang N, Li X-J, Aebersold R. A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Molecular Systems Biology 1, 17 (2005). 5.Mascot: Perkins, DN, Pappin, DJ, Creasy, DM and Cottrell, JS. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20(18), (1999). 6.X!Tandem: Craig R, Beavis RC. TANDEM: matching proteins with mass spectra. Bioinformatics 20, (2004). 7.Human Plasma PeptideAtlas: Deutsch EW, Eng JK, Zhang H, King NL, Nesvizhskii AI, Lin B, Lee H, Yi EC, Ossola R, Aebersold R. Human Plasma PeptideAtlas. Proteomics 5(13), (2005). 8.PeptideProphet: Keller A, Nesvizhskii AI, Kolker E, Aebersold R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Analytical Chemistry 74(20), (2002). This work is supported by the National Heart, Lung and Blood Institute, National Institutes of Health, under contract No. N01-HV Motivation There has been a recent surge of interest in using spectral library searching as an alternative to sequence searching for identification of peptide MS/MS spectra. In spectral library searching, unknown MS/MS spectra are searched against a carefully compiled library of previously observed and identified peptide MS/MS spectra; a good spectral match indicates a correct identification. This method was shown to offer significant speed gain and superior sensitivity and accuracy compared to traditional sequence searching. 1 However, it relies on the availability of a high-quality spectral library covering the proteome or subproteome of interest. Although public spectral libraries, such as those under development at the National Institute of Standards and Technology, are rapidly maturing, a ready-to-deploy tool to create custom spectral libraries is greatly desirable for various applications for which suitable public spectral libraries are not yet available. This would enable individual researchers to build spectral libraries of their own proteomes or subproteomes of interest, as a means to organizing and condensing previous data and to facilitate future research. Methods Software development – SpectraST (v.3.0)  Written in C++, with LINUX and Windows versions available (  Performs both spectral library searching 1 and spectral library creation  Minimal resource requirement - can be run on typical personal PCs  Open-source and readily customizable  Integrated with the Trans Proteomic Pipeline (TPP) software suite 6 for full workflow support and easy adaptation  Requires no relational database backend Consensus spectrum creation  Pool replicate spectra (identified to tbe same peptide ion) of high confidence A PeptideProphet probability cutoff of 0.9 is used  Remove dissimilar replicates Only keeps the largest cluster of similar spectra  Align peaks Slightly m/z-shifted peaks from different replicates are aligned  Peak voting Only peaks present in a majority of replicates are included  Intensity averaging Intensities of eligible peaks are weighted-averaged by the replicate’s signal-to-noise ratio Results Building a spectral library from the Human Plasma PeptideAtlas 7 (  22 datasets (including 7 from HUPO Plasma Proteome Project)  1.4 million spectra positively identified with P > 0.9 (SEQUEST/PeptideProphet 8 ) from a total of 14 million spectra  Over 30,000 distinct peptide ions among positive identifications  Library building by SpectraST takes about 3 days of CPU time  The number of peaks is reduced by more than a factor of 3 during consensus creation, indicating effective noise removal  Different levels of quality filter stringency were investigated SEQUEST/Mascot 5 /X!Tandem 6 PeptideProphet SpectraST Raw spectra import PepXML Search Results (.pepXML) mzXML Spectra (.mzXML) Library (.splib).spidx.pepidx SpectraST Library manipulation Union/Intersection Filter based on criteria Consensus creation Quality filter Library (.splib) Library (.splib) Library (.splib).spidx.pepidx Library (.splib).spidx.pepidx … Library of consensus spectra CVDAGQAK DGGGENSR QPWHIVK TTSGLADK IPGSGQGAR … DGGGENSR QPWHIVK CVDAGQAK TTSGGANK IPGSGQGAR TTSGGANK QPWHIVK Data repository Dataset 1 Dataset 2 Dataset 3 Precursor m/z index GVM 147 NAVNNVNNVIAAAFK/2 (6 replicates) Dot = 0.87 Similar spectra with conflicting IDs FFTAICDMVAWLGYTPYKVTY/3 (1 replicate) ALVLIAFAQYLQQC 160 PFEDHVK/3 (100 replicates) AVDLLFFTDESGDSR/2 (2 replicates) Possibly correctly identified, but impure spectrum DFFTPNLFLK/3 (1 replicate) False-positive impure spectrum Re-analysis of the Human Plasma PeptideAtlas datasets by spectral searching  All datasets used to build the spectral library were re-searched by SpectraST against the library  Dramatic increase (over 60%) in positively identified spectra with P > 0.9 (SpectraST/PeptideProphet)  Extra identifications are mostly lower quality spectra previously missed by sequence searching  Library searching by SpectraST takes about 3 days of CPU time Conclusions  The library-searching tool SpectraST is extended to allow users to build custom spectral libraries.  The consensus spectrum creation algorithm enables the reduction of noise and spurious peaks, resulting in high-quality, representative spectra.  A spectral library has been built from the entire Human Plasma PeptideAtlas, and is now available.  The Human Plasma PeptideAtlas datasets were re-analyzed by searching against this library. Compared to sequence searching by SEQUEST, SpectraST identified over 60% more spectra at the same probability threshold, with much improved sensitivites and false discovery rates.  Quality filters are shown to have a significant impact on the performance of the search. Quality LevelNumber of spectra remaining Q0 (No filter)37,428 Q1 (Removed impure spectra; spectra with look-alikes having conflicting IDs) 30,517 Q2 (Removed impure spectra; spectra with look-alikes having conflicting IDs; singleton spectra) 20,315 R = (Ave. # peaks in replicates) / (# peaks in consensus) Advantages of spectral searching over traditional sequence searching 1  Smaller search space. Only peptide ions known to be observed and identified are included in the library.  More precise scoring. Peak intensities are naturally accounted for and all spectral features, including uncommon fragments, are used for similarity scoring.  Vast improvement in speed. A reduced search space and a simpler scoring algorithm yield a typical speed gain of X (compared to SEQUEST 3 ).  Higher confidence in identifications. The more precise scoring allows better separation of good and bad hits, leading to much improved sensitivity and false discovery rates. SpectraST library creation operations An example of consensus spectrum creation Building searchable spectral libraries