Eat Raw & Fresh: Introducing isotopic Mass-to-charge Ratio and Envelope Fingerprinting (iMEF) and ProteinGoggle for Protein Database Search Zhixin(Michael)

Slides:



Advertisements
Similar presentations
De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides Hannu Peltoniemi
Advertisements

In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics.
Mass Spectrometry in Life Science: Technology and Data-Evaluation H. Thiele Bruker Daltonik, Germany.
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
Database Searches. Peptide mass fingerprinting digestMS Search HIT SCORE Protein X 1000 Protein Y 50 Protein Z 5 Protein X theoretical digestProtein Y.
PepArML: A model-free, result-combining peptide identification arbiter via machine learning Xue Wu, Chau-Wen Tseng, Nathan Edwards University of Maryland,
De Novo Sequencing v.s. Database Search Bin Ma School of Computer Science University of Waterloo Ontario, Canada.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan.
Yoona Kim University of California, San Diego UCSD Mass Spectrometry Journal Club 12/03/10.
Sangtae Kim Ph.D. candidate University of California, San Diego
Building and Using Libraries of Peptide Ion Fragmentation Spectra S.E. Stein, L.E. Kilpatrick, M. Mautner, P. Neta, J. Roth National Institute of Standards.
ProReP - Protein Results Parser v3.0©
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Proteomics Informatics Workshop Part I: Protein Identification
Previous Lecture: Regression and Correlation
Mass Spectrometry. What are mass spectrometers? They are analytical tools used to measure the molecular weight of a sample. Accuracy – 0.01 % of the total.
Each results report will contain:
Scaffold Download free viewer:
My contact details and information about submitting samples for MS
Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Karl Clauser Proteomics and Biomarker Discovery Taming Errors for Peptides with Post-Translational Modifications Bioinformatics for MS Interest Group ASMS.
Production of polypeptides, Da, and middle-down analysis by LC-MSMS Catherine Fenselau 1, Joseph Cannon 1, Nathan Edwards 2, Karen Lohnes 1,
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
PROTEIN QUANTIFICATION AND PTM JUN SIN HSS.I. PROJECT 1.
Top-down characterization of proteins in bacteria with unsequenced genomes Nathan Edwards Georgetown University Medical Center.
Acknowledgements This work is supported by NSF award DBI , and National Center for Glycomics and Glycoproteomics, funded by NIH/NCRR grant 5P41RR
Common parameters At the beginning one need to set up the parameters.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Laxman Yetukuri T : Modeling of Proteomics Data
Search Engine Result Combining Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center.
Automatic Analysis of Ion Mobility Spectrometry – Mass Spectrometry (IMS-MS) Data Hyejin Yoon School of Informatics Indiana University Bloomington December.
MS Calibration for Protein Profiles We need calibration for –Accurate mass value Mass error: (Measured Mass – Theoretical Mass) X 10 6 ppm Theoretical.
Temple University MASS SPECTROMETRY FURTHER INVESTIGATIONS Ilyana Mushaeva and Amber Moscato Department of Electrical and Computer Engineering Temple University.
C. Other Enzymes PCA1 PCA2 glycolytic HSPB2 CK Other Enzymes PCA1 PCA2 Other Enzymes PC1 glycolytic HSPB2 CK glycolytic HSPB2 CK Quantitation of Changes.
Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center Application of meta-search, grid-computing, and machine-learning.
Protein Identification via Database searching Attila Kertész-Farkas Protein Structure and Bioinformatics Group, ICGEB, Trieste.
A Reference Library of Peptide Ion Fragmentation Spectra Stephen Stein 1 ; Lisa Kilpatrick 2 ; Pedatsur Neta 1 ; Jeri Roth 1 ; Xiaoyu Yang 1 National Institute.
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Improving the Sensitivity of Peptide Identification Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical.
Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.
Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center Application of meta-search, grid-computing, and machine-learning.
The observed and theoretical peptide sequence information Cal.MassObserved. Mass ±da±ppmStart Sequence EndSequenceIon Score C.I%modification FLPVNEK.
ISOMATCH-web For automatic matching of isotope peak distributions ■ Automatic matching of a raw spectrum (ASCII format) to theoretical isotopic distributions.
Mascot Example Slides. MS/MS Database Search Example Data: BSAonespectra.mgf (one spectra) Database: bovine Fixed modifications: Carboxymethyl(C )
Application of meta-search, grid-computing, and machine-learning can significantly improve the sensitivity of peptide identification. The PepArML meta-search.
Deducing protein composition from complex protein preparations by MALDI without peptide separation.. TP #419 Kenneth C. Parker SimulTof Corporation, Sudbury,
Constructing high resolution consensus spectra for a peptide library
Minimize Database-Dependence in Proteome Informatics Apr. 28, 2009 Kyung-Hoon Kwon Korea Basic Science Institute.
DIA Method Design, Data Acquisition, and Assessment
Protein quantitation I: Overview (Week 5). Fractionation Digestion LC-MS Lysis MS Sample i Protein j Peptide k Proteomic Bioinformatics – Quantitation.
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Database Search Algorithm for Identification of Intact Cross-Links in Proteins and Peptides Using Tandem Mass Sepctrometry 신성호.
Yiming Yang1,2, Abhay Harpale1 and Subramanian Ganaphathy1
LC-MS/MS Identification of Impurities Present in Synthetic Peptide Drugs Dr Anna Meljon*, Dr Alan Thompson, Dr Osama Chahrour, and Dr John Malone Almac.
MassMatrix Search Results Explained
Protein Identification via Database searching
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra I
Proteomics Informatics –
NoDupe algorithm to detect and group similar mass spectra.
Processing of fragment ion information in DTA files to remove isotope ions and noise. Processing of fragment ion information in DTA files to remove isotope.
High level view of the MAE algorithm.
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra
Kuen-Pin Wu Institute of Information Science Academia Sinica
Presentation transcript:

Eat Raw & Fresh: Introducing isotopic Mass-to-charge Ratio and Envelope Fingerprinting (iMEF) and ProteinGoggle for Protein Database Search Zhixin(Michael) Tian CNCP 11/15/2012

What is mass? Monoisotopic mass (m/z, z=+1) L. C. Dias, et al. J. Org. Chem. 2012, 77, 4046.

(13C/12C ratio’s variability) Missing monoisotopic mass in protein Monoisotopic mass : most significant & accurate Mass of the most abundant isotope Error: ±1 Da or more (mis-assignment of # of contributing heavy isotopes ) Average mass: Error: ±1 u at 16,000 u (13C/12C ratio’s variability) Monoisotopic mass (12C, 1H, 14N, 16O, 32S) Average mass (average of isotopic peak masses weighted by abundance) The increased probability for multiple heavy isotopes as the mass of a molecule increases causes a decrease in the relative abundance of the monoisotopic peak. The observation of the monoisotopic peak is unlikely for molecules larger than 15 KDa.

Deisotoping (Deconvolution) Algorithms: AID-MS, ESI-ISOCONV, LASSO, MapQuant, MasSPIKE, MATCHING, msInspect, Peplist, quadratic deisotoping, RAPID, THRASH, Wang’s method, Zhang’s program, and ZSCORE Steps: Calculate background noise level Determine charge state using FT/Patterson technique Calculate theoretical profile Fit with observed isotopic profile Monoisotopic mass Search Engines: ProSightPC, SEQUEST, Mascot, X!Tandem, InsPecT, OMSSA, Andromeda, pFind 2. C. D. Wenger, M. T. Boyne, J. T. Ferguson, D. E. Robinson, N. L. Kelleher, Versatile Online-Offline Engine for Automated Acquisition of High-Resolution Tandem Mass Spectra. Anal Chem 80, 8055 (Nov 1, 2008). 3. J. K. Eng, A. L. Mccormack, J. R. Yates, An Approach to Correlate Tandem Mass-Spectral Data of Peptides with Amino-Acid-Sequences in a Protein Database. J Am Soc Mass Spectr 5, 976 (Nov, 1994). 4. D. N. Perkins, D. J. C. Pappin, D. M. Creasy, J. S. Cottrell, Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551 (Dec, 1999). 5. S. Tanner et al., InsPecT: Identification of posttransiationally modified peptides from tandem mass spectra. Anal Chem 77, 4626 (Jul 15, 2005). 6. L. Y. Geer et al., Open mass spectrometry search algorithm. J Proteome Res 3, 958 (Sep-Oct, 2004). 7. J. Cox et al., Andromeda: A Peptide Search Engine Integrated into the MaxQuant Environment. J Proteome Res 10, 1794 (Apr, 2011). 8. D. Q. Li et al., pFind: a novel database-searching software system for automated peptide and protein identification via tandem mass spectrometry. Bioinformatics 21, 3049 (Jul 1, 2005).

Peptide Mass Fingerprinting (PMF) Protein Database RAW File Input MS Spectrum (iE) MS/MS Spectra (iE) A1/P1 A1/P2 A2/P3 Search Engine Parent (Theo. mass) Parent (Exp. mass) A2/P4 Fragments (Theo. mass) Fragments (Exp. mass) Candidates Output Final IDs Initial IDs

Ubiquitin - MS spectrum (profile)

Ubiquitin – MS/MS (ETD) Spectrum (Profile)

Database search with PMF using ProSightPC NMFs = 92 NUMFs = 219 P score = 4.86E-98

Definition of P_Score f - the total number of observed fragments (NMFs + NUMFs); n - the number of matching fragments (NMFs). x - the mean probability that a mass of an observed fragment ion will randomly match one from a generic protein 111.1 - the mass of the average amino acid, weighted for its occurrence in proteins; 2 - the number of fragment ions generated from each bond cleavage, which is assumed to be 2 (b- and y-type ions or c-and z•-type ions); Ma - the mass accuracy (a Ma of ±1 Da translates to a 2 Da window). Neil L. Kelleher, et al. Nat. Biotechnol. 2001, 19, 952

Is “MFs” really good? ?

Is “NUMFs” really good? RAPID (28+49=77) THRASH (92+219=311) PeakPicking: SNRThreshold = 3.0 BackgroundRatio = 5.0 FitType = Lorentzian DeconvPep: MaxCharge = 25 ThScore = 0.0 AdvDeconv: MaxAbundancePeak = 3 ScanNoModifier = 0 MaxMissPeak = 3 MassErr = 1.0E-05 ThClustExt = 0.0 IntsRangeErr = 0.5 Better “deisotoping”? NO “deisotoping”?

What is a mass spectrum? MS of Ubiquitin

The nature of the iE of an ion x, y coordinates Profile Exp. m/z Abundance 856.9821 6061 857.0825 21811 857.1826 52841 857.2809 82342 857.3782 93523 857.4746 96019 857.5714 75857 857.6682 60680 857.7663 42420 857.8669 27294 857.9680 14752 858.0681 5685 858.1685 1120 858.2717 919 858.3671 316 858.4594 147 Centroid

What are in a protein database? MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG x, y coordinates Exp. m/z Abundance 856.9690 3.95 857.0692 18.83 857.1695 45.88 857.2698 76.13 857.3701 96.65 857.4703 100.00 857.5706 87.76 857.6709 67.12 857.7711 45.63 857.8714 27.99 857.9716 15.67 858.0719 8.09 858.1721 3.87 858.2724 1.73 858.3726 0.73 858.4729 0.29 C378H630N105O118S1 Centroid

iMEF(isotopic m/z & Envelope Fingerprinting) Protein Database RAW File Input A2/P3 A2/P4 Parent (Theo. mass) Fragments Parent (Theo. iE) Fragments A/P1 A/P2 MS Spectrum (iE) MS/MS Spectra (iE) A1/P1 Parent (Exp. mass) Fragments A1/P2 Search Candidates Output Final IDs Initial IDs

Top-down Screening – MS/MS2 ( Targeted Screening - MS2) 1st isotopic peak DB A1/F1 Parent ion exp. iE Parent ion theo. iE A2 F2 Protein candidates Fragment ion exp. iEs Fragment ion theo. iEs A2/F3 Preliminary protein IDs 2nd isotopic peak Y 3rd isotopic peak Initial protein ID NMFs PTM_Scores Initial protein IDs Final IDs Remove duplicates Isotopic peak exclusion list Norm. isotopic peaks removed N Combined initial protein IDs Preliminary protein candidates N Top-down Screening – MS/MS2 ( Targeted Screening - MS2) N iMEF = iMF (A1) + iEF (A2) Y Y Y N

Pre-Step 1: Customized database MS Precursor ions MS/MS fragment ions

Pre-Step 2: Noise level determination

Ubiquitin - MS spectrum (profile)

Ubiquitin – MS/MS (HCD) spectrum (profile)

Step 1: Profile to centroid (MS & MS2)

isolation window (±3 m/z units) Step 2: iMF of precursor ion candidates 857.47461 (4 ppm) Top-down Screening IPMD  15 ppm isolation window (±3 m/z units) … … … … … …

Step 3: iEF of precursor ion candidates IPACO  5% IPMD  15ppm IPAD  30%

Targeted Screening IPMD  10 ppm Step 4: iMF of fragment ion candidates Targeted Screening IPMD  10 ppm 277.13278 (5 ppm) C1;MAX_MZ=149.07431&C2;MAX_MZ=277.132888&C3;MAX_MZ=390.216952&C4;MAX_MZ=537.285366&C5;MAX_MZ=636.353779&C6;MAX_MZ=764.448743&C7;…

Step 5: iEF of fragment ion candidates IPACO  5% IPMD  10ppm IPAD  50%

Exemplary PTM_Score assignment Human histone H4_S1acK16acK20me2

IPMDO=20, IPMDOM=30, IPADO=20, IPADOM=200 ID of ubiquitin from ETD NMFs = 91 IPACO=10, IPMD=15, IPAD=100 IPMDO=20, IPMDOM=30, IPADO=20, IPADOM=200 NMFs vs. IPACO NMFs vs. IPMD NMFs vs. IPAD

Pros and Cons Pros: As-strict-as-you-choose confidence Strict quality control (QC) Fine discrimination of close iEs In-situ unwrapping of overlapped iEs Cons: More complex and bigger database More data points for fingerprinting

Pros: As-strict-as-you-choose confidence Comparison with ProSightPC

Layman’s choice of parameters Default values with statistical significance!

Pros: Fine discrimination of close iEs b38-533+ b18-333+ or b19-343+ (b6-22-H2O)3+ Exp. m/z Theo. m/z IPMD 599.6575 599.6478 16 599.6511 11 599.6595 -3 599.9919 599.9821 599.9855 599.9939 600.3242 600.3164 13 600.3197 8 600.3281 -6 600.6616 600.6506 18 600.6539 600.6623 -1

Pros: In-situ unwrapping of overlapped iEs The abundance of an overlapped isotopic peak is divided into individual overlapped isotopic envelopes according to the calculated proportional abundance using the experimental abundance and theoretical relative abundance ratios Proportional partition k: # of overlapped isotopic peaks m: # of isotopic peak in each iE n: # of overlapped iEs

Other improvements and utilities Bi-section method for fast indexing of candidates LASSO-like approach to untangle overlapped iEs Additional utilities: A comprehensive confidence score False discovery rate (FDR) Customized ion types to look for new dissociation channels Customized MODs for the search of new modification or labeled proteins MS/MS spectrum annotation with matching fragments

Conclusions An as-confident-as-you-choose protein database search algorithm, iMEF, has been created and implemented in the search engine ProteinGoggle The principle of iMEF with ProteinGoggle is demonstrated with identification of ubiquitin from its tandem mass spectrum using ETD iMEF as implemented in ProteinGoggle has been able to unwrap complex overlapping isotopic envelopes and confidently provide embedded fragment ions iMEF could be adapted for peptide and glycan database search with customized databases

Acknowledgements DNL2003 Li Li Bo Wang Jing Li Xu Zhao The KENES. Co. Ltd. Miao Zhou Shijin Liu Bin Yang Funding: DICP “Research Start” China “Youth 1000-talents Theme”

Thank you very much!