Presentation is loading. Please wait.

Presentation is loading. Please wait.

Eat Raw & Fresh: Introducing isotopic Mass-to-charge Ratio and Envelope Fingerprinting (iMEF) and ProteinGoggle for Protein Database Search Zhixin(Michael)

Similar presentations


Presentation on theme: "Eat Raw & Fresh: Introducing isotopic Mass-to-charge Ratio and Envelope Fingerprinting (iMEF) and ProteinGoggle for Protein Database Search Zhixin(Michael)"— Presentation transcript:

1 Eat Raw & Fresh: Introducing isotopic Mass-to-charge Ratio and Envelope Fingerprinting (iMEF) and ProteinGoggle for Protein Database Search Zhixin(Michael) Tian CNCP 11/15/2012

2 What is mass? Monoisotopic mass (m/z, z=+1)
L. C. Dias, et al. J. Org. Chem. 2012, 77, 4046.

3 (13C/12C ratio’s variability)
Missing monoisotopic mass in protein Monoisotopic mass : most significant & accurate Mass of the most abundant isotope Error: ±1 Da or more (mis-assignment of # of contributing heavy isotopes ) Average mass: Error: ±1 u at 16,000 u (13C/12C ratio’s variability) Monoisotopic mass (12C, 1H, 14N, 16O, 32S) Average mass (average of isotopic peak masses weighted by abundance) The increased probability for multiple heavy isotopes as the mass of a molecule increases causes a decrease in the relative abundance of the monoisotopic peak. The observation of the monoisotopic peak is unlikely for molecules larger than 15 KDa.

4 Deisotoping (Deconvolution)
Algorithms: AID-MS, ESI-ISOCONV, LASSO, MapQuant, MasSPIKE, MATCHING, msInspect, Peplist, quadratic deisotoping, RAPID, THRASH, Wang’s method, Zhang’s program, and ZSCORE Steps: Calculate background noise level Determine charge state using FT/Patterson technique Calculate theoretical profile Fit with observed isotopic profile Monoisotopic mass Search Engines: ProSightPC, SEQUEST, Mascot, X!Tandem, InsPecT, OMSSA, Andromeda, pFind 2. C. D. Wenger, M. T. Boyne, J. T. Ferguson, D. E. Robinson, N. L. Kelleher, Versatile Online-Offline Engine for Automated Acquisition of High-Resolution Tandem Mass Spectra. Anal Chem 80, 8055 (Nov 1, 2008). 3. J. K. Eng, A. L. Mccormack, J. R. Yates, An Approach to Correlate Tandem Mass-Spectral Data of Peptides with Amino-Acid-Sequences in a Protein Database. J Am Soc Mass Spectr 5, 976 (Nov, 1994). 4. D. N. Perkins, D. J. C. Pappin, D. M. Creasy, J. S. Cottrell, Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551 (Dec, 1999). 5. S. Tanner et al., InsPecT: Identification of posttransiationally modified peptides from tandem mass spectra. Anal Chem 77, 4626 (Jul 15, 2005). 6. L. Y. Geer et al., Open mass spectrometry search algorithm. J Proteome Res 3, 958 (Sep-Oct, 2004). 7. J. Cox et al., Andromeda: A Peptide Search Engine Integrated into the MaxQuant Environment. J Proteome Res 10, 1794 (Apr, 2011). 8. D. Q. Li et al., pFind: a novel database-searching software system for automated peptide and protein identification via tandem mass spectrometry. Bioinformatics 21, 3049 (Jul 1, 2005).

5 Peptide Mass Fingerprinting (PMF)
Protein Database RAW File Input MS Spectrum (iE) MS/MS Spectra (iE) A1/P1 A1/P2 A2/P3 Search Engine Parent (Theo. mass) Parent (Exp. mass) A2/P4 Fragments (Theo. mass) Fragments (Exp. mass) Candidates Output Final IDs Initial IDs

6 Ubiquitin - MS spectrum (profile)

7 Ubiquitin – MS/MS (ETD) Spectrum (Profile)

8 Database search with PMF using ProSightPC
NMFs = 92 NUMFs = 219 P score = 4.86E-98

9 Definition of P_Score f - the total number of observed fragments (NMFs + NUMFs); n - the number of matching fragments (NMFs). x - the mean probability that a mass of an observed fragment ion will randomly match one from a generic protein the mass of the average amino acid, weighted for its occurrence in proteins; 2 - the number of fragment ions generated from each bond cleavage, which is assumed to be 2 (b- and y-type ions or c-and z•-type ions); Ma - the mass accuracy (a Ma of ±1 Da translates to a 2 Da window). Neil L. Kelleher, et al. Nat. Biotechnol. 2001, 19, 952

10 Is “MFs” really good? ?

11 Is “NUMFs” really good? RAPID (28+49=77) THRASH (92+219=311)
PeakPicking: SNRThreshold = 3.0 BackgroundRatio = 5.0 FitType = Lorentzian DeconvPep: MaxCharge = 25 ThScore = 0.0 AdvDeconv: MaxAbundancePeak = 3 ScanNoModifier = 0 MaxMissPeak = 3 MassErr = 1.0E-05 ThClustExt = 0.0 IntsRangeErr = 0.5 Better “deisotoping”? NO “deisotoping”?

12 What is a mass spectrum? MS of Ubiquitin

13 The nature of the iE of an ion
x, y coordinates Profile Exp. m/z Abundance 6061 21811 52841 82342 93523 96019 75857 60680 42420 27294 14752 5685 1120 919 316 147 Centroid

14 What are in a protein database?
MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG x, y coordinates Exp. m/z Abundance 3.95 18.83 45.88 76.13 96.65 100.00 87.76 67.12 45.63 27.99 15.67 8.09 3.87 1.73 0.73 0.29 C378H630N105O118S1 Centroid

15 iMEF(isotopic m/z & Envelope Fingerprinting)
Protein Database RAW File Input A2/P3 A2/P4 Parent (Theo. mass) Fragments Parent (Theo. iE) Fragments A/P1 A/P2 MS Spectrum (iE) MS/MS Spectra (iE) A1/P1 Parent (Exp. mass) Fragments A1/P2 Search Candidates Output Final IDs Initial IDs

16 Top-down Screening – MS/MS2 ( Targeted Screening - MS2)
1st isotopic peak DB A1/F1 Parent ion exp. iE Parent ion theo. iE A2 F2 Protein candidates Fragment ion exp. iEs Fragment ion theo. iEs A2/F3 Preliminary protein IDs 2nd isotopic peak Y 3rd isotopic peak Initial protein ID NMFs PTM_Scores Initial protein IDs Final IDs Remove duplicates Isotopic peak exclusion list Norm. isotopic peaks removed N Combined initial protein IDs Preliminary protein candidates N Top-down Screening – MS/MS2 ( Targeted Screening - MS2) N iMEF = iMF (A1) + iEF (A2) Y Y Y N

17 Pre-Step 1: Customized database
MS Precursor ions MS/MS fragment ions

18 Pre-Step 2: Noise level determination

19 Ubiquitin - MS spectrum (profile)

20 Ubiquitin – MS/MS (HCD) spectrum (profile)

21 Step 1: Profile to centroid (MS & MS2)

22 isolation window (±3 m/z units)
Step 2: iMF of precursor ion candidates (4 ppm) Top-down Screening IPMD  15 ppm isolation window (±3 m/z units) … … … … … …

23 Step 3: iEF of precursor ion candidates
IPACO  5% IPMD  15ppm IPAD  30%

24 Targeted Screening IPMD  10 ppm
Step 4: iMF of fragment ion candidates Targeted Screening IPMD  10 ppm (5 ppm) C1;MAX_MZ= &C2;MAX_MZ= &C3;MAX_MZ= &C4;MAX_MZ= &C5;MAX_MZ= &C6;MAX_MZ= &C7;…

25 Step 5: iEF of fragment ion candidates
IPACO  5% IPMD  10ppm IPAD  50%

26 Exemplary PTM_Score assignment
Human histone H4_S1acK16acK20me2

27 IPMDO=20, IPMDOM=30, IPADO=20, IPADOM=200
ID of ubiquitin from ETD NMFs = 91 IPACO=10, IPMD=15, IPAD=100 IPMDO=20, IPMDOM=30, IPADO=20, IPADOM=200 NMFs vs. IPACO NMFs vs. IPMD NMFs vs. IPAD

28 Pros and Cons Pros: As-strict-as-you-choose confidence
Strict quality control (QC) Fine discrimination of close iEs In-situ unwrapping of overlapped iEs Cons: More complex and bigger database More data points for fingerprinting

29 Pros: As-strict-as-you-choose confidence
Comparison with ProSightPC

30 Layman’s choice of parameters
Default values with statistical significance!

31 Pros: Fine discrimination of close iEs
b b or b (b6-22-H2O)3+ Exp. m/z Theo. m/z IPMD 16 11 -3 13 8 -6 18 -1

32 Pros: In-situ unwrapping of overlapped iEs
The abundance of an overlapped isotopic peak is divided into individual overlapped isotopic envelopes according to the calculated proportional abundance using the experimental abundance and theoretical relative abundance ratios Proportional partition k: # of overlapped isotopic peaks m: # of isotopic peak in each iE n: # of overlapped iEs

33 Other improvements and utilities
Bi-section method for fast indexing of candidates LASSO-like approach to untangle overlapped iEs Additional utilities: A comprehensive confidence score False discovery rate (FDR) Customized ion types to look for new dissociation channels Customized MODs for the search of new modification or labeled proteins MS/MS spectrum annotation with matching fragments

34 Conclusions An as-confident-as-you-choose protein database search algorithm, iMEF, has been created and implemented in the search engine ProteinGoggle The principle of iMEF with ProteinGoggle is demonstrated with identification of ubiquitin from its tandem mass spectrum using ETD iMEF as implemented in ProteinGoggle has been able to unwrap complex overlapping isotopic envelopes and confidently provide embedded fragment ions iMEF could be adapted for peptide and glycan database search with customized databases

35 Acknowledgements DNL2003 Li Li Bo Wang Jing Li Xu Zhao
The KENES. Co. Ltd. Miao Zhou Shijin Liu Bin Yang Funding: DICP “Research Start” China “Youth 1000-talents Theme”

36 Thank you very much!


Download ppt "Eat Raw & Fresh: Introducing isotopic Mass-to-charge Ratio and Envelope Fingerprinting (iMEF) and ProteinGoggle for Protein Database Search Zhixin(Michael)"

Similar presentations


Ads by Google