Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 st MS 2 2 nd 3 rd 4 th 5 th 6 th 10 th 9 th 8 th 7 th Relative Intensity Fill Times Scan Times “shotgun sequencing”

Similar presentations


Presentation on theme: "1 st MS 2 2 nd 3 rd 4 th 5 th 6 th 10 th 9 th 8 th 7 th Relative Intensity Fill Times Scan Times “shotgun sequencing”"— Presentation transcript:

1 1 st MS 2 2 nd 3 rd 4 th 5 th 6 th 10 th 9 th 8 th 7 th Relative Intensity Fill Times Scan Times “shotgun sequencing”

2 MS/MS Spectrum Protein Database spectral matching

3 time “shotgun sequencing”

4 ms 1 ms 2 time “shotgun sequencing”

5 LTQ Orbitrap base peak chromatogram 37 min LC-MS/MS run-time 6186 MS/MS spectra 2308 peptide IDs (false-positive rate 1%) 287 protein IDs 6000 spectra x 10s/spectrum = 16 CPU hours Server single CPU search time 16 hours Server 20 nodes parallel CPUs 0.8 hours distributed spectral matching

6 XCorr: goodness of fit between theoretical b and y ions from peptides in the database dCn: fractional XCorr difference between the highest XCorr and next highest XCorr sequest yates j.r. 3 rd et al. j am soc mass spectrom 5: (1994)

7 ms 1 ms 2 time ms 2 spectra all ms 2 in LC run sequest

8 all ms 2 in LC run 1 dta all raw (precursor m/z) +2 (charge state) ms2 array (all ms2 = 1 file) 1 ms2 = 1 file (all ms2 = ~10000 files) 2 dta (precursor m/z) +3 (charge state) ms2 array sequest

9 2 x 3,250,000 times3 x 3,250,000 times x 3,250,000 times all ms 2 in LC run 1 dta, dta /- 1Da human ipi database proteins peptide mass: MSQVQVQVQNPSAALSGSQILNK digest to next peptide calculate peptide mass compare with precursor not a candidate if cand., calc. theoretical spectrum correlate, score & return /- 1Da 3,250,000 times sequest

10 yates j.r. 3 rd et al. j am soc mass spectrom 5: (1994) theoretical “candidate” spectrumexperimental peptide spectrum correlation spectrum

11 yates j.r. 3 rd et al. j am soc mass spectrom 5: (1994) correlation spectrum

12 yates j.r. 3 rd et al. j am soc mass spectrom 5: (1994) correlation spectrum

13 yates j.r. 3 rd et al. j am soc mass spectrom 5: (1994) correlation spectrum similarity scoring Xcorr score

14 Xcorr (cross-correlation) Dot product similarity scoring – cross-correlation vs dot product Dot product

15 human ipi database proteins >ipi MSQVQVQVQNPSAALSGSQILNKNQSLLSQ PLMSIPSTTSSLPSENAGRPIQNSALPSASITST SAAAESITPTVELNAL…. 1 st >ipi ….AKPNINLITGHLEEPMPNPIDEMTEEQKEY EAMKLVNMLDKLSREELLKPMGLKPDGTIT th /- 1Da non-indexed searching

16 human ipi database proteins >ipi G 75 Da >ipi AKPNINLITGHLEEPMPNPIDEMTEEQEYEA MLVNMLDLSEELLKPMGLKPDGTITAKPNINL ITGHLEEPMPNPIDEMTEEQEYEAMLVNML DLSEELLKPMGLKPDGTIT Da indexed >ipi WEFGGHTVLR /- 1Da indexed searching

17 scoring & analysis score/criterion frequency TP TN cutoff/threshold FN FP Score/Metric 1Score/Metric 2Score/Metric 3 Peptide A Peptide B Peptide C Peptide D Peptide E Peptide F sensitivity = TP TP + FN precision = TP TP + FP specificity = TN TN + FP accuracy = TP + TN TP + TN + FN + FP

18 The Results: Distinguishing Right from Wrong In large proteomics data sets (for which manual data inspection is impossible), how can we distinguish between correct and incorrect peptide assignments? Use “decoy” sequences to distract non-peptidic, non- uniquely matchable, or otherwise unmatchable spectra into a search space that is known a priori to be incorrect Use the frequency of “decoy” sequences among total sequences to estimate the overall frequency of wrong answers (False Positive Rate) Adjust filtering criteria to achieve a ~ 1% False Positive Rate

19 Decoy Sequences? A “Reversed” Database! We generate decoy sequences by reversing each protein sequence in a given database, such that the resultant in silico digest contains nonsense peptides, then append the reversed database to the end of the forward database Decoy references are labeled with # Database searching with SEQUEST occurs from top to bottom – when decoy references are found, there is an equal probability it could have also mapped to a non-decoy sequence. So our FPR is (# of decoys) x 2 / total matches. S E A R C H I N G

20 Forward database 1.MAGFA→ → →SHTRP Reversed database 1.PRTHS→ → →AFGAM Composite Database Sequest Right Wrong (random) F FR 50% 100% Filter (scoring, mass accuracy, etc) Generate final list Estimate FP rate from 2 x Rev (i.e., 4%) Known FP Unknown FP Target/Decoy Database Searching

21  Cn XCorr Forward Sequences  Cn XCorr Forward + Reverse TPFP PSM number sequest scores: finding true positives XCorr

22 Precision of mass errors between observed and actual m/z LTQ Orbitrap & LTQ FT 0.1 ± 0.4 ppm LTQ FT (SIM) AGC target 50,000 to avoid space-charge effects Olsen et al. (2004) Mol. Cell. Proteomics 3, ± 1.0 ppm High Mass Accuracy Haas et al. (2006) Mol. Cell. Proteomics 5, 1326 Mass “Accuracy” in Proteomics: Performance is related to the width of the distribution, not the average error

23 MMA: True Positives and False Positives MMA0 True Positives False Positives TPFP PSM number False positives are distributed evenly across MMA space

24 MS/MS vs MMA: Precision vs Sensitivity MMA0 0 MS/MS criteria are strong precision filters – require TP / FP separation for sensitivity MMA criteria are weak precision filters – assists MS/MS criteria in improving sensitivity

25 Distracting Wrong from Right: MMA MMA0 True Positives False Positives True Positives False Positives MMA 0 Extended Search Space Search Space Filtered

26 Mass Accuracy: Another dimension of selectivity  Cn XCorr  Cn XCorr Forward Sequences  Cn XCorr Forward + Reverse Tryptic Search +/- 2Da  Cn XCorr Tryptic Search +/- 2Da 5ppm filter

27 Distracting Wrong from Right: Trypticity True Positives False Positives K/R-PeptideK/R- True Positives False Positives A-G-C-S-T-I-L-F-P-M-V-H-D-E-Y-W-Q-N- A-G-C-S-T-I-L-F-P-M-V-H-D-E-Y-W-Q-N- PeptideK/R- K/R-Peptide Filtered Tryptic Search Partial Enzyme Search

28 Phosphorylated Unphosphorylated XCorr dCn n = 286 What do we have here, hm? Reversed Hits

29 dCn (Phosphorylated) dCn (Unphosphorylated) Doubly Phosphorylated (n=79)Singly Phosphorylated (n=207) n = 286 Phosphopeptides: Chemically disadvantaged… XCorr (Unphosphorylated) XCorr (Phosphorylated) n = Dataset of phosphorylated and unphosphorylated peptide MS/MS pairs MSFEILR P

30 Doubly Phosphorylated Singly Phosphorylated XCorr (Ph/UnPh) 86% Phosphopeptides: Less power in XCorr & dCn Unphosphorylated 93% dCn (Ph/UnPh) Unphosphorylated

31 Yeast Whole-Cell Lysate Red., Alkyl. SDS-PAGE kDa Trypsin IMAC-purification Mass Accuracy: Can it help for phosphorylation?

32 Mass Accuracy: Rescuing phosphopeptides +2: : : : 3.5 XCorr n=1390 LTQ TOP10 SEQUEST partial enzyme search, fully tryptic peptide spectral matches n=1311 MMA (ppm) Orbitrap TOP10 XCorr

33 LTQ Orbitrap % FP % FP 74% increase Mission: Phosphopeptide rescue – accomplished! % FP No MMAMMA # of phosphopeptides

34 search algorithms & phosphorylation Bakalarski et al., Anal. Bioanal. Chem., 2007 sequest omssa

35 phosphorylation site localization GFDSNQpTWR or GFDpSNQTWR? Beausoleil et al., Nat. Biotechnol, 2006

36 phosphorylation site localization Beausoleil et al., Nat. Biotechnol, 2006

37 phosphorylation site localization Taus et al., JPR, 2011

38 phosphorylation localization rate (FLR) Chalkey & Clauser, MCP, 2012 Baker et al., MCP, 2011 use non-native phosphoacceptors as “decoys” Ser + Thr (human proteome): 14.1% Pro + Glu (human proteome): 14.5% allow search engine / localization assessment tools to consider pP and pE as true negative “decoys” calculate dataset FLR based on frequency of pP + pE “decoys”


Download ppt "1 st MS 2 2 nd 3 rd 4 th 5 th 6 th 10 th 9 th 8 th 7 th Relative Intensity Fill Times Scan Times “shotgun sequencing”"

Similar presentations


Ads by Google