Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reporting Protein Identifications from MS/MS Results Brian C. Searle Proteome Software Inc. Portland, Oregon USA Creative.

Similar presentations


Presentation on theme: "Reporting Protein Identifications from MS/MS Results Brian C. Searle Proteome Software Inc. Portland, Oregon USA Creative."— Presentation transcript:

1 Reporting Protein Identifications from MS/MS Results Brian C. Searle Proteome Software Inc. Portland, Oregon USA Creative Commons Attribution

2 Outline Assigning Proteins from Peptide IDs Correcting for One-Hit-Wonders Protein False Discovery Rates? Correcting for Shared Peptides Publication Standards

3 Outline Assigning Proteins from Peptide IDs Correcting for One-Hit-Wonders Protein False Discovery Rates? Correcting for Shared Peptides Publication Standards

4 Just to Review: clearly wrong possibly correct F R Elias JE, Gygi SP. Nat Methods Mar;4(3):

5 Just to Review: #SpectrumAccessionPeptideScore 1scan 3632P35908GFSSGSAVVSGGSR4.6 2scan 3609P0AFY8FSAASQPAAPVTK3.7 3scan 3629P0A940GFQSNTIGPK3.0 4scan 3635P0A6F9STRGEVLAVGNGR2.2 5scan 3636P0A870ELAESEGAIER2.1 6scan 3607P0A799ADLNVPVKDGK1.9 7scan 3626P0ABC7EAEAYTNEVQPR1.6 8scan 3602P0A853IRVIEPVKR1.4 9scan 3623P38489KLTPEQAEQIK0.9 10scan 3616P00448GTTLQGDLK0.8 11scan 3621P09546LLPGPTGER0.4 12scan 3615P0AFG8AFLEGR0.2 13scan 3624P14565SAADVAIMK0.0 14scan 3613rev_P06864EGSLAVNVQGDAAIR scan 3604P36562DPEEVVGIGANLPTDK scan 3606P0A9C5IPVVSSPK scan 3611P0ABB0ASTISNVVR scan 3614rev_Q2EEU2KFVALTCDTLLLGER scan 3620rev_P0ACL5NNESAALMKEYCR scan 3633rev_P37309SDGSCNQRALNR scan 3627P32132VEETEDADAFRVSGR 22scan 3618P37342ILTQDEIDVR 23scan 3610rev_P0ADK0IANVSDVVPR scan 3601P0AG93LGMKREHMLQQK-1.3

6 Just to Review: #SpectrumAccessionPeptideScore 1scan 3632P35908GFSSGSAVVSGGSR4.6 2scan 3609P0AFY8FSAASQPAAPVTK3.7 3scan 3629P0A940GFQSNTIGPK3.0 4scan 3635P0A6F9STRGEVLAVGNGR2.2 5scan 3636P0A870ELAESEGAIER2.1 6scan 3607P0A799ADLNVPVKDGK1.9 7scan 3626P0ABC7EAEAYTNEVQPR1.6 8scan 3602P0A853IRVIEPVKR1.4 9scan 3623P38489KLTPEQAEQIK0.9 10scan 3616P00448GTTLQGDLK0.8 11scan 3621P09546LLPGPTGER0.4 12scan 3615P0AFG8AFLEGR0.2 13scan 3624P14565SAADVAIMK0.0 14scan 3613rev_P06864EGSLAVNVQGDAAIR scan 3604P36562DPEEVVGIGANLPTDK scan 3606P0A9C5IPVVSSPK scan 3611P0ABB0ASTISNVVR scan 3614rev_Q2EEU2KFVALTCDTLLLGER scan 3620rev_P0ACL5NNESAALMKEYCR scan 3633rev_P37309SDGSCNQRALNR scan 3627P32132VEETEDADAFRVSGR 22scan 3618P37342ILTQDEIDVR 23scan 3610rev_P0ADK0IANVSDVVPR scan 3601P0AG93LGMKREHMLQQK-1.3

7 Just to Review: #SpectrumAccessionPeptideScore 1scan 3632P35908GFSSGSAVVSGGSR4.6 2scan 3609P0AFY8FSAASQPAAPVTK3.7 3scan 3629P0A940GFQSNTIGPK3.0 4scan 3635P0A6F9STRGEVLAVGNGR2.2 5scan 3636P0A870ELAESEGAIER2.1 6scan 3607P0A799ADLNVPVKDGK1.9 7scan 3626P0ABC7EAEAYTNEVQPR1.6 8scan 3602P0A853IRVIEPVKR1.4 9scan 3623P38489KLTPEQAEQIK0.9 10scan 3616P00448GTTLQGDLK0.8 11scan 3621P09546LLPGPTGER0.4 12scan 3615P0AFG8AFLEGR0.2 13scan 3624P14565SAADVAIMK0.0 14scan 3613rev_P06864EGSLAVNVQGDAAIR scan 3604P36562DPEEVVGIGANLPTDK scan 3606P0A9C5IPVVSSPK scan 3611P0ABB0ASTISNVVR scan 3614rev_Q2EEU2KFVALTCDTLLLGER scan 3620rev_P0ACL5NNESAALMKEYCR scan 3633rev_P37309SDGSCNQRALNR scan 3627P32132VEETEDADAFRVSGR 22scan 3618P37342ILTQDEIDVR 23scan 3610rev_P0ADK0IANVSDVVPR scan 3601P0AG93LGMKREHMLQQK-1.3 ?

8 …Well, Maybe

9 AEPTIR IDVCIVLLQHK NTGDR Protein

10 AEPTIR IDVCIVLLQHK NTGDR Protein 85% 65% 25% ??%

11 FDRs for Whole Datasets vs Individual Peptides Cumulative FDRs only estimate the validity of a data set Probabilities (or instantaneous FDRs) estimate the validity of a peptide of interest

12 One Possible Approach Instantaneous False Discovery Rate PeptideProphet (TPP, Scaffold) Percolator Spectral Energies RAId De Novo Many Others:

13 Just to Review: #SpectrumAccessionPeptideScore 1scan 3632P35908GFSSGSAVVSGGSR4.6 2scan 3609P0AFY8FSAASQPAAPVTK3.7 3scan 3629P0A940GFQSNTIGPK3.0 4scan 3635P0A6F9STRGEVLAVGNGR2.2 5scan 3636P0A870ELAESEGAIER2.1 6scan 3607P0A799ADLNVPVKDGK1.9 7scan 3626P0ABC7EAEAYTNEVQPR1.6 8scan 3602P0A853IRVIEPVKR1.4 9scan 3623P38489KLTPEQAEQIK0.9 10scan 3616P00448GTTLQGDLK0.8 11scan 3621P09546LLPGPTGER0.4 12scan 3615P0AFG8AFLEGR0.2 13scan 3624P14565SAADVAIMK0.0 14scan 3613rev_P06864EGSLAVNVQGDAAIR scan 3604P36562DPEEVVGIGANLPTDK scan 3606P0A9C5IPVVSSPK scan 3611P0ABB0ASTISNVVR scan 3614rev_Q2EEU2KFVALTCDTLLLGER scan 3620rev_P0ACL5NNESAALMKEYCR scan 3633rev_P37309SDGSCNQRALNR scan 3627P32132VEETEDADAFRVSGR 22scan 3618P37342ILTQDEIDVR 23scan 3610rev_P0ADK0IANVSDVVPR scan 3601P0AG93LGMKREHMLQQK-1.3

14 Just to Review: #SpectrumAccessionPeptideScore 1scan 3632P35908GFSSGSAVVSGGSR4.6 2scan 3609P0AFY8FSAASQPAAPVTK3.7 3scan 3629P0A940GFQSNTIGPK3.0 4scan 3635P0A6F9STRGEVLAVGNGR2.2 5scan 3636P0A870ELAESEGAIER2.1 6scan 3607P0A799ADLNVPVKDGK1.9 7scan 3626P0ABC7EAEAYTNEVQPR1.6 8scan 3602P0A853IRVIEPVKR1.4 9scan 3623P38489KLTPEQAEQIK0.9 10scan 3616P00448GTTLQGDLK0.8 11scan 3621P09546LLPGPTGER0.4 12scan 3615P0AFG8AFLEGR0.2 13scan 3624P14565SAADVAIMK0.0 14scan 3613rev_P06864EGSLAVNVQGDAAIR scan 3604P36562DPEEVVGIGANLPTDK scan 3606P0A9C5IPVVSSPK scan 3611P0ABB0ASTISNVVR scan 3614rev_Q2EEU2KFVALTCDTLLLGER scan 3620rev_P0ACL5NNESAALMKEYCR scan 3633rev_P37309SDGSCNQRALNR scan 3627P32132VEETEDADAFRVSGR 22scan 3618P37342ILTQDEIDVR 23scan 3610rev_P0ADK0IANVSDVVPR scan 3601P0AG93LGMKREHMLQQK to 5 3 to 4 2 to 3 1 to 2 0 to 1 -1 to 0 -2 to -1

15 # of Matches “Correct” Ion Score – Identity Score “2x Decoy” Histogram of Decoy Matches

16 # of Matches “Correct” Ion Score – Identity Score Histogram of Decoy Matches “2x Decoy”

17 # of Matches Ion Score – Identity Score Curve Fit Distributions “2x Decoy” “Correct” Choi H, Ghosh D, Nesvizhskii AI. J Proteome Res Jan;7(1):

18 Instantaneous FDR Method # of Matches “Correct” “2x Decoy” Ion Score – Identity Score Choi H, Ghosh D, Nesvizhskii AI. J Proteome Res Jan;7(1):

19 AEPTIR IDVCIVLLQHK NTGDR Protein 85% 65% 25% ??%

20 AEPTIR IDVCIVLLQHK NTGDR Protein (15%) (35%) (75%) (??%) Feng J, Naiman DQ, Cooper B. Anal Chem May 15;79(10):

21 AEPTIR IDVCIVLLQHK NTGDR Protein (15%) (35%) (75%) (4%) 0.15 * 0.35 * 0.75 = 0.04 Feng J, Naiman DQ, Cooper B. Anal Chem May 15;79(10):

22 AEPTIR IDVCIVLLQHK NTGDR Protein 85% 65% 25% 96% 0.15 * 0.35 * 0.75 = 0.04 Feng J, Naiman DQ, Cooper B. Anal Chem May 15;79(10):

23 If only it were so easy!

24 Peptide 1 Peptide 2 Peptide 3 Peptide 4 Peptide 5 Peptide 6 Peptide 7 Peptide 8 Peptide 9 Peptide 10 80% Peptides

25 Peptide 1 Peptide 2 Peptide 3 Peptide 4 Peptide 5 Peptide 6 Peptide 7 Peptide 8 Peptide 9 Peptide 10 Correct Protein A Correct Protein B 80% Peptides

26 Peptide 1 Peptide 2 Peptide 3 Peptide 4 Peptide 5 Peptide 6 Peptide 7 Peptide 8 Peptide 9 Peptide 10 Correct Protein A Correct Protein B Incorrect Protein C Incorrect Protein D 80% Peptides50% Proteins

27 One hit wonders are dubious at best

28 Outline Assigning Proteins from Peptide IDs Correcting for One-Hit-Wonders Protein False Discovery Rates? Correcting for Shared Peptides Publication Standards

29 Computed Probability Actual Probability Nesvizhskii, A. I.; Keller, A. et al Anal. Chem. 75,

30 Computed Probability Actual Probability UNDER estimation OVER estimation Nesvizhskii, A. I.; Keller, A. et al Anal. Chem. 75,

31 Nesvizhskii, A. I.; Keller, A. et al Anal. Chem. 75, UNDER estimation OVER estimation Computed Probability Actual Probability

32 What if we could score one-hit-wonderness? Nesvizhskii, A. I.; Keller, A. et al Anal. Chem. 75,

33 Combining different peptides Quantify as a score: If different peptides agree: Good! If peptides are one-hit-wonders: Bad! Nesvizhskii, A. I.; Keller, A. et al Anal. Chem. 75,

34 Combining different peptides Quantify as a score: If different peptides agree: Good! If peptides are one-hit-wonders: Bad! Peptide agreement score: Nesvizhskii, A. I.; Keller, A. et al Anal. Chem. 75,

35 Combining different peptides Quantify as a score: If different peptides agree: Good! If peptides are one-hit-wonders: Bad! Peptide agreement score: NSP score for peptide (k) is the sum of other agreeing peptides (not k) Nesvizhskii, A. I.; Keller, A. et al Anal. Chem. 75,

36 Protein Prophet Distributions Multi-hit Proteins One-hit Wonders

37 Protein Prophet Distributions

38

39 in between (keep same) one hit wonders (decrease prob) multi-hit proteins (increase prob)

40 UNDER estimation OVER estimation Computed Probability Actual Probability Nesvizhskii, A. I.; Keller, A. et al Anal. Chem. 75,

41 Computed Probability Actual Probability with NSP without NSP Nesvizhskii, A. I.; Keller, A. et al Anal. Chem. 75,

42 Brian, I hate math. What do I do?

43 Option 1: Throw Out One-Hit-Wonders Advantages: Easy, works! Disadvantages: Loss of sensitivity!

44 Option 2: Use Multiple Filters Filter 1 - Protein Mode ≥2 peptides/protein moderate spectrum threshold Filter 2 - Peptide Mode 1 peptide/protein high spectrum threshold

45 Option 2: Use Multiple Filters Advantages: More sensitive! Disadvantages: Pretty arbitrary!

46 Option 3: Assigning Proteins from Peptide IDs Correcting for One-Hit-Wonders Protein False Discovery Rates? Correcting for Shared Peptides Publication Standards

47 #AccessionProtein Score 1P0ABH P0ABJ P0A7S P0ACF P0AES P P0AG P P rev_P P P P P0A P0AGG P rev_P0AEQ rev_P P0A9S P0AE P P rev_P rev_P0A6E

48 #AccessionProtein Score 1P0ABH P0ABJ P0A7S P0ACF P0AES P P0AG P P rev_P P P P P0A P0AGG P rev_P0AEQ rev_P P0A9S P0AE P P rev_P rev_P0A6E

49 Protein FDRs only accurate with >100 Proteins Number of Confidently IDed Proteins Uncertainty in Protein FDR 1% Error In FDR Estimation

50 Histogram of Decoy PROTEIN Matches Protein Score # Protein Identifications “Correct” “2x Decoy”

51 Instantaneous Protein FDRs… Estimate the likelihood that a single protein of interest is present Are trouble at best due to stochastic sampling Shouldn’t be used with <500 likely proteins –Better off calculating protein probabilities using a model like ProteinProphet

52 Proteins don’t exist in isolation

53 Outline Assigning Proteins from Peptide IDs Correcting for One-Hit-Wonders Protein False Discovery Rates? Correcting for Shared Peptides Publication Standards

54 Nesvizhskii, A. I.; Aebersold, R. Mol. Cell. Proteom. 4.10, , 2005

55 Nesvizhskii, A. I.; Aebersold, R. Mol. Cell. Proteom. 4.10, , 2005

56 Nesvizhskii, A. I.; Aebersold, R. Mol. Cell. Proteom. 4.10, , 2005

57 Tubulin alpha 6 Tubulin alpha 3 YMACCLLYR Tubulin alpha 4 85% ??%

58 Tubulin alpha 6 Tubulin alpha 3 YMACCLLYR Tubulin alpha 4 85% Nesvizhskii, A. I.; Keller, A. et al Anal. Chem. 75,

59 Tubulin alpha 6 Tubulin alpha 3 YMACCLLYR SIQFVDWCPTGFK Tubulin alpha 4 ??%

60 Tubulin alpha 6 Tubulin alpha 3 YMACCLLYR SIQFVDWCPTGFK Tubulin alpha 4

61 Peptide 1Peptide 2 Peptide 3Peptide 4 Protein B Protein A Distinct Proteins 100%

62 Peptide 1Peptide 2Peptide 3Peptide 4 Peptide 1Peptide 2Peptide 3Peptide 4 Protein B Protein A Indistinguishable Proteins 50%

63 Peptide 1Peptide 2Peptide 3 Peptide 2Peptide 3Peptide 4 Protein B Protein A Differentiable Proteins 100%50% 100%

64 Peptide 1Peptide 2Peptide 3Peptide 4 Peptide 2Peptide 3Peptide 4 Protein B Protein A Subset Proteins 100% 0%

65

66 Indistinguishable

67 Differentiable

68 Subset

69 Peptide 1Peptide 2Peptide 3Peptide 4 Peptide 2Peptide 3Peptide 4 Protein B Protein A The Quantitative Subset Complication

70 Peptide 1Peptide 2Peptide 3Peptide 4 Peptide 2Peptide 3Peptide 4 Protein B Protein A The Quantitative Subset Complication

71 Peptide 1Peptide 2Peptide 3Peptide 4 Peptide 2Peptide 3Peptide 4 Protein B Protein A The Quantitative Subset Complication ?

72 Peptide 1Peptide 2Peptide 3Peptide 4 Peptide 2Peptide 3Peptide 4 Protein B Protein A The Quantitative Subset Complication ?

73 EAFIDHGEEFSGR GSFPMAE K NLGMGK Specific to 2c29 Specific to 2c40 Common to both Ratio ≈ 1.1 P450 2c40P450 2c29 Ratio ≈ 1.6Ratio ≈ 2.2

74

75 The Hidden Subset Complication Peptide 1 Protein B Protein A Peptide 2 Peptide 3 Peptide 2 Peptide 3Peptide 4 Protein C

76 The Hidden Subset Complication Peptide 1 Protein B Protein A Peptide 2 Peptide 3 Peptide 2 Peptide 3Peptide 4 Protein C 100%

77 The Hidden Subset Complication Peptide 1 Protein B Protein A Peptide 2 Peptide 3 Peptide 2 Peptide 3Peptide 4 Protein C 100% 0% 100 %

78 The Bold Red Complication Peptide 1 Protein B Protein A Peptide 2Peptide 3Peptide 4 Peptide 3Peptide 4Peptide 5

79 The Bold Red Complication Peptide 1 Protein B Protein A 100% Peptide 2Peptide 3Peptide 4 Peptide 3Peptide 4Peptide 5 100% 0% 100%

80 The Bold Red Complication Peptide 1 Protein B Protein A 100% Peptide 2Peptide 3Peptide 4 Peptide 3Peptide 4Peptide 5 100% 0% 100% ?

81 The Bold Red Complication Peptide 1 Protein B Protein A Peptide 2Peptide 3Peptide 4 Peptide 3Peptide 4Peptide 5 Protein IdentificationUnique PeptidesTrust Family of A and B5 Unique, 5 TotalHigh Definitive ID of Protein A2 Unique, 4 TotalMed Definitive ID of Protein B1 Unique, 3 TotalLow

82 The Similar Peptide Complication AVGNLR Scan Number: 2435 GLGNLR

83 The Similar Peptide Complication AVGNLR Scan Number: 2435 TLR9_HUMAN GLGNLR TRFE_HUMAN LRFN1_HUMAN

84 The Similar Peptide Complication AVGNLR Scan Number: 2435 TLR9_HUMAN TRFE_HUMAN LRFN1_HUMAN

85 No software deals with all of these issues

86 Outline Assigning Proteins from Peptide IDs Correcting for One-Hit-Wonders Protein False Discovery Rates? Correcting for Shared Peptides Publication Standards

87 In 2006 MCP published guidelines for reporting peptide and protein identifications Other proteomics journals have adopted similar standards Revised “Paris 2” guidelines are forthcoming Expected to be enforced 1/1/2010!

88 Guidelines remind you: To present a complete methods/results section I. Search Parameters and Acceptance Criteria VI. Raw Data Submission

89 Guidelines remind you: To present a complete methods/results section I. Search Parameters and Acceptance Criteria VI. Raw Data Submission Follow smart criteria for choosing results to publish II. Protein and Peptide Identification IV. Protein Inference from Peptide Assignments V. Quantification

90 Guidelines remind you: To present a complete methods/results section I. Search Parameters and Acceptance Criteria VI. Raw Data Submission Follow smart criteria for choosing results to publish II. Protein and Peptide Identification IV. Protein Inference from Peptide Assignments V. Quantification To not over-report your results III. Post-Translational Modifications

91 Software Can Make Guideline Fulfillment Easier Peak picking software, version, altered parameters Database Selection –Database name and version –Species restriction –Number of proteins searched Database search parameters –Search engine name and version –Enzyme specificity –# missed cleavages –Fixed/variable modifications –Mass tolerances Peptide selection criteria

92 XML Standards Can Make Guideline Fulfillment Easier I.Search Parameters and Acceptance Criteria II.Protein and Peptide Identification III.Post-Translational Modifications IV.Protein Inference from Peptide Assignments V.Quantification VI.Raw Data Submission mzIdentML mzML

93 XML Standards Can Make Guideline Fulfillment Easier I.Search Parameters and Acceptance Criteria II.Protein and Peptide Identification III.Post-Translational Modifications IV.Protein Inference from Peptide Assignments V.Quantification VI.Raw Data Submission mzIdentML mzML

94 Where are they? Molecular & Cellular Proteomics: Bradshaw, R. A., Burlingame, A. L., Carr, S., Aebersold, R., Reporting Protein Identification Data: The next Generation of Guidelines. Mol. Cell. Proteomics, 5: , Journal of Proteome Research: Beavis, R., Editorial: The Paris Consensus. J. Proteome Res., 2005, 4 (5), p 1475 Proteomics: Wilkins, M. R., Appel, R. D., Van Eyk, J. E., Maxey, C. M., et al., Guidelines for the next 10 years of proteomics. Proteomics. 2006, 6, 1, 4-8.

95 Conclusions We identify Proteins (not Peptides)! –Can’t stop at Peptide FDRs and Probabilities

96 Conclusions We identify Proteins (not Peptides)! –Can’t stop at Peptide FDRs and Probabilities One-Hit-Wonders are often wrong and need to be seriously investigated (manually or mathematically)

97 Conclusions We identify Proteins (not Peptides)! –Can’t stop at Peptide FDRs and Probabilities One-Hit-Wonders are often wrong and need to be seriously investigated (manually or mathematically) You can compute Protein level FDRs –But take them with a grain of salt!

98 Conclusions We identify Proteins (not Peptides)! –Can’t stop at Peptide FDRs and Probabilities One-Hit-Wonders are often wrong and need to be seriously investigated (manually or mathematically) You can compute Protein level FDRs –But take them with a grain of salt! Occam’s Razor can simplify Shared Peptides

99 Conclusions We identify Proteins (not Peptides)! –Can’t stop at Peptide FDRs and Probabilities One-Hit-Wonders are often wrong and need to be seriously investigated (manually or mathematically) You can compute Protein level FDRs –But take them with a grain of salt! Occam’s Razor can simplify Shared Peptides Publication Standards exist to help you

100 Conclusions We identify Proteins (not Peptides)! –Can’t stop at Peptide FDRs and Probabilities One-Hit-Wonders are often wrong and need to be seriously investigated (manually or mathematically) You can compute Protein level FDRs –But take them with a grain of salt! Occam’s Razor can simplify Shared Peptides Publication Standards exist to help you


Download ppt "Reporting Protein Identifications from MS/MS Results Brian C. Searle Proteome Software Inc. Portland, Oregon USA Creative."

Similar presentations


Ads by Google