3 Study GoalsPrimary: Evaluate the ability of participants to identify modified peptides present in a complex mixtureSecondary: Find out why result sets might differ between participantsTertiary: Produce a benchmark dataset, along with an analysis resource
4 Study Design Use a common, rich dataset Use a common sequence database Allow participants to use the bioinformatic tools and methods of their choosingUse a common reporting templateReport results at an estimated 1% FDR (at the spectrum level)Ignore protein inference
5 SampleTryptic digest of yeast (RM8323 – NIST), spiked with 69 synthetic modified peptides (tryptic peptides from 6 different proteins – sPRG)Phospho (STY)Sulfo (Y)Mono-, di-, trimethyl (K)Mono-, dimethyl (R)Acetyl (K)Nitro (Y)
6 Supplied Study Materials 5600 TripleTOF dataset (i.e. WIFF file)WIFF, mzML, dta, MGF (de-isotoped);– conversions by MS Data Converter 1.1.0MGF (not de-isotoped – conversion by Mascot Distiller 2.4)1 fasta file (UniProtKB/SwissProt S. cerevisiae, human, + 1 bovine protein + trypsin from Dec. 2011)1 template (Excel)1 on-line survey (Survey Monkey)
7 Instructions to Participants Retrieve and analyze the data file in the format of your choosing, with the method(s) of your choosingReport the peptide to spectrum matches in the provided templateReport measures of reliability for PTM site assignments (optional)Fill out the surveyAttach a 1-2 page description of the methodology employed
9 Soliciting Participants and Logistics Study advertised on the ABRF website and listserv and by direct invitation from iPRG members1. participation request toParticipant2. Send official study letter with instructionsiPRG membersQuestions / Answers3. All further communication (e.g., questions, submission) through“Anonymizer”9
10 Participants (i) – overall numbers 24 submissionsOne participant submitted two result sets9 initialed iPRG member submissions (with appended ‘i’)2 vendor submissions (identifiable by appended ‘v’)
15 Site Localization Software 4 participants did not list using software for site localization.
16 Summary of Submitted Results Only reportedmodified peptides
17 Summary of IDs and Localizations Peptide Identificationin all SpectraSite Localization in SpectraWith Interesting ModificationsThere is a very wide range in the total number of spectra with identified peptides. Once one focuses only on the spectra containing modifcations for which the ability to localize the modification to a particular residue, the range is much narrower. The 5 rightmost participants went so far as to reported only spectra of modified peptides.
18 Overlap of spectrum identifications 7840 agreed on by 3 or more participantsWe selected 3 participants agreeing as the threshold for denoting consensus agreement. Consensus requires agreement on sequence, so do note that this still allows for disagreement on modification localization.
19 Room for improvement in thresholding? 3356493128i112115840987133i94158i97053i42424i77777i2306840104i87048i9265334284i2311774564141515278147603141524551111821PeaklistmgfmzMLmgf_ndWIFFSpectral Pre-ProcessingPkPPiMDiOtSMpFPDPkDBPWSqPeptide IdentificationByMOPPrMGP/PPSTMMTPPXTIHDiscovery of Unexpected ModsMOSite LocalizationMDeASAnScResults FilteringIPXLPRNTT21?Experience5-10 years>10 years< 1 year3-4 years1-2 yearsThe green (NS) bars represent the room for improving confidence threshold setting. If one could improve the decision making about confidence in a peptide spectral match, without increasing FDR then the sum of the heights of the blue (YS) and green (NS) bars appears to be within reach for many participants to substantially improve their overall identification totals. The gray bar corresponds to spectra, which for that participant were reported as Peptide Identification Certainty N and their reported sequence differs from the consensus sequence. These spectra are part of the consensus set for which 3 or more participants reported Peptide Identification Certainty Y and agreed on the sequence, but may have differed on site localization if modified. I.e. they are part of the blue bar of at least 3 other participants. Participants who allowed for semi-tryptic peptides generally were toward the left of the plot, although only about 2 % of the consensus results were semi-tryptic.AnAndromeda/MaxQuantMGMS-GFDBpFpFindScScaffoldASA-ScoreMMMyriMatchPkPEAKSSMSpectrum MillByByonicsMOMODaPkDBPEAKSDBSqSequestIHIn-house softwareOOMSSAPPiProtein PilotSTSpectraSTIPIDPickerOtOtherPPrProtein ProspectorTPPTransProteomic PipelineMMascotP/PPPep/Prot ProphetPRPhosphoRSXLExcelMDeMascot Delta ScorePDProteomeDiscovererPWProteoWizardXTX!TandemMDiMascot Distiller
20 ESR and FDRExtraordinary Skill Rate or High False Discovery Rate? ESR + FDR = 100* (Y<3P+YD)/total ids Y24 participants3 for consensusWhen particular identifications are reported by less than 3 participants it is difficult to tell whether that represents extraordinary skill or just another false positive. On the other hand, disagreement with the consensus (YD) is more likely to indicate a wrong answer. Consequently, the YD rate serves as a surrogate for the minimum FDR level. Note that many participants, especially those to the far right tend to have a YD rate much greater than 1%, which suggests they have underestimated their FDR level. Although the study was requested to be performed at 1% FDR, these far right participants reported much lower total numbers of ids. The 5 rightmost participants reported only modified peptides.
21 Characteristics of consensus spectra 7840 spectra >=3 participants agreeing on sequenceThe total numbers on the bars is > 7840 because some of the consensus spectra contain more than 1 modification.Consensus requires agreement onSequence, but not modification localization
22 Peak lists Two types of peak lists were supplied Deisotoped and non deisotopedCan only tell fragment charge state from non-deisotopedRequires search engine to be able to de-isotope spectrum
23 Peaklists Number of spectra with undefined precursor charge state Deisotoped (304 in consensus results)Non-deisotoped (1140 in consensus results)For 1013 out of 7840 consensus spectra the precursor m/z differ by greater than 0.02 Da between deisotoped and non-deisotoped peak list.For 238 consensus spectra the peak lists had different specified charge state193 consensus results only possible with deisotoped peak list45 consensus results only possible with non-deisotoped peak listFor 19 consensus results multiple people who searched the nd peak list agreed on a confident different answerFor 4 consensus results multiple people who searched the deisotoped peak list agreed on a confident different answer
25 Synthetic Peptide ID by Peptide TrimethylSulfoMethyl (K)Methyl (R)PhosphoDimethyl (K)Counts correspond to the number of participants reporting at least 1 PSM for the spiked synthetic peptide modified with the correct localization reported (case sensitive string compare) and the correct modification name. The localization certainty may have been reported as either Y or N. PSM’s containing modifcation of residues other than s,t,y,k,r were excluded.Dimethyl (R)AcetylNitro# participants# participants
26 Synthetic Peptide ID by Participant 71755v58288v3356493128i112115840987133i94158i97053i42424i77777i2306840104i87048i9265334284i2311774564141515278147603141524551111821Acetyl (K)1Dimethyl (K)Dimethyl (R)Methyl (K)Methyl (R)Nitro (Y)Phospho (STY)Sulfo (Y)Trimethyl (K)Red corresponds to the presence of at least 1 PSM for a spiked synthetic peptide modified with the correct localization reported (case sensitive string compare) and the correct modification name. The localization certainty may have been reported as either Y or N. PSM’s containing modifcation of residues other than s,t,y,k,r were excluded.
27 Correct Localization of Modified Synthetic Peptides 70 synthetic modified peptides were spiked into sample.7 of these were confidently found by no participantCorrect localization & nameof modification reported
28 FLR of Modified Synthetic Peptides FLR = 100% * # PSMs wrong localization of s,t,y,k,r# PSMs wrong + right localization of s,t,y,k,rIgnored PSMs contain mods of residues other than s,t,y,k,r . Sample handling mods (n,q,d,e, etc).Values below second chart are participant’s estimate of the reliability of their site localizations.5%1%1-2%<10.510%<30%0.01<5%<1%
29 Incorrect Localization by Peptide Number of PSM’s with Incorrect Site Localization – Mod Loc Confidence YPresent as sulfo-TyrPresent as phospho S-10 often mislocalized as S-12 or Y-14Present as mono, di, tri methyl K often mislocalized at R71755v58288v3356493128i112115840987133i94158i97053i42424i77777i2306840104i87048i9265334284i2311774564141515278147603141524551111821EKLLDFIKAEGSEIRLAK1VDATEESDLAQQYGVRTITLEVEPSDTIENVKESTLHLVLRAEFAEVSKLKLVSELWDAGIKDQGGELLSLRTYETTLEKNGDTASPKEYTAGR4LKAEGSEIRTVIDYNGER3ADEGISFRYKPESDELTAEKGTRDYSPRVPQVSTPTLVEVSRADEGISFRGLFIIDDKALAPEYAKTIAQDYGVLK2THILLFLPKSVSDYEGK568WVTFISLLFLFSSAYSRIFSIVEQRTLSDYNIQKGILRQITVNDLPVGRNVAVDELSRLDELRDEGKESTLHLVLRLRDEGKASSAKSVSDYEGKLVQAFQFTDKLVNEVTEFAKGLFIIDDKGILRFPKAEFAEVSKLKAQLGPDESKDISLSDYKFKDLGEENFKIncorrect localizations were limited to spectra from a few particular peptides.
30 Phospho vs Sulfo DISLSDY(Phospho)K Observe modified fragment ions. DISLSDY(Sulfo)KObserve modified fragmentions.Observe ‘unmodified’fragment ions.Spectrum looks essentiallyidentical to unmodifiedpeptide spectrum
31 ConclusionsReasonable number of participants from around the globe, mainly experienced users but a few first-timersLarge spread in number of spectra identifiedFalse negatives (NS) are generally much higher than false positives, so there is generally room for improvementPeak list was a significant factor on performanceVaried performance in detecting PTMsMost participants struggled with sulfationMultiply phosphorylated harder to find than singlyMost common errors in site assignment were:Reporting sulfo(Y) as phospho(ST)Mis-assignment of site/s in multiply phosphorylated peptides
32 What did the participants think? “The spiked proteins made it possible to game the study - look for the uncommon modifications only on the spikes. Of course we didn't do this. Overall I'd say this was a flawed but very interesting ABRF study.”22 out of 24 participants found the study useful“Too many modifications at the same time. Manual validation is necessary and the right time necessary for this study is too demanding for this challenge.”
33 Participant’s Confidence in Analyzing PTM Data BeforeAfter33
34 How difficult do you think this study was? What was your total analysis time for the entire project?
35 Based on this study, would you consider participating in future ABRF studies?
36 THANK YOU TO ALL STUDY PARTICIPANTS! Thank you! Questions?THANK YOU TO ALLSTUDY PARTICIPANTS!iPRGNuno BandeiraRobert Chalkley(chair)Matt ChambersKarl ClauserJohn CottrellEric DeutschEugene KappHenry LamHayes McDonaldTom Neubert (EB liaison)Ruixiang SunDataset CreationChris ColangeloAnonymizer:Jeremy Carver, UCSD