iPRG 2012: A Study on Detecting Modified Peptides in a Complex Mixture

Slides:



Advertisements
Similar presentations
You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Advertisements

Using Matrices in Real Life
Advanced Piloting Cruise Plot.
Copyright © 2002 Pearson Education, Inc. Slide 1.
Chapter 1 The Study of Body Function Image PowerPoint
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Document #07-12G 1 RXQ Customer Enrollment Using a Registration Agent Process Flow Diagram (Switch) Customer Supplier Customer authorizes Enrollment.
Document #07-12G 1 RXQ Customer Enrollment Using a Registration Agent Process Flow Diagram (Switch) Customer Supplier Customer authorizes Enrollment.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
1 Community Right to Know Electronic Reporting Bruce Boyd Tina Gutierrez & Latoshia Parker Office of Pollution Prevention and Right to Know.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
Determine Eligibility Chapter 4. Determine Eligibility 4-2 Objectives Search for Customer on database Enter application signed date and eligibility determination.
Create an Application Title 1A - Adult Chapter 3.
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
Multiplying binomials You will have 20 seconds to answer each of the following multiplication problems. If you get hung up, go to the next problem when.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Addition Facts
Year 6 mental test 5 second questions
Around the World AdditionSubtraction MultiplicationDivision AdditionSubtraction MultiplicationDivision.
ZMQS ZMQS
Integrify 5.0 Tutorial : Creating a New Process
Modification Site Localization Why is this a problem? Calculating localization reliability Ways of representing reliability Modification ambiguity.
Webinar for Inexperienced Registrants Creation of the Substance Dataset and of the Registration Dossier Ricardo Simoes ECHA – Registration &
Richmond House, Liverpool (1) 26 th January 2004.
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
1 of Audience Survey Results Larry D. Gustke, Ph.D. – October 5, 2013.
ABC Technology Project
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
VOORBLAD.
15. Oktober Oktober Oktober 2012.
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
CREATING A PAYMENT REQUEST FOR A NEW VENDOR
© 2010 Cisco and/or its affiliates. All rights reserved.Presentation_IDCisco Confidential CISCO LEARNING CREDITS MANAGEMENT TOOL CLP ADMINISTRATOR – USER.
Sample Service Screenshots Enterprise Cloud Service 11.3.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Chapter 5 Microsoft Excel 2007 Window
Squares and Square Root WALK. Solve each problem REVIEW:
Do you have the Maths Factor?. Maths Can you beat this term’s Maths Challenge?
© 2012 National Heart Foundation of Australia. Slide 2.
Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Chapter 5 Test Review Sections 5-1 through 5-4.
SIMOCODE-DP Software.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Addition 1’s to 20.
25 seconds left…...
Subtraction: Adding UP
U1A L1 Examples FACTORING REVIEW EXAMPLES.
H to shape fully developed personality to shape fully developed personality for successful application in life for successful.
Slide 1 of 29 Community news Slide 2 of 29 Nouvelles de la communauté…
Januar MDMDFSSMDMDFSSS
Week 1.
Sedex: Registration and Account Set Up Instructions
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
A SMALL TRUTH TO MAKE LIFE 100%
1 Unit 1 Kinematics Chapter 1 Day
PSSA Preparation.
CpSc 3220 Designing a Database

In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics.
Proteomics Informatics –
Presentation transcript:

iPRG 2012: A Study on Detecting Modified Peptides in a Complex Mixture ABRF 2012, Orlando, FL 3/17-20/2012

iPRG2012 Study: DESIGN

Study Goals Primary: Evaluate the ability of participants to identify modified peptides present in a complex mixture Secondary: Find out why result sets might differ between participants Tertiary: Produce a benchmark dataset, along with an analysis resource

Study Design Use a common, rich dataset Use a common sequence database Allow participants to use the bioinformatic tools and methods of their choosing Use a common reporting template Report results at an estimated 1% FDR (at the spectrum level) Ignore protein inference

Sample Tryptic digest of yeast (RM8323 – NIST), spiked with 69 synthetic modified peptides (tryptic peptides from 6 different proteins – sPRG) Phospho (STY) Sulfo (Y) Mono-, di-, trimethyl (K) Mono-, dimethyl (R) Acetyl (K) Nitro (Y)

Supplied Study Materials 5600 TripleTOF dataset (i.e. WIFF file) WIFF, mzML, dta, MGF (de-isotoped);– conversions by MS Data Converter 1.1.0 MGF (not de-isotoped – conversion by Mascot Distiller 2.4) 1 fasta file (UniProtKB/SwissProt S. cerevisiae, human, + 1 bovine protein + trypsin from Dec. 2011) 1 template (Excel) 1 on-line survey (Survey Monkey)

Instructions to Participants Retrieve and analyze the data file in the format of your choosing, with the method(s) of your choosing Report the peptide to spectrum matches in the provided template Report measures of reliability for PTM site assignments (optional) Fill out the survey Attach a 1-2 page description of the methodology employed

iPRG 2012 STUDY: PARTICIPATION

Soliciting Participants and Logistics Study advertised on the ABRF website and listserv and by direct invitation from iPRG members 1. Email participation request to ‘iPRGxxxx@gmail.com’ Participant 2. Send official study letter with instructions iPRG members Questions / Answers 3. All further communication (e.g., questions, submission) through ‘iPRGxxx.anonymous@gmail.com’ “Anonymizer” 9

Participants (i) – overall numbers 24 submissions One participant submitted two result sets 9 initialed iPRG member submissions (with appended ‘i’) 2 vendor submissions (identifiable by appended ‘v’)

About the Participant 11

About the Participant’s Lab 12

Participation in sPRG Study Only one participant indicated he used sPRG information to aid his analysis. This person was one of the least successful in identifying the spiked-in peptides!

Search Engine Used 14

Site Localization Software 4 participants did not list using software for site localization.

Summary of Submitted Results Only reported modified peptides

Summary of IDs and Localizations Peptide Identification in all Spectra Site Localization in Spectra With Interesting Modifications There is a very wide range in the total number of spectra with identified peptides. Once one focuses only on the spectra containing modifcations for which the ability to localize the modification to a particular residue, the range is much narrower. The 5 rightmost participants went so far as to reported only spectra of modified peptides.

Overlap of spectrum identifications 7840 agreed on by 3 or more participants We selected 3 participants agreeing as the threshold for denoting consensus agreement. Consensus requires agreement on sequence, so do note that this still allows for disagreement on modification localization.

Room for improvement in thresholding? 33564 93128i 11211 58409 87133i 94158i 97053i 42424i 77777i 23068 40104i 87048i 92653 34284i 23117 74564 14151 52781 47603 14152 45511 11821 Peaklist mgf mzML mgf_nd   WIFF Spectral Pre-Processing Pk PPi MDi Ot SM pF PD PkDB PW Sq Peptide Identification By M O PPr MG P/PP ST MM TPP XT IH Discovery of Unexpected Mods MO Site Localization MDe AS An Sc Results Filtering IP XL PR NTT 2 1 ? Experience 5-10 years >10 years < 1 year 3-4 years 1-2 years The green (NS) bars represent the room for improving confidence threshold setting. If one could improve the decision making about confidence in a peptide spectral match, without increasing FDR then the sum of the heights of the blue (YS) and green (NS) bars appears to be within reach for many participants to substantially improve their overall identification totals. The gray bar corresponds to spectra, which for that participant were reported as Peptide Identification Certainty N and their reported sequence differs from the consensus sequence. These spectra are part of the consensus set for which 3 or more participants reported Peptide Identification Certainty Y and agreed on the sequence, but may have differed on site localization if modified. I.e. they are part of the blue bar of at least 3 other participants. Participants who allowed for semi-tryptic peptides generally were toward the left of the plot, although only about 2 % of the consensus results were semi-tryptic. An Andromeda/MaxQuant MG MS-GFDB pF pFind Sc Scaffold AS A-Score MM MyriMatch Pk PEAKS SM Spectrum Mill By Byonics MO MODa PkDB PEAKSDB Sq Sequest IH In-house software O OMSSA PPi Protein Pilot ST SpectraST IP IDPicker Ot Other PPr Protein Prospector TPP TransProteomic Pipeline M Mascot P/PP Pep/Prot Prophet PR PhosphoRS XL Excel MDe Mascot Delta Score PD ProteomeDiscoverer PW ProteoWizard XT X!Tandem MDi Mascot Distiller

ESR and FDR Extraordinary Skill Rate or High False Discovery Rate? ESR + FDR = 100* (Y<3P+YD)/total ids Y 24 participants 3 for consensus When particular identifications are reported by less than 3 participants it is difficult to tell whether that represents extraordinary skill or just another false positive. On the other hand, disagreement with the consensus (YD) is more likely to indicate a wrong answer. Consequently, the YD rate serves as a surrogate for the minimum FDR level. Note that many participants, especially those to the far right tend to have a YD rate much greater than 1%, which suggests they have underestimated their FDR level. Although the study was requested to be performed at 1% FDR, these far right participants reported much lower total numbers of ids. The 5 rightmost participants reported only modified peptides.

Characteristics of consensus spectra 7840 spectra >=3 participants agreeing on sequence The total numbers on the bars is > 7840 because some of the consensus spectra contain more than 1 modification. Consensus requires agreement on Sequence, but not modification localization

Peak lists Two types of peak lists were supplied Deisotoped and non deisotoped Can only tell fragment charge state from non-deisotoped Requires search engine to be able to de-isotope spectrum

Peaklists Number of spectra with undefined precursor charge state Deisotoped 1031 (304 in consensus results) Non-deisotoped 6094 (1140 in consensus results) For 1013 out of 7840 consensus spectra the precursor m/z differ by greater than 0.02 Da between deisotoped and non-deisotoped peak list. For 238 consensus spectra the peak lists had different specified charge state 193 consensus results only possible with deisotoped peak list 45 consensus results only possible with non-deisotoped peak list For 19 consensus results multiple people who searched the nd peak list agreed on a confident different answer For 4 consensus results multiple people who searched the deisotoped peak list agreed on a confident different answer

Mixed Spectra 465.19 2+ 464.59 3+ 465.19 2+ Deisotoped peaklist Non-deisotoped peaklist

Synthetic Peptide ID by Peptide Trimethyl Sulfo Methyl (K) Methyl (R) Phospho Dimethyl (K) Counts correspond to the number of participants reporting at least 1 PSM for the spiked synthetic peptide modified with the correct localization reported (case sensitive string compare) and the correct modification name. The localization certainty may have been reported as either Y or N. PSM’s containing modifcation of residues other than s,t,y,k,r were excluded. Dimethyl (R) Acetyl Nitro # participants # participants

Synthetic Peptide ID by Participant 71755v 58288v 33564 93128i 11211 58409 87133i 94158i 97053i 42424i 77777i 23068 40104i 87048i 92653 34284i 23117 74564 14151 52781 47603 14152 45511 11821 Acetyl (K) 1 Dimethyl (K) Dimethyl (R) Methyl (K) Methyl (R) Nitro (Y) Phospho (STY) Sulfo (Y) Trimethyl (K) Red corresponds to the presence of at least 1 PSM for a spiked synthetic peptide modified with the correct localization reported (case sensitive string compare) and the correct modification name. The localization certainty may have been reported as either Y or N. PSM’s containing modifcation of residues other than s,t,y,k,r were excluded.

Correct Localization of Modified Synthetic Peptides 70 synthetic modified peptides were spiked into sample. 7 of these were confidently found by no participant Correct localization & name of modification reported

FLR of Modified Synthetic Peptides FLR = 100% * # PSMs wrong localization of s,t,y,k,r # PSMs wrong + right localization of s,t,y,k,r Ignored PSMs contain mods of residues other than s,t,y,k,r . Sample handling mods (n,q,d,e, etc). Values below second chart are participant’s estimate of the reliability of their site localizations. 5% 1% 1-2% <1 0.5 10% <30% 0.01 <5% <1%

Incorrect Localization by Peptide Number of PSM’s with Incorrect Site Localization – Mod Loc Confidence Y Present as sulfo-Tyr Present as phospho S-10 often mislocalized as S-12 or Y-14 Present as mono, di, tri methyl K often mislocalized at R 71755v 58288v 33564 93128i 11211 58409 87133i 94158i 97053i 42424i 77777i 23068 40104i 87048i 92653 34284i 23117 74564 14151 52781 47603 14152 45511 11821 EKLLDFIK AEGSEIRLAK 1 VDATEESDLAQQYGVR TITLEVEPSDTIENVK ESTLHLVLR AEFAEVSK LKLVSELWDAGIK DQGGELLSLR TYETTLEK NGDTASPKEYTAGR 4 LKAEGSEIR TVIDYNGER 3 ADEGISFR YKPESDELTAEK GTRDYSPR VPQVSTPTLVEVSR ADEGISFRGLFIIDDK ALAPEYAK TIAQDYGVLK 2 THILLFLPKSVSDYEGK 5 6 8 WVTFISLLFLFSSAYSR IFSIVEQR TLSDYNIQK GILRQITVNDLPVGR NVAVDELSR LDELRDEGK ESTLHLVLRLR DEGKASSAK SVSDYEGK LVQAFQFTDK LVNEVTEFAK GLFIIDDKGILR FPKAEFAEVSK LKAQLGPDESK DISLSDYK FKDLGEENFK Incorrect localizations were limited to spectra from a few particular peptides.

Phospho vs Sulfo DISLSDY(Phospho)K Observe modified fragment ions. DISLSDY(Sulfo)K Observe modified fragment ions. Observe ‘unmodified’ fragment ions. Spectrum looks essentially identical to unmodified peptide spectrum

Conclusions Reasonable number of participants from around the globe, mainly experienced users but a few first-timers Large spread in number of spectra identified False negatives (NS) are generally much higher than false positives, so there is generally room for improvement Peak list was a significant factor on performance Varied performance in detecting PTMs Most participants struggled with sulfation Multiply phosphorylated harder to find than singly Most common errors in site assignment were: Reporting sulfo(Y) as phospho(ST) Mis-assignment of site/s in multiply phosphorylated peptides

What did the participants think? “The spiked proteins made it possible to game the study - look for the uncommon modifications only on the spikes. Of course we didn't do this. Overall I'd say this was a flawed but very interesting ABRF study.” 22 out of 24 participants found the study useful “Too many modifications at the same time. Manual validation is necessary and the right time necessary for this study is too demanding for this challenge.”

Participant’s Confidence in Analyzing PTM Data Before After 33

How difficult do you think this study was? What was your total analysis time for the entire project?

Based on this study, would you consider participating in future ABRF studies?

THANK YOU TO ALL STUDY PARTICIPANTS! Thank you! Questions? THANK YOU TO ALL STUDY PARTICIPANTS! iPRG Nuno Bandeira Robert Chalkley(chair) Matt Chambers Karl Clauser John Cottrell Eric Deutsch Eugene Kapp Henry Lam Hayes McDonald Tom Neubert (EB liaison) Ruixiang Sun Dataset Creation Chris Colangelo Anonymizer: Jeremy Carver, UCSD