Presentation on theme: "Modification Site Localization Why is this a problem? Calculating localization reliability Ways of representing reliability Modification ambiguity."— Presentation transcript:
Modification Site Localization Why is this a problem? Calculating localization reliability Ways of representing reliability Modification ambiguity
PTM Analysis: An Exploding Field Large-scale PTM characterization studies are now common Phosphorylation O-GlcNAcylation Acetylation … Database search engines can identify modified peptides and report a measure of reliability for peptide IDs Peptide Level: p-value; e-value Dataset Level: FDR Most search engines do not assess modification site assignment reliability. No standard FLR calculation method
Search Engine Performance for Site Assignment Database search engines are optimized for peptide identification Optimal parameters for discriminating between correct and random answers are not same as for site identification More peaks may be needed for site assignment Reliability of modified peptide identifications is higher than PTM site assignments What most search engines do: Report site consistent with data May be more than one site equally consistent with the data No information about how reliable site assignment is Bradshaw et al. J Mass Spectrom (2010)
There are Mistakes In The Literature There are several large-scale PTM datasets where site assignment was by manual verification. Did authors carefully look at spectra? Results from publications are used to populate other databases SwissProt Phosphosite
Evidence for Serine 486 Phosphorylation Spectrum from publication reporting unambiguous assignment of serine 4 (serine 487) phosphorylation. Annotated spectra associated with publications are useful!
Why I highlighted this example I found this modification site in my own data in 2006 SwissProt Entry of this protein in 2006
Site Assignment Scoring Methods (1) Probability of randomly observing a given peak A-Score (Gygi) PTM Score (Mann) Probability calculation based on unit mass measurement and assuming all masses equally possible at random: e.g. if considering 4 peaks per 100 Da, then probability of random match of a given peak is 4% A-score is a number; PTM score reports a probability How valid are these assumptions? Nominal mass may be appropriate for poor mass accuracy ion trap data, but not for high mass accuracy data Could adjust probability calculation to more mass bins All masses are not equally probable; e.g. for b ions: 201 – EA, LP, IP, TV204 – Not possible 202 – NS205 – FG, CT 203 – MA, CV, TT 206 – Not possible
Score/probability difference Compare search engine probabilities for peptide IDs with different site assignments Mascot Delta Score SLIP Score e.g. Top scoring assignment:E-value: 1E-5 Next best site assignment: E-value 1E-4; SLIP score=10 Next best site assignment: E-value 1E-3; SLIP score=20 Advantages: Can be calculated as part of database search Accounts for variation of probability of observing different masses If search engine makes use of mass accuracy, score will adjust to data of different mass accuracy Site Assignment Scoring Methods (2)
Assessing Reliability of Site Localization Scoring Data from 180 synthetic phosphopeptides Tested with wide range of fragmentation data (CID, HCD, ETD, MSA…) Comparison of Mascot Delta Score to A-score SLIP Score in Protein Prospector PhosphoRS used different set of synthetic phosphopeptides Savitski et al. Mol Cell Proteomics (2011) M
SLIP Score vs A-Score vs MD-Score Dataset: QTOF Micro CID Data of 180 synthetic phosphopeptides 1 Modification sites known Data Searched by Mascot: 2174 correct spectra matches Data Searched by PP:2334 correct spectra matches Baker et al. Mol Cell Proteomics (2011) M SLIP ScoreA-ScoreMD-Score Site IDs Incorrect Sites FLR6.3%8.7%10.9% 1 Site Possible Ambiguous220
SLIP Score Decoy Sites for Estimating PEP (Local FLR) Test Dataset: Synaptic phosphopeptides acquired in LTQ-Orbitrap Velos (IT-CID): 70,000 phosphopeptide spectra identified Altered Batch-Tag to allow for phosphorylation of Pro and Glu Filtered results to only phosphopeptide IDs containing one S, T or Y Modification site known Local FLR: SLIP score of 6 = 95% correct Global FLR (matches to phosphoP and phosphoE) similar to QTOF Micro data. Similar score threshold appropriate for ion trap CID and quadrupole CID data
Representing Ambiguity VATVSVLATR – Singly phosphorylated Best site assignment with associated score. No information as to which is second best site. Example software: A-Score; Mascot Delta Score; SLIP Score Indicating inability to differentiate between two sites, either due to no information, or confidence below a defined threshold Example software: SLIP Score; VML Score VAT(0.1)VS(0.89)VLAT(0.01)R Probabilities for all potential site assignments within peptide are reported Example software: PTM Score / MaxQuant; PhosphoRS
Representing Ambiguity VATVSVLATR – Doubly phosphorylated Best site assignments with associated scores. Separate score calculated for each site assignment. Score is in comparison to best assignment not containing a particular modification site; is relative to when residues 5 and 9 are modified. One site has confidence measure; other site does not. VAT(0.95)VS(0.9)VLAT(0.15)R Probabilities are combination probabilities for one of the two modifications.
Site-Level or Peptide-Level Assesment for Localization Reliability All current software reports reliability for individual site localizations, but software could in theory calculate a reliability for the combination of modifications reported: e.g. VAT(0.95)VS(0.9)VLAT(0.15)R Could be reported as VAT(phospho)VS(phospho)VLATR with probability (0.95x0.9=) 0.86
Modification Ambiguity Some modifications are isobaric Acetyl vs Trimethyl; Phospho vs Sulfo; Ser->Thr vs Methyl Some combinations of modifications are isobaric /isomeric with a single modification Methyl + Methyl vs Dimethyl Carbamidomethyl + Carbamidomethyl vs GlyGly (ubiquitin) Carbamidomethyl + methyl vs propionamide (acrylamide) Acetyl + K + /Ca 2+ adduct vs phospho
Modification Ambiguity Many of the published site localization software were specifically written for phospho, so will not work for other PTMs. Site localization scoring based on search engine results should work for all modifications SLIP score; Mascot Delta score; VML score However, they will only be meaningful if the competing modification alternatives were considered in the initial database search If carbamidomethyl modification of lysines or N-termini in addition to cysteines was not considered, then two carbamidomethyl modifications may not be considered as an alternative to ubiquitination. Knowledge of modifications considered relevant to evaluating site localization reliability
PTMs in Crosslinked Peptides For crosslinked peptides, ambiguity may be between peptides: CAMKER TMAKER Oxidation could be on methionine in either peptide.
What is an Acceptable FLR? 5% 1%1-2% 1%<10.510% <30% 0.01<5% <1% 2012 iPRG study involved identification of modified peptides Participants were asked to return results with 1% FDR at PSM level They were asked to indicate for which peptides they thought PTM site assignments were reliable Modified peptides were spiked in, so correct site localizations were known What was reliability of results reported?