Presentation on theme: "Interpreting MS/MS Proteomics Results Brian C. Searle Proteome Software Inc. Portland, Oregon USA NPC Progress Meeting."— Presentation transcript:
Interpreting MS/MS Proteomics Results Brian C. Searle Proteome Software Inc. Portland, Oregon USA Brian.Searle@ProteomeSoftware.com NPC Progress Meeting (February 2nd, 2006) The first thing I should say is that none of the material presented is original research done at Proteome Software but we do strive to make the tools presented here available in our software product Scaffold. With that caveat aside … Illustrated by Toni Boudreault
Organization This is an foremost an introduction so we’re first going to talk about Then we’re going to talk about the motivations behind the development of the first really useful bioinformatics technique in our field, SEQUEST. This technique has been extended by two other tools called X! Tandem and Mascot. We’re also going to talk about how these programs differ and how we can use that to our advantage by considering them simultaneously using probabilities. Identify SEQUEST X! Tandem/Mascot Differ Combine how you go about identifying proteins with tandem mass spectrometry in the first place
So, this is proteomics, so we’re going to use tandem mass spectrometry to identify proteins-- hopefully many of them, and hopefully very quickly. A A I K G K I D V C I V L L Q H K A EP T I R N T D G R T A Start with a protein
And to use this technique you generally have to lyse the protein into peptides about 8 to 20 amino acids in length and… A A I K G K I D V C I V L L Q H K A EP T I R N T D G R T A Cut with an enzyme
A A I K G K I D V C I V L L Q H K A EP T I R N T D G R T A Select a peptide Look at each peptide individually. We select the peptide by mass using the first half of the tandem mass spectrometer
AEPTIR H2OH2O Impart energy in collision cell The mass spectrometer imparts energy into the peptide causing it to fragment at the peptide bonds between amino acids.
M/z Intensity AEP A AE AEPT 72.0 201.1 298.1 399.2 Measure mass of daughter ions The masses of these fragment ions is recorded using the second mass spectrometer.
M/z Intensity AEPTIR B-type Ions H2OH2O 72.0129.097.0101.0113.1174.1 These ions are commonly called B ions, based on nomenclature you don’t really want to know about… But the mass difference between the peaks corresponds directly to the amino acid sequence.
M/z Intensity AEPTIR B-type Ions H2OH2O 72.0129.097.0101.0113.1174.1 A-0 AE-AAEP -AE AEPT -AEP AEPTI -AEPT AEPTIR -AEPTI For example, the A-E peak minus the A peak should produce the mass of E. You can build these mass differences up and derive a sequence for the original peptide This is pretty neat and it makes tandem mass spectrometry one of the best tools out there for sequencing novel peptides.
So, it seems pretty easy, doesn’t it? But there are a couple confounding factors. For example…
M/z Intensity AEPTIR B-type Ions H2OH2O CO B ions have a tendency to degrade and lose carbon monoxide producing…
M/z AEPTIR A-type Ions H2OH2O CO A ions. Furthermore …
M/z Intensity RITPEA Y-type Ions H2OH2O … The second half are represented as Y ions that sequence backwards. And, unfortunately, this is the real world, so…
M/z Intensity RITPEA Y-type Ions H2OH2O … All the peaks have different measured heights and many peaks can often be missing.
M/z Intensity RITPEA H2OH2O B-type, A-type, Y-type Ions All these peaks are seen together simultaneously and we don’t even know …
M/z Intensity What type of ion they are, making the mass differences approach even more difficult. Finally, as with all analytical techniques,
M/z Intensity There’s noise, producing a final spectrum that looks like …
M/z Intensity ….This, on a good day. And so it’s actually fairly difficult to …
M/z Intensity 72.0129.097.0101.0113.1174.1 AEPTIR H2OH2O … compute the mass differences to sequence the peptide, certainly in a computer automated way.
So the community needed a new technique. Now, it wasn’t all without hope…
Known Ion Types B-type ions A-type ions Y-type ions We knew a couple of things about peptide fragmentation. Not only do we know to expect B, A, and Y ions, but…
Known Ion Types B-type ions A-type ions Y-type ions B- or Y-type +2H ions B- or Y-type -NH 3 ions B- or Y-type -H 2 O ions … We also know a couple of other variations on those ions that come up. We even know something about the …
Known Ion Types B-type ions A-type ions Y-type ions B- or Y-type +2H ions B- or Y-type -NH 3 ions B- or Y-type -H 2 O ions 100% 20% 100% 50% 20% … likelihood of seeing each type of ion, where generally B and Y ions are most prominent.
If we know the amino acid sequence of a peptide, we can guess what the spectra should look like! So it’s actually pretty easy to guess what a spectrum should look like if we know what the peptide sequence is.
ELVISLIVESK Model Spectrum *Courtesy of Dr. Richard Johnson http://www.hairyfatguy.com/ So as an example, consider the peptide ELVIS LIVES K that was synthesized by Rich Johnson in Seattle
Model Spectrum We can create a hypothetical spectrum based on our rules
B/Y type ions (100%) A type ions B/Y -NH 3 /-H 2 O (20%) B/Y +2H type ions (50%) Where B and Y ions are estimated at 100%, plus 2 ions are estimated at 50%, and other stragglers are at 20%.
Model Spectrum So if we consider the spectrum that was derived from the ELVIS LIVES K peptide…
Model Spectrum We can find where the overlap is between the hypothetical and the actual spectra…
Model Spectrum And say conclusively based on the evidence that the spectrum does belong to the ELVIS LIVES K peptide.
But who cares? The more important question is “what about situations where we don’t know the sequence?”
PepSeq AAAAAAAAAA AAAAAAAAAC AAAAAAAACC AAAAAAACCC ELVISLIVESK WYYYYYYYYY YYYYYYYYYY … … J. Rozenski et al., Org. Mass Spectrom., 29 (1994) 654-658. build a hypothetical spectrum, And so this was an approach followed by a program called PepSeq which would guess every combination of amino acids possible and find the best matching hypothetical.
PepSeq Impossibly hard after 7 or 8 amino acids! High false positive rate because you consider so many options but it’s clearly impossibly hard with larger peptides and there’s a lot of room to overfit the data. This was a start,
PepSeq Impossibly hard after 7 or 8 amino acids! High false positive rate because you consider so many options Another strategy is needed! So obviously this isn’t going to work in the long run.
Sequencing Explosion 1977 Shotgun sequencing invented, bacteriophage fX174 sequenced. 1989 Yeast Genome project announced 1990 Human Genome project announced 1992 First chromosome (Yeast) sequenced 1995 H. influenza sequenced 1996 Yeast Genome sequenced 2000 Human Genome draft … et cetra, et cetra In 89 and 90 the Yeast and Human Genome projects were announced We needed a new invention to come around followed by the first chromosome in 92 and that was shotgun Sanger-sequencing
1977 Shotgun sequencing invented, bacteriophage fX174 sequenced. 1989 Yeast Genome project announced 1990 Human Genome project announced 1992 First chromosome (Yeast) sequenced 1995 H. influenza sequenced 1996 Yeast Genome sequenced 2000 Human Genome draft Sequencing Explosion … Eng, J. K.; McCormack, A. L.; Yates, J. R. III J. Am. Soc. Mass Spectrom. 1994, 5, 976-989. In 1994 Jimmy Eng and John Yates published a technique to exploit genome sequencing And the idea was … for use in tandem mass spectrometry.
SEQUEST.… instead of searching all possible peptide sequences, search only those in genome databases. Now, in the post- genomic world this seems like a pretty trivial idea, but back then there was a lot of assumption placed on the idea that we’d actually have a complete Human genome in a reasonable amount of time.
SEQUEST 2*10 14 -- All possible 11mers (ELVISLIVESK) 2*10 10 -- All possible peptides in NR 1*10 8 -- All tryptic peptides in NR 4*10 6 -- All Human tryptic peptides in NR So, In terms of 11amino acid peptides we’re talking about a 10 thousand fold difference between searching every possible 11mer those in the current non-redundant protein database from the NCBI And a 100 million fold difference for searching human trypic peptides So that was huge, it made hypothetical spectrum matching feasible.
SEQUEST Model Spectrum Instead of trying to make a better model, Jimmy and John noted that there was a discontinuity between the intensities of the hypothetical spectrum and the actual spectrum. SEQUEST made a couple of other interesting improvements as well they decided just to make the actual spectrum look like the model with normalization …
SEQUEST Model Spectrum For a scoring function they decided to use Cross-Correlation, Like so. which basically sums the peaks that overlap between hypothetical and the actual spectra
SEQUEST Model Spectrum And then they shifted the spectra back and ….
SEQUEST Model Spectrum They used this number, also called the Auto-Correlation, as their background. … Forth so that the peaks shouldn’t align.
SEQUEST XCorr Gentzel M. et al Proteomics 3 (2003) 1597-1610 Offset (AMU) Correlation Score Cross Correlation (direct comparison) Auto Correlation (background) This is another representation of the Cross Correlation and the Auto Correlation.
SEQUEST XCorr Cross Correlation (direct comparison) Auto Correlation (background) XCorr = Gentzel M. et al Proteomics 3 (2003) 1597-1610 Offset (AMU) Correlation Score The XCorr score is the Cross Correlation divided by the average of the auto correlation over a 150 AMU range. The XCorr is high if the direct comparison is significantly greater than the background, which is obviously good for peptide identification.
SEQUEST DeltaCn and so far, there really haven’t been any significant improvements on it. The DeltaCn is another score that scientists often use. It measures how good the XCorr is relative to the next best match. And this XCorr is actually a pretty robust method for estimating how accurate the match is, As you can see, this is actually a pretty crude calculation.
Accuracy ScoreRelative Score Strong (XCorr) Weak (DeltaCn) SEQUEST Here’s another representation of that sentiment. The XCorr is a strong measure of accuracy, whereas the DeltaCn is a weak measure of relative goodness..
Accuracy ScoreRelative Score Alternate Method Strong (XCorr) Weak (DeltaCn) Strong SEQUEST Obviously, there could be an alternative method that focuses more on the success of the relative score. Mascot and X! Tandem fit that bill.
by-Score= Sum of intensities of peaks matching B-type or Y-type ions HyperScore= X! Tandem Scoring Fenyo, D.; Beavis, R. C. Anal. Chem., 75 (2003) 768-774 Now the X! Tandem accuracy score is rather crude. It only considers B and Y ions and and attaches these factorial terms with an admittedly hand waving argument.
Distribution of “Incorrect” Hits Hyper Score # of Matches Best Hit Second Best But instead of just considering the best match to the second best, it looks at the distribution of lower scoring hits, assuming that they are all wrong. This is somewhat based on ideas pioneered with the BLAST algorithm. Here, every bar represents the number of matches at a given score. The X! Tandem creators found that the distribution decays (or slopes down) exponentially …
Estimate Likelihood (E-Value) Best Hit Hyper Score Log(# of Matches) … and the log of the distribution is relatively linear because of the exponential decay.
Estimate Likelihood (E-Value) Hyper Score Log(# of Matches) Expected Number Of Random Matches Best Hit If the distribution represents the number of random matches at any given score, the linear fit should correspond to the expected number of random matches.
Estimate Likelihood (E-Value) Log(# of Matches) Score of 60 has 1/10 chance of occurring at random Best Hit This is called an E-Value, or Expected-Value. And from this, you can calculate the likelihood that the best match is random. In this case, a score of 60 corresponds with a log number of matches being -1 which means the estimated number of random matches for that score is 0.1
X! Tandem and Mascot E-Value= Likelihood that match is incorrect relative to N guesses Empirical (X! Tandem) P-Value= Likelihood that match is incorrect (E~P·N) Theoretical (Mascot) Another search engine, Mascot, tries to get at the same kind of number using theoretical calculations, Now, X! Tandem calculates this E-Value empirically. most likely based on the number of identified peaks and the likelihood of finding certain amino acids in the genome database. They’ve never explicitly published their algorithm, so we’ll never really know, I just want to bring up a point that we’ll touch on a little later … but I suspect it’s something smart.
X! Tandem and Mascot E-Value= Likelihood that match is incorrect relative to N guesses Empirical (X! Tandem) P-Value= Likelihood that match is incorrect (E~P·N) Theoretical (Mascot) Probability= Likelihood that match is correct Note (Probability≠1-P)! This is realistically not nearly as useful as knowing the probability that a peptide identification is right, which is NOT 1 minus the P-Value. … the E-Value that X! Tandem calculates and the P-Value that Mascot calculates are probabilistically based, but they can only estimate the likelihood that the match is wrong.
Accuracy ScoreRelative Score X! Tandem SEQUEST XCorr HyperScore DeltaCn E-Value Now, let’s go back and fill in the X! Tandem part of our accuracy/relativity scoring grid.
Accuracy ScoreRelative Score X! Tandem SEQUEST XCorr HyperScore DeltaCn E-Value To reiterate, the XCorr is an excellent measure of accuracy …
Accuracy ScoreRelative Score X! Tandem SEQUEST XCorr HyperScore DeltaCn E-Value If we assume that accuracy and relativity scores are independent measures of goodness, could we use both the SEQUEST’s XCorr and X! Tandem’s E-Value together ? … whereas the E-Value is an excellent measure of how good the best score is relative to the rest.
SEQUEST: Discriminant Score X! Tandem: -log(E-Value) 10 Protein Control Sample And the answer is a resounding yes. Each point on this graph is a spectrum, where correct identifications are marked in red, while incorrect identifications are marked in blue. Although in general the spectra SEQUEST scores well are spectra X!Tandem also scores well, there is considerable scatter between the search engines. We know what’s correct and incorrect because this is a control sample.
Mascot: Ion-Identity Score 10 Protein Control Sample X! Tandem: -log(E-Value) One might wonder if X! Tandem and Mascot use similar scoring approaches, would they benefit as much, Now, why are the scores so different? but the answer is surprisingly still yes!
Why So Different? Sequest –Considers relative intensities X! Tandem –Considers semi-tryptic peptides –Considers only B/Y-type Ions Mascot –Considers theoretical P-Value relative to search space Well, here are a couple of possible reasons. SEQUEST is the only method to consider relative intensities.
Why So Different? Sequest –Considers relative intensities X! Tandem –Considers semi-tryptic peptides –Considers only B/Y-type Ions Mascot –Considers theoretical P-Value relative to search space X! Tandem is the only method to consider peptides outside the standard search space by default, such as semi-tryptic peptides. However, it’s the only score that considers only B and Y ions, as opposed to a complete model.
Why So Different? Sequest –Considers relative intensities X! Tandem –Considers semi-tryptic peptides –Considers only B/Y-type Ions Mascot –Considers theoretical P-Value relative to search space And Mascot is the only search engine to compute a completely theoretical P-Value
Mascot: Ion-Identity Score Consider Multiple Algorithms? X! Tandem: -log(E-Value) So we clearly want to consider multiple search engines simultaneously, but how?
How To Compare Search Engines? –SEQUEST: XCorr>2.5, DeltaCn>0.1 –Mascot:Ion Score-Identity Score>0 –X! Tandem:E-Value<0.01 You can’t use a thresholding system because it’s impossible to find corresponding thresholds. For example, a SEQUEST match with an XCorr of 2.5 doesn’t mean the same thing as an X! Tandem match with an E-Value of 0.01.
How To Compare Search Engines? Need to convert scores to probabilities! –SEQUEST: XCorr>2.5, DeltaCn>0.1 –Mascot:Ion Score-Identity Score>0 –X! Tandem:E-Value<0.01 The simplest way would be to convert the scores into probabilities and compare those. We advocate for Andrew Keller and Alexy Nesviskii’s Peptide Prophet approach because it actually calculates a true probability, not just a p-value.
10 Protein Control Sample (Q-ToF) X! Tandem approach Other Incorrect IDs for Spectrum Possibly Correct? Mascot: Ion-Identity Score # of Matches So if you remember, X! Tandem considers the best peptide match for a spectrum against a distribution of incorrect matches
10 Protein Control Sample (Q-ToF) Peptide Prophet approach ALL Other “Best” Matches Possibly Correct? Mascot: Ion-Identity Score # of Matches Keller, A. et al Anal. Chem. 74, 5383-5392 Well, Peptide Prophet looks across the entire sample, and not at just one spectrum at a time. It compares the best match against all of the other best matches in the sample, which is clearly bimodal.
10 Protein Control Sample (Q-ToF) Peptide Prophet approach ALL Other “Best” Matches Possibly Correct? Mascot: Ion-Identity Score # of Matches Keller, A. et al Anal. Chem. 74, 5383-5392 The low mode represents matches that are most likely wrong while the high mode represents matches that are probably right.
10 Protein Control Sample (Q-ToF) Peptide Prophet approach Possibly Correct? “Correct” “Incorrect” Mascot: Ion-Identity Score # of Matches Peptide Prophet curve fits two distributions to the modes, following the assumption that the low scoring distribution is “Incorrect” and that the higher scoring distribution is “correct”.
10 Protein Control Sample (Q-ToF) “Incorrect” Mascot: Ion-Identity Score # of Matches Possibly Correct? “Correct” These two distributions can be analyzed using Bayesian statistics with this formula. Now that formula looks pretty complex, but …
10 Protein Control Sample (Q-ToF) “Incorrect” Mascot: Ion-Identity Score # of Matches “Correct” It just calculates the height of the correct distribution at a particular score, divided by the height of both distributions.
10 Protein Control Sample (Q-ToF) “Correct” “Incorrect” Mascot: Ion-Identity Score This is essentially the probability of having that score and being correct divided by the probability of just having that score
Mascot: Ion-Identity Score Possibly Correct? “Correct” “Incorrect” # of Matches This is a neat method because it actually considers the likelihood of being correct, rather than X! Tandem and Mascot, which only calculate the probability of being incorrect. It’s because of this that Peptide Prophet can get produce a true probability, which is important when the sample characteristics change.
Mascot: Ion-Identity Score Possibly Correct? “Correct” “Incorrect” # of Matches Q-ToF: For example, the control sample we’ve been looking at was derived from Q-ToF data which produces pretty high quality results
Possibly Correct? “Correct” “Incorrect” # of Matches Mascot: Ion-Identity Score Possibly Correct? “Correct” “Incorrect” # of Matches Q-ToF: Ion Trap: If you compare that to the same sample on run on an Ion Trap, the probability of being correct is greatly diminished. If you’ll note, the Incorrect distribution doesn’t change very much between the two analyses, however, the likelihood that the identification is right changes dramatically!
Possibly Correct? “Correct” “Incorrect” # of Matches Mascot: Ion-Identity Score Ion Trap: As Peptide Prophet considers the correct distribution, it is immune to fluctuations between samples. P-Values and E-Values don’t consider this information, so they can’t be compared across multiple samples, or different examinations of the same sample hence the reason why we need to use Peptide Prophet for comparing two different search engines
Mascot: Ion-Identity Score Consider Multiple Algorithms? X! Tandem: -log(E-Value) So going back to the scatter plot between X! Tandem and Mascot, we can use Peptide Prophet to compute the score threshold that represents a 95% cut-off …
Mascot: Ion-Identity Score Consider Multiple Algorithms? X! Tandem: 2.6=95% Mascot: -2.5=95% X! Tandem: -log(E-Value) Like so. This allows you to fairly consider the answers from both search engines simultaneously. The important thing to note, is that if you looked at a different sample, these thresholds should change depending on the height of the correct distributions
Conclusion All search engines use different criteria, producing different scores Using multiple search engines simultaneously yields better results Peptide Prophet can normalize search engine results So in conclusion, all of the search engines look at different criteria
Conclusion All search engines use different criteria, producing different scores Using multiple search engines simultaneously yields better results Peptide Prophet can normalize search engine results And we can leverage this to identify more peptides
Conclusion All search engines use different criteria, producing different scores Using multiple search engines simultaneously yields better results Peptide Prophet can normalize search engine results And that Peptide Prophet is a great mechanism for doing that because it calculates true probabilities, instead of p-values