Download presentation
Presentation is loading. Please wait.
Published byBridget Pendlebury Modified over 10 years ago
1
Protein Sequencing and Identification by Mass Spectrometry
2
Masses of Amino Acid Residues
3
Protein Backbone H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH R i-1 RiRi R i+1 AA residue i-1 AA residue i AA residue i+1 N-terminus C-terminus
4
Peptide Fragmentation Peptides tend to fragment along the backbone. Fragments can also loose neutral chemical groups like NH 3 and H 2 O. H...-HN-CH-CO... NH-CH-CO-NH-CH-CO-…OH R i-1 RiRi R i+1 H+H+ Prefix FragmentSuffix Fragment Collision Induced Dissociation
5
Breaking Protein into Peptides and Peptides into Fragment Ions Proteases, e.g. trypsin, break protein into peptides. A Tandem Mass Spectrometer further breaks the peptides down into fragment ions and measures the mass of each piece. Mass Spectrometer accelerates the fragmented ions; heavier ions accelerate slower than lighter ones. Mass Spectrometer measure mass/charge ratio of an ion.
6
N- and C-terminal Peptides N-terminal peptides C-terminal peptides
7
Terminal peptides and ion types Peptide Mass (D) 57 + 97 + 147 + 114 = 415 Peptide Mass (D) 57 + 97 + 147 + 114 – 18 = 397 without
8
N- and C-terminal Peptides N-terminal peptides C-terminal peptides 415 486 301 154 57 71 185 332 429
9
N- and C-terminal Peptides N-terminal peptides C-terminal peptides 415 486 301 154 57 71 185 332 429
10
N- and C-terminal Peptides 415 486 301 154 57 71 185 332 429
11
N- and C-terminal Peptides 415 486 301 154 57 71 185 332 429 Reconstruct peptide from the set of masses of fragment ions (mass-spectrum)
12
Peptide Fragmentation y3y3 b2b2 y2y2 y1y1 b3b3 a2a2 a3a3 HO NH 3 + | | R 1 O R 2 O R 3 O R 4 | || | || | || | H -- N --- C --- C --- N --- C --- C --- N --- C --- C --- N --- C -- COOH | | | | | | | H H H H H H H b2-H2Ob2-H2O y 3 -H 2 O b 3 - NH 3 y 2 - NH 3
13
Mass Spectra GVDLK mass 0 57 Da = ‘G’ 99 Da = ‘V’ L K DVG The peaks in the mass spectrum: Prefix Fragments with neutral losses (-H 2 O, -NH 3 ) Noise and missing peaks. and Suffix Fragments. D H2OH2O
14
Protein Identification with MS/MS GVDLK mass 0 Intensity mass 0 MS/MS Peptide Identification:
15
Tandem Mass-Spectrometry
16
Breaking Proteins into Peptides peptides MPSER …… GTDIMR PAKID …… HPLC To MS/MS MPSERGTDIMRPAKID...... protein
17
Mass Spectrometry Matrix-Assisted Laser Desorption/Ionization (MALDI) From lectures by Vineet Bafna (UCSD)
18
Tandem Mass Spectrometry Scan 1708 LC Scan 1707 MS MS/MS Ion Source MS-1 collision cell MS-2
19
Protein Identification by Tandem Mass Spectrometry Sequence MS/MS instrument Database search Sequest de Novo interpretation Sherenga
20
Tandem Mass Spectrum Tandem Mass Spectrometry (MS/MS): mainly generates partial N- and C-terminal peptides Spectrum consists of different ion types because peptides can be broken in several places. Chemical noise often complicates the spectrum. Represented in 2-D: mass/charge axis vs. intensity axis
21
De Novo vs. Database Search W R A C V G E K D W L P T L T W R A C V G E K D W L P T L T De Novo AVGELTK Database Search Database of all peptides = 20 n AAAAAAAA,AAAAAAAC,AAAAAAAD,AAAAAAAE, AAAAAAAG,AAAAAAAF,AAAAAAAH,AAAAAAI, AVGELTI, AVGELTK, AVGELTL, AVGELTM, YYYYYYYS,YYYYYYYT,YYYYYYYV,YYYYYYYY Database of known peptides MDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT, HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE, ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC, GVFGSVLRA, EKLNKAATYIN.. Mass, Score
22
De Novo vs. Database Search: A Paradox The database of all peptides is huge ≈ O(20 n ). The database of all known peptides is much smaller ≈ O(10 8 ). However, de novo algorithms can be much faster, even though their search space is much larger! A database search scans all peptides in the database of all known peptides search space to find best one. De novo eliminates the need to scan database of all peptides by modeling the problem as a graph search.
23
De novo Peptide Sequencing Sequence
24
Theoretical Spectrum
25
Theoretical Spectrum (cont’d)
27
Building Spectrum Graph How to create vertices (from masses) How to create edges (from mass differences) How to score paths How to find best path
28
S E Q U E N C E b Mass/Charge (M/Z)
29
a S E Q U E N C E
30
Mass/Charge (M/Z) a is an ion type shift in b
31
y Mass/Charge (M/Z) E C N E U Q E S
32
Mass/Charge (M/Z) Intensity
33
Intensity
34
noise Mass/Charge (M/Z)
35
MS/MS Spectrum Mass/Charge (M/z) Intensity
36
Some Mass Differences between Peaks Correspond to Amino Acids s s s e e e e e e e e q q q u u u n n n e c c c
37
Ion Types Some masses correspond to fragment ions, others are just random noise Knowing ion types Δ={δ 1, δ 2,…, δ k } lets us distinguish fragment ions from noise We can learn ion types δ i and their probabilities q i by analyzing a large test sample of annotated spectra.
38
Example of Ion Type Δ={δ 1, δ 2,…, δ k } Ion types {b, b-NH 3, b-H 2 O} correspond to Δ={0, 17, 18} *Note: In reality the δ value of ion type b is -1 but we will “hide” it for the sake of simplicity
39
Match between Spectra and the Shared Peak Count The match between two spectra is the number of masses (peaks) they share (Shared Peak Count or SPC) In practice mass-spectrometrists use the weighted SPC that reflects intensities of the peaks Match between experimental and theoretical spectra is defined similarly
40
Peptide Sequencing Problem Goal: Find a peptide with maximal match between an experimental and theoretical spectrum. Input: S: experimental spectrum Δ : set of possible ion types m: parent mass Output: P: peptide with mass m, whose theoretical spectrum matches the experimental S spectrum the best
41
Vertices of Spectrum Graph Masses of potential N-terminal peptides Vertices are generated by reverse shifts corresponding to ion types Δ={δ 1, δ 2,…, δ k } Every N-terminal peptide can generate up to k ions m-δ 1, m-δ 2, …, m-δ k Every mass s in an MS/MS spectrum generates k vertices V(s) = {s+δ 1, s+δ 2, …, s+δ k } corresponding to potential N-terminal peptides Vertices of the spectrum graph: {initial vertex} V(s 1 ) V(s 2 ) ... V(s m ) {terminal vertex}
42
Reverse Shifts Two peaks b-H 2 O and b are given by the Mass Spectrum With a +H 2 O shift, if two peaks coincide that is a possible vertex. Mass/Charge (M/Z) Intensity Red: Mass Spectrum Blue: shift (+H 2 O) b/b-H 2 O+H 2 O b-H 2 O b+H 2 O
43
Reverse Shifts Shift in H 2 O+NH 3 Shift in H 2 O
44
Edges of Spectrum Graph Two vertices with mass difference corresponding to an amino acid A: Connect with an edge labeled by A Gap edges for di- and tri-peptides
45
Paths Path in the labeled graph spell out amino acid sequences There are many paths, how to find the correct one? We need scoring to evaluate paths
46
Path Score p(P,S) = probability that peptide P produces spectrum S= {s 1,s 2,…s q } p(P, s) = the probability that peptide P generates a peak s Scoring = computing probabilities p(P,S) = π s є S p(P, s)
47
For a position t that represents ion type d j : q j, if peak is generated at t p(P,s t ) = 1-q j, otherwise Peak Score
48
Peak Score (cont’d) For a position t that is not associated with an ion type: q R, if peak is generated at t p R (P,s t ) = 1-q R, otherwise q R = the probability of a noisy peak that does not correspond to any ion type
49
Finding Optimal Paths in the Spectrum Graph For a given MS/MS spectrum S, find a peptide P’ maximizing p(P,S) over all possible peptides P: Peptides = paths in the spectrum graph P’ = the optimal path in the spectrum graph
50
Ions and Probabilities Tandem mass spectrometry is characterized by a set of ion types {δ 1,δ 2,..,δ k } and their probabilities {q 1,...,q k } δ i -ions of a partial peptide are produced independently with probabilities q i
51
Ions and Probabilities A peptide has all k peaks with probability and no peaks with probability A peptide also produces a ``random noise'' with uniform probability q R in any position.
52
Ratio Test Scoring for Partial Peptides Incorporates premiums for observed ions and penalties for missing ions. Example: for k=4, assume that for a partial peptide P’ we only see ions δ 1,δ 2,δ 4. The score is calculated as:
53
Scoring Peptides T- set of all positions. T i ={t δ1,, t δ2,...,,t δk, }- set of positions that represent ions of partial peptides P i. A peak at position t δj is generated with probability q j. R=T- U T i - set of positions that are not associated with any partial peptides (noise).
54
Probabilistic Model For a position t δj T i the probability p(t, P,S) that peptide P produces a peak at position t. Similarly, for t R, the probability that P produces a random noise peak at t is:
55
Probabilistic Score For a peptide P with n amino acids, the score for the whole peptides is expressed by the following ratio test:
56
De Novo vs. Database Search W R A C V G E K D W L P T L T W R A C V G E K D W L P T L T De Novo AVGELTK Database Search Database of known peptides MDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT, HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE, ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC, GVFGSVLRA, EKLNKAATYIN..
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.