Protein Sequencing and Identification by Mass Spectrometry.

Slides:



Advertisements
Similar presentations
Tandem MS (MS/MS) on the Q-ToF2
Advertisements

Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,
Introduction to Bioinformatics Algorithms Protein Sequencing and Identification by Mass Spectrometry.
Protein Sequencing and Identification by Mass Spectrometry
Proteomics Informatics – Protein identification III: de novo sequencing (Week 6)
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
Protein Sequencing and Identification by Mass Spectrometry.
Fa 05CSE182 CSE182-L7 Protein sequencing and Mass Spectrometry.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.
PROTEIN IDENTIFICATION BY MASS SPECTROMETRY. OBJECTIVES To become familiar with matrix assisted laser desorption ionization-time of flight mass spectrometry.
Fa 05CSE182 CSE182-L8 Mass Spectrometry. Fa 05CSE182 Bio. quiz What is a gene? What is a transcript? What is translation? What are microarrays? What is.
Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012.
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Previous Lecture: Regression and Correlation
De Novo Sequencing of MS Spectra
Mass Spectrometry. What are mass spectrometers? They are analytical tools used to measure the molecular weight of a sample. Accuracy – 0.01 % of the total.
My contact details and information about submitting samples for MS
1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)
1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)
Protein sequencing and Mass Spectrometry. Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Common parameters At the beginning one need to set up the parameters.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
A Comprehensive Comparison of the de novo Sequencing Accuracies of PEAKS, BioAnalyst and PLGS Bin Ma 1 ; Amanda Doherty-Kirby 1 ; Aaron Booy 2 ; Bob Olafson.
Laxman Yetukuri T : Modeling of Proteomics Data
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
Temple University MASS SPECTROMETRY FURTHER INVESTIGATIONS Ilyana Mushaeva and Amber Moscato Department of Electrical and Computer Engineering Temple University.
Introduction to Bioinformatics Algorithms Protein Sequencing and Identification by Mass Spectrometry.
Protein Identification via Database searching Attila Kertész-Farkas Protein Structure and Bioinformatics Group, ICGEB, Trieste.
PEAKS: De Novo Sequencing using Tandem Mass Spectrometry Bin Ma Dept. of Computer Science University of Western Ontario.
Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?
CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
CSE280Vineet Bafna CSE280a: Projects Vineet Bafna.
INF380 - Proteomics-71 INF380 – Proteomics Chap 7 –Protein Identification and Characterization by MS Protein identification in our context means that we.
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Peptide Identification via Tandem Mass Spectrometry Sorin Istrail.
Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.
Separates charged atoms or molecules according to their mass-to-charge ratio Mass Spectrometry Frequently.
Introduction to Bioinformatics Algorithms Protein Sequencing and Identification by Mass Spectrometry.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.
De Novo Peptide Sequencing via Probabilistic Network Modeling PepNovo.
2014 생화학 실험 (1) 6주차 실험조교 : 류 지 연 Yonsei Proteome Research Center 산학협동관 421호
Constructing high resolution consensus spectra for a peptide library
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
B Monoisotopic mass of neutral peptide M r (calc): Fixed modifications: Carbamidomethyl Ions score: 45 † Expect: ‡ Matches (red): 18/50.
Yonsei Proteome Research Center Peptide Mass Finger-Printing Part II. MALDI-TOF 2013 생화학 실험 (1) 6 주차 자료 임종선 조교 내선 6625.
Peptide de novo sequencing Peptide de novo sequencing is the analytical process that derives a peptide’s amino acid sequence from its tandem mass spectrum.
Mass Spectrometry makes it possible to measure protein/peptide masses (actually mass/charge ratio) with great accuracy Major uses Protein and peptide identification.
Mass Spectrometry 101 (continued) Hackert - CH 370 / 387D
Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.
A Database of Peak Annotations of Empirically Derived Mass Spectra
The Covalent Structure of Proteins
The Syllabus. The Syllabus Safety First !!! Students will not be allowed into the lab without proper attire. Proper attire is designed for your protection.
Protein Identification via Database searching
De novo interpretation of peptide mass spectra
Interpretation of Mass Spectra I
Proteomics Informatics –
NoDupe algorithm to detect and group similar mass spectra.
Protein Identification Using Tandem Mass Spectrometry
Shotgun Proteomics in Neuroscience
Interpretation of Mass Spectra
Kuen-Pin Wu Institute of Information Science Academia Sinica
Presentation transcript:

Protein Sequencing and Identification by Mass Spectrometry

Masses of Amino Acid Residues

Protein Backbone H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH R i-1 RiRi R i+1 AA residue i-1 AA residue i AA residue i+1 N-terminus C-terminus

Peptide Fragmentation Peptides tend to fragment along the backbone. Fragments can also loose neutral chemical groups like NH 3 and H 2 O. H...-HN-CH-CO... NH-CH-CO-NH-CH-CO-…OH R i-1 RiRi R i+1 H+H+ Prefix FragmentSuffix Fragment Collision Induced Dissociation

Breaking Protein into Peptides and Peptides into Fragment Ions Proteases, e.g. trypsin, break protein into peptides. A Tandem Mass Spectrometer further breaks the peptides down into fragment ions and measures the mass of each piece. Mass Spectrometer accelerates the fragmented ions; heavier ions accelerate slower than lighter ones. Mass Spectrometer measure mass/charge ratio of an ion.

N- and C-terminal Peptides N-terminal peptides C-terminal peptides

Terminal peptides and ion types Peptide Mass (D) = 415 Peptide Mass (D) – 18 = 397 without

N- and C-terminal Peptides N-terminal peptides C-terminal peptides

N- and C-terminal Peptides N-terminal peptides C-terminal peptides

N- and C-terminal Peptides

N- and C-terminal Peptides Reconstruct peptide from the set of masses of fragment ions (mass-spectrum)

Peptide Fragmentation y3y3 b2b2 y2y2 y1y1 b3b3 a2a2 a3a3 HO NH 3 + | | R 1 O R 2 O R 3 O R 4 | || | || | || | H -- N --- C --- C --- N --- C --- C --- N --- C --- C --- N --- C -- COOH | | | | | | | H H H H H H H b2-H2Ob2-H2O y 3 -H 2 O b 3 - NH 3 y 2 - NH 3

Mass Spectra GVDLK mass 0 57 Da = ‘G’ 99 Da = ‘V’ L K DVG The peaks in the mass spectrum: Prefix Fragments with neutral losses (-H 2 O, -NH 3 ) Noise and missing peaks. and Suffix Fragments. D H2OH2O

Protein Identification with MS/MS GVDLK mass 0 Intensity mass 0 MS/MS Peptide Identification:

Tandem Mass-Spectrometry

Breaking Proteins into Peptides peptides MPSER …… GTDIMR PAKID …… HPLC To MS/MS MPSERGTDIMRPAKID protein

Mass Spectrometry Matrix-Assisted Laser Desorption/Ionization (MALDI) From lectures by Vineet Bafna (UCSD)

Tandem Mass Spectrometry Scan 1708 LC Scan 1707 MS MS/MS Ion Source MS-1 collision cell MS-2

Protein Identification by Tandem Mass Spectrometry Sequence MS/MS instrument Database search Sequest de Novo interpretation Sherenga

Tandem Mass Spectrum Tandem Mass Spectrometry (MS/MS): mainly generates partial N- and C-terminal peptides Spectrum consists of different ion types because peptides can be broken in several places. Chemical noise often complicates the spectrum. Represented in 2-D: mass/charge axis vs. intensity axis

De Novo vs. Database Search W R A C V G E K D W L P T L T W R A C V G E K D W L P T L T De Novo AVGELTK Database Search Database of all peptides = 20 n AAAAAAAA,AAAAAAAC,AAAAAAAD,AAAAAAAE, AAAAAAAG,AAAAAAAF,AAAAAAAH,AAAAAAI, AVGELTI, AVGELTK, AVGELTL, AVGELTM, YYYYYYYS,YYYYYYYT,YYYYYYYV,YYYYYYYY Database of known peptides MDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT, HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE, ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC, GVFGSVLRA, EKLNKAATYIN.. Mass, Score

De Novo vs. Database Search: A Paradox The database of all peptides is huge ≈ O(20 n ). The database of all known peptides is much smaller ≈ O(10 8 ). However, de novo algorithms can be much faster, even though their search space is much larger! A database search scans all peptides in the database of all known peptides search space to find best one. De novo eliminates the need to scan database of all peptides by modeling the problem as a graph search.

De novo Peptide Sequencing Sequence

Theoretical Spectrum

Theoretical Spectrum (cont’d)

Building Spectrum Graph How to create vertices (from masses) How to create edges (from mass differences) How to score paths How to find best path

S E Q U E N C E b Mass/Charge (M/Z)

a S E Q U E N C E

Mass/Charge (M/Z) a is an ion type shift in b

y Mass/Charge (M/Z) E C N E U Q E S

Mass/Charge (M/Z) Intensity

Intensity

noise Mass/Charge (M/Z)

MS/MS Spectrum Mass/Charge (M/z) Intensity

Some Mass Differences between Peaks Correspond to Amino Acids s s s e e e e e e e e q q q u u u n n n e c c c

Ion Types Some masses correspond to fragment ions, others are just random noise Knowing ion types Δ={δ 1, δ 2,…, δ k } lets us distinguish fragment ions from noise We can learn ion types δ i and their probabilities q i by analyzing a large test sample of annotated spectra.

Example of Ion Type Δ={δ 1, δ 2,…, δ k } Ion types {b, b-NH 3, b-H 2 O} correspond to Δ={0, 17, 18} *Note: In reality the δ value of ion type b is -1 but we will “hide” it for the sake of simplicity

Match between Spectra and the Shared Peak Count The match between two spectra is the number of masses (peaks) they share (Shared Peak Count or SPC) In practice mass-spectrometrists use the weighted SPC that reflects intensities of the peaks Match between experimental and theoretical spectra is defined similarly

Peptide Sequencing Problem Goal: Find a peptide with maximal match between an experimental and theoretical spectrum. Input: S: experimental spectrum Δ : set of possible ion types m: parent mass Output: P: peptide with mass m, whose theoretical spectrum matches the experimental S spectrum the best

Vertices of Spectrum Graph Masses of potential N-terminal peptides Vertices are generated by reverse shifts corresponding to ion types Δ={δ 1, δ 2,…, δ k } Every N-terminal peptide can generate up to k ions m-δ 1, m-δ 2, …, m-δ k Every mass s in an MS/MS spectrum generates k vertices V(s) = {s+δ 1, s+δ 2, …, s+δ k } corresponding to potential N-terminal peptides Vertices of the spectrum graph: {initial vertex}  V(s 1 )  V(s 2 ) ...  V(s m )  {terminal vertex}

Reverse Shifts Two peaks b-H 2 O and b are given by the Mass Spectrum With a +H 2 O shift, if two peaks coincide that is a possible vertex. Mass/Charge (M/Z) Intensity Red: Mass Spectrum Blue: shift (+H 2 O) b/b-H 2 O+H 2 O b-H 2 O b+H 2 O

Reverse Shifts Shift in H 2 O+NH 3 Shift in H 2 O

Edges of Spectrum Graph Two vertices with mass difference corresponding to an amino acid A: Connect with an edge labeled by A Gap edges for di- and tri-peptides

Paths Path in the labeled graph spell out amino acid sequences There are many paths, how to find the correct one? We need scoring to evaluate paths

Path Score p(P,S) = probability that peptide P produces spectrum S= {s 1,s 2,…s q } p(P, s) = the probability that peptide P generates a peak s Scoring = computing probabilities p(P,S) = π s є S p(P, s)

For a position t that represents ion type d j : q j, if peak is generated at t p(P,s t ) = 1-q j, otherwise Peak Score

Peak Score (cont’d) For a position t that is not associated with an ion type: q R, if peak is generated at t p R (P,s t ) = 1-q R, otherwise q R = the probability of a noisy peak that does not correspond to any ion type

Finding Optimal Paths in the Spectrum Graph For a given MS/MS spectrum S, find a peptide P’ maximizing p(P,S) over all possible peptides P: Peptides = paths in the spectrum graph P’ = the optimal path in the spectrum graph

Ions and Probabilities Tandem mass spectrometry is characterized by a set of ion types {δ 1,δ 2,..,δ k } and their probabilities {q 1,...,q k } δ i -ions of a partial peptide are produced independently with probabilities q i

Ions and Probabilities A peptide has all k peaks with probability and no peaks with probability A peptide also produces a ``random noise'' with uniform probability q R in any position.

Ratio Test Scoring for Partial Peptides Incorporates premiums for observed ions and penalties for missing ions. Example: for k=4, assume that for a partial peptide P’ we only see ions δ 1,δ 2,δ 4. The score is calculated as:

Scoring Peptides T- set of all positions. T i ={t δ1,, t δ2,...,,t δk, }- set of positions that represent ions of partial peptides P i. A peak at position t δj is generated with probability q j. R=T- U T i - set of positions that are not associated with any partial peptides (noise).

Probabilistic Model For a position t δj  T i the probability p(t, P,S) that peptide P produces a peak at position t. Similarly, for t  R, the probability that P produces a random noise peak at t is:

Probabilistic Score For a peptide P with n amino acids, the score for the whole peptides is expressed by the following ratio test:

De Novo vs. Database Search W R A C V G E K D W L P T L T W R A C V G E K D W L P T L T De Novo AVGELTK Database Search Database of known peptides MDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT, HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE, ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC, GVFGSVLRA, EKLNKAATYIN..