Hanyang Univ. Introduction to Data Analyses for Mass Spectrometry-based Proteomics 1.

Slides:



Advertisements
Similar presentations
In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics.
Advertisements

1336 SW Bertha Blvd, Portland OR 97219
MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.
N-Glycopeptide Identification from CID Tandem Mass Spectra using Glycan Databases and False Discovery Rate Estimation Kevin B. Chandler, Petr Pompach,
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
De Novo Sequencing v.s. Database Search Bin Ma School of Computer Science University of Waterloo Ontario, Canada.
CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
Protein Sequencing and Identification by Mass Spectrometry.
Fa 05CSE182 CSE182-L7 Protein sequencing and Mass Spectrometry.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan.
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
Fa 05CSE182 CSE182-L8 Mass Spectrometry. Fa 05CSE182 Bio. quiz What is a gene? What is a transcript? What is translation? What are microarrays? What is.
ProReP - Protein Results Parser v3.0©
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012.
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Proteomics Informatics Workshop Part I: Protein Identification
Previous Lecture: Regression and Correlation
Mass Spectrometry. What are mass spectrometers? They are analytical tools used to measure the molecular weight of a sample. Accuracy – 0.01 % of the total.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.
1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)
Facts and Fallacies about de Novo Sequencing & Database Search.
1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)
Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Spectral Counting. 2 Definition The total number of identified peptide sequences (peptide spectrum matches) for the protein, including those redundantly.
Proteomics Informatics Workshop Part II: Protein Characterization David Fenyö February 18, 2011 Top-down/bottom-up proteomics Post-translational modifications.
Evaluated Reference MS/MS Spectra Libraries Current and Future NIST Programs.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
Karl Clauser Proteomics and Biomarker Discovery Taming Errors for Peptides with Post-Translational Modifications Bioinformatics for MS Interest Group ASMS.
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Common parameters At the beginning one need to set up the parameters.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
A Comprehensive Comparison of the de novo Sequencing Accuracies of PEAKS, BioAnalyst and PLGS Bin Ma 1 ; Amanda Doherty-Kirby 1 ; Aaron Booy 2 ; Bob Olafson.
Laxman Yetukuri T : Modeling of Proteomics Data
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
Protein Identification via Database searching Attila Kertész-Farkas Protein Structure and Bioinformatics Group, ICGEB, Trieste.
Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?
CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
INF380 - Proteomics-71 INF380 – Proteomics Chap 7 –Protein Identification and Characterization by MS Protein identification in our context means that we.
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated.
Isotope Labeled Internal Standards in Skyline
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.
De Novo Peptide Sequencing via Probabilistic Network Modeling PepNovo.
Peptide-assisted annotation of the Mlp genome Philippe Tanguay Nicolas Feau David Joly Richard Hamelin.
ISA Kim Hye mi. Introduction Input Spectrum data (Protein database) Peptide assignment Peptide validation manual validation PeptideProphet.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Constructing high resolution consensus spectra for a peptide library
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Hanyang Univ. Introduction to Mass Spectrometry-based Proteomics 1.
Database Search Algorithm for Identification of Intact Cross-Links in Proteins and Peptides Using Tandem Mass Sepctrometry 신성호.
MassMatrix Search Results Explained
Protein Identification via Database searching
Mass spectrometry-based proteomics
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra I
Peptide & Protein Identification by MS/MS
Proteomics Informatics –
NoDupe algorithm to detect and group similar mass spectra.
Sim and PIC scoring results for standard peptides and the test shotgun proteomics dataset. Sim and PIC scoring results for standard peptides and the test.
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra
Presentation transcript:

Hanyang Univ. Introduction to Data Analyses for Mass Spectrometry-based Proteomics 1

Hanyang Univ. Peptide Assignment DEAR vs. READ Differentiable ? 2

Hanyang Univ. DEARREAD digestion mass-spectrometry ProteinPeptides m/z intensity DEAR READ Mass spectrum (MS) m/z intensity DEAR READ m/z intensity mass-spectrometry DEAR Mass/Mass spectrum (MS/MS) D EAR DE AR DEA R peptide fragmentation m/z intensity READ R EAD RE AD REA D mass-spectrometry peptide fragmentation 471 Data Analysis - Peptide Assignment 3

Hanyang Univ. 4 The mass-spectrometry/proteomic experiment ???? digestion mass-spectrometry ProteinPeptides m/z intensity ???? Mass spectrum (MS) m/z intensity ???? 471 ? ??? ?? ??? ? mass-spectrometry peptide fragmentation m/z intensity ???? mass-spectrometry Mass/Mass spectrum (MS/MS) m/z intensity ???? Trypsin - Pepsin - Lys-C - Quadrupole - Time of flight - FTICR ? ??? ?? ??? ? peptide fragmentation - CID - ECD - ETD

Hanyang Univ. y-ion  Labeled from C-terminal to N-terminal b-ion  Labeled from N-terminal to C-terminal N-terminal (Amino-terminal) C-terminal (Carboxy-terminal) Peptide Fragmentation

Hanyang Univ. Peptide Fragmentation

Hanyang Univ. Calculating b-/y-ion mass

Hanyang Univ. Average vs. Monoisotopopic mass 8

Hanyang Univ. Amino AcidMass A71 D115 E129 R156 energy RDEA H +H ++ OH Peptide E R A D Fragmentation 1 D E AR 3 D E R A 2 Intensity m/z b1 b2 b3 y3 y2 y1 116 b1 245 b2 316 b3 375 y3 246 y2 175 y1 ? ? ERA EDA MS/MS Peptide Assignment 9

Hanyang Univ. 334 y3 134 y1 Amino AcidMass A71 D115 E129 R156 energy DREA H +H ++ OH Peptide E D A R Fragmentation 1 R E AD 3 R E D A 2 b1 b2 b3 y3 y2 y1 157 b1 286 b2 356 b3 205 y2 Intensity m/z EDA ERA MS/MS Peptide Assignment 10

Hanyang Univ. DEAR vs. READ Intensity m/z b1 b2 b3 y3y2y1 334 y3 134 y1 157 b1 286 b2 356 b3 205 y2 Intensity m/z DEAR READ MS/MS 11

Hanyang Univ. MS/MS simulation 1 D EAR DEAR Intensity m/z Intensity m/z fragmentation MS MS/MS DEAR READ b1 y3 12

Hanyang Univ. MS/MS simulation 2 DEA R DEAR Intensity m/z Intensity m/z fragmentation MS MS/MS DEAR READ b1 b3 y3 y1 13

Hanyang Univ. MS/MS simulation 3 DEAR Intensity m/z fragmentation Intensity m/z D EAR MS MS/MS DEAR READ b1 b3 y3 y1 14

Hanyang Univ. MS/MS simulation 4 DE AR DEAR Intensity m/z fragmentation Intensity m/z MS MS/MS DEAR READ b1 b2 b3 y3 y2 y1 15

Hanyang Univ. MS/MS simulation 5 DE AR DEAR Intensity m/z fragmentation Intensity m/z MS MS/MS DEAR READ b1 b2 b3 y3 y2 y1 16

Hanyang Univ. MS/MS simulation 100 DE AR DEAR Intensity m/z fragmentation D EAR DEA R Intensity m/z MS MS/MS READ b1 b2 b3 y3 y2 y1 17

Hanyang Univ. Peptide assignment RDEA H +H ++ OH Peptide Intensity m/z b1 b2 b3 y3y2y1 ERA EDA Not known whether an ion is a b-ion or y-ion Some ions may be missing Various ion types (neutral loss) Amino acid modification 18

Hanyang Univ. 19 Database search - Peptide assignment using MS/MS >Protein A MEMEKEFEQIDKSGSWAAIYQDIDVGAEDFPCRVAKLPKNKNRNR YRDVSPFDHSRKREADDNDYINASLIKMEEAQRSYILTQQIDKSG SWAAIYQDIRHEASDFHEASDFPCRVAKLPKNKDEARYMEKEFEQ IDKGAGVDADIRHEMEKEFEQIDKSGSWAAIYQDIRHE >Protein B MKVLILACLVALALAEGDRLNVPGEIVESLSSSEESITRINKKIE KFQSEEQQQTEDELQDKIHPFAQTQSLVYPFPGPEGDVAPQNIPP LTQTPVVVPPFLQPEVMGVSKVKEAMAPKHKEMPFPKYPVEPF >Protein C... Intensity m/z MS/MS Parent mass = 471 Sequence Database  Raw genomic  Transcript or EST  Protein Sequence

Hanyang Univ. 20 Database search - Peptide assignment using MS/MS >Protein A MEMEKEFEQIDKSGSWAAIYQDIDVGAEDFPCRVAKLPKNKNRNR YRDVSPFDHSRKREADDNDYINASLIKMEEAQRSYILTQQIDKSG SWAAIYQDIRHEASDFHEASDFPCRVAKLPKNKDEARYMEKEFEQ IDKGAGVDADIRHEMEKEFEQIDKSGSWAAIYQDIRHE >Protein B MKVLILACLVALALAEGDRLNVPGEIVESLSSSEESITRINKKIE KFQSEEQQQTEDELQDKIHPFAQTQSLVYPFPGPEGDVAPQNIPP LTQTPVVVPPFLQPEVMGVSKVKEAMAPKHKEMPFPKYPVEPF >Protein C... Intensity m/z MS/MS Parent mass = 471

Hanyang Univ. 21 Peptide assignment using MS/MS READ DVGAE DEAR GAGVDA EGDVA … Candidate peptides Intensity m/z Experimental MS/MS spectrum MS/MS Comparison Parent mass = 471

Hanyang Univ. 22 Peptide assignment using MS/MS READ DVGAE DEAR GAGVDA EGDVA … Candidate peptides Intensity m/z Experimental MS/MS spectrum Intensity m/z Intensity m/z Intensity m/z Intensity m/z Intensity m/z Theoretical MS/MS spectrum MS/MS Parent mass = 471

Hanyang Univ. 23 Peptide assignment using MS/MS Intensity m/z Intensity m/z Intensity m/z Intensity m/z Intensity m/z Intensity m/z Intensity m/z Intensity m/z Intensity m/z Intensity m/z Theoretical MS/MS spectrum READ DVGAE DEAR GAGVDA EGDVA … Candidate peptidesComparison Select TOP one Match score

Hanyang Univ. 24 Post-translational modification (PTM) modified protein Addition of chemical groups Structural changes Various Cellular Functions PROTEIN PROT PO4 EIN PROTEINS PROTE CH2 IN ROTEIN PROT PO4 EINS PO4 Dynamic proteome

Hanyang Univ. 25 MS/MS of modified peptides digestion MS/MS Intensity m/z Modified protein

Hanyang Univ. 26 MS/MS spectrum of modified peptides ‘TVTAMDVVY’ m/z intensity AVTMDVV T MS/MS spectrum of peptide ‘TVTAM Δ DVVY’ with a modification of +Δ mass T TV TVT TVTA TVTAM TVTAMD TVTAMDV TVTAMDVV VTAMDVVY TAMDVVY AMDVVY MDVVY DVVY VVY VY Y ΔΔΔΔ M+ΔDVV Δ SHIFT AVT T intensity m/z T TV TVT TVTA TVTAM Δ TVTAM Δ D TVTAM Δ DV TVTAM Δ DVV MDVV TVTAMDVVY vs. TVTAM Δ DVVY ‘TVTAM Δ DVVY’

Hanyang Univ Database search – modification analysis >Protein A MEMEKEFEQIDKSGSWAAIYQDIDVGAEDFPCRVAKLPK NKNRNRYRDVSPFDHSRKREADDNDYINASLIKMEEAQR SYILTQQIDKSGSWAAIYQDIRHEASDFHEASDFPCRVA KLPKNKDEARYMEKEFEQIDKGAGVDADIRHEMEKEFEQ IDKSGSWAAIYQDIRHE >Protein B … Intensity m/z MS/MS Parent mass = 471 DVGAE READ DEAR GAGVDA … Candidate peptides Every substring Candidate peptides PEAK Modification analysis Explosion of the no. of candidate peptides

Hanyang Univ. 28 Complexity for analyzing modified peptides O(N) Intensity m/z MS/MS Parent mass = 769 PTMPEPT 753 PTMPEPT PTMPEPT PTMPEPT PTMPEPT PTMPEPT PTMPEPT PTMPEPT 16 - Considering one modification per peptide

Hanyang Univ. 29 Complexity for analyzing modified peptides PTMPEPT 100 = = = … = = … = d (-200 ~ +200) O(dN 2 ) N(N-1) - Considering two modifications per peptide

Hanyang Univ. 30 Standard method for modification analysis PTMPEPT 100 = = = … = = … = Input modifications +1 on N +3 on M +97 on E +102 on T Restrictive search

Hanyang Univ. Spectral Library - Peptide assignment using MS/MS 31

Hanyang Univ. Spectral Library - Peptide assignment using MS/MS Consensus Spectrum 32

Hanyang Univ. Peptide Validation Peptide assignment  각각의 MS/MS 스펙트럼에 대해 독립적으로 해석  사용한 소프트웨어가 다를 경우에 대응이 어려움 Manual validation  Filtering by search scores, NTT(Number of Tryptic Termini)  주관적인 판단이 개입될 수 있음  Error rate 이 얼마나 되는지 알 수 없음  Dataset 이 커지면 ? Statistical validation  Search score 에 대한 확률모델을 근거로 각각의 peptide assignment 가 올바를 확률을 제시 — PeptideProphet  False discovery rate 를 decoy peptide 에 대한 match 를 근거로 추정 33

Hanyang Univ. (Un)reliability of Manual Validation Manual Authenticators Search Results Correct ValidationIncorrect ValidationValidation Withheld 34

Hanyang Univ. Peptide Validation  PeptideProphet AAAA m/z intensity m/z intensity CCCC m/z intensity m/z intensity m/z intensity GGGG KKKKLLLL TTTT m/z intensity m/z intensity LLLLQQQQIIII m/z intensity m/z intensity

Hanyang Univ. Peptide Validation  Target/Decoy m/z intensity YILT DAER m/z intensity >Protein A (Target Sequence) MEMEKEFEQIDKSGSWAAIYQDIDVGAEDFPCRVAKLPK NKNRNRYRDVSPFDHSRKREADDNDYINASLIKMEEAQR SYILTQQIDKSGSWAAIYQDIRHEASDFHEASDFPCRVA KLPKNKDEARYMEKEFEQIDKGAGVDADIRHEMEKEFEQ IDKSGSWAAIYQDIRHE >Reversed Protein A (Decoy Sequence) EHRIDQYIAAWSGSKDIQEFEKEMEHRIDADVGAGKDIQ EFEKEMYRAEDKNKPLKAVRCPFDSAEHFDSAEHRIDQY IAAWGSGKDIQQTLIYSRQAEEMKILSANIYDNDDAERK RSHDFPSVDRYRNRNKNKPLKAVRCPFDEAGVDIDQYIA AWSGSKDIQEFEKEMEM T = 1000# of matches to the target sequence (above score threshold) D = 20 # of matches to the decoy sequence (above score threshold) False Discovery Rate = ? 36

Hanyang Univ. Peptide Validation  Target/Decoy Target/Decoy 는 large dataset 에 대해서만 의미가 있음. Decoy database 로 적당한 것은 ? (amino acid composition, peptide sequence redundancy, precursor mass distribution)  Reversed sequence  Random sequences  Pseudo-reverse sequence Separated or concatenated?  Threshold score 30  Match to the target score 50  Match to the decoy score 40  Is this counted as a false positive? Calculating FDR  Concatenated:  Is this counted as a false positive? 37

Hanyang Univ. Semi-parametric PeptideProphet Decoy search results => distribution for incorrect assignments EM algorithm to estimate distributions of correct assignments  NTT (number of tryptic termini) 38

Hanyang Univ. Protein Assignment 39

Hanyang Univ. Protein Assignment >Protein A MEMEKEFEQIDKSGSWAAIYQDIDVGAEDFPCRVAKLPKNKNRNRYRDVSPFD HSRKREADDNDYINASLIKMEEAQRSYILTQQIDKSGSWAAIYQDIRHEASDF HEASDFPCRVAKLPKNKDEARYMEKEFEQIDKGAGVDADIRHEMEKEFEQIDK SGSWAAIYQDIRHE VAKLPKNKNR:p=0.96 YMEKEFEQIDK:p=0.65 EADDNDYINASLIK:p=0.83 P = 1 – (1-0.83)(1-0.65)(1-0.96) 40

Hanyang Univ. Protein Assignment >Protein A MEMEKEFEQIDKSGSWAAIYQDIDVGAEDFPCRVAKLPKNKNRNRYRDVSPFD HSRKREADDNDYINASLIKMEEAQRSYILTQQIDKSGSWAAIYQDIRHEASDF HEASDFPCRVAKLPKNKDEARYMEKEFEQIDKGAGVDADIRHEMEKEFEQIDK SGSWAAIYQDIRHE EADDNDYINASLIK:p=0.83 EADDNDYINASLIK:p=0.62 EADDNDYINASLIK:p=0.95 Probability(Protein A)=? 41

Hanyang Univ. Protein Assignment - ProteinProphet 42

Hanyang Univ. Protein Assignment - ProteinProphet Probabilistic Model with NSP(Number of Sibling Peptides) as a random var. 43

Hanyang Univ. Protein Assignment - ProteinProphet Degenerate Peptides (alternative splicing, paralogs, database redundancies) 44

Hanyang Univ. Protein Assignment - IDPicker Bipartite graph A.Initialize B.Collapse C.Separate D.Reduce 45

Hanyang Univ. Peptide Quantitation Labeled Quantitation  Use of stable isotope containing compound  Peptide assignment from MS/MS  Peptide Quantitation from MS: single ion chromatogram 46

Hanyang Univ. Peptide Quantitation Label-free Quantitation  Matching peptide features  AMT (Accurate Mass & Time) approach — normalized elution time  Spectral counting — number of spectra identified for a given peptide 47

Hanyang Univ. Pipeline : integrated tools for MS/MS proteomics Input Spectrum data (Protein database) Peptide assignment SEQUEST PEAKS MODi Peptide validation manual validation PeptideProphet Target/Decoy Protein assignment & validation ProteinProphet IDPicker Output Interpretation Quantitation ASAPRatio MaxQuant 48

Hanyang Univ. 49