Glycoprotein Microheterogeneity via N-Glycopeptide Identification Kevin Brown Chandler, Petr Pompach, Radoslav Goldman, Nathan Edwards Georgetown University.

Slides:



Advertisements
Similar presentations
Protein Quantitation II: Multiple Reaction Monitoring
Advertisements

Proteomics Informatics – Protein characterization I: post-translational modifications (Week 10)
De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides Hannu Peltoniemi
Conclusion The workflow presented provides a strategy to incorporate unbiased glycopeptide identification to generate an initial list of targets for data.
Analysis of human haptoglobin, digest with trypsin and Glu-C – six putative N-motif peptides. Glycopeptide separation by hydrophilic interaction liquid.
Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown.
In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics.
N-Glycopeptide Identification from CID Tandem Mass Spectra using Glycan Databases and False Discovery Rate Estimation Kevin B. Chandler, Petr Pompach,
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
PepArML: A model-free, result-combining peptide identification arbiter via machine learning Xue Wu, Chau-Wen Tseng, Nathan Edwards University of Maryland,
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
Sangtae Kim Ph.D. candidate University of California, San Diego
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Proteomics Informatics Workshop Part I: Protein Identification
Previous Lecture: Regression and Correlation
Each results report will contain:
My contact details and information about submitting samples for MS
Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.
Facts and Fallacies about de Novo Sequencing & Database Search.
Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Proteomics Informatics Workshop Part III: Protein Quantitation
Proteomics Informatics Workshop Part II: Protein Characterization David Fenyö February 18, 2011 Top-down/bottom-up proteomics Post-translational modifications.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
Karl Clauser Proteomics and Biomarker Discovery Taming Errors for Peptides with Post-Translational Modifications Bioinformatics for MS Interest Group ASMS.
Production of polypeptides, Da, and middle-down analysis by LC-MSMS Catherine Fenselau 1, Joseph Cannon 1, Nathan Edwards 2, Karen Lohnes 1,
Introduction The GPM project (The Global Proteome Machine Organization) Salvador Martínez de Bartolomé Bioinformatics support –
Analysis of human haptoglobin, after digest with trypsin and Glu-C – six putative N-linked motif peptides. Glycopeptide separation by hydrophilic interaction.
Top-down characterization of proteins in bacteria with unsequenced genomes Nathan Edwards Georgetown University Medical Center.
Acknowledgements This work is supported by NSF award DBI , and National Center for Glycomics and Glycoproteomics, funded by NIH/NCRR grant 5P41RR
Common parameters At the beginning one need to set up the parameters.
Novel Empirical FDR Estimation in PepArML David Retz and Nathan Edwards Georgetown University Medical Center.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Meta-Search and Result Combining Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center.
A Phospho-Peptide Spectrum Library for Improved Targeted Assays Barbara Frewen 1, Scott Peterman 1, John Sinclair 2, Claus Jorgensen 2, Amol Prakash 1,
Laxman Yetukuri T : Modeling of Proteomics Data
Search Engine Result Combining Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center.
Protein bioinformatics and systems biology Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center.
Novel Algorithms for the Quantification Confidence in Quantitative Proteomics with Stable Isotope Labeling* Novel Algorithms for the Quantification Confidence.
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
Protein Identification by Database Searching John Cottrell Matrix Science.
Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center Application of meta-search, grid-computing, and machine-learning.
Protein Identification via Database searching Attila Kertész-Farkas Protein Structure and Bioinformatics Group, ICGEB, Trieste.
INF380 - Proteomics-71 INF380 – Proteomics Chap 7 –Protein Identification and Characterization by MS Protein identification in our context means that we.
Peptide Identification via Tandem Mass Spectrometry Sorin Istrail.
Improving the Sensitivity of Peptide Identification Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical.
Faster, more sensitive peptide identification from tandem mass spectra by sequence database compression Nathan J. Edwards Center for Bioinformatics & Computational.
Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.
Eat Raw & Fresh: Introducing isotopic Mass-to-charge Ratio and Envelope Fingerprinting (iMEF) and ProteinGoggle for Protein Database Search Zhixin(Michael)
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center Application of meta-search, grid-computing, and machine-learning.
The observed and theoretical peptide sequence information Cal.MassObserved. Mass ±da±ppmStart Sequence EndSequenceIon Score C.I%modification FLPVNEK.
Novel Peptide Identification using ESTs and Genomic Sequence Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland,
Application of meta-search, grid-computing, and machine-learning can significantly improve the sensitivity of peptide identification. The PepArML meta-search.
Deducing protein composition from complex protein preparations by MALDI without peptide separation.. TP #419 Kenneth C. Parker SimulTof Corporation, Sudbury,
ISA Kim Hye mi. Introduction Input Spectrum data (Protein database) Peptide assignment Peptide validation manual validation PeptideProphet.
Constructing high resolution consensus spectra for a peptide library
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Hanyang Univ. Introduction to Data Analyses for Mass Spectrometry-based Proteomics 1.
Database Search Algorithm for Identification of Intact Cross-Links in Proteins and Peptides Using Tandem Mass Sepctrometry 신성호.
Algorithms and Computation: Bottom-Up Data Analysis Workflows
MassMatrix Search Results Explained
Bioinformatics Solutions Inc.
Proteomics Informatics David Fenyő
Proteomics Informatics –
Top-down protein identification.
Sim and PIC scoring results for standard peptides and the test shotgun proteomics dataset. Sim and PIC scoring results for standard peptides and the test.
Proteomics Informatics David Fenyő
Operation manual of AI SIDA
Presentation transcript:

Glycoprotein Microheterogeneity via N-Glycopeptide Identification Kevin Brown Chandler, Petr Pompach, Radoslav Goldman, Nathan Edwards Georgetown University Medical Center

The challenge Identify glycopeptides in large-scale tandem mass-spectrometry datasets Many glycopeptide enriched fractions Many tandem mass-spectra / fraction Good, but not great, instrumentation QStar Elite – CID, good MS1/MS2 resolution Strive for hypothesis-generating analysis Site-specific glycopeptide characterization Glycoform occupancy in differentiated samples 2

Observations Oxonium ions (204, 366) help distinguish glycopeptides from peptides… …but do little to identify the glycopeptide Few peptide b/y-ions to identify peptides… …but intact peptide fragments are common If the peptide can be guessed, then… …the glycan's mass can be determined 3

Observations 4

Glycopeptide Search Strategy Glycan-Peptide to Spectrum Matches Multi-Peptide, Multi-Glycan Mass (Single Peptide), Single Glycan Mass, Single Glycan (Topology) 5

Compromises Single protein / Simple protein mixture Few peptides to distinguish Single N-glycan per peptide Subtraction from precursor Digest may not resolve site Need peptide/glycan fragments to distinguish Isobaric peptide-glycan pairs are not resolved Need peptide/glycan fragments to distinguish 6

Glycan Databases Link putative glycan masses to N-linked glycan structures (and organism, etc. ): Human N-linked GlycomeDB Cartoonist structure enumeration CFG Mammalian Array (v5.0) In-house database (Oxford notation) Database(s) provide "biased" search space: Coverage vs. "Reasonableness" Trade off: Time, Specificity, Biology 7

Haptoglobin (HPT_HUMAN) NLFLNHSE*NATAK MVSHHNLTTGATLINE VVLHPNYSQVDIGLIK Haptoglobin standard 8 N-glycosylation motif (NX/ST) * Site of GluC cleavage Pompach et al. Journal of Proteome Research 11.3 (2012): 1728–1740.

Haptoglobin standard 11 HILIC fractions enriched for glycopeptides 11 x LC-MS/MS acquisitions (≥ 15k spectra) 2887/3288 MS/MS spectra have oxonium ion(s) 317 have "intact-peptide" fragment ions 263 spectra matched to peptide-glycan pairs 52% matched single-glycan 8% matched multi-peptide 27 distinct (mass) glycans on 11 peptides Glycans identified on all 4 haptoglobin sites 9

Algorithms & Infrastructure Glycan databases indexed by composition, mass, N-linked, and motif/type Formats: IUPAC, Linear Code, GlycoCT_condensed Implemented: GlycomeDB, Cartoonist, CFG Array Monosaccharide decomposition of glycan mass Böcker et al. Efficient mass decomposition (2005) χ 2 Goodness-of-fit test for precursor cluster Theoretical isotope cluster from composition. ICScore based on χ 2 -test p-value. 10

False Discovery Rate (FDR) How confident can we be in these mass- matches? 11

False Discovery Rate (FDR) How confident can we be in these mass- matches? FDR: 3.9% [ ~ 10 / 263 spectra ] 12

False Discovery Rate (FDR) How confident can we be in these mass- matches? FDR: 3.9% [ ~ 10 / 263 spectra ] Estimate the number of errors by searching with non-N-linked motif (decoy) peptides too. Count spectra matched to decoy peptide-glycan pairs. Rescale decoy counts to balance the number of motif and non-motif peptides. 13

Tuning the filters… Adjusting thresholds and parameters to Increase specificity (lower FDR, fewer spectra), or Increase sensitivity (more spectra, higher FDR) 14

Tuning the filters… Oxonium ions: Number & intensity Match tolerance "Intact-peptide" fragments: Number & intensity Match tolerance Glycan composition: ICScore Constrain search space Match tolerance Glycan database: Constrain search space Match tolerance Precursor ion: Non-monoisotopic selection Sodium adducts Charge state Peptide search space: Semi-specific peptides Non-specific peptides Peptide MW range Variable modifications 15

Tuning the filters… 16

Tuning the filters… 17

GlycoPeptideSearch (GPS) 1.3 Freely available implementation Windows, Linux Reads open-format spectra (mzXML, MGF) Pre-indexed Glycan databases Human & Mammalian GlycomeDB Mammalian CFG Array (v5.0) User-Named (Oxford notation) In silico digest and N-linked motif identification Automatic target/decoy analysis for FDR 18

Where to from here? Demonstrate utility on new instrument platforms, proteins, samples Develop a scoring model for fragments Re-implement Cartoonist demerits Exploit relationships between MS 2 spectra, MS n spectra Explore application to O-glycopeptides, N-glycans, O-glycans 19

Edwards Lab (Georgetown) Kevin Brown Chandler [NSF] (Poster 32) Goldman Lab (Georgetown) Radoslav Goldman (Poster 6) Petr Pompach Miloslav Sanda (Poster 23) Marshal Bern (Xerox PARC) Cartoonist, Peptoonist Rene Ranzinger (CCRC) GlycomeDB Acknowledgements 20