Automatic annotation of N-glycans in MALDI-TOF spectra for rapid glycan profiling and comparison Chuan-Yih, Yu 2010.05.14 Capstone Presentation Advisor:

Slides:



Advertisements
Similar presentations
Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
Advertisements

Protein Quantitation II: Multiple Reaction Monitoring
Improvements in Mass Spectrometry for Life Science Research – Does Agilent Have the Answer? Ashley Sage PhD.
De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides Hannu Peltoniemi
A Multi-PCA Approach to Glycan Biomarker Discovery using Mass Spectrometry Profile Data Anoop Mayampurath, Chuan-Yih Yu Info-690 (Glycoinformatics) Final.
In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics.
Evaluation of Peptides in Wisconsin Beer Mckenna L. Missfeldt, Dr. Jennifer Grant, University of Wisconsin-Stout Abstract Matrix-assisted laser desorption/ionization.
N-Glycopeptide Identification from CID Tandem Mass Spectra using Glycan Databases and False Discovery Rate Estimation Kevin B. Chandler, Petr Pompach,
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
1 InCoB 2009, Singapore Ren é Hussong et al. Highly accelerated feature detection in mass spectrometry data using modern graphics processing units Bioinformatics.
Knowledge Enabled Information and Services Science What can SW do for HCLS today? Panel at HCSL Workshop, WWW2007 Amit Sheth Kno.e.sis Center Wright State.
Automatic annotation of N-glycan species in MALDI-TOF-TOF spectra for rapid profiling and comparing Chuan-Yih, Yu Capstone Advisor: Prof. Haixu.
Biomarker discovery by automatic annotation of N-glycan species in MALDI-TOF-TOF spectra Chuan-Yih, Yu Capstone Advisor: Prof. Haixu Tang Indiana.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
PROTEIN IDENTIFICATION BY MASS SPECTROMETRY. OBJECTIVES To become familiar with matrix assisted laser desorption ionization-time of flight mass spectrometry.
Machine Learning techniques for biomarker discovery in proteomic pattern data Elena Marchiori Department of Computer Science Vrije Universiteit Amsterdam.
Biomarker discovery by automatic annotation of N-glycan species in MALDI-TOF-TOF spectra Chuan-Yih, Yu Capstone Advisor: Prof. Haixu Tang.
ProReP - Protein Results Parser v3.0©
Computational Methods for Biomarker Discovery in Proteomics and Glycomics Vijetha Vemulapalli School of Informatics Indiana University Capstone Advisor:
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Announcements: Proposal resubmissions are due 4/23. It is recommended that students set up a meeting to discuss modifications for the final step of the.
Proteomics Informatics Workshop Part I: Protein Identification
Previous Lecture: Regression and Correlation
Mass Spectrometry. What are mass spectrometers? They are analytical tools used to measure the molecular weight of a sample. Accuracy – 0.01 % of the total.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
My contact details and information about submitting samples for MS
A Neural Network Predictor for Peptide Fragmentation in Mass Spectrometry Arunima Ram Advisor : Dr. Predrag Radivojac Co-Advisor : Dr. Haixu Tang Co-Advisor.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
EUROCarbDB CCRC – Database for high quality mass spectrometry data Khalifeh Al Jadda 1, Haseeb Yousef 1, Kitae Myong 1, Srikalyan Swayampakula 1, David.
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
Mass Spectrometry I Basic Data Processing. Mass spectrometry A mass spectrometer measures molecular masses. The mass unit is called dalton, which is 1/12.
UPDATE! In-Class Wed Oct 6 Latil de Ros, Derek Buns, John.
Acknowledgements This work is supported by NSF award DBI , and National Center for Glycomics and Glycoproteomics, funded by NIH/NCRR grant 5P41RR
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Laxman Yetukuri T : Modeling of Proteomics Data
High throughput Protein Measurement Techniques Harin Kanani.
Lecture 9. Functional Genomics at the Protein Level: Proteomics.
Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac.
Clustering of MS/MS spectra for glycan biomarker discovery Anoop Mayampurath, Chuan-Yih Yu.
Genomics II: The Proteome Using high-throughput methods to identify proteins and to understand their function.
Software Project MassAnalyst Roeland Luitwieler Marnix Kammer April 24, 2006.
PEAKS: De Novo Sequencing using Tandem Mass Spectrometry Bin Ma Dept. of Computer Science University of Western Ontario.
Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?
SVM-based techniques for biomarker discovery in proteomic pattern data Elena Marchiori Department of Computer Science Vrije Universiteit Amsterdam.
Glycan database. Database of molecules Two models (of vocabularies) – Proteins / Nucleic Acids Residues (+ modifications) Genbank / Swissprot – Compounds.
Glycoprotein Microheterogeneity via N-Glycopeptide Identification Kevin Brown Chandler, Petr Pompach, Radoslav Goldman, Nathan Edwards Georgetown University.
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Bioinformatics Research Overview Outline Biomedical Ontologies oGlycO oEnzyO oProPreO Scientific Workflow for analysis of Proteomics Data Framework for.
A New Strategy of Protein Identification in Proteomics Xinmin Yin CS Dept. Ball State Univ.
Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
ISOMATCH-web For automatic matching of isotope peak distributions ■ Automatic matching of a raw spectrum (ASCII format) to theoretical isotopic distributions.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
What is proteomics? Richard Mbasu and Ben Richards.
Using Scaffold OHRI Proteomics Core Facility. This presentation is intended for Core Facility internal training purposes only.
RANIA MOHAMED EL-SHARKAWY Lecturer of clinical chemistry Medical Research Institute, Alexandria University MEDICAL RESEARCH INSTITUTE– ALEXANDRIA UNIVERSITY.
Carbonyl-Reactive Tandem Mass Tags for the Proteome-Wide Quantification of N-Linked Glycans Hannes Hahne, Patrick Neubert, Karsten Kuhn, Chris Etienne,
Post translational modification n- acetylation Peptide Mass Fingerprinting (PMF) is an analytical technique for identifying unknown protein. Proteins to.
Mass Spectrometry makes it possible to measure protein/peptide masses (actually mass/charge ratio) with great accuracy Major uses Protein and peptide identification.
The Syllabus. The Syllabus Safety First !!! Students will not be allowed into the lab without proper attire. Proper attire is designed for your protection.
Mass spectrometry-based proteomics
V. Protein Chips 1. What is Protein Chips 2. How to Make Protein Chips
Proteomics Informatics David Fenyő
Kiyoko F. Aoki-Kinoshita Dept. of Bioinformatics, Soka University
Proteomics Informatics –
Pierre P. Massion, MD, Richard M. Caprioli, PhD 
Shotgun Proteomics in Neuroscience
Proteomics Informatics David Fenyő
Kuen-Pin Wu Institute of Information Science Academia Sinica
Presentation transcript:

Automatic annotation of N-glycans in MALDI-TOF spectra for rapid glycan profiling and comparison Chuan-Yih, Yu Capstone Presentation Advisor: Prof. Haixu Tang Indiana University Bloomington School of Informatics and Computing

Outline Background Problem definition and goals Implementation of Multi N-Glycan Results Future work 1

Background Post-Translation Modification (PTM) –Enzyme-catalyzed protein modification after protein synthesized –Acetylation, Glycosylation, Methylation, Phosphorylation, Prenylation, and etc. >50% of all eukaryotic proteins are glycosylated 1 [Apweiler, et al.] 2 1.Apweiler, R., H. Hermjakob, and N. Sharon, On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. Biochim Biophys Acta, (1): p

Glycosylation Attachment of a glycan(sugar) to the peptide chain N-linked glycosylation –Nitrogen link to Asn –Asn-X-Ser(NXS) or Asn-X-Thr(NXT), X can be any but Pro (glycosylation sequon) –Core structure – 2 GlcNac + 3 Man –Glycosylation while folding O-linked glycosylation –Many different core structures –Serine or Threonine –Glycosylation after folding 3

N-linked glycosylation Tree structure Monosaccharides- building blocks of polysaccharide chain Diverse linkage – at most four branches Three types of N-linked glycan tree –High mannose –Complex –Hybrid Graphs: Varki, A., Essentials of glycobiology. 2nd ed. 2009, Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press. xxix, 784 p NameMolecular formula/ Structure Mannose (Man)C 6 H 12 O 6 Galactose (Gal)C 6 H 12 O 6 Fucose (Fuc)C 6 H 12 O 5 GlcNacC 8 H 15 NO 6 NeuNACC 11 H 19 NO 9 NeuNGCC 11 H 19 NO 10 4

Analytical strategies for analyzing glycans 5

Mass Spectrometry Wright scale of molecular High throughput, High accuracy, High sensitivity Ion Source –Electrospray ionization (ESI) –Matrix-assisted laser desorption/ionization (MALDI) Mass Analyzer –Time of flight (TOF) –Quadrupole –Fourier transform mass spectrometry (FTMS) Detector –Charge induced or the current produced 6

Mass Spectrometry Spectrum 7 Isotopic envelope

N-Glycan Profiling Given a MS spectrum screen which glycans present in this spectrum (annotation) and how abundance it is (quantification) 8

Problem Definition Glycan isotope envelope –Isotope present in the natural world different numbers of neutrons Graphs: Isotope Pattern Calculator v GlcNac + 9 Man = GlcNac + 3 Man = Mass% %

Problem Definition 10 7 GlcNac + 3 Man = GlcNac + 9 Man = ? Unknown 2 GlcNac + 9 Man =

Goals Annotation of N-glycan –Decompose observed isotopic envelopes into non-overlapping and overlapping isotopic envelopes of glycan –Quantify the relative abundance of glycan Glycan profile comparison –Report glycans that show significant different abundance between groups of samples –Discover glycan biomarkers 11

Glycans Annotation For each glycan ( i.e. monosaccharides composition) –412 different glycans [Krambeck, et al. ] 1 –Generate a theoretical isotope envelope –Calculate the correlation between the theoretical and observed isotope envelopes for each of following scenarios 1.Glycans 2.Glycans + Glycans, linear fitting applied 3.Glycans + Unknown, linear fitting applied –Mercury algorithm 2 - generate the unknown isotope envelopes 2.Rockwood, A., S. Van Orden, and R. Smith, Rapid Calculation of Isotope Distributions. Analytical Chemistry, : p Krambeck, F.J. and M.J. Betenbaugh, A mathematical model of N-linked glycosylation. Biotechnol Bioeng, (6): p

Three scenarios Experimental isotope envelope Glycan Correlation Score 13 Theoretical isotope envelope α Glycan β α Unknown β

Glycan Profiles Decompose the abundance for two glycans with overlapping isotopic envelopes 14 α Glycans β Experimental isotope envelope

Glycan Profile Comparison Comparison of glycan abundances in multiple samples Biomarker discovery –Given glycan spectra from multiple samples under different (e.g. disease vs. health) conditions –Goal: To find glycans with distinct abundances between samples Z Kyselova, Y. Mechref, M. M. Al Bataineh, L. E. Dobrolecki, R. J. Hickey, J. Vinson, C. J. Sweeney, and M. V. Novotny. Alterations in the serum glycome due to metastatic prostate cancer. Journal of Proteome Research, 6: ,

Approach Health spectra (H 1, H 2, H 3 …H k ) Disease spectra (D 1, D 2, D 3 …D k ) Remove the least significant component. Repeat until all the score above threshold. 1.Hastie, T., et al., 'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns. Genome Biol, (2): p. RESEARCH % identical with a cutoff at

Implementation of Multi N-Glycan Software Requirements –.net framework 2.0 using C# –C++ runtime –[R] for PCA analysis –Thermo Scientific Xcalibur Input –Spectrum File format: Plain text (Peak list), mzXML 1,RAW file (Thermo Scientific raw file) –N-Glycans list CSV file (User-defined); default define by [Krambeck, et al. ] Output –List of glycans with scores 1.Pedrioli, P., et al., A Common Open Representation of Mass Spectrometry Data and its Application in a Proteomics Research Environment. Nature Biotechnology, (11): p

Software Interface 18

Software features Signal preprocessing provided –Subtracting background –Smoothing and picking peaks –Tolerating mass accuracy Flexible parameters incorporate actual experiment Isotope envelopes generator Content rich output, supporting multiple formats –csv, text, html 19

Software screenshot 20 Html result export

Software screenshot 21

Result Data set [ Zhiqun T., et al] –Liver Cancer : 73 individuals –Health: 78 individuals 412 N-glycan are used Parameters –Correlation score < 0.5 will be discarded. –Present in >30% of all samples 22 1.Zhiqun T., et al., Identification of N-Glycan Serum Markers Associated with Hepatocellular Carcinoma from Mass Spectrometry Data. J Proteome Res, 2009

Result Derived from The Paper Filtered out 23 Low correlation score Overlap with 2192 Zhiqun T., et al., Identification of N-Glycan Serum Markers Associated with Hepatocellular Carcinoma from Mass Spectrometry Data. J Proteome Res, 2009 Identified

Result Derived from Multi N-Glycan 24 Confirmed result Distinct glycan

Future Work Test on more clinical samples Extend to O-glycan profiling Apply de novo glycan sequencing on reported glycan (ongoing) Connect reported glycans to glycan research literatures 25

Acknowledge Advisor: Prof. Haixu Tang Co-worker: Anoop Mayampurath Collaborator: Yehia Mechref, Department of Chemistry COL Lab members This work will be presented on May 26 th 2010, 58 th ASMS Conference Salt Lake City, Utah; and will be submitted to the Bioinformatics. This work is funded by NCI/NIH grant number 1 U01 A

Thank You