A Multi-PCA Approach to Glycan Biomarker Discovery using Mass Spectrometry Profile Data Anoop Mayampurath, Chuan-Yih Yu Info-690 (Glycoinformatics) Final.

Slides:



Advertisements
Similar presentations
Molecular Systems Biology 3; Article number 140; doi: /msb
Advertisements

Protein Quantitation II: Multiple Reaction Monitoring
Supervised and unsupervised analysis of gene expression data Bing Zhang Department of Biomedical Informatics Vanderbilt University
Gene Shaving – Applying PCA Identify groups of genes a set of genes using PCA which serve as the informative genes to classify samples. The “gene shaving”
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no Anindya Bhattacharya and Rajat K. De.
Principal Component Analysis
Automatic annotation of N-glycan species in MALDI-TOF-TOF spectra for rapid profiling and comparing Chuan-Yih, Yu Capstone Advisor: Prof. Haixu.
Microarray Data Preprocessing and Clustering Analysis
Biomarker discovery by automatic annotation of N-glycan species in MALDI-TOF-TOF spectra Chuan-Yih, Yu Capstone Advisor: Prof. Haixu Tang Indiana.
Canonical Correlation: Equations Psy 524 Andrew Ainsworth.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Smart Templates for Chemical Identification in GCxGC-MS QingPing Tao 1, Stephen E. Reichenbach 2, Mingtian Ni 3, Arvind Visvanathan 2, Michael Kok 2, Luke.
09/05/2005 סמינריון במתמטיקה ביולוגית Dimension Reduction - PCA Principle Component Analysis.
Theodore Alexandrov, Michael Becker, Sören Deininger, Günther Ernst, Liane Wehder, Markus Grasmair, Ferdinand von Eggeling, Herbert Thiele, and Peter Maass.
Microarray analysis Algorithms in Computational Biology Spring 2006 Written by Itai Sharon.
Evaluation of Two Methods to Cluster Gene Expression Data Odisse Azizgolshani Adam Wadsworth Protein Pathways SoCalBSI.
Biomarker discovery by automatic annotation of N-glycan species in MALDI-TOF-TOF spectra Chuan-Yih, Yu Capstone Advisor: Prof. Haixu Tang.
CISC667, F05, Lec24, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) DNA Microarray, 2d gel, MSMS, yeast 2-hybrid.
Fuzzy K means.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
ProReP - Protein Results Parser v3.0©
ICA-based Clustering of Genes from Microarray Expression Data Su-In Lee 1, Serafim Batzoglou 2 1 Department.
Computational Methods for Biomarker Discovery in Proteomics and Glycomics Vijetha Vemulapalli School of Informatics Indiana University Capstone Advisor:
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Fa 05CSE182 CSE182-L9 Mass Spectrometry Quantitation and other applications.
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
A hybrid method for gene selection in microarray datasets Yungho Leu, Chien-Pan Lee and Ai-Chen Chang National Taiwan University of Science and Technology.
Proteomics Informatics – Data Analysis and Visualization (Week 13)
CSE182 L14 Mass Spec Quantitation MS applications Microarray analysis.
2007 GeneSpring MS GeneSpring for Metabolite BioMarker Analysis using Mass Spectrometry data Agilent Q-TOF VIP Visit Jan 16-17, 2007 Santa Clara, CA Thon.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
Automatic annotation of N-glycans in MALDI-TOF spectra for rapid glycan profiling and comparison Chuan-Yih, Yu Capstone Presentation Advisor:
Acknowledgements This work is supported by NSF award DBI , and National Center for Glycomics and Glycoproteomics, funded by NIH/NCRR grant 5P41RR
Identification of Cancer-Specific Motifs in
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.
Automatic Analysis of Ion Mobility Spectrometry – Mass Spectrometry (IMS-MS) Data Hyejin Yoon School of Informatics Indiana University Bloomington December.
LC-MS Based Detection and Quantification of N-glycans in Human Serum Samples Tsung-Heng Tsai¹, Minkun Wang¹, Cristina Di Poto¹, Yi Zhao¹, Yunli Hu², Shiyue.
Quantification of Membrane and Membrane- Bound Proteins in Normal and Malignant Breast Cancer Cells Isolated from the Same Patient with Primary Breast.
Ranjit Ganta, Raj Acharya, Shruthi Prabhakara Department of Computer Science and Engineering, Penn State University DATA WAREHOUSE FOR BIO-GEO HEALTH CARE.
High throughput Protein Measurement Techniques Harin Kanani.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
Clustering of MS/MS spectra for glycan biomarker discovery Anoop Mayampurath, Chuan-Yih Yu.
EECS 730 Introduction to Bioinformatics Microarray Luke Huan Electrical Engineering and Computer Science
SVM-based techniques for biomarker discovery in proteomic pattern data Elena Marchiori Department of Computer Science Vrije Universiteit Amsterdam.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Pan-cancer analysis of prognostic genes Jordan Anaya Omnes Res, In this study I have used publicly available clinical and.
Learning disjunctions in Geronimo’s regression trees Felix Sanchez Garcia supervised by Prof. Dana Pe’er.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
CSE182 L14 Mass Spec Quantitation MS applications Microarray analysis.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Principal Components Analysis ( PCA)
Cluster Analysis of Gene Expression Profiles
Research in Computational Molecular Biology , Vol (2008)
Day 2: Session 8: Questions and follow-up…. James C. Fleet, PhD
Proteomics Informatics David Fenyő
miRNA expression patterns in stools from healthy subjects.
Kiyoko F. Aoki-Kinoshita Dept. of Bioinformatics, Soka University
NoDupe algorithm to detect and group similar mass spectra.
Pierre P. Massion, MD, Richard M. Caprioli, PhD 
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Yamanishi, M., Itoh, M., Kanehisa, M.
Proteomics Informatics David Fenyő
Global analysis of the chemical–genetic interaction map.
Presentation transcript:

A Multi-PCA Approach to Glycan Biomarker Discovery using Mass Spectrometry Profile Data Anoop Mayampurath, Chuan-Yih Yu Info-690 (Glycoinformatics) Final Project Presentation

Background [1] Kyselova et al. “Alterations in the Serum Glycome Due to Metastatic Prostate Cancer “ Journal of Proteome Research, 2007, 6:

[2] Tang et. al “Identification of N-Glycan Serum Markers Associated with Hepatocellular Carcinoma from Mass Spectrometry Data” Journal of Proteome Research, 2009, Article ASAP [3] Ressom et. al “Analysis of MALDI-TOF Mass Spectrometry Data for Discovery of Peptide and Glycan Biomarkers of Heptacelluar Carcinoma, Journal of Proteome Research, 2008, 7:603

Objective Given a set of N mass spectra(disease and healthy), develop an algorithm that identifies “significant” spectra and glycan peaks ▫From the significant glycan peaks  Nature of regulation between disease and healthy  Study of effects such as fucosylation and linkage ▫From the significant spectra  A smaller set of spectra m << N that help in analysis  Glycan annotation  Check for overlapping glycans What is meant by “significant”? ▫Elements that exhibit coherent patterns and large variation between disease and healthy Datasets ▫151 MALDI TOF mass spectra : 73 cancer, 78 normal

Data Processing - MultiNGlycan

Details ▫Background subtraction ▫Peak Picking ▫Identification of common glycans across all 151 spectra ▫Filtering using Fit Coefficient cutoff > 0.5  30% of spectra has glycan fit coefficient greater that 0.5, then retain A Nxp matrix X is obtained (N : number of glycans, p: number of spectra)

Multi-PCA algorithm Perform PCA Perform inner-product Sort glycans by inner product (which measure correlation) Shave off 10% of glycans with the lowest inner product score Repeat [4] Hastie et. al ‘‘Gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns’, Genome Biology 2000, 1(2):1-21

Multi-PCA Algorithm X Sort by inner product, shave of 10% of glycans -The algorithm was iterated until 10 glycan values were acquired. The glycans are supposed to be coherent in intensity changes while having high variance between cancer and no cancer - We also switched dimensions to shave off spectra. The algorithm was iterated until we got 6 spectra

Results Mass value Total Intensity

Filtered out Not present in original composition file

Mass value Total Intensity

Significant Spectra No overlapping glycans were found

Future Directions Fragmentation of glycans to study effect of linkage among glycans Glycan microarray More detail on overlapping glycans (substitute single score by combined score) Orthogonalize the data to see other patterns.

Acknowledgements Prof. Haixu Tang, School of Informatics & Computing Prof. Yehia Mechref, Dept of Chemistry