Institute on Research and Statistics, Sacramento 04/08/04

Slides:



Advertisements
Similar presentations
Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.
Advertisements

Calibration Techniques
S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access.
Instrumental Analysis
Imaging MS MIAPE Working Document Helmholtz Institute, Munich, April 16 th 2012.
1 Chapter 13 Nuclear Magnetic Resonance Spectroscopy.
NMR-Part CNMR Video 2 Features of 13 CNMR 1) Low Natural Abundance: Since most polymers are composed of hydrogen and carbon, the natural alternative.
PCA for analysis of complex multivariate data. Interpretation of large data tables by PCA In industry, research and finance the amount of data is often.
Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.
RTI International RTI International is a trade name of Research Triangle Institute. NMR Hands On UAB Metabolomics Training Course June 02-05,
An Introduction to Multivariate Analysis
Methods: Metabolomics Workflow Introduction Figure 1a: 1 H NMR spectrum of blood serum sample from a breast cancer patient. Results The emerging area of.
Advanced Higher Unit 3 Nuclear Magnetic Resonance Spectroscopy.
S-SENCE Signal processing for chemical sensors Martin Holmberg S-SENCE Applied Physics, Department of Physics and Measurement Technology (IFM) Linköping.
NMR Spectroscopy.
NMR SPECTROSCOPY.
Chapter 17 Overview of Multivariate Analysis Methods
Principal Component Analysis
Error Propagation. Uncertainty Uncertainty reflects the knowledge that a measured value is related to the mean. Probable error is the range from the mean.
The Hidden Message Some useful techniques for data analysis Chihway Chang, Feb 18’ 2009.
Metabolomics Bob Ward German Lab Food Science and Technology.
09/05/2005 סמינריון במתמטיקה ביולוגית Dimension Reduction - PCA Principle Component Analysis.
Theodore Alexandrov, Michael Becker, Sören Deininger, Günther Ernst, Liane Wehder, Markus Grasmair, Ferdinand von Eggeling, Herbert Thiele, and Peter Maass.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Prof.Dr.Cevdet Demir
Principal Component Analysis Principles and Application.
Metabolomic Data Processing & Statistical Analysis
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
1 Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc.
Physical Chemistry 2 nd Edition Thomas Engel, Philip Reid Chapter 28 Nuclear Magnetic Resonance Spectroscopy.
Practical Issues in Microarray Data Analysis Mark Reimers National Cancer Institute Bethesda Maryland.
Nuclear Magnetic Resonance (NMR) Spectroscopy Structure Determination
Discriminant Function Analysis Basics Psy524 Andrew Ainsworth.
BACKGROUND LEARNING AND LETTER DETECTION USING TEXTURE WITH PRINCIPAL COMPONENT ANALYSIS (PCA) CIS 601 PROJECT SUMIT BASU FALL 2004.
Chapter 3 Nuclear Magnetic Resonance Spectroscopy Many atomic nuclei have the property of nuclear spin. When placed between the poles of a magnet, the.
Nuclear Magnetic Resonance Spectroscopy (NMR) Dr AKM Shafiqul Islam School of Bioprocess Engineering.
Nuclear Magnetic Resonance ANIMATED ILLUSTRATIONS MS Powerpoint Presentation Files Uses Animation Schemes as available in MS XP or MS 2003 versions A class.
EEG Classification Using Maximum Noise Fractions and spectral classification Steve Grikschart and Hugo Shi EECS 559 Fall 2005.
ASCA: analysis of multivariate data from an experimental design, Biosystems Data Analysis group Universiteit van Amsterdam.
SPH 247 Statistical Analysis of Laboratory Data April 9, 2013SPH 247 Statistical Analysis of Laboratory Data1.
Metabolomics Metabolome Reflects the State of the Cell, Organ or Organism Change in the metabolome is a direct consequence of protein activity changes.
Extracting quantitative information from proteomic 2-D gels Lecture in the bioinformatics course ”Gene expression and cell models” April 20, 2005 John.
Figure 8.3 gives the basic layout of a continuous wave NMR spectrometer. These intruments were the original type of instrument and have largely.
Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Richard Brereton
Nuclear Magnetic Resonance
Metabolomics MS and Data Analysis PCB 5530 Tom Niehaus Fall 2015.
The most important instrumental technique used by organic chemists to determine the structure of organic compounds. NMR spectroscopy helps to identify.
1 Robustness of Multiway Methods in Relation to Homoscedastic and Hetroscedastic Noise T. Khayamian Department of Chemistry, Isfahan University of Technology,
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
1. What is an NMR Spectrum ? 2. What are the Spectral Features? 3. What are the Spectral Parameters? 4. How much should be known about the NMR Phenomena.
Spectroscopy 3: Magnetic Resonance CHAPTER 15. Conventional nuclear magnetic resonance Energies of nuclei in magnetic fields Typical NMR spectrometer.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Strategies for Metabolomic Data Analysis Dmitry Grapov, PhD.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
PREDICT 422: Practical Machine Learning
Background on Classification
Exploring Microarray data
Principal Component Analysis (PCA)
Nat. Rev. Nephrol. doi: /nrneph
Introduction to Statistical Methods for Measuring “Omics” and Field Data PCA, PcoA, distance measure, AMOVA.
X.1 Principal component analysis
Nuclear Magnetic Resonance (NMR) Spectroscopy
Softberry Mass Spectra (SMS) processing tools
Presentation transcript:

Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data Institute on Research and Statistics, Sacramento 04/08/04 Parul Vora Purohit

Biodata and ‘omics Genome Project Analytical techniques Genomics - Study of Genes Proteomics - Study of proteins Metabolomics - Study of metabolites * cellomics, CHOmics, chromonoics, etc. Analytical techniques Microarray Spectroscopy Mass Spectroscopy NMR Spectroscopy *

NMR Spectroscopy Intense homogenous and magnetic field High Powered RF transmittor capable of delivering short pulses ~ 500 MHz stimulate 1H nuclear spin transitions Probe which enables the coils used to excite and detect the signal Plot of signal vs shift in frequency from original pulse Measured in ppm (ratio from the original signal) Curtsey ~ Joseph Medendorp / Public Information / University of Kentucky

NMR Data Allows detection of compounds with H content Shift characterizes the chemicals (metabolites) Examples: 2.14 ppm – glutamine – γ CH2 group 2.27 ppm - valine – β CH group 6.91 ppm – tyrosine – C3, 5H ring ~65,000 points (variables) per sample

Questions Classification ~ Can we distinguish sick organisms from the healthy ones? Identification ~ Which metabolites play a role in the disease (biomarker)? DIFFERENCES IN THE DETAILS!

Abalone Data A set of 18 abalone Tissue from muscle Questions : 8 healthy, 5 stunted, 5 sick Tissue from muscle Questions : Can we classify the abalone accurately ? Can we detect any metabolites that are markers?

Problems / Solutions Multivariate Techniques Matrix of 65,000 (variables) x 18 (samples) Too many variables as compared to the number of samples Dimension Reduction by Binning Classification and metabolite marker identification using PCA and Cluster Analysis Methods assume that the data is normally distributed with a constant variance Generalized Log Transformation improves results!

NMR Data Pre-Processing Background Subtraction ‘TMSP Peak (standard at 0 ppm removed) Water Peak Removal 4.72-4.96 ppm removed) Normalization Integrated Intensity normalized to 1.0 to remove the effects of systematic intensity changes between abalone Binning / Size

Binned Spectrum Bin Size Range = 0.00125 ppm – 0.7 ppm Bin Size = .04 ppm 239 Bins Bin Size Range = 0.00125 ppm – 0.7 ppm Intensity of Bin = Integrated Intensity of all points in Bin Restricted Region of interest to 0.2 ppm – 10.0 ppm

Principal Component Analysis (PCA) Technique that allows for the explanation of the variance-covariance of the variables in terms of a linear combination of them X = t1pT1 + t2pT2 + …+ tkpTk + E pi - eigenvectors Projections of the original data matrix on these components give the relations between the samples – Scores Plot A plot of the eigenvectors of the covariance matrix gives a relationship between the variables – Loadings Plot Reduces the dimension of the problem; a few components suffice to explain the variance * Courtesy Wise, B. M. and Gallagher, N. B., PLS_Toolbox 2.1

PCA Results Scores Plot Loadings Plot

Cluster Analysis - Hierarchical Transformed Data – Groups Clearly Identified Untransformed Data

Generalized Log Transformation Shown* that a transformation of the form f(y) = ln( y + (y2 + c) ) can lead to a variance stabilizing effect on the data The parameter c can be obtained by Maximum Likelihood or ANOVA methods and is ~ of the value c ~ σ2 / S2 where σ2 is the variance of the noise and S2 the variance of the high peaks *Durbin, B., Hardin, J., Rocke, D. M., Bioinformatics, 2002, 18, s105-s110 * Sue Geller, Jeff Gregg, Paul Hagerman, David Rocke, Transformation and Normalization of Oligonucleotide Microarray Data, 2003

Maximum Likelihood* Need replicates to determine accurate the SSE (c) Find c for the minimum SSE Find c steps using Newton’s method or educated intervals * Box, G. and Cox. D.R. (1964) An Analysis of transformations. J. roy. Stat. Soc.. Series B (Methodological), 26, 211. Error Sum of Squares c

Transformed Spectrum Calculate ‘c’ using the replicate data by maximum likelihood methods Use transformation of the form using replicates, Transform data to stabilize the variance f(y) = ln( y + (y2 + c) ) Bin Size = .04 ppm 239 Bins, c = 2.7e-7

Stabilized Variance Bin Size = .04ppm Bin Size = .04ppm C = 2.7E-7

Scores Plot – Transformation Effects Untransformed Data Transformed Data

Loadings Plot – Transformation Effects Untransformed Data Transformed Data

Cluster Analysis - Hierarchical Transformed Data – Groups Clearly Identified Untransformed Data

Raw Spectra – Significant Bins Healthy Stunt. Sick Healthy Stunt. Sick Glycogen, Sucrose, Fructose ? Bin 124 – 5.38 ppm Bin 76 – 3.22 ppm Bin 125 – 5.42 ppm Bin 77 – 3.26 ppm Bin 126 – 5.46 ppm Bin 78 – 3.3 ppm

Conclusions Demonstrated the use of data reduction techniques, multi-variate techniques for studying NMR and Mass Spectrometer data Demonstrated the use of these techniques to identify metabolite and protein bio-markers Showed the usefulness of transformations in rendering the data more useful

Acknowledgements David M. Rocke, CIPIC David L. Woodruff, CIPIC Mark R. Viant, U. of Birmingham, U. K.