Strategies for Metabolomic Data Analysis Dmitry Grapov, PhD.

Slides:



Advertisements
Similar presentations
StatisticalDesign&ModelsValidation. Introduction.
Advertisements

Regression Bivariate Multivariate. Simple Regression line – line of best fit Ŷ = a + bX The mean point is always on the line Y is the criterion variable.
The General Linear Model Or, What the Hell’s Going on During Estimation?
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
Learning Objectives 1 Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Psychology 202b Advanced Psychological Statistics, II February 1, 2011.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
Educational Research by John W. Creswell. Copyright © 2002 by Pearson Education. All rights reserved. Slide 1 Chapter 8 Analyzing and Interpreting Quantitative.
Today Concepts underlying inferential statistics
Measures of Association Deepak Khazanchi Chapter 18.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Lorelei Howard and Nick Wright MfD 2008
Chapter 14 Inferential Data Analysis
Slide 1 Testing Multivariate Assumptions The multivariate statistical techniques which we will cover in this class require one or more the following assumptions.
T-tests and ANOVA Statistical analysis of group differences.
Inferential statistics Hypothesis testing. Questions statistics can help us answer Is the mean score (or variance) for a given population different from.
The Practice of Social Research
1 Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc.
Correlation and Regression
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Statistical Methods For Health Research. History Blaise Pascl: tossing ……probability William Gossett: standard error of mean “ how large the sample should.
Simple Linear Regression
Multivariate Statistical Data Analysis with Its Applications
Class Meeting #11 Data Analysis. Types of Statistics Descriptive Statistics used to describe things, frequently groups of people.  Central Tendency 
Learning Objective Chapter 14 Correlation and Regression Analysis CHAPTER fourteen Correlation and Regression Analysis Copyright © 2000 by John Wiley &
Choosing and using statistics to test ecological hypotheses
Statistics Definition Methods of organizing and analyzing quantitative data Types Descriptive statistics –Central tendency, variability, etc. Inferential.
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
Statistical Analysis. Statistics u Description –Describes the data –Mean –Median –Mode u Inferential –Allows prediction from the sample to the population.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Examining Relationships in Quantitative Research
ANOVA and Linear Regression ScWk 242 – Week 13 Slides.
Linear correlation and linear regression + summary of tests
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
 Muhamad Jantan & T. Ramayah School of Management, Universiti Sains Malaysia Data Analysis Using SPSS.
Chapter 16 Data Analysis: Testing for Associations.
Academic Research Academic Research Dr Kishor Bhanushali M
Topic: Quadratics and Complex Numbers Grade: 10 Key Learning(s): Analyzes the graphs of and solves quadratic equations and inequalities by factoring, taking.
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
ANALYSIS PLAN: STATISTICAL PROCEDURES
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
Examining Relationships in Quantitative Research
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Statistical Selection Chart. For 2 samples ASK You say you want to compare! How many samples? Are my samples related? OR Are they independent?
Introduction to Biostatistics and Bioinformatics Regression and Correlation.
Multivariate Data Analysis Chapter 2 – Examining Your Data
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Copyright © 2011, 2005, 1998, 1993 by Mosby, Inc., an affiliate of Elsevier Inc. Chapter 19: Statistical Analysis for Experimental-Type Research.
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Power Point Slides by Ronald J. Shope in collaboration with John W. Creswell Chapter 7 Analyzing and Interpreting Quantitative Data.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Advanced Strategies for Metabolomic Data Analysis Dmitry Grapov, PhD.
Appendix I A Refresher on some Statistical Terms and Tests.
Thursday, May 12, 2016 Report at 11:30 to Prairieview
Statistical Data Analysis
Correlation, Bivariate Regression, and Multiple Regression
CH 5: Multivariate Methods
Part Three. Data Analysis
CH2. Cleaning and Transforming Data
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
Presentation transcript:

Strategies for Metabolomic Data Analysis Dmitry Grapov, PhD

Goals?

Metabolomics

Analytical Dimensions Samples variables

Analyzing Metabolomic Data Pre-analysis Data properties Statistical approaches Multivariate approaches Systems approaches

Pre-analysis Data quality metrics precision accuracy Remedies normalization outliers detection missing values imputation

Normalization sample-wise sum, adjusted measurement-wise transformation (normality) encoding (trigonometric, etc.) mean standard deviation

Outliers single measurements (univariate) two compounds (bivariate)

Outliers univariate/bivariate vs. \ multivariate mixed up samples outliers?

X -0.5 X Transformation logarithm (shifted) power (BOX-COX) inverse Quantile-quantile (Q-Q) plots are useful for visual overview of variable normality

Missing Values Imputation Why is it missing? random systematic analytical biological Imputation methods single value (mean, min, etc.) multiple multivariate mean PCA

Goals for Data Analysis Are there any trends in my data? –analytical sources –meta data/covariates Useful Methods –matrix decomposition (PCA, ICA, NMF) –cluster analysis Differences/similarities between groups? –discrimination, classification, significant changes Useful Methods –analysis of variance (ANOVA) –partial least squares discriminant analysis (PLS-DA) –Others: random forest, CART, SVM, ANN What is related or predictive of my variable(s) of interest? –regression Useful Methods –correlation –partial least squares (PLS) Exploration Classification Prediction

Data Structure univariate: a single variable (1-D) bivariate: two variables (2-D) multivariate: 2 > variables (m-D) Data Types continuous discreet binary

Data Complexity n m 1-D2-D m-D Data samples variables complexity Meta Data Experimental Design = Variable # = dimensionality

Univariate Analyses univariate properties length center (mean, median, geometric mean) dispersion (variance, standard deviation) Range (min / max) mean standard deviation

Univariate Analyses sensitive to distribution shape parametric = assumes normality error in Y, not in X (Y = mX + error) optimal for long data assumed independence false discovery rate long wide n-of-one

False Discovery Rate (FDR) univariate approaches do not scale well Type I Error: False Positives Type II Error: False Negatives Type I risk = 1-(1-p.value) m m = number of variables tested

FDR correction Example: Design: 30 sample, 300 variables Test: t-test FDR method: Benjamini and Hochberg (fdr) correction at q=0.05 Bioinformatics (2008) 24 (12): Results FDR adjusted p-values (fdr) or estimate of FDR (Fdr, q-value)

Achieving “significance” is a function of: significance level (α) and power (1-β ) effect size (standardized difference in means) sample size (n)

Bivariate Data relationship between two variables correlation (strength) regression (predictive) correlation regression

Correlation Parametric (Pearson) or rank-order (Spearman, Kendall) correlation is covariance scaled between -1 and 1

Correlation vs. Regression Regression describes the least squares or best-fit- line for the relationship (Y = m*X + b)

Bivariate Example Goal: Don’t miss eruption! Data time between eruptions – 70 ± 14 min duration of eruption – 3.5 ± 1 min Azzalini, A. and Bowman, A. W. (1990). A look at some data on the Old Faithful geyser. Applied Statistics 39, 357–365 Old Faithful, Yellowstone, WY

Bivariate Example Two cluster pattern for both duration and frequency Azzalini, A. and Bowman, A. W. (1990). A look at some data on the Old Faithful geyser. Applied Statistics 39, 357–365

Bivariate Example Noted deviations from two cluster pattern –Outliers? –Covariates?

Covariates Trends in data which mask primary goals can be accounted for using covariate adjustment and appropriate modeling strategies

Bivariate Example Noted deviations from two cluster pattern can be explained by covariate: Hydrofraking  Covariate adjustment is an integral aspect of statistical analyses (e.g. ANCOVA)

Summary Data exploration and pre-analysis: increase robustness of results guards against spurious findings Can greatly improve primary analyses Univariate Statistics: are useful for identification of statically significant changes or relationships sub-optimal for wide data best when combined with advanced multivariate techniques

Resources Web-based data analysis platforms MetaboAnalyst( ) MeltDB( ) Programming tools The R Project for Statistical Computing( ) Bioconductor( ) GUI tools imDEV( )