Differential Methylation Analysis

Slides:



Advertisements
Similar presentations
Inferential Statistics and t - tests
Advertisements

Differential Methylation Analysis
Statistical Techniques I
BPS - 5th Ed. Chapter 241 One-Way Analysis of Variance: Comparing Several Means.
Visualising and Exploring BS-Seq Data
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Correlation and Autocorrelation
Part I – MULTIVARIATE ANALYSIS
Chapter 17 Comparing Two Proportions
Analysis of Variance Chapter 3Design & Analysis of Experiments 7E 2009 Montgomery 1.
1 Chapter 16 Linear regression is a procedure that identifies relationship between independent variables and a dependent variable.Linear regression is.
Experimental Evaluation
SeqMonk tools for methylation analysis
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 – Multiple comparisons, non-normality, outliers Marshall.
Gene Set Enrichment Analysis (GSEA)
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
ANOVA (Analysis of Variance) by Aziza Munir
Statistical analysis Prepared and gathered by Alireza Yousefy(Ph.D)
RNA-Seq Analysis Simon V4.1.
1October In Chapter 17: 17.1 Data 17.2 Risk Difference 17.3 Hypothesis Test 17.4 Risk Ratio 17.5 Systematic Sources of Error 17.6 Power and Sample.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Next Colin Clarke-Hill and Ismo Kuhanen 1 Analysing Quantitative Data 1 Forming the Hypothesis Inferential Methods - an overview Research Methods Analysing.
Data Collection and Processing (DCP) 1. Key Aspects (1) DCPRecording Raw Data Processing Raw Data Presenting Processed Data CompleteRecords appropriate.
C M Clarke-Hill1 Analysing Quantitative Data Forming the Hypothesis Inferential Methods - an overview Research Methods.
Section 10.1 Confidence Intervals
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Going from data to analysis Dr. Nancy Mayo. Getting it right Research is about getting the right answer, not just an answer An answer is easy The right.
Two-Way (Independent) ANOVA. PSYC 6130A, PROF. J. ELDER 2 Two-Way ANOVA “Two-Way” means groups are defined by 2 independent variables. These IVs are typically.
Section 6.4 Inferences for Variances. Chi-square probability densities.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Stat Today: More Chapter 3. Full Factorial Designs at 2 Levels Notation/terminology: 2 k experiment, where –k is the number of factors –each factor.
Model adequacy checking in the ANOVA Checking assumptions is important –Normality –Constant variance –Independence –Have we fit the right model? Later.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
IMPACT EVALUATION PBAF 526 Class 5, October 31, 2011.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Simon v RNA-Seq Analysis Simon v
SeqMonk tools for methylation analysis
SeqMonk tools for methylation analysis
Measurement, Quantification and Analysis
Biases and their Effect on Biological Interpretation
26134 Business Statistics Week 5 Tutorial
Introduction The two-sample z procedures of Chapter 10 allow us to compare the proportions of successes in two populations or for two treatments. What.
Basic Estimation Techniques
RNA-Seq analysis in R (Bioconductor)
Understanding and Validating Experimental Expectations
Basic Practice of Statistics - 5th Edition
Genome Wide Association Studies using SNP
12 Inferential Analysis.
Analysing ChIP-Seq Data
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Basic Estimation Techniques
Statistical Methods Carey Williamson Department of Computer Science
Statistical Methods For Engineers
Review for Exam 2 Some important themes from Chapters 6-9
Exploring and Understanding ChIP-Seq data
Visualising and Exploring BS-Seq Data
12 Inferential Analysis.
Sampling and Power Slides by Jishnu Das.
Artefacts and Biases in Gene Set Analysis
New Techniques and Technologies for Statistics 2017  Estimation of Response Propensities and Indicators of Representative Response Using Population-Level.
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
MGS 3100 Business Analysis Regression Feb 18, 2016
Differential Expression of RNA-Seq Data
Day 4 Analysis of Qualitative data Anne Segonds-Pichon v
Presentation transcript:

Differential Methylation Analysis Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews v2017-01

A basic question…

Two Strategies Raw Data Normalised %methylation Counts of Meth/Unmeth Continuous value statistics Count based statistics

Factors to consider Formulating a sensible question Applying corrections if needed Assessing statistical power Relating hits to biology

Global Differences

Question Which areas show a significant change in methylation level between the two conditions?

Question Which areas show a change in methylation which is larger or smaller than the global change in the samples overall?

Question Which areas show a change in methylation after correcting for the small global differences?

Count based statistics

Count Data Is the difference in ratios significant given the observation levels of the samples

The problem of power… Ideally want to cover every Cytosine (CpG) Should correct for the number of tests It’s unlikely you’ll collect enough data to analyse each C and have p-values which survive multiple testing correction How do you work around this?

Maximising power Options Don’t do any multiple testing correction Increase Power Analyse in windows (more data and fewer tests) Pre-filter (fewer tests) Hierarchical or Adaptive filtering

Window sizes Small Large Good resolution Specific biological effects Effect size Small Large Good resolution Specific biological effects High MTC burden Small observations High p-values Lots of data High statistical power Low MTC burden Low p-values Effect averaging

Power Analysis (Assuming a human genome with p<0.05 and power of detection of 0.8) Window Size (# CpG cytosines) Absolute methylation change (from 80%) Required Fold Genome Coverage Without Multiple Testing Correction

Pre-filtering Uncontentious Contentious Low coverage regions Regions of biological interest (externally decided) Contentious Minimum absolute difference

Contingency tests Chi-square / G-test / Fisher’s exact test Differ only at low observations Significant changes require enough observations that any of these will likely give a similar answer Operates on single replicates Technical measure of difference Independent for each measurement window

Chi-Square results

Logistic Regression Extension of contingency tables to replicated data Accounts for variability within each sample group Selects points with consistently different methylation given their coverage

Globally changing samples Changes the default expectation Find average difference for each starting point Select points which exhibit unusual change

Globally changing example Starting level = 30% Observations = 14 meth 6 unmeth Expected End level = 85% Binomial test, p=0.85, trials=20, successes=14 Raw p=0.106

Beta Binomial Models What is the probability distribution for the true methylation level? Simple model: Binomial stats to estimate confidence Can we do better? Genome-wide methylation profile. All levels are not equally likely Can inform the construction of a Custom beta binomial distribution

Beta-binomial model Measure 3/20 observations as methylated The binomial distribution would be defined by the mean and observations Using the whole genome prior a beta-binomial model would upweight the lower methylation levels, since these are more common. Provides increased power in comparisons between major groups Often computationally intensive

Limitations of count based stats No subdivision of calls – all calls are equal even when coverage isn’t Supplement with differences based on better quantitation Potential biased by power Can alleviate with CpG window based analysis Easy to bias data otherwise Problem of interpretation, not statistics

Methylation Level Statistics

BSmooth algorithm for methylation correction black: 25x (Lister) pink: 4x (Lister)

Normalisation for methylation levels Original Levels Single Correction Quantile Normalisation

Statistics Standard continuous statistics T-Test ANOVA Much reduced power – one value per replicate Unlikely to justify multiple testing correction

Reverse counting Some packages offer a conversion from normalised methylation back to counts Allows count based statistics – regains the lost power from normalisation Retains information about noise from true observation level True observations: Meth=20 Umeth=30 (40% meth) Corrected % methylation = 50% Reversed counts: Meth=25 Unmeth=25

Biological considerations Minimum relevant effect size? Balance power vs change What makes biological sense (what would you follow up?) Position relative to features Consistent change over adjacent regions

Methylation statistics packages SeqMonk (Graphical Analysis Package) Flexible measurement based on fixed windows, fixed calls or features. Complex corrected methylation calculation and several optional post-calculation normalization options. Chi-Square with optional resampling for unreplicated data, logistic regression with optional resampling for replicated data. methylKit (R-package by A. Akalin et al.) Sliding window, Fisher’s exact test or logistic regression. Adjusts p-values to q-values using SLIM method. bsseq (R/Bioconductor by K.D. Hansen) Implements the BSmooth smoothing algorithm. Numerous CpG-wise t-tests and p-value cutoff to define DMRs. Outperforms Fisher’s exact test. Requires biological replicates for DMR detection BiSeq (R/Bioconductor by K. Hebestreit et al.) Beta regression model, impractical for very large data other than RRBS or targeted BS-Seq MOABS (C++ command line tool by D. Sun et al.) Beta binomial hierarchical model to capture sampling and biological variation, Credible Methylation Difference (CDIF) single metric that combines biological and statistical significance