Download presentation

Presentation is loading. Please wait.

1
**Differential Methylation Analysis**

Simon Andrews @simon_andrews

2
A basic question…

3
**Factors to consider Number of observations Magnitude of effect**

Technical considerations Biological variability Biological common sense

4
**The problem of power… Ideally want to cover every Cytosine (CpG)**

Have to correct for the number of tests There’s no way you’ll collect enough data to analyse each C and have p-values which survive multiple testing correction Stats have to find a way to work round this.

5
**Maximising power Options Analyse in windows Pre-filter**

Hierarchical or Adaptive filtering

6
**Window sizes Small windows Large windows Good resolution**

Specific biological effects High MTC burden Small observations High p-values Lots of data High statistical power Low MTC burden Low p-values Effect averaging

7
**Simple Statistical Approach**

Is the proportion of methylated calls different between two samples, given the number of observations? Meth count A Unmeth count A Meth count B Unmeth count B % change Significant? 2 100 No 200 198 5 1.5 50 75 60 11 Probably

8
**Contingency tests Chi-square / G-test / Fisher’s exact test**

Differ only at low observations Significant changes require enough observations that any of these should give the same answer Operates on single replicates Technical measure of difference Meth A Unmeth A Meth B Unmeth B

9
Chi-Square results

10
**Biological considerations**

Minimum relevant effect size? Balance power vs change What makes biological sense (what would you follow up?) Minimum coverage worth testing No point testing poorly covered regions

11
**Effect of pre-filtering**

12
**Distribution of methylation**

Chi square assumes a normal distribution, and methylation data isn’t normally distributed

13
**Beta binomial distribution**

More relevant statistics than chi-square. Need to fit custom model to actual data.

14
**Implications of a beta distribution**

Many summaries assume normality Mean Standard Deviation Boxplots None of these is strictly appropriate when looking at methylation data

15
**Dealing with replicates**

Simple approach Merge data from replicates together Single test, High power Post-hoc test for consistency Explicitly account for batch effects Logistic regression Measures batch effects and excludes them from final significance calculation Work with methylation values Normalise percentage methylation values Use conventional statistics (t-tests etc) for comparing groups

16
**Hierarchical testing Test larger regions**

Windows / Features etc. Take significant hits and subdivide Smaller windows Individual CpGs Correct only for these tests Assemble hits together to make up DMRs

17
**X X Hierarchical testing**

Genome CGI Genome CGI X Genome CGI X Statistically ‘creative’ solution to not having enough data

18
**Methylation statistics packages**

swDMR (Perl/R-package) Sliding window DMR finding (choose between t_test, Kolmogorov, Fisher, ChiSquare, Wilcoxon for n = 2; ANOVA, Kruskal for n > 3) methylKit* (R-package by A. Akalin et al.) Sliding window, Fisher’s exact test or logistic regression. Adjusts p-values to q-values using SLIM method. bsseq* (R/Bioconductor by K.D. Hansen) Implements the BSmooth smoothing algorithm. Numerous CpG-wise t-tests and p-value cutoff to define DMRs. Outperforms Fisher’s exact test. Requires biological replicates for DMR detection BiSeq* (R/Bioconductor by K. Hebestreit et al.) Beta regression model, impractical for very large data other than RRBS or targeted BS-Seq RnBeads* (R package by F. Mueller et al.) works for 450K arrays, BS-Seq, MeDIP or MBD-Seq data DMAP* (C command line tool by P. Stockwell et al.) RRBS fragment or fixed window approach, Fisher’s exact test, Chi-squared or ANOVA RADMeth (C++ command line tool by E. Dolzhenko and A.D. Smith) Beta-binomial regression analysis to find DMCs or DMRs, local likelihood, adjust for neighbouring CpGs MOABS* (C++ command line tool by D. Sun et al.) Beta binomial hierarchical model to capture sampling and biological variation, Credible Methylation Difference (CDIF) single metric that combines biological and statistical significance ComMet (Y. Saito et al., 2014) Bisulfighter suite; DMR detection based on hidden Markov models (HMMs) that enable automated adjustment of DMC chaining criteria. Does not require biological replicates DSS (R/Bioconductor by Feng et al., 2014) Constructs genome-wide prior distribution for beta-binomial dispersion. Bayesian hierarchical model to detect differentially methylated loci more appearing every other week… * interface well with

19
Tool Statistical test Suitable for Implementation Notes bsseq Sample-wise smoothing, then group differences via CpG-wise t-tests (p-value cutoff to define adjacent CpG sites as DMRs) WGBS; not designed for targeted BS-Seq or RRBS R package/ Bioconductor Outperforms Fisher’s exact test; intended to compare 2 groups; replicates required BiSeq Define CpG clusters, smooth methylation data, model and test group effect (fitting beta regression model to smoothed methylation levels and testing for group effect using the Wald test), hierarchical testing procedure on CpG clusters, then define DMR boundaries RRBS; targeted BS-Seq; for WGBS Very computationally intensive; Not limited to 2 groups MethylKit Models CpG methylation within a logistic regression. Sliding linear model (SLIM) to correct for multiple testing (e)RRBS R package * WGBS = whole genome BS-Seq; (e)RRBS = (enhanced) reduced representation BS-Seq

20
**bsseq – for whole genome BS-Seq**

Smoothing of low coverage BS-Seq first to get reliable semi-local methylation estimation estimates Not suitable for captured or restricted data After smoothing it uses biological replicates to estimate biological variation and identify methylated regions (DMRs) Smoothing suitable for even a single sample Works for CpG context in humans, will probably not scale to 2x585M Cs in non-CG context

21
BSmooth algorithm black: 25x (Lister) pink: 4x (Lister)

22
Bsmooth t-values

Similar presentations

OK

January 15. 2 Structure of the book Section 1 (Ch 1 – 10) Basic concepts and techniques Section 2 (Ch 11 – 15): Inference for quantitative outcomes Section.

January 15. 2 Structure of the book Section 1 (Ch 1 – 10) Basic concepts and techniques Section 2 (Ch 11 – 15): Inference for quantitative outcomes Section.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on international channels of distribution Ppt on hotel management career Ppt on bluetooth broadcasting devices Ppt on credit policy for customers Ppt on types of life insurance Ppt on high sea sales Ppt on indian politics and youth Ppt on effect of global warming on weather radar Insect anatomy and physiology ppt on cells English ppt on reported speech