Presentation is loading. Please wait.

Presentation is loading. Please wait.

Differential Methylation Analysis

Similar presentations


Presentation on theme: "Differential Methylation Analysis"— Presentation transcript:

1 Differential Methylation Analysis
Simon Andrews @simon_andrews

2 A basic question…

3 Factors to consider Number of observations Magnitude of effect
Technical considerations Biological variability Biological common sense

4 The problem of power… Ideally want to cover every Cytosine (CpG)
Have to correct for the number of tests There’s no way you’ll collect enough data to analyse each C and have p-values which survive multiple testing correction Stats have to find a way to work round this.

5 Maximising power Options Analyse in windows Pre-filter
Hierarchical or Adaptive filtering

6 Window sizes Small windows Large windows Good resolution
Specific biological effects High MTC burden Small observations High p-values Lots of data High statistical power Low MTC burden Low p-values Effect averaging

7 Simple Statistical Approach
Is the proportion of methylated calls different between two samples, given the number of observations? Meth count A Unmeth count A Meth count B Unmeth count B % change Significant? 2 100 No 200 198 5 1.5 50 75 60 11 Probably

8 Contingency tests Chi-square / G-test / Fisher’s exact test
Differ only at low observations Significant changes require enough observations that any of these should give the same answer Operates on single replicates Technical measure of difference Meth A Unmeth A Meth B Unmeth B

9 Chi-Square results

10 Biological considerations
Minimum relevant effect size? Balance power vs change What makes biological sense (what would you follow up?) Minimum coverage worth testing No point testing poorly covered regions

11 Effect of pre-filtering

12 Distribution of methylation
Chi square assumes a normal distribution, and methylation data isn’t normally distributed

13 Beta binomial distribution
More relevant statistics than chi-square. Need to fit custom model to actual data.

14 Implications of a beta distribution
Many summaries assume normality Mean Standard Deviation Boxplots None of these is strictly appropriate when looking at methylation data

15 Dealing with replicates
Simple approach Merge data from replicates together Single test, High power Post-hoc test for consistency Explicitly account for batch effects Logistic regression Measures batch effects and excludes them from final significance calculation Work with methylation values Normalise percentage methylation values Use conventional statistics (t-tests etc) for comparing groups

16 Hierarchical testing Test larger regions
Windows / Features etc. Take significant hits and subdivide Smaller windows Individual CpGs Correct only for these tests Assemble hits together to make up DMRs

17 X X Hierarchical testing
Genome CGI Genome CGI X Genome CGI X Statistically ‘creative’ solution to not having enough data

18 Methylation statistics packages
swDMR (Perl/R-package) Sliding window DMR finding (choose between t_test, Kolmogorov, Fisher, ChiSquare, Wilcoxon for n = 2; ANOVA, Kruskal for n > 3) methylKit* (R-package by A. Akalin et al.) Sliding window, Fisher’s exact test or logistic regression. Adjusts p-values to q-values using SLIM method. bsseq* (R/Bioconductor by K.D. Hansen) Implements the BSmooth smoothing algorithm. Numerous CpG-wise t-tests and p-value cutoff to define DMRs. Outperforms Fisher’s exact test. Requires biological replicates for DMR detection BiSeq* (R/Bioconductor by K. Hebestreit et al.) Beta regression model, impractical for very large data other than RRBS or targeted BS-Seq RnBeads* (R package by F. Mueller et al.) works for 450K arrays, BS-Seq, MeDIP or MBD-Seq data DMAP* (C command line tool by P. Stockwell et al.) RRBS fragment or fixed window approach, Fisher’s exact test, Chi-squared or ANOVA RADMeth (C++ command line tool by E. Dolzhenko and A.D. Smith) Beta-binomial regression analysis to find DMCs or DMRs, local likelihood, adjust for neighbouring CpGs MOABS* (C++ command line tool by D. Sun et al.) Beta binomial hierarchical model to capture sampling and biological variation, Credible Methylation Difference (CDIF) single metric that combines biological and statistical significance ComMet (Y. Saito et al., 2014) Bisulfighter suite; DMR detection based on hidden Markov models (HMMs) that enable automated adjustment of DMC chaining criteria. Does not require biological replicates DSS (R/Bioconductor by Feng et al., 2014) Constructs genome-wide prior distribution for beta-binomial dispersion. Bayesian hierarchical model to detect differentially methylated loci more appearing every other week… * interface well with

19 Tool Statistical test Suitable for Implementation Notes bsseq Sample-wise smoothing, then group differences via CpG-wise t-tests (p-value cutoff to define adjacent CpG sites as DMRs) WGBS; not designed for targeted BS-Seq or RRBS R package/ Bioconductor Outperforms Fisher’s exact test; intended to compare 2 groups; replicates required BiSeq Define CpG clusters, smooth methylation data, model and test group effect (fitting beta regression model to smoothed methylation levels and testing for group effect using the Wald test), hierarchical testing procedure on CpG clusters, then define DMR boundaries RRBS; targeted BS-Seq; for WGBS Very computationally intensive; Not limited to 2 groups MethylKit Models CpG methylation within a logistic regression. Sliding linear model (SLIM) to correct for multiple testing (e)RRBS R package * WGBS = whole genome BS-Seq; (e)RRBS = (enhanced) reduced representation BS-Seq

20 bsseq – for whole genome BS-Seq
Smoothing of low coverage BS-Seq first to get reliable semi-local methylation estimation estimates Not suitable for captured or restricted data After smoothing it uses biological replicates to estimate biological variation and identify methylated regions (DMRs) Smoothing suitable for even a single sample Works for CpG context in humans, will probably not scale to 2x585M Cs in non-CG context

21 BSmooth algorithm black: 25x (Lister) pink: 4x (Lister)

22 Bsmooth t-values


Download ppt "Differential Methylation Analysis"

Similar presentations


Ads by Google