Presentation is loading. Please wait.

Presentation is loading. Please wait.

Differential Methylation Analysis Simon 1.

Similar presentations


Presentation on theme: "Differential Methylation Analysis Simon 1."— Presentation transcript:

1 Differential Methylation Analysis Simon 1

2 A basic question… 2

3 Factors to consider Number of observations Magnitude of effect Technical considerations Biological variability Biological common sense 3

4 The problem of power… Ideally want to cover every Cytosine (CpG) Have to correct for the number of tests There’s no way you’ll collect enough data to analyse each C and have p-values which survive multiple testing correction Stats have to find a way to work round this. 4

5 Maximising power Options – Analyse in windows – Pre-filter – Hierarchical or Adaptive filtering 5

6 Window sizes Small windows Good resolution Specific biological effects High MTC burden Small observations High p-values Large windows Lots of data High statistical power Low MTC burden Low p-values Effect averaging 6

7 Simple Statistical Approach Is the proportion of methylated calls different between two samples, given the number of observations? Meth count AUnmeth count AMeth count BUnmeth count B% changeSignificant? No No Probably 7

8 Contingency tests Chi-square / G-test / Fisher’s exact test – Differ only at low observations – Significant changes require enough observations that any of these should give the same answer Operates on single replicates Technical measure of difference Meth AUnmeth A Meth BUnmeth B 8

9 Chi-Square results 9

10 Biological considerations Minimum relevant effect size? – Balance power vs change – What makes biological sense – (what would you follow up?) Minimum coverage worth testing – No point testing poorly covered regions 10

11 Effect of pre-filtering 11

12 Distribution of methylation 12 Chi square assumes a normal distribution, and methylation data isn’t normally distributed

13 Beta binomial distribution 13 More relevant statistics than chi-square. Need to fit custom model to actual data.

14 Implications of a beta distribution Many summaries assume normality – Mean – Standard Deviation – Boxplots None of these is strictly appropriate when looking at methylation data 14

15 Dealing with replicates Simple approach – Merge data from replicates together – Single test, High power – Post-hoc test for consistency Explicitly account for batch effects – Logistic regression – Measures batch effects and excludes them from final significance calculation Work with methylation values – Normalise percentage methylation values – Use conventional statistics (t-tests etc) for comparing groups 15

16 Hierarchical testing Test larger regions – Windows / Features etc. Take significant hits and subdivide – Smaller windows – Individual CpGs – Correct only for these tests Assemble hits together to make up DMRs 16

17 Hierarchical testing Genome CGI Genome CGI X XXX Genome CGI X XXX Statistically ‘creative’ solution to not having enough data 17

18 Methylation statistics packages swDMR (Perl/R-package) Sliding window DMR finding (choose between t_test, Kolmogorov, Fisher, ChiSquare, Wilcoxon for n = 2; ANOVA, Kruskal for n > 3) methylKit* (R-package by A. Akalin et al.) Sliding window, Fisher’s exact test or logistic regression. Adjusts p-values to q-values using SLIM method. bsseq* (R/Bioconductor by K.D. Hansen) Implements the BSmooth smoothing algorithm. Numerous CpG-wise t-tests and p-value cutoff to define DMRs. Outperforms Fisher’s exact test. Requires biological replicates for DMR detection BiSeq* (R/Bioconductor by K. Hebestreit et al.) Beta regression model, impractical for very large data other than RRBS or targeted BS-Seq RnBeads* (R package by F. Mueller et al.) works for 450K arrays, BS-Seq, MeDIP or MBD-Seq data DMAP* (C command line tool by P. Stockwell et al.) RRBS fragment or fixed window approach, Fisher’s exact test, Chi-squared or ANOVA RADMeth (C++ command line tool by E. Dolzhenko and A.D. Smith) Beta-binomial regression analysis to find DMCs or DMRs, local likelihood, adjust for neighbouring CpGs MOABS* (C++ command line tool by D. Sun et al.) Beta binomial hierarchical model to capture sampling and biological variation, Credible Methylation Difference (CDIF) single metric that combines biological and statistical significance ComMet (Y. Saito et al., 2014) Bisulfighter suite; DMR detection based on hidden Markov models (HMMs) that enable automated adjustment of DMC chaining criteria. Does not require biological replicates DSS (R/Bioconductor by Feng et al., 2014) Constructs genome-wide prior distribution for beta-binomial dispersion. Bayesian hierarchical model to detect differentially methylated loci more appearing every other week… 18 * interface well with

19 ToolStatistical testSuitable forImplementationNotes bsseqSample-wise smoothing, then group differences via CpG-wise t-tests (p- value cutoff to define adjacent CpG sites as DMRs) WGBS; not designed for targeted BS-Seq or RRBS R package/ Bioconductor Outperforms Fisher’s exact test; intended to compare 2 groups; replicates required BiSeqDefine CpG clusters, smooth methylation data, model and test group effect (fitting beta regression model to smoothed methylation levels and testing for group effect using the Wald test), hierarchical testing procedure on CpG clusters, then define DMR boundaries RRBS; targeted BS-Seq; for WGBS R package/ Bioconductor Very computationally intensive; Not limited to 2 groups MethylKitModels CpG methylation within a logistic regression. Sliding linear model (SLIM) to correct for multiple testing (e)RRBSR package * WGBS = whole genome BS-Seq; (e)RRBS = (enhanced) reduced representation BS-Seq 19

20 bsseq – for whole genome BS-Seq Smoothing of low coverage BS-Seq first to get reliable semi- local methylation estimation estimates Not suitable for captured or restricted data After smoothing it uses biological replicates to estimate biological variation and identify methylated regions (DMRs) Smoothing suitable for even a single sample Works for CpG context in humans, will probably not scale to 2x585M Cs in non-CG context 20

21 BSmooth algorithm 21 black: 25x (Lister) pink: 4x (Lister)

22 Bsmooth t-values 22


Download ppt "Differential Methylation Analysis Simon 1."

Similar presentations


Ads by Google