Presentation is loading. Please wait.

Presentation is loading. Please wait.

ARIES Methylation Pre-processing and Clean up

Similar presentations


Presentation on theme: "ARIES Methylation Pre-processing and Clean up"— Presentation transcript:

1 ARIES Methylation Pre-processing and Clean up
Geoff Woodward

2 Overview Initial QC Normalisation Batch Correction Data
MWAS (Methylome Wide Assoc. Study) Results

3 Initial QC Probe p-value confidence in detection overall QC indicator
background -ve controls overall QC indicator High background Low signal Poor stringency

4 Initial QC: Control Probes
Mixture of dependent/independent Sample independent Staining (Biotin/DNP) Hybridisation (synthetic target) Extension (hairpin) Sample dependent Bisulfite conversion (HindIII site) G/T mismatch (non-spec.) Specificity & Non-polymorphic Negative

5 Initial QC: LIMS

6 LIMS Control DashBoard
Real time Jscript/JSON Zoom & scroll All Illumina controls probes +ve & -ve Area Max Median Min

7 Intial QC: MDS Start pre-processing What’s affecting the data?
Failures controls

8 Initial QC: MDS Remove Controls/Failures Remove Sex Chromosomes

9 Sample Confirmation Genotyping 65 SNP probes Kmeans clustering
Call genotype Cross reference with SNP data Calculate % match Fully automated in pipeline Stored in LIMS

10 Normalisation Why? Quantile? Not appropriate:
Cancer vs. Control – not req. More sensitive differences... Quantile? Rank & scale according to ref dist. (av.) Not appropriate: Type I & II assays differ Medians – opposite ends of β scale SD (across reps.) smaller in Type I probes Interrogate different subsets of the genome Type II > proportion in open-sea Type I > proportion in gene promoters

11 Normalisation: Method 1
Subset Within Array Normalisation (minfi) To address differences in dist: No. of CpGs in probe body indicates density/loc. Dist. more similar in these groups Approach Reference quantiles: N random type I & II selected for each group Split meth/unmeth channels Linear interpolation fit probes to ref. Doesn’t treat type I & II separately BUT does decrease difference

12 Normalisation: Method 2
Touleimat & Tost To address differences: CpG region Shore / Shelf / Island / Open-sea Treat Type I & II separately Approach: reference quantiles Type I used “anchors” for each region More reliable / lower SD estimate target quantiles Fit type II to target

13 Normalisation: Method 3
Dasen (wateRmelon) Under review Separate QN of methylated Type I unmethylated Type I methylated Type II unmethylated Type II intensities. Both directions

14 Normalisation: Comparison
wateRmelon metrics: Imprinted DMRs 237 probes within iDMRs iDMR e=50% meth. SE = SD / √ N SD of all 237 probes N = number of samples iDMRs Raw Dasen Tost Swan

15 Normalisation: Comparison
SNP probes 63 highly polym. SNP probes K-means clustering into 3 genotypes SE like measure for each group AA AB BB Raw 9.025 e-05 1.910 e-04 5.145 e-05 Dasen 1.669 e-04 2.047 e-04 2.321 e-05 Tost 8.253 e-05 5.242 e-04 1.541 e-04 Swan Na na

16 Normalisation: Comparison
wateRmelon metrics: X-Chromosome Inactivation 11,232 probes T-test all probes for sex differences ROC analysis using p-val for sex diff. 1 – AUC 0 being the perfect predictor & best sex separation X-Inact. Raw 0.0947 Dasen 0.0889 Tost 0.0892 Swan 0.4952

17 Comparison: Density Plots
Metrics are great but how do they really effect the data? All typeI typeII

18 Comparison: Density Plots
Normalised distributions All typeI typeII

19 Comparison: Scatter Plot
Pepsi Plot – you’ll see why! Raw (x) vs. Normalised (y) typeI typeII SWAN Tost dasen

20 Comparison: Scatter Plot

21 Batch Correction: Exp. Design
Bisulphite Conversion Excess of samples > 48 Redundant controls QC and PCR MSA4 Plate Well dictates chip position (Robot) Randomised Min. 4 of each time point Max 1 control Mix of gender Infinium 450k Chips 12 arrays per chip Throughput doubled

22 Batch Correction: Metadata
LIMS tracking Every process All consumables ~20 Formamide to hyb. Buffers > 1000 used so far! All equipment Fridge/centrifuge/PCR block

23 Batch Correction ComBat What are we seeing? Correction
Bisulphite batch Correction Many algorithms available SVD/SVA/DWD Gene expression ComBat Chen C, Grennan K, Badner J, Zhang D, Gershon E, et al. (2011) Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods. PLoS ONE 6(2): e doi: /journal.pone Empirical Bayesian framework Create a model matrix Supply batch var Standardise gene-wise Least squares approach Fits L/S model – find priors Adjust to empirical parametric priors

24 Batch Correction Example data Batch correct Tost norm. data
use M values Convert back to β Values can escape 0-1 limit Scale 0.02% of probes Dist. unaffected.

25 Batch Correction: BEFORE

26 Batch Correction: AFTER

27 Datasets ARIES pre-release: Filtered probes SNP probes Age group n
Cord 584 F7 598 TF3 (15) 64 F17 280 Antenatal 394 FOM 329

28 MWAS Choice of servers: Epi-garrod BlueCrystal

29 Epi-garrod Request account via IT-services for:
epi-garrod.bris.ac.uk Relatively quiet server in the dept. No queuing system Check htop before running jobs Cord data requires ~15% RAM

30 Epi-garrod Data: Permissions for this folder SAN
Accessible from multiple servers /mnt/sscm3/ARIES_DATA/… Permissions for this folder You must be a member of the aries group

31 Blue Crystal Request an account via: Queuing handled Data:
https://www.acrc.bris.ac.uk/login-area/apply.cgi Queuing handled Data: /gpfs/cluster/smed/alspac-shared/aries/… Again, permissions required: Member of aries group

32 Files ALN_dasen_<<time_code>>_betas.Rdata
ALN_tost_<<time_code>>_betas.Rdata <<time_code>>_manifest.Rdata fdata.Rdata MWAS.r

33 ALN_dasen_<<time_code>>_betas.Rdata

34 <<time_code>>_manifest.Rdata

35 Fdata_new.RData

36 CpGassoc CRAN Tests for association between an independent variable and methylation Option to include additional covariates Assesses significance with: Holm (step-down Bonferroni) FDR methods

37 MWAS.r

38 MWAS.r continued...

39 MWAS.r continued...

40 Manhattan / QQ Replicated the following studies results: Gene hits:
450K Epigenome-Wide Scan Identifies Differential DNA Methylation  in Newborns Related to Maternal Smoking during Pregnancy. Bonnie R. Joubert, et.al.,  Gene hits: GFI1, AHRR, MYO1G, CYP1A1 "CYP1A1 plays a key role in the aryl hydrocarbon receptor signaling pathway, which mediates the detoxification of the components of tobacco smoke." - Joubert, et.al.,

41 Results file

42 BlueCrystal .bashrc

43 Any Questions?


Download ppt "ARIES Methylation Pre-processing and Clean up"

Similar presentations


Ads by Google