Download presentation
Presentation is loading. Please wait.
1
ARIES Methylation Pre-processing and Clean up
Geoff Woodward
2
Overview Initial QC Normalisation Batch Correction Data
MWAS (Methylome Wide Assoc. Study) Results
3
Initial QC Probe p-value confidence in detection overall QC indicator
background -ve controls overall QC indicator High background Low signal Poor stringency
4
Initial QC: Control Probes
Mixture of dependent/independent Sample independent Staining (Biotin/DNP) Hybridisation (synthetic target) Extension (hairpin) Sample dependent Bisulfite conversion (HindIII site) G/T mismatch (non-spec.) Specificity & Non-polymorphic Negative
5
Initial QC: LIMS
6
LIMS Control DashBoard
Real time Jscript/JSON Zoom & scroll All Illumina controls probes +ve & -ve Area Max Median Min
7
Intial QC: MDS Start pre-processing What’s affecting the data?
Failures controls
8
Initial QC: MDS Remove Controls/Failures Remove Sex Chromosomes
9
Sample Confirmation Genotyping 65 SNP probes Kmeans clustering
Call genotype Cross reference with SNP data Calculate % match Fully automated in pipeline Stored in LIMS
10
Normalisation Why? Quantile? Not appropriate:
Cancer vs. Control – not req. More sensitive differences... Quantile? Rank & scale according to ref dist. (av.) Not appropriate: Type I & II assays differ Medians – opposite ends of β scale SD (across reps.) smaller in Type I probes Interrogate different subsets of the genome Type II > proportion in open-sea Type I > proportion in gene promoters
11
Normalisation: Method 1
Subset Within Array Normalisation (minfi) To address differences in dist: No. of CpGs in probe body indicates density/loc. Dist. more similar in these groups Approach Reference quantiles: N random type I & II selected for each group Split meth/unmeth channels Linear interpolation fit probes to ref. Doesn’t treat type I & II separately BUT does decrease difference
12
Normalisation: Method 2
Touleimat & Tost To address differences: CpG region Shore / Shelf / Island / Open-sea Treat Type I & II separately Approach: reference quantiles Type I used “anchors” for each region More reliable / lower SD estimate target quantiles Fit type II to target
13
Normalisation: Method 3
Dasen (wateRmelon) Under review Separate QN of methylated Type I unmethylated Type I methylated Type II unmethylated Type II intensities. Both directions
14
Normalisation: Comparison
wateRmelon metrics: Imprinted DMRs 237 probes within iDMRs iDMR e=50% meth. SE = SD / √ N SD of all 237 probes N = number of samples iDMRs Raw Dasen Tost Swan
15
Normalisation: Comparison
SNP probes 63 highly polym. SNP probes K-means clustering into 3 genotypes SE like measure for each group AA AB BB Raw 9.025 e-05 1.910 e-04 5.145 e-05 Dasen 1.669 e-04 2.047 e-04 2.321 e-05 Tost 8.253 e-05 5.242 e-04 1.541 e-04 Swan Na na
16
Normalisation: Comparison
wateRmelon metrics: X-Chromosome Inactivation 11,232 probes T-test all probes for sex differences ROC analysis using p-val for sex diff. 1 – AUC 0 being the perfect predictor & best sex separation X-Inact. Raw 0.0947 Dasen 0.0889 Tost 0.0892 Swan 0.4952
17
Comparison: Density Plots
Metrics are great but how do they really effect the data? All typeI typeII
18
Comparison: Density Plots
Normalised distributions All typeI typeII
19
Comparison: Scatter Plot
Pepsi Plot – you’ll see why! Raw (x) vs. Normalised (y) typeI typeII SWAN Tost dasen
20
Comparison: Scatter Plot
21
Batch Correction: Exp. Design
Bisulphite Conversion Excess of samples > 48 Redundant controls QC and PCR MSA4 Plate Well dictates chip position (Robot) Randomised Min. 4 of each time point Max 1 control Mix of gender Infinium 450k Chips 12 arrays per chip Throughput doubled
22
Batch Correction: Metadata
LIMS tracking Every process All consumables ~20 Formamide to hyb. Buffers > 1000 used so far! All equipment Fridge/centrifuge/PCR block
23
Batch Correction ComBat What are we seeing? Correction
Bisulphite batch Correction Many algorithms available SVD/SVA/DWD Gene expression ComBat Chen C, Grennan K, Badner J, Zhang D, Gershon E, et al. (2011) Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods. PLoS ONE 6(2): e doi: /journal.pone Empirical Bayesian framework Create a model matrix Supply batch var Standardise gene-wise Least squares approach Fits L/S model – find priors Adjust to empirical parametric priors
24
Batch Correction Example data Batch correct Tost norm. data
use M values Convert back to β Values can escape 0-1 limit Scale 0.02% of probes Dist. unaffected.
25
Batch Correction: BEFORE
26
Batch Correction: AFTER
27
Datasets ARIES pre-release: Filtered probes SNP probes Age group n
Cord 584 F7 598 TF3 (15) 64 F17 280 Antenatal 394 FOM 329
28
MWAS Choice of servers: Epi-garrod BlueCrystal
29
Epi-garrod Request account via IT-services for:
epi-garrod.bris.ac.uk Relatively quiet server in the dept. No queuing system Check htop before running jobs Cord data requires ~15% RAM
30
Epi-garrod Data: Permissions for this folder SAN
Accessible from multiple servers /mnt/sscm3/ARIES_DATA/… Permissions for this folder You must be a member of the aries group
31
Blue Crystal Request an account via: Queuing handled Data:
Queuing handled Data: /gpfs/cluster/smed/alspac-shared/aries/… Again, permissions required: Member of aries group
32
Files ALN_dasen_<<time_code>>_betas.Rdata
ALN_tost_<<time_code>>_betas.Rdata <<time_code>>_manifest.Rdata fdata.Rdata MWAS.r
33
ALN_dasen_<<time_code>>_betas.Rdata
34
<<time_code>>_manifest.Rdata
35
Fdata_new.RData
36
CpGassoc CRAN Tests for association between an independent variable and methylation Option to include additional covariates Assesses significance with: Holm (step-down Bonferroni) FDR methods
37
MWAS.r
38
MWAS.r continued...
39
MWAS.r continued...
40
Manhattan / QQ Replicated the following studies results: Gene hits:
450K Epigenome-Wide Scan Identifies Differential DNA Methylation in Newborns Related to Maternal Smoking during Pregnancy. Bonnie R. Joubert, et.al., Gene hits: GFI1, AHRR, MYO1G, CYP1A1 "CYP1A1 plays a key role in the aryl hydrocarbon receptor signaling pathway, which mediates the detoxification of the components of tobacco smoke." - Joubert, et.al.,
41
Results file
42
BlueCrystal .bashrc
43
Any Questions?
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.