ARIES Methylation Pre-processing and Clean up

Slides:



Advertisements
Similar presentations
A Spreadsheet for Analysis of Straightforward Controlled Trials
Advertisements

Lecture 2 Strachan and Read Chapter 13
Estimating the False Discovery Rate in Multi-class Gene Expression Experiments using a Bayesian Mixture Model Alex Lewin 1, Philippe Broët 2 and Sylvia.
Bayesian mixture models for analysing gene expression data Natalia Bochkina In collaboration with Alex Lewin, Sylvia Richardson, BAIR Consortium Imperial.
Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.
Psychology Practical (Year 2) PS2001 Correlation and other topics.
Exercise 1: Importing Illumina data  Using the Import tool File / Import folder. Select the folder IlluminaTeratospermiaHuman6v1_BS1 In the Import files.
Differential Methylation Analysis
Shibing Deng Pfizer, Inc. Efficient Outlier Identification in Lung Cancer Study.
Randa Stringer Supervisor: Dr. Guillaume Par é A review of quality control and pre- processing measures for the Illumina 450K BeadChip.
Visualising and Exploring BS-Seq Data
We processed six samples in triplicate using 11 different array platforms at one or two laboratories. we obtained measures of array signal variability.
Zhongxue Chen, Monnie McGee, Qingzhong Liu and Richard Scheuermann
Getting the numbers comparable
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Differentially expressed genes
Statistical Analysis of Microarray Data
Low Level Statistics and Quality Control Javier Cabrera.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
Introduction to epigenetics: chromatin modifications, DNA methylation and the CpG Island landscape (part 2) Héctor Corrada Bravo CMSC858P Spring 2012 (many.
SeqMonk tools for methylation analysis
Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian.
DNA Methylation Assays High Throughput Data Analysis BIOS , VCU Winter 2010 Mark Reimers, PhD.
Factors to Consider in Selecting a Genotyping Platform Elizabeth Pugh June 22, 2007.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Microarray - Leukemia vs. normal GeneChip System.
Epigenetic Analysis BIOS Statistics for Systems Biology Spring 2008.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
1 Gene Ontology Javier Cabrera. 2 Outline Goal: How to identify biological processes or biochemical pathways that are changed by treatment.Goal: How to.
Lecture Topic 5 Pre-processing AFFY data. Probe Level Analysis The Purpose –Calculate an expression value for each probe set (gene) from the PM.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Other genomic arrays: Methylation, chIP on chip… UBio Training Courses.
Epigenetic Control of Tamoxifen Resistant Breast Cancer Kristin Williams Arcaro Lab Thesis Resarch June 25, 2012 Kristin Williams Arcaro Lab Thesis Resarch.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
Empirical Bayes Analysis of Variance Component Models for Microarray Data S. Feng, 1 R.Wolfinger, 2 T.Chu, 2 G.Gibson, 3 L.McGraw 4 1. Department of Statistics,
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
Supplemental Figure 1. False trans association due to probe cross-hybridization and genetic polymorphism at single base extension site. (A) The Infinium.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Ethnic variation in methylation of birth weight and length Presenter: Zahra Sohani Supervisor: Dr. Sonia Anand.
Microarray Data Analysis Xuming He Department of Statistics University of Illinois at Urbana-Champaign.
Bioinformatics for biologists (2) Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
Differential Methylation Analysis
SeqMonk tools for methylation analysis
SeqMonk tools for methylation analysis
Leukoreduction filter obtained from
Expression and Methylation: QC and Pre-Processing
Differential Gene Expression
Discovery of Multiple Differentially Methylated Regions
Genome Wide Association Studies using SNP
876 fetal cord blood DNA samples
Volume 151, Issue 5, Pages (November 2016)
1. Interpreting rich epigenomic datasets
Visualising and Exploring BS-Seq Data
Getting the numbers comparable
Volume 23, Issue 11, Pages (June 2018)
Volume 20, Issue 4, Pages e6 (April 2017)
Genome-wide sperm deoxyribonucleic acid methylation is altered in some men with abnormal chromatin packaging or poor in vitro fertilization embryogenesis 
Integrative Multi-omic Analysis of Human Platelet eQTLs Reveals Alternative Start Site in Mitofusin 2  Lukas M. Simon, Edward S. Chen, Leonard C. Edelstein,
Volume 20, Issue 4, Pages e6 (April 2017)
Volume 23, Issue 1, Pages 9-22 (January 2013)
Density Density ß values ß values
Increased DNA Methylation at the AXIN1 Gene in a Monozygotic Twin from a Pair Discordant for a Caudal Duplication Anomaly  N.A. Oates, J. van Vliet, D.L.
Other genomic arrays: Methylation, chIP on chip…
Presentation transcript:

ARIES Methylation Pre-processing and Clean up Geoff Woodward

Overview Initial QC Normalisation Batch Correction Data MWAS (Methylome Wide Assoc. Study) Results

Initial QC Probe p-value confidence in detection overall QC indicator background -ve controls overall QC indicator High background Low signal Poor stringency

Initial QC: Control Probes Mixture of dependent/independent Sample independent Staining (Biotin/DNP) Hybridisation (synthetic target) Extension (hairpin) Sample dependent Bisulfite conversion (HindIII site) G/T mismatch (non-spec.) Specificity & Non-polymorphic Negative

Initial QC: LIMS

LIMS Control DashBoard Real time Jscript/JSON Zoom & scroll All Illumina controls probes +ve & -ve Area Max Median Min

Intial QC: MDS Start pre-processing What’s affecting the data? Failures controls

Initial QC: MDS Remove Controls/Failures Remove Sex Chromosomes

Sample Confirmation Genotyping 65 SNP probes Kmeans clustering Call genotype Cross reference with SNP data Calculate % match Fully automated in pipeline Stored in LIMS

Normalisation Why? Quantile? Not appropriate: Cancer vs. Control – not req. More sensitive differences... Quantile? Rank & scale according to ref dist. (av.) Not appropriate: Type I & II assays differ Medians – opposite ends of β scale SD (across reps.) smaller in Type I probes Interrogate different subsets of the genome Type II > proportion in open-sea Type I > proportion in gene promoters

Normalisation: Method 1 Subset Within Array Normalisation (minfi) To address differences in dist: No. of CpGs in probe body indicates density/loc. Dist. more similar in these groups Approach Reference quantiles: N random type I & II selected for each group Split meth/unmeth channels Linear interpolation fit probes to ref. Doesn’t treat type I & II separately BUT does decrease difference

Normalisation: Method 2 Touleimat & Tost To address differences: CpG region Shore / Shelf / Island / Open-sea Treat Type I & II separately Approach: reference quantiles Type I used “anchors” for each region More reliable / lower SD estimate target quantiles Fit type II to target

Normalisation: Method 3 Dasen (wateRmelon) Under review Separate QN of methylated Type I unmethylated Type I methylated Type II unmethylated Type II intensities. Both directions

Normalisation: Comparison wateRmelon metrics: Imprinted DMRs 237 probes within iDMRs iDMR e=50% meth. SE = SD / √ N SD of all 237 probes N = number of samples iDMRs Raw 0.00431 Dasen 0.00241 Tost 0.00214 Swan 0.00428

Normalisation: Comparison SNP probes 63 highly polym. SNP probes K-means clustering into 3 genotypes SE like measure for each group AA AB BB Raw 9.025 e-05 1.910 e-04 5.145 e-05 Dasen 1.669 e-04 2.047 e-04 2.321 e-05 Tost 8.253 e-05 5.242 e-04 1.541 e-04 Swan Na na

Normalisation: Comparison wateRmelon metrics: X-Chromosome Inactivation 11,232 probes T-test all probes for sex differences ROC analysis using p-val for sex diff. 1 – AUC 0 being the perfect predictor & best sex separation X-Inact. Raw 0.0947 Dasen 0.0889 Tost 0.0892 Swan 0.4952

Comparison: Density Plots Metrics are great but how do they really effect the data? All typeI typeII

Comparison: Density Plots Normalised distributions All typeI typeII

Comparison: Scatter Plot Pepsi Plot – you’ll see why! Raw (x) vs. Normalised (y) typeI typeII SWAN Tost dasen

Comparison: Scatter Plot

Batch Correction: Exp. Design Bisulphite Conversion Excess of samples > 48 Redundant controls QC and PCR MSA4 Plate Well dictates chip position (Robot) Randomised Min. 4 of each time point Max 1 control Mix of gender Infinium 450k Chips 12 arrays per chip Throughput doubled

Batch Correction: Metadata LIMS tracking Every process All consumables ~20 Formamide to hyb. Buffers > 1000 used so far! All equipment Fridge/centrifuge/PCR block

Batch Correction ComBat What are we seeing? Correction Bisulphite batch Correction Many algorithms available SVD/SVA/DWD Gene expression ComBat Chen C, Grennan K, Badner J, Zhang D, Gershon E, et al. (2011) Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods. PLoS ONE 6(2): e17238. doi:10.1371/journal.pone.0017238 Empirical Bayesian framework Create a model matrix Supply batch var Standardise gene-wise Least squares approach Fits L/S model – find priors Adjust to empirical parametric priors

Batch Correction Example data Batch correct Tost norm. data use M values Convert back to β Values can escape 0-1 limit Scale 0.02% of probes Dist. unaffected.

Batch Correction: BEFORE

Batch Correction: AFTER

Datasets ARIES pre-release: Filtered probes SNP probes Age group n Cord 584 F7 598 TF3 (15) 64 F17 280 Antenatal 394 FOM 329

MWAS Choice of servers: Epi-garrod BlueCrystal

Epi-garrod Request account via IT-services for: epi-garrod.bris.ac.uk Relatively quiet server in the dept. No queuing system Check htop before running jobs Cord data requires ~15% RAM

Epi-garrod Data: Permissions for this folder SAN Accessible from multiple servers /mnt/sscm3/ARIES_DATA/… Permissions for this folder You must be a member of the aries group

Blue Crystal Request an account via: Queuing handled Data: https://www.acrc.bris.ac.uk/login-area/apply.cgi Queuing handled Data: /gpfs/cluster/smed/alspac-shared/aries/… Again, permissions required: Member of aries group

Files ALN_dasen_<<time_code>>_betas.Rdata ALN_tost_<<time_code>>_betas.Rdata <<time_code>>_manifest.Rdata fdata.Rdata MWAS.r

ALN_dasen_<<time_code>>_betas.Rdata

<<time_code>>_manifest.Rdata

Fdata_new.RData

CpGassoc CRAN http://cran.r-project.org/web/packages/CpGassoc/index.html Tests for association between an independent variable and methylation Option to include additional covariates Assesses significance with: Holm (step-down Bonferroni) FDR methods

MWAS.r

MWAS.r continued...

MWAS.r continued...

Manhattan / QQ Replicated the following studies results: Gene hits: 450K Epigenome-Wide Scan Identifies Differential DNA Methylation  in Newborns Related to Maternal Smoking during Pregnancy. Bonnie R. Joubert, et.al.,  Gene hits: GFI1, AHRR, MYO1G, CYP1A1 "CYP1A1 plays a key role in the aryl hydrocarbon receptor signaling pathway, which mediates the detoxification of the components of tobacco smoke." - Joubert, et.al.,

Results file

BlueCrystal .bashrc

Any Questions?