Presentation is loading. Please wait.

Presentation is loading. Please wait.

Randa Stringer Supervisor: Dr. Guillaume Par é A review of quality control and pre- processing measures for the Illumina 450K BeadChip.

Similar presentations


Presentation on theme: "Randa Stringer Supervisor: Dr. Guillaume Par é A review of quality control and pre- processing measures for the Illumina 450K BeadChip."— Presentation transcript:

1 Randa Stringer Supervisor: Dr. Guillaume Par é A review of quality control and pre- processing measures for the Illumina 450K BeadChip

2 Steps for Review  Sample Quality  Probe Quality  Background correction  Normalization  Cellular composition  Batch effects

3 Array Design  > 485,000 CpG sites  Covers 99% of RefSeq genes  Average of 17 sites per gene  Distributed across promoter, 5’ UTR, first exon, gene body, and 3’ UTR  Covers 96% of known CpG islands

4 Sample Quality  Reported vs. predicted sex  Use DNA methylation to predict sex  Minfi – getSex function  yMed - xMed is less than cutoff we predict a female, otherwise male.  Sample detection cut-offs  Threshold of failed probes in a sample (usually < 0.05 or 0.01)

5 Probe Quality  Probe detection cut-offs  Bead count ( > 3 )  Remove probes on sex chromosomes  Probes containing SNPs  Cross-reactive probes  MAF > 1%

6 Background Correction  Background subtraction method  Available in GenomeStudio  Background calculated from negative control probes is subtracted from all probes (separately for each channel [rd vs grn]) (GenomeStudio Methylation Module v1.8 User Guide)

7 Normalization  Goal: reduce non-biological variation  Equalizes probe intensity and signal distributions across arrays and between colour channels  New challenges with DNA methylation vs. gene expression techniques  Systematic/technical variation  Novel probe design

8 Normalization for Illumina 450K  Problem: 2-type probe design Infinium I Probe 2 different probes per CpG Infinium II Probe Single base extension at CpG Maksimovic et al. Genome Biology 2012

9 CpG Content  Infinium II ≤ 3 Infinium I ≥ 3  Compressed β value distribution in InfII  Solution: scale Infinium II probes to InfI probes Maksimovic et al. Genome Biology 2012

10 Normalization to Internal Controls  Illumina GenomeStudio  Probe intensity multiplied by constant normalization factor (NF)  NF calculated as average of controls in a reference sample (GenomeStudio Methylation Module v1.8 User Guide)  Doesn’t account for the InfI vs InfII probe issues

11 Peak-Based Correction (PBC)  Uses peak summits to correct β values  Convert β to M values  Determine peaks for I and II probes with kernel density estimation  Rescale M values by peak summits  Rescale these corrected M values to the I range and converted back to β values Raw PBC Dedeurwaerder et al. Epigenomics 2011

12 Subset Quantile Normalization (SQN)  Modeled after SQN methods in expression  Probes separated and poor detection removed  ‘Anchors’ (RQs) chosen from InfI probes  Target quantiles are estimated for InfI and II  InfI and II normalized to their RQs  Dataset is rebuilt Touleimat and Tost, Epigenomics, 2012

13 SQN Cont’d No normalization Unique RQs RQs by ‘relation to CpG’ RQs by ‘relation to gene sequence’ Maksimovic et al. Genome Biology 2012

14 Subset Within-Array Normalization (SWAN)  Allows InfI and InfII probes to be normalized together  Subset of N InfI and InfII probes chosen based on underlying CpG content  Separate methylated and unmethylated channels  Mean intensity for each of 3N calculated  InfI and II probes adjusted separately by linear interpolation Maksimovic et al. Genome Biology 2012

15 Beta-MIxture Quantile normalization (BMIQ)  Novel normalization method  Fit 3-state (U/H/M) to InfI and InfII probes separately  Transform InfI U and M probes using the inverse of the cumulative beta distribution estimated from the respective InfII probes  For H probes perform dilation transformation to fit the data into the gap Teschendorff et al. Bioinformatics 2012

16 START Data Raw DataSWAN Normalized

17 Cellular Composition Adapted from Correa-Rocha et al. Pediatric Research 2012

18 Estimations by Houseman Houseman et al. BMC Bioinformatics 2012

19 Batch Effects  Can be assessed using principal component analysis or variations on singular variable decomposition (ex. sva)  ComBat method uses a parametric or non- parametric empirical Bayes framework to adjust for a known source of batch effects

20 Singular Variable Decomposition (START)

21 Questions & Discussion


Download ppt "Randa Stringer Supervisor: Dr. Guillaume Par é A review of quality control and pre- processing measures for the Illumina 450K BeadChip."

Similar presentations


Ads by Google