Randa Stringer Supervisor: Dr. Guillaume Par é A review of quality control and pre- processing measures for the Illumina 450K BeadChip.

Slides:



Advertisements
Similar presentations
Female DNase: Male DNase: Female Control: Male Control: Figure S1: Overview of DNase hypersensitivity on chr5. Shown are all sequence reads across the.
Advertisements

EigenFaces and EigenPatches Useful model of variation in a region –Region must be fixed shape (eg rectangle) Developed for face recognition Generalised.
SHI Meng. Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants,
Visualising and Exploring BS-Seq Data
Chromatin Immuno-precipitation (CHIP)-chip Analysis
Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS Spring 2010 Dr Mark Reimers.
Microarray Normalization
RNA-seq: the future of transcriptomics ……. ?
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
Data Analysis for High-Throughput Sequencing
Development, Implementation and Testing of a DNA Microarray Test Suite Ehsanul Haque Mentors: Dr. Cecilie Boysen Dr. Jim Breaux ViaLogy Corp.
Getting the numbers comparable
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Preprocessing Methods for Two-Color Microarray Data
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Low Level Statistics and Quality Control Javier Cabrera.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Microarray Analysis Jesse Mecham CS 601R. Microarray Analysis It all comes down to Experimental Design Experimental Design Preprocessing Preprocessing.
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian.
DNA Methylation Assays High Throughput Data Analysis BIOS , VCU Winter 2010 Mark Reimers, PhD.
The virochip (UCSF) is a spotted microarray. Hybridization of a clinical RNA (cDNA) sample can identify specific viral expression.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Agenda Introduction to microarrays
1 Dimension Reduction Examples: 1. DNA MICROARRAYS: Khan et al (2001): 4 types of small round blue cell tumors (SRBCT) Neuroblastoma (NB) Rhabdomyosarcoma.
Epigenetic Analysis BIOS Statistics for Systems Biology Spring 2008.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Statistical analysis of expression data: Normalization, differential expression and multiple testing Jelle Goeman.
Spatial Interpolation III
Lecture Topic 5 Pre-processing AFFY data. Probe Level Analysis The Purpose –Calculate an expression value for each probe set (gene) from the PM.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Other genomic arrays: Methylation, chIP on chip… UBio Training Courses.
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Cluster validation Integration ICES Bioinformatics.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
No reference available
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
Supplemental Figure 1. False trans association due to probe cross-hybridization and genetic polymorphism at single base extension site. (A) The Infinium.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Ethnic variation in methylation of birth weight and length Presenter: Zahra Sohani Supervisor: Dr. Sonia Anand.
Bioinformatics for biologists (2) Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
Differential Methylation Analysis
Volume 23, Issue 2, Pages (February 2016)
Expression and Methylation: QC and Pre-Processing
Sensitivity Analysis of the MGMT-STP27 Model and Impact of Genetic and Epigenetic Context to Predict the MGMT Methylation Status in Gliomas and Other.
Discovery of Multiple Differentially Methylated Regions
The FASTQ format and quality control
876 fetal cord blood DNA samples
Figure 7 miRNA and mRNA gene expression changes in the Poor Group
Sensitivity Analysis of the MGMT-STP27 Model and Impact of Genetic and Epigenetic Context to Predict the MGMT Methylation Status in Gliomas and Other.
ppmi EPIgenetics Andy Singleton and Dena Hernandez
DNase‐HS sites are main independent determinants of DNA replication timing Simulations based on genome sequence features (GC content, CpG islands), or.
Sensitivity of RNA‐seq.
Visualising and Exploring BS-Seq Data
Generalizations of Markov model to characterize biological sequences
Getting the numbers comparable
Volume 23, Issue 2, Pages (February 2016)
Blood DNA methylation biomarkers predict clinical reactivity in food-sensitized infants  David Martino, PhD, Thanh Dang, PhD, Alexandra Sexton-Oates, BSc,
Volume 23, Issue 1, Pages 9-22 (January 2013)
Normalization for cDNA Microarray Data
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Density Density ß values ß values
Other genomic arrays: Methylation, chIP on chip…
Batch variation of formulations from two products by two different genomic-scale techniques. Batch variation of formulations from two products by two different.
Presentation transcript:

Randa Stringer Supervisor: Dr. Guillaume Par é A review of quality control and pre- processing measures for the Illumina 450K BeadChip

Steps for Review  Sample Quality  Probe Quality  Background correction  Normalization  Cellular composition  Batch effects

Array Design  > 485,000 CpG sites  Covers 99% of RefSeq genes  Average of 17 sites per gene  Distributed across promoter, 5’ UTR, first exon, gene body, and 3’ UTR  Covers 96% of known CpG islands

Sample Quality  Reported vs. predicted sex  Use DNA methylation to predict sex  Minfi – getSex function  yMed - xMed is less than cutoff we predict a female, otherwise male.  Sample detection cut-offs  Threshold of failed probes in a sample (usually < 0.05 or 0.01)

Probe Quality  Probe detection cut-offs  Bead count ( > 3 )  Remove probes on sex chromosomes  Probes containing SNPs  Cross-reactive probes  MAF > 1%

Background Correction  Background subtraction method  Available in GenomeStudio  Background calculated from negative control probes is subtracted from all probes (separately for each channel [rd vs grn]) (GenomeStudio Methylation Module v1.8 User Guide)

Normalization  Goal: reduce non-biological variation  Equalizes probe intensity and signal distributions across arrays and between colour channels  New challenges with DNA methylation vs. gene expression techniques  Systematic/technical variation  Novel probe design

Normalization for Illumina 450K  Problem: 2-type probe design Infinium I Probe 2 different probes per CpG Infinium II Probe Single base extension at CpG Maksimovic et al. Genome Biology 2012

CpG Content  Infinium II ≤ 3 Infinium I ≥ 3  Compressed β value distribution in InfII  Solution: scale Infinium II probes to InfI probes Maksimovic et al. Genome Biology 2012

Normalization to Internal Controls  Illumina GenomeStudio  Probe intensity multiplied by constant normalization factor (NF)  NF calculated as average of controls in a reference sample (GenomeStudio Methylation Module v1.8 User Guide)  Doesn’t account for the InfI vs InfII probe issues

Peak-Based Correction (PBC)  Uses peak summits to correct β values  Convert β to M values  Determine peaks for I and II probes with kernel density estimation  Rescale M values by peak summits  Rescale these corrected M values to the I range and converted back to β values Raw PBC Dedeurwaerder et al. Epigenomics 2011

Subset Quantile Normalization (SQN)  Modeled after SQN methods in expression  Probes separated and poor detection removed  ‘Anchors’ (RQs) chosen from InfI probes  Target quantiles are estimated for InfI and II  InfI and II normalized to their RQs  Dataset is rebuilt Touleimat and Tost, Epigenomics, 2012

SQN Cont’d No normalization Unique RQs RQs by ‘relation to CpG’ RQs by ‘relation to gene sequence’ Maksimovic et al. Genome Biology 2012

Subset Within-Array Normalization (SWAN)  Allows InfI and InfII probes to be normalized together  Subset of N InfI and InfII probes chosen based on underlying CpG content  Separate methylated and unmethylated channels  Mean intensity for each of 3N calculated  InfI and II probes adjusted separately by linear interpolation Maksimovic et al. Genome Biology 2012

Beta-MIxture Quantile normalization (BMIQ)  Novel normalization method  Fit 3-state (U/H/M) to InfI and InfII probes separately  Transform InfI U and M probes using the inverse of the cumulative beta distribution estimated from the respective InfII probes  For H probes perform dilation transformation to fit the data into the gap Teschendorff et al. Bioinformatics 2012

START Data Raw DataSWAN Normalized

Cellular Composition Adapted from Correa-Rocha et al. Pediatric Research 2012

Estimations by Houseman Houseman et al. BMC Bioinformatics 2012

Batch Effects  Can be assessed using principal component analysis or variations on singular variable decomposition (ex. sva)  ComBat method uses a parametric or non- parametric empirical Bayes framework to adjust for a known source of batch effects

Singular Variable Decomposition (START)

Questions & Discussion