Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers.

Slides:



Advertisements
Similar presentations
Quality control of Affymetrix arrays. What can go wrong? RNA degradation (before hyb) –3’/5’ Dirty samples –background, % present calls Uneven hybridizations.
Advertisements

M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.
We processed six samples in triplicate using 11 different array platforms at one or two laboratories. we obtained measures of array signal variability.
Gene Expression Index Stat Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.
Microarray Normalization
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Statistical Methods in Microarray Data Analysis Mark Reimers, Genomics and Bioinformatics, Karolinska Institute.
Getting the numbers comparable
Probe Level Analysis of AffymetrixTM Data
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Preprocessing Methods for Two-Color Microarray Data
Normalization Class web site: Statistics for Microarrays.
Low-Level Analysis and QC Regional Biases Mark Reimers, NCI.
Gene Expression Data Analyses (2)
Low Level Statistics and Quality Control Javier Cabrera.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Microarray Data Analysis Using R Studies in Tissue Databases Mark Reimers, NCI.
Felix Naef & Marcelo Magnasco, GL meeting, Nov Outline Background subtraction Probeset statistics Excursions into.
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
Introduce to Microarray
A robust neural networks approach for spatial and intensity-dependent normalization of cDNA microarray data A.L. Tarca, J.E.K. Cooke and J. MacKay Presented.
Filtering and Normalization of Microarray Gene Expression Data Waclaw Kusnierczyk Norwegian University of Science and Technology Trondheim, Norway.
Summaries of Affymetrix GeneChip probe level data By Rafael A. Irizarry PH 296 Project, Fall 2003 Group: Kelly Moore, Amanda Shieh, Xin Zhao.
1 Normalization Methods for Two-Color Microarray Data 1/13/2009 Copyright © 2009 Dan Nettleton.
(4) Within-Array Normalization PNAS, vol. 101, no. 5, Feb Jianqing Fan, Paul Tam, George Vande Woude, and Yi Ren.
The following slides have been adapted from to be presented at the Follow-up course on Microarray Data Analysis.
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Affymetrix vs. glass slide based arrays
Practical Issues in Microarray Data Analysis Mark Reimers National Cancer Institute Bethesda Maryland.
Microarray Data Analysis Tutorial at ISMB 2008 Mark Reimers Virginia Commonwealth University.
DATA TRANSFORMATION and NORMALIZATION Lecture Topic 4.
CDNA Microarrays MB206.
Panu Somervuo, March 19, cDNA microarrays.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Agenda Introduction to microarrays
Epigenetic Analysis BIOS Statistics for Systems Biology Spring 2008.
Lo w -Level Analysis of Affymetrix Data Mark Reimers National Cancer Institute Bethesda Maryland.
A Short Overview of Microarrays Tex Thompson Spring 2005.
Microarray Normalization Issues in High-Throughput Data Analysis BIOS Spring 2010 Dr Mark Reimers.
Lecture Topic 5 Pre-processing AFFY data. Probe Level Analysis The Purpose –Calculate an expression value for each probe set (gene) from the PM.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Statistics for Differential Expression Naomi Altman Oct. 06.
ABC D EF GH I JKL. Supplementary figure S1: Exemplary overview of the quality assessment plots generated by Robin. All plots have been generated using.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University, Sweden Plate Effects in cDNA Microarray Data.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Design of Micro-arrays Lecture Topic 6. Experimental design Proper experimental design is needed to ensure that questions of interest can be answered.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
(1) Normalization of cDNA microarray data Methods, Vol. 31, no. 4, December 2003 Gordon K. Smyth and Terry Speed.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Henrik Bengtsson Mathematical Statistics Centre for Mathematical Sciences Lund University Plate Effects in cDNA Microarray Data.
Statistical Analyses of High Density Oligonucleotide Arrays Rafael A. Irizarry Department of Biostatistics, JHU (joint work with Bridget Hobbs and Terry.
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data Rafael A. Irizarry Department of Biostatistics, JHU (joint.
Copyright © 2007 Dan Nettleton
Normalization Methods for Two-Color Microarray Data
The Basics of Microarray Image Processing
Microarrays 1/31/2018.
Getting the numbers comparable
Normalization for cDNA Microarray Data
Pre-processing AFFY data
Presentation transcript:

Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS Spring 2010 Dr Mark Reimers

Quality Assessment Are there any factors that would lead you to doubt or distrust a particular datum (array) ? Quality of inputs – e.g. RNA quality Statistical QA – evidence of systematic variation different from others

BioAnalyzer Ideal: Two sharp peaks for 18S & 28S RNA

Spot QA for cDNA Spotted Arrays Spot Measures –Signal/Noise Foreground / background or –foreground / SD –Uniformity –Spot Area Global Measures –Qualitative assessments –Averages of spot measures Inspect images for artifacts –Streaks of dye, scratches etc. Are there biases in regions? With commercial arrays we assume these issues are under control

Statistical Approaches Question: Are any samples different from others on technical grounds? Exploratory Data Analysis (EDA) Boxplots, clustering, PCA –Are there any outliers? –Are there associations with technical factors? Technician; date of sample prep; etc.

EDA - Boxplots Boxplot of 16 chips from Cheung et al Nature 2005

Another Portrait - Densities

Probe Intensities in 23 Replicates

Some Causes of Technical Variation Temperature of hybridization differs Amount of RNA differs RNA degraded in some samples Yield of conversion to cDNA or cRNA differs Strength of ionic buffers differs Stringency of wash differs Scratches on some chips Ozone (affects Cy5) at some times

Borrow an Idea from Model Testing Question: Is the model adequate? Or do hidden factors cause systematic errors? Examine residuals after fitting model –Should be IID Normal –Is there structure in residuals? –Plot against known technical covariates, such as order of sample How to adapt residual examination for high-throughput assays?

Statistical QA for Arrays Model for signal of probe i on chip j: y ij ~  i +  ij –Each gene has same mean in all arrays (mostly true) –Look at residuals after fitting model New twist for high-throughput assays: –Examine residuals within each chip (fix j; vary i) –Plot against known technical factors of probes –Is there any factor that seems to be predicting systematic errors?

Statistical QA of Arrays Significant artifacts may not be obvious from visual inspection or bulk statistics General approach: plot deviations from average or residuals from fit against any technical variable: –Average Intensity across chips –CG content or T m –Probe position relative to 3’ end of gene (for poly-T primed RNA) –Physical location on chip

Ratio vs Intensity Plots: Saturation & Quenching Saturation –Decreasing rate of binding of RNA at higher occupancies on probe Quenching: –Light emitted by one dye molecule may be re-absorbed by a nearby dye molecule –Then lost as heat –Effect proportional to square of density Plot of log ratio against average log intensity across chips GSM25377 from the CEPH expression data GSE2552

How Much Variability on R-I? Ratio-Intensity plots for six arrays at random from Cheung et al Nature (2005)

Covariation with Probe T m MAQC project Agilent 44K –Array 1C3 –Performed by Agilent Plot of log ratios to average against Tm Bimodal distribution because two samples are very different

Covariation with Probe Position RNA degrades from 5’ end Intensity should decrease from 3’ end uniformly across chips affyRNAdeg plots in affy package Plot of average intensity for each probe position across all genes against probe position

Effect of Runs of Guanines 4 G’s allows quadruplex structure

Spatial Variation Across Chips Red/Green ratios show variation -probably concentrated Ratios of ratios on slide to ratios on standard show consistent biases

In House Spotted Arrays Ratio of ratios shows much clearer concentration of red spots on some slides Note non-random but highly irregular concentration of red Legend

Bioconductor arrayQuality Package

Background Subtraction (1) We think that local background contributes to bias Does subtracting background remove bias? Local off-spot background may not be the best estimate of spot background (non- specific hyb) Spots BG subtracted

Background Subtraction (2) Raw spot ratios show a mild bias relative to average After subtracting a high green bg in the center a red bias results Raw Ratios Background BG-subtracted

Other Bias Patterns This spotted oligo array shows strong biases at the beginning and end of each print-tip group The background shows a milder version of this effect Subtracting background compensates for about half this effect Processed Raw Spot Background

Local Bias on Affymetrix Chips Image of raw data on a log2 scale shows striations but no obvious artifacts Image of ratios of probes to standard shows a smudge Non- coding probes Images show high values as red, low values as yellow

Spatial Artifacts on Affy Chips Bubbles (yellow) in hybridization chamber Touching cover slip and wiping incompletely Scratches on cover slip

QC in Bioconductor Robust Multi-chip Analysis (RMA) –fits a linear model to each probe set –High residuals show regional patterns High residuals in green Available in affyQCReport package at See /

Affy QC Metrics in Bioconductor affyPLM package fits probe level model to Affymetrix raw data NUSE - Normalized Unscaled Standard Errors –normalized relative to each gene How many big errors?

Spatial Artifacts in Agilent Usually not so strong as on other array types More diffuse artifacts – probably reflecting washing irregularities

Spatial Artifacts in Nimblegen More common than Agilent Usually more diffuse, probably reflecting washing Some sharp artifacts of unclear origin

Spatial Artifacts in Illumina Arrays Often bigger artifacts than Affy Less consequential because more beads, and all have same sequence