Statistical Analyses of High Density Oligonucleotide Arrays Rafael A. Irizarry Department of Biostatistics, JHU (joint work with Bridget Hobbs and Terry.

Slides:

Advertisements

Similar presentations

NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.

Advertisements

Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li &Wong’s, and AvLog(PM-BG) Rafael A. Irizarry Department of Biostatistics, JHU (joint.

Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li &Wong’s, and AvLog(PM-BG) Rafael A. Irizarry Department of Biostatistics, JHU (joint.

Modeling sequence dependence of microarray probe signals Li Zhang Department of Biostatistics and Applied Mathematics MD Anderson Cancer Center.

Gene Expression Index Stat Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.

Microarray Normalization

Evaluation of Affymetrix array normalization procedures based on spiked cRNAs Andrew Hill Expression Profiling Informatics Genetics Institute/Wyeth-Ayerst.

Zhongxue Chen, Monnie McGee, Qingzhong Liu and Richard Scheuermann

Microarray technology and analysis of gene expression data Hillevi Lindroos.

Development, Implementation and Testing of a DNA Microarray Test Suite Ehsanul Haque Mentors: Dr. Cecilie Boysen Dr. Jim Breaux ViaLogy Corp.

Statistical Methods in Microarray Data Analysis Mark Reimers, Genomics and Bioinformatics, Karolinska Institute.

Getting the numbers comparable

DNA microarray and array data analysis

Probe Level Analysis of AffymetrixTM Data

Normalization for cDNA Microarray Data Yee Hwa Yang, Sandrine Dudoit, Percy Luu and Terry Speed. SPIE BIOS 2001, San Jose, CA January 22, 2001.

Public data - available for projects 6 data sets: –Human Tissues –Leukemia –Spike-in –FARO compendium – Yeast Cell Cycle –Yeast Rosetta Find one yourself.

Dilution/Mixture Study Bill Craven, GeneLogic, Inc. Motivated by a desire for a data set to be used as a baseline to characterize analysis and normalization.

Low-Level Analysis and QC Regional Biases Mark Reimers, NCI.

Summarizing and comparing GeneChip  data Terry Speed, UC Berkeley & WEHI, Melbourne Affymetrix Users Meeting, Friday June 7, 2002 Redwood City, CA.

Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew.

Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,

Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.

5 µm Millions of copies of a specific oligonucleotide probe >5 760,000 different complementary probes ~ targets Single stranded, labeled ‘target’

Felix Naef & Marcelo Magnasco, GL meeting, Nov Outline Background subtraction Probeset statistics Excursions into.

1 Models and methods for summarizing GeneChip probe set data.

Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.

Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.

ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”

Affymetrix GeneChip Data Analysis Chip concepts and array design Improving intensity estimation from probe pairs level Clustering Motif discovering and.

Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Division of Human Cancer Genetics Ohio State University.

GeneChips and Microarray Expression Data

Gene Expression Microarrays Microarray Normalization Stat

Summaries of Affymetrix GeneChip probe level data By Rafael A. Irizarry PH 296 Project, Fall 2003 Group: Kelly Moore, Amanda Shieh, Xin Zhao.

Microarray Preprocessing

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Affymetrix GeneChips Oligonucleotide.

Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics

Data Type 1: Microarrays

Panu Somervuo, March 19, cDNA microarrays.

Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.

Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.

Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.

Department of Statistics, University of California, Berkeley, and Division of Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research.

Assessing expression data quality in high-density oligonucliotide arrays.

Scenario 6 Distinguishing different types of leukemia to target treatment.

Introduction to DNA microarray technologies Sandrine Dudoit, Robert Gentleman, Rafael Irizarry, and Yee Hwa Yang Bioconductor short course Summer 2002.

Lo w -Level Analysis of Affymetrix Data Mark Reimers National Cancer Institute Bethesda Maryland.

Lecture Topic 5 Pre-processing AFFY data. Probe Level Analysis The Purpose –Calculate an expression value for each probe set (gene) from the PM.

Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.

Model-based analysis of oligonucleotide arrays, dChip software Statistics and Genomics – Lecture 4 Department of Biostatistics Harvard School of Public.

GeneChip® Probe Arrays

1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:

Pre-processing in DNA microarray experiments Sandrine Dudoit, Robert Gentleman, Rafael Irizarry, and Yee Hwa Yang Bioconductor short course Summer 2002.

A Microarray-Based Screening Procedure for Detecting Differentially Represented Yeast Mutants Rafael A. Irizarry Department of Biostatistics, JHU

Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.

Microarray hybridization Usually comparative – Ratio between two samples Examples – Tumor vs. normal tissue – Drug treatment vs. no treatment – Embryo.

Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics

Pre-processing DNA Microarray Data Sandrine Dudoit, Robert Gentleman, Rafael Irizarry, and Yee Hwa Yang Bioconductor Short Course Winter 2002 © Copyright.

Empirical Bayes Analysis of Variance Component Models for Microarray Data S. Feng, 1 R.Wolfinger, 2 T.Chu, 2 G.Gibson, 3 L.McGraw 4 1. Department of Statistics,

Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.

Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data Rafael A. Irizarry Department of Biostatistics, JHU (joint.

Introduction to Oligonucleotide Microarray Technology

Using ArrayStar with a public dataset

Introduction to Affymetrix GeneChip data

Copy-number estimation using Robust Multichip Analysis - Supplementary materials for the aroma.affymetrix lab session Henrik Bengtsson & Terry Speed Dept.

CDNA-Project cDNA project Julia Brettschneider (UCB Statistics)

Significance Analysis of Microarrays (SAM)

Significance Analysis of Microarrays (SAM)

Getting the numbers comparable

Normalization for cDNA Microarray Data

Pre-processing AFFY data

Presentation transcript:

Statistical Analyses of High Density Oligonucleotide Arrays Rafael A. Irizarry Department of Biostatistics, JHU (joint work with Bridget Hobbs and Terry Speed, Walter & Eliza Hall Institute of Medical Research and Francois Collin,Gene Logic)

Summary Review of technology Data exploration Probe level summaries (expression measures) Normalization Evaluate and compare through bias, variance and model fit to 4 expression measures Use Gene Logic spike-in and dilution study Conclusion/future work

Probe Arrays 24µm Millions of copies of a specific oligonucleotide probe Image of Hybridized Probe Array Image of Hybridized Probe Array >200,000 different complementary probes Single stranded, labeled RNA target Oligonucleotide probe * * * * *1.28cm GeneChip Probe Array Hybridized Probe Cell Compliments of D. Gerhold

PM MM

Data and Notation PM ijn, MM ijn = Intensity for perfect/mis- match probe cell j, in chip i, in gene n i = 1,…, I (ranging from 1 to hundreds) j=1,…, J (usually 16 or 20) n = 1,…, N (between 8,000 and 12,000)

The Big Picture Summarize 20 PM,MM pairs (probe level data) into one number for each gene We call this number an expression measure Affymetrix GeneChip’s Software uses AvDiff as expression measure Does it work? Can it be improved?

What is the evidence? Lockhart et. al. Nature Biotechnology 14 (1996)

Competing Measures of Expression GeneChip ® software uses Avg.diff with A a set of “suitable” pairs chosen by software. Log ratio version is also used. For differential expression Avg.diffs are compared between chips.

Competing Measures of Expression GeneChip ® new version uses something else with MM* a version of MM that is never bigger than PM.

Competing Measures of Expression Li and Wong fit a model Consider expression in chip i Efron et. al. consider log PM – 0.5 log MM Another is second largest PM

Competing Measures of Expression Why not stick to what has worked for cDNA? with A a set of “suitable” pairs.

Features of Probe Level Data

SD vs. Avg of Defective Probes

ANOVA: Strong probe effect 5 times bigger than gene effect

Histograms of log 2 (PM/MM) stratifies by log 2 (PMxMM)/2 for mouse chip for defective and normal probe

Normalization at Probe Level

Spike-In Experiments Set A: 11 control cRNAs were spiked in, all at the same concentration, which varied across chips. Set B: 11 control cRNAs were spiked in, all at different concentrations, which varied across chips. The concentrations were arranged in 12x12 cyclic Latin square (with 3 replicates)

Set A: Probe Level Data (12 chips)

What Did We Learn? Don’t subtract or divide by MM Probe effect is additive on log scale Take logs

Why Remove Background?

Background Distribution

Average Log 2 (PM-BG) Normalize probe level data Compute BG = background mean by estimating the mode of the MM distribution Subtract BG from each PM If PM-BG < 0 use minimum of positives divided by 2 Take average

Expression after Normalization

Expression Level Comparison

Spike-In B Probe SetConc 1Conc 2Rank BioB BioB BioC BioB-M BioDn DapX CreX CreX BioC DapX DapX-M Later we consider 23 different combinations of concentrations

Differential Expression

Observed Ranks GeneAvDiffMAS 5.0Li&WongAvLog(PM-BG) BioB BioB BioC BioB-M30373 BioDn DapX CreX CreX BioC DapX DapX-M Top

Observed vs True Ratio

Dilution Experiment cRNA hybridized to human chip (HGU95) in range of proportions and dilutions Dilution series begins at 1.25  g cRNA per GeneChip array, and rises through 2.5, 5.0, 7.5, 10.0, to 20.0  g per array. 5 replicate chips were used at each dilution Normalize just within each set of 5 replicates For each probe set compute expression, average and SD over replicates, and fit a line to log expression vs. log concentration Regression line should have slope 1 and high R 2

Dilution Experiment Data

Expression and SD

Slope Estimates and R 2

Model check Compute observed SD of 5 replicate expression estimates Compute RMS of 5 nominal SDs Compare by taking the log ratio Closeness of observed and nominal SD taken as a measure of goodness of fit of the model

Observed vs. Model SE

Conclusion Take logs PMs need to be normalized Using global background improves on use of probe-specific MM Gene Logic spike-in and dilution study show all four expression measures performed very well AvLog(PM-BG) is arguably the best in terms of bias, variance and model fit Future: better BG; robust/resistant summaries

Acknowledgements Gene Brown’s group at Wyeth/Genetics Institute, and Uwe Scherf’s Genomics Research & Development Group at Gene Logic, for generating the spike-in and dilution data Gene Logic for permission to use these data Ben Bolstad (UC Berkeley) Magnus Åstrand (Astra Zeneca Mölndal) Skip Garcia, Tom Cappola, and Joshua Hare (JHU)