Zhongxue Chen, Monnie McGee, Qingzhong Liu and Richard Scheuermann

Slides:



Advertisements
Similar presentations
Experiment Design for Affymetrix Microarray.
Advertisements

NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.
Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li &Wong’s, and AvLog(PM-BG) Rafael A. Irizarry Department of Biostatistics, JHU (joint.
Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li &Wong’s, and AvLog(PM-BG) Rafael A. Irizarry Department of Biostatistics, JHU (joint.
Modeling sequence dependence of microarray probe signals Li Zhang Department of Biostatistics and Applied Mathematics MD Anderson Cancer Center.
27/06/2005ISMB 2005 GenXHC: A Probabilistic Generative Model for Cross- hybridization Compensation in High-density Genome-wide Microarray Data Joint work.
Gene Expression Index Stat Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.
Microarray Normalization
Analysis and Interpretation of Microarray Data Michael F. Miles, M.D., Ph.D. Depts. of Pharmacology/Toxicology and Neurology and the Center for Study of.
Introduction to Affymetrix Microarrays
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Development, Implementation and Testing of a DNA Microarray Test Suite Ehsanul Haque Mentors: Dr. Cecilie Boysen Dr. Jim Breaux ViaLogy Corp.
Statistical Methods in Microarray Data Analysis Mark Reimers, Genomics and Bioinformatics, Karolinska Institute.
Getting the numbers comparable
DNA microarray and array data analysis
Probe Level Analysis of AffymetrixTM Data
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Microarray Analysis with a Small Number of Replicates By Kung-Hua Chang & Dhondup Pemba By Kung-Hua Chang & Dhondup Pemba Mentors: Cecilie Boysen, Ph.D.
Public data - available for projects 6 data sets: –Human Tissues –Leukemia –Spike-in –FARO compendium – Yeast Cell Cycle –Yeast Rosetta Find one yourself.
Preprocessing Methods for Two-Color Microarray Data
Low-Level Analysis and QC Regional Biases Mark Reimers, NCI.
Summarizing and comparing GeneChip  data Terry Speed, UC Berkeley & WEHI, Melbourne Affymetrix Users Meeting, Friday June 7, 2002 Redwood City, CA.
Quantitation of Gene Expression for High-Density Oligonucleotide Arrays: A SAFER Approach Daniel Holder, Bill Pikounis, Richard Raubertas, Vladimir Svetnik,
Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew.
1 Preprocessing for Affymetrix GeneChip Data 1/18/2011 Copyright © 2011 Dan Nettleton.
Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
1 Models and methods for summarizing GeneChip probe set data.
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
Combining the strengths of UMIST and The Victoria University of Manchester Propagating Measurement Uncertainty in Microarray Data Analysis Magnus Rattray.
Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Division of Human Cancer Genetics Ohio State University.
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
Summaries of Affymetrix GeneChip probe level data By Rafael A. Irizarry PH 296 Project, Fall 2003 Group: Kelly Moore, Amanda Shieh, Xin Zhao.
Microarray Preprocessing
Lecture 10. Microarray and RNA-seq
Lecture 22 Introduction to Microarray
Data Type 1: Microarrays
1 Use of the Half-Normal Probability Plot to Identify Significant Effects for Microarray Data C. F. Jeff Wu University of Michigan (joint work with G.
Bioconductor Packages for Pre-processing DNA Microarray Data affy and marray Sandrine Dudoit, Robert Gentleman, Rafael Irizarry, and Yee Hwa Yang Bioconductor.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Dr Andrew Harrison Departments of Mathematical Sciences and Biological Sciences University of Essex Looking for signals in tens of thousands.
Gene Level Expression Profiling Using Affymetrix Exon Arrays Alan Williams, Ph.D. Director Chip Design Affymetrix, Inc.
Estimating Signal with Next Generation Affymetrix Software Earl Hubbell, Ph.D. Principal Statistician, Applied Research.
Assessing expression data quality in high-density oligonucliotide arrays.
Microarray - Leukemia vs. normal GeneChip System.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.
Lo w -Level Analysis of Affymetrix Data Mark Reimers National Cancer Institute Bethesda Maryland.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Lecture Topic 5 Pre-processing AFFY data. Probe Level Analysis The Purpose –Calculate an expression value for each probe set (gene) from the PM.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Model-based analysis of oligonucleotide arrays, dChip software Statistics and Genomics – Lecture 4 Department of Biostatistics Harvard School of Public.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
(1) Normalization of cDNA microarray data Methods, Vol. 31, no. 4, December 2003 Gordon K. Smyth and Terry Speed.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Statistical Analyses of High Density Oligonucleotide Arrays Rafael A. Irizarry Department of Biostatistics, JHU (joint work with Bridget Hobbs and Terry.
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
Gene expression  Introduction to gene expression arrays Microarray Data pre-processing  Introduction to RNA-seq Deep sequencing applications RNA-seq.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data Rafael A. Irizarry Department of Biostatistics, JHU (joint.
Using ArrayStar with a public dataset
Introduction to Affymetrix GeneChip data
CDNA-Project cDNA project Julia Brettschneider (UCB Statistics)
Getting the numbers comparable
Lecture 3 From Images to Data
Pre-processing AFFY data
Presentation transcript:

A Distribution-Free Summarization Method for Affymetrix GeneChip Arrays Zhongxue Chen, Monnie McGee, Qingzhong Liu and Richard Scheuermann Dallas Area Bioinformatics Workshop August 29, 2006

A new summarization method Distribution Free Weighted (DFW) Summarization Use information on variability of probe intensities to summarize Affymetrix data Translate variability into weights which allow downweighting of poorly performing probes DAB Workshop 2006

Need for Summarization Result of unique Affymetrix array structure Summarization is necessary to obtain one number for each gene All 11 - 20 probes interrogating each gene must be summarized into one expression value DAB Workshop 2006

Structure of Affymetrix Arrays Probe = sequence of 25 bases Probe pair = perfect match (PM) probe and its corresponding mismatch (MM) Probe set = 11 to 20 probe pairs interrogating one gene or EST Chips contain 6K to 54K probe sets Image courtesy of Affymetrix DAB Workshop 2006

PM and MM PM = 25 base probe perfectly complementary to a specific region of a gene MM = 25 base probe agreeing with PM apart from middle base Middle base is a transition to Watson-Crick complement (AT, G C) DAB Workshop 2006

DFW Transform probe-level intensities to log2 scale for all arrays in experiment Stabilizes the variance (larger intensity  increased variability Arrange arrays in N by R matrix N = total number of PM probes R = total number of arrays for entire experiment For each probe set, calculate a weight for each PM probe using Tukey biweight function Multiply weights by each probe intensity and summarize DAB Workshop 2006

Calculating Weights Calculate range of log intensities for each PM Find median of each range (M) Calculate distance of range to M for each PM (call this distance x) Weighting function: DAB Workshop 2006

Probe Weights Weight for probe i is given by J = number of probes in the probe set DAB Workshop 2006

More Calculations Weighted Range (WR) Range of weighted intensities Weighted Standard Deviation (WSD) Transformed Intensity Values (TIV) Standardizes measures between DEGs and non-DEGs m and n should be positive integers DAB Workshop 2006

Example array-1, 2, 3, 4, 5, 6 range x w(x) wi SD wi(SD) PM1 5.8 6.2 5.9 9.5 10.1 9.2 4.3 0.45 0.86 0.32 2.02 0.30 PM2 8.2 7.9 7.8 11.7 12.0 10.7 4.2 0.35 0.91 0.34 1.97 0.35 PM3 7.3 7.4 8.1 8.8 7.9 9.5 2.2 1.65 0 0 0.85 0 PM4 7.7 6.9 7.4 10.4 9.3 8.5 3.5 0.35 0.91 0.34 1.31 0.35 M = 3.85 max(x) = 1.65 Weighted Intensities: 7.26 7.02 7.06 10.55 10.47 9.47 Transformed Intensities (TI): 0.07 0 0.01 1 0.98 0.69 Weighted Range (WR): 10.55 - 7.02 = 3.53 Weighted SD (WSD): 1.75 Expression values (m=3, n=1): 7.28 7.02 7.06 10.87 10.78 9.69 DAB Workshop 2006

Why Weight? Some PMs may have poor behavior Give small or 0 weight to “poor” PM Use information across arrays Assess quality of PM based on overall behavior SD of range provides information for detecting differentially expressed genes DAB Workshop 2006

Probe Performance Poorly performing probes DAB Workshop 2006

Comparison Data Sets Affymetrix Latin Square Spike-In Experiments Two experiments: on HGU-95Av2 platform and HGU-133A platform HGU-95 experiments has 14 transcripts spiked-in at concentrations from 0 to 1024 pM (59 arrays) HGU-133 experiment has 42 transcripts spiked-in in triplicate at concentrations from 0 to 512 (42 arrays) McGee and Chen (2006) report 22 more spike-ins “GoldenSpike” Experiment (Choe et al., 2005) Six arrays (3 experiment, 3 control) on DrosGenome1 Chip 1309 transcripts recognizing known fold differences (from 1.2 to 4) 2551 recognizing transcripts included at the same concentration DAB Workshop 2006

Comparison Methods ROC curves, AUC values and CPU time Competitors: Robust Multichip Average (RMA) Bolstad, 2004; Irizarry et al., 2003 Gene Chip RMA (GCRMA) Wu et al., 2004 MAS 5.0, PLIER Affymetrix 2001, 2004 Model-Based Expression Index (MBEI) Li & Wong, 2001a,b Factor Analysis for Robust Array Summarization (FARMS) Hochreiter et al., 2006 DAB Workshop 2006

HGU-95 dataset : DAB Workshop 2006

HGU-133 dataset (64 spike-ins) DAB Workshop 2006

“Preferred” Method Choe et al. tested dozens of combinations of background correction, normalization, and summarization methods Preferred = the “best performing” method (according to DEGs obtained by CyberT - Baldi & Long, 2001) MAS 5.0 background correction  Quantile normalization  median polish summarization  second expression level normalization using LOESS procedure DAB Workshop 2006

GoldenSpike Data (FC = 1.2) DAB Workshop 2006

Overall Area Under the Curve HGU-95a HGU-133a Choeb DFW 1.00 0.85 FARMS 0.91 0.95 0.83 GCRMA 0.69 0.57 0.88 RMA 0.60 0.63 0.77 RMA-noBG 0.65 0.82 MAS 5 0.05 0.06 0.39 MBEI 0.26 0.40 0.76 PLIER 0.03 0.20 0.50 a From Affycomp II competition: 16 spike-ins for HGU95, 42 spike-ins for HGU133, bAll spike-ins DAB Workshop 2006

Computation Speed (in seconds) DAB Workshop 2006

Computational Speed (in seconds) HGU-95 HGU-133 Choe DFW 112 150 68 FARMS 132 198 280 GCRMA 214 210 78 RMA 342 388 RMA-noBG 299 353 147 MAS 5 953 1064 130 MBEI 869 833 269 PLIER 321 239 17 DAB Workshop 2006

Further Comparisons Affycomp II Competition SMU Technical Report Cope, et al., 2004; Irizarry et al., 2006 For Hgu95 spikein data, uses 16 spike-ins For Hgu133 spikein data, uses 42 spike-ins http://affycomp.biostat.jhsph.edu/AFFY2/TABLES.hgu/0.html SMU Technical Report http://www.smu.edu/statistics/TechReports/TR344.pdf Monnie McGee’s website http://faculty.smu.edu/mmcgee DAB Workshop 2006

References DAB Workshop 2006 Affymetrix, Inc.. (2002) Statistical algorithms description document. Affymetrix, Inc. (2005) Technical note: guide to probe logarithmic intensity error (PLIER) estimation. Baldi,P. and Long, A.D. (2001) A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics, 17, 509-519. Bolstad, BM. (2004) Low Level Analysis of High-density oligonucleotide array data: Background, normalization and summarization [dissertation]. Department of Statistics, University of California at Berkeley. Choe, S.E. et al. (2005) Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control datasets. Genome Biol., 6, R16.1-R16.6. Cope, L.M. et.al. (2004) A benchmark for Affymetrix GeneChip expression measures. Bioinformatics, 20, 323-331. Hochreiter, S. et al. (2006) A new summarization method for Affymetrix probe level data. Bioinformatics, 22, 943-949 Irizarry, R.A. et al. (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, 4, 249-264. Irizarry, R.A. et al. (2006) Comparison of Affymetrix GeneChip expression measures. Bioinformatics, 22, 789-794. Li, C. and Wong, H.W. (2001a) Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc. Nat. Acad. Sci., 98, 31-36. Li, C and Wong, H.W. (2001b) Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biol., 2, research0032.1-0032.11. McGee, M. and Chen, Z. (2006) New spiked-in probe sets for the Affymetrix HG-U133A Latin Square experiment. COBRA Preprint Series, Article 5 Wu, Z. et.al. (2004) A model-based background adjustment for oligonucleotide expression arrays. J. Am. Stat. Assoc., 99, 909-917. DAB Workshop 2006