Presentation is loading. Please wait.

Presentation is loading. Please wait.

Zhongxue Chen, Monnie McGee, Qingzhong Liu and Richard Scheuermann

Similar presentations


Presentation on theme: "Zhongxue Chen, Monnie McGee, Qingzhong Liu and Richard Scheuermann"— Presentation transcript:

1 A Distribution-Free Summarization Method for Affymetrix GeneChip Arrays
Zhongxue Chen, Monnie McGee, Qingzhong Liu and Richard Scheuermann Dallas Area Bioinformatics Workshop August 29, 2006

2 A new summarization method
Distribution Free Weighted (DFW) Summarization Use information on variability of probe intensities to summarize Affymetrix data Translate variability into weights which allow downweighting of poorly performing probes DAB Workshop 2006

3 Need for Summarization
Result of unique Affymetrix array structure Summarization is necessary to obtain one number for each gene All probes interrogating each gene must be summarized into one expression value DAB Workshop 2006

4 Structure of Affymetrix Arrays
Probe = sequence of 25 bases Probe pair = perfect match (PM) probe and its corresponding mismatch (MM) Probe set = 11 to 20 probe pairs interrogating one gene or EST Chips contain 6K to 54K probe sets Image courtesy of Affymetrix DAB Workshop 2006

5 PM and MM PM = 25 base probe perfectly complementary to a specific region of a gene MM = 25 base probe agreeing with PM apart from middle base Middle base is a transition to Watson-Crick complement (AT, G C) DAB Workshop 2006

6 DFW Transform probe-level intensities to log2 scale for all arrays in experiment Stabilizes the variance (larger intensity  increased variability Arrange arrays in N by R matrix N = total number of PM probes R = total number of arrays for entire experiment For each probe set, calculate a weight for each PM probe using Tukey biweight function Multiply weights by each probe intensity and summarize DAB Workshop 2006

7 Calculating Weights Calculate range of log intensities for each PM
Find median of each range (M) Calculate distance of range to M for each PM (call this distance x) Weighting function: DAB Workshop 2006

8 Probe Weights Weight for probe i is given by
J = number of probes in the probe set DAB Workshop 2006

9 More Calculations Weighted Range (WR)
Range of weighted intensities Weighted Standard Deviation (WSD) Transformed Intensity Values (TIV) Standardizes measures between DEGs and non-DEGs m and n should be positive integers DAB Workshop 2006

10 Example array-1, 2, 3, 4, 5, 6 range x w(x) wi SD wi(SD)
PM PM PM PM M = 3.85 max(x) = 1.65 Weighted Intensities: Transformed Intensities (TI): Weighted Range (WR): = 3.53 Weighted SD (WSD): 1.75 Expression values (m=3, n=1): DAB Workshop 2006

11 Why Weight? Some PMs may have poor behavior
Give small or 0 weight to “poor” PM Use information across arrays Assess quality of PM based on overall behavior SD of range provides information for detecting differentially expressed genes DAB Workshop 2006

12 Probe Performance Poorly performing probes DAB Workshop 2006

13 Comparison Data Sets Affymetrix Latin Square Spike-In Experiments
Two experiments: on HGU-95Av2 platform and HGU-133A platform HGU-95 experiments has 14 transcripts spiked-in at concentrations from 0 to 1024 pM (59 arrays) HGU-133 experiment has 42 transcripts spiked-in in triplicate at concentrations from 0 to 512 (42 arrays) McGee and Chen (2006) report 22 more spike-ins “GoldenSpike” Experiment (Choe et al., 2005) Six arrays (3 experiment, 3 control) on DrosGenome1 Chip 1309 transcripts recognizing known fold differences (from 1.2 to 4) 2551 recognizing transcripts included at the same concentration DAB Workshop 2006

14 Comparison Methods ROC curves, AUC values and CPU time Competitors:
Robust Multichip Average (RMA) Bolstad, 2004; Irizarry et al., 2003 Gene Chip RMA (GCRMA) Wu et al., 2004 MAS 5.0, PLIER Affymetrix 2001, 2004 Model-Based Expression Index (MBEI) Li & Wong, 2001a,b Factor Analysis for Robust Array Summarization (FARMS) Hochreiter et al., 2006 DAB Workshop 2006

15 HGU-95 dataset : DAB Workshop 2006

16 HGU-133 dataset (64 spike-ins)
DAB Workshop 2006

17 “Preferred” Method Choe et al. tested dozens of combinations of background correction, normalization, and summarization methods Preferred = the “best performing” method (according to DEGs obtained by CyberT - Baldi & Long, 2001) MAS 5.0 background correction  Quantile normalization  median polish summarization  second expression level normalization using LOESS procedure DAB Workshop 2006

18 GoldenSpike Data (FC = 1.2)
DAB Workshop 2006

19 Overall Area Under the Curve
HGU-95a HGU-133a Choeb DFW 1.00 0.85 FARMS 0.91 0.95 0.83 GCRMA 0.69 0.57 0.88 RMA 0.60 0.63 0.77 RMA-noBG 0.65 0.82 MAS 5 0.05 0.06 0.39 MBEI 0.26 0.40 0.76 PLIER 0.03 0.20 0.50 a From Affycomp II competition: 16 spike-ins for HGU95, 42 spike-ins for HGU133, bAll spike-ins DAB Workshop 2006

20 Computation Speed (in seconds)
DAB Workshop 2006

21 Computational Speed (in seconds)
HGU-95 HGU-133 Choe DFW 112 150 68 FARMS 132 198 280 GCRMA 214 210 78 RMA 342 388 RMA-noBG 299 353 147 MAS 5 953 1064 130 MBEI 869 833 269 PLIER 321 239 17 DAB Workshop 2006

22 Further Comparisons Affycomp II Competition SMU Technical Report
Cope, et al., 2004; Irizarry et al., 2006 For Hgu95 spikein data, uses 16 spike-ins For Hgu133 spikein data, uses 42 spike-ins SMU Technical Report Monnie McGee’s website DAB Workshop 2006

23 References DAB Workshop 2006
Affymetrix, Inc.. (2002) Statistical algorithms description document. Affymetrix, Inc. (2005) Technical note: guide to probe logarithmic intensity error (PLIER) estimation. Baldi,P. and Long, A.D. (2001) A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics, 17, Bolstad, BM. (2004) Low Level Analysis of High-density oligonucleotide array data: Background, normalization and summarization [dissertation]. Department of Statistics, University of California at Berkeley. Choe, S.E. et al. (2005) Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control datasets. Genome Biol., 6, R16.1-R16.6. Cope, L.M. et.al. (2004) A benchmark for Affymetrix GeneChip expression measures. Bioinformatics, 20, Hochreiter, S. et al. (2006) A new summarization method for Affymetrix probe level data. Bioinformatics, 22, Irizarry, R.A. et al. (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, 4, Irizarry, R.A. et al. (2006) Comparison of Affymetrix GeneChip expression measures. Bioinformatics, 22, Li, C. and Wong, H.W. (2001a) Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc. Nat. Acad. Sci., 98, Li, C and Wong, H.W. (2001b) Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biol., 2, research McGee, M. and Chen, Z. (2006) New spiked-in probe sets for the Affymetrix HG-U133A Latin Square experiment. COBRA Preprint Series, Article 5 Wu, Z. et.al. (2004) A model-based background adjustment for oligonucleotide expression arrays. J. Am. Stat. Assoc., 99, DAB Workshop 2006


Download ppt "Zhongxue Chen, Monnie McGee, Qingzhong Liu and Richard Scheuermann"

Similar presentations


Ads by Google