Download presentation

Presentation is loading. Please wait.

Published byKeyshawn Fackrell Modified over 2 years ago

1
P. J. Munson, National Institutes of Health, Nov. 2001Page 1 A "Consistency" Test for Determining the Significance of Gene Expression Changes on Replicate Samples and Two Convenient Variance-stabilizing Transformations Peter J. Munson, Ph.D. Mathematical and Statistical Computing Laboratory DCB, CIT, NIH munson@helix.nih.gov

2
P. J. Munson, National Institutes of Health, Nov. 2001Page 2 Introduction Math. Stat. Comp. Lab. at NIH Run Affy LIMS database –Started Dec 2000, Stores >700 chips, –Serves 3 core facilities at NIH Study 1 –2 treatments, 5 time points, 6 subjects, 60 U95A chips, PBMC cells Study 2 –3 treatments, 5 time points, 5 subj., 75 Hu6800 chips, human cells in culter Study 3 –4 doses, 2 time oints, 20 subjects, 20 RG U34A chips, blood cells

3
P. J. Munson, National Institutes of Health, Nov. 2001Page 3 Outline Development of Consistency Test Variance-stabilizing transforms –Generalize Logarithm, GLog –Adaptive transform for Average Diff, TAD Normalization –Normal quantile + adaptive transform Application Probe-pair data visualization: –Parallel Axis Coordinate Display

4
P. J. Munson, National Institutes of Health, Nov. 2001Page 4 Comparing Two Cell Lines Data from Carlisle, et al., Mol.Carcinogen., 2000 Don’t subtract background Ignore background-level points Calibrate on median intensity of each cell type Over 3-fold change = = Outside dashed lines Are these expression level changes significant? real?

5
P. J. Munson, National Institutes of Health, Nov. 2001Page 5 Duplicate Experiments and "Consistency" Plot Identifies Real Changes in Expression Vimentin Keratin 5

6
P. J. Munson, National Institutes of Health, Nov. 2001Page 6 Replication Permits Calculation of Significance (P-values) 4 False-positives Out of 5760 spots: P ≈ 4/5760 = 0.0007

7
P. J. Munson, National Institutes of Health, Nov. 2001Page 7 Consistency Plot Compare duplicate experiments, Log Ratio scale Set Cutoffs for Over-, Under- expression Calculate number detected, D Assume Independence, calculate expected number, E, above both, below both cutoffs Estimate false positive rate, E/D 0 0. 3 22 45.2 D=24 E=0. 6 E/D=3% 46 11 26.1 4074 4036.6 28 50.4 4113 16 E=0.6 74 88.4 0 1.1 90 274170524249 D=24 D=16

8
P. J. Munson, National Institutes of Health, Nov. 2001Page 8 p53 +/+ cells 6 hrs, replicate reciprocal experiment

9
P. J. Munson, National Institutes of Health, Nov. 2001Page 9 Consistency Test on Relative Expression DEFINE: x(g, i) = relative expression value for gene g (=1,...,n) in experiment i (=1,...,m) F i (X) = empirical cdf of x i across genes (spots) c = min j x(g, j), across experiments THEN assuming that { x(g, i), g=1,...,n } are an independent sample from distribution F i, the probability that x(g, i) is consistently large is: p up (g) = Pr(X i ≥ c, for all i) = ∏ i (1 - F i (c))

10
P. J. Munson, National Institutes of Health, Nov. 2001Page 10 Consistency Test on Relative Expression- 2 DEFINE: x(g, i) = relative expression value for gene g (= 1,...,n) in experiment i (= 1,...,m) p up (g) = ∏ i (1 - F i ( min j x(g, j) )) p dn (g) = ∏ i (F i ( max j x(g, j) )) THEN Expected number of false positives: E(g) = n * p(g)

11
P. J. Munson, National Institutes of Health, Nov. 2001Page 11 Assumptions of Consistency Test Independence between experiments “Exchangeability” of genes Homogeneity of variance across genes (i.e. across expression intensity) Does NOT require: Identical distribution in separate experiments But, variance homogeneity violated for Affy Avg. Diff. data

12
P. J. Munson, National Institutes of Health, Nov. 2001Page 12 Variance Stabilizing Transformations Logarithm Box-Cox, power Generalized Logarithm, GLog Adaptive, TAD

13
P. J. Munson, National Institutes of Health, Nov. 2001Page 13 Model Variance as Function of Mean AD

14
P. J. Munson, National Institutes of Health, Nov. 2001Page 14 Model Variance as Function of Mean AD Var(y) = a0 Var(y) = a0 + a1*y Var(y) = a0 + a1*y + a2*y 2 Var(y) = a2*y 2 =>> use logarithms What about: Var(y) = a0 + a2*y 2

15
P. J. Munson, National Institutes of Health, Nov. 2001Page 15 Var(y) = a0 + a2 * y 2 = a0*( 1+ (y/c) 2 ) where c = sqrt(a0/a2) GLog(y; c) = sign(y) *ln{ |y/c| + sqrt(1 + y 2 /c 2 ) } = s.d. at y = 0 / CV, e.g. = 10 / 0.1 = 100 Generalized Log Transform (G-Log)

16
P. J. Munson, National Institutes of Health, Nov. 2001Page 16 Quantile Normalization for AD (before)

17
P. J. Munson, National Institutes of Health, Nov. 2001Page 17 Quantile Normalization for AD (after)

18
P. J. Munson, National Institutes of Health, Nov. 2001Page 18 Normal Quantile Transform after GLog(AD) (it’s almost linear)

19
P. J. Munson, National Institutes of Health, Nov. 2001Page 19 Adaptive Transform of AD (TAD) - 1 Model variance (over many replicates) vs. mean AD Plot: Log(SD) or Wilson-Hilferty, SD^(2/3) transform vs. Mean of NQ(AD) Fit smooth function, g which predicts SD

20
P. J. Munson, National Institutes of Health, Nov. 2001Page 20 T(X) = Int(-inf,X,1/g) Adaptive Transform of AD (TAD) - 2

21
P. J. Munson, National Institutes of Health, Nov. 2001Page 21 Adaptive Transform of AD (TAD)

22
P. J. Munson, National Institutes of Health, Nov. 2001Page 22 Consistency Test p-values Time 2 vs. Time 0Time 1 vs. Time 0 Treatment Sham

23
P. J. Munson, National Institutes of Health, Nov. 2001Page 23 Results of Study 1 (5 time points, 2 treatments, 6 subjects)

24
P. J. Munson, National Institutes of Health, Nov. 2001Page 24 Probe Pair Data, Delta TAD = 2 Parallel Axis Coordinate Display

25
P. J. Munson, National Institutes of Health, Nov. 2001Page 25 Probe Pair Data Delta TAD = 0.5

26
P. J. Munson, National Institutes of Health, Nov. 2001Page 26 Probe Pair Data, Delta TAD = -1.5

27
P. J. Munson, National Institutes of Health, Nov. 2001Page 27 Probe Pair Data, Delta TAD = -0.5

28
P. J. Munson, National Institutes of Health, Nov. 2001Page 28 Acknowledgements Lynn Young, MSCL Vinay Prabhu, MSCL Jennifer Barb, MSCL Howard Shindel, MSCL Andrew Schwartz, CIT Steve Bailey, CIT Robert Danner, CC Anthony Suffredini, CC Peter Eichacker, CC James Shelhamer, CC Eric Gerstenberger, CC Sayed Daoud, NCI Yves Pommier, NCI John Weinstein, NCI David Krizman, NCI Alex Carlisle, NCI David Rocke, UC Davis

Similar presentations

OK

Practical Issues in Microarray Data Analysis Mark Reimers National Cancer Institute Bethesda Maryland.

Practical Issues in Microarray Data Analysis Mark Reimers National Cancer Institute Bethesda Maryland.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google