# 1 Parametric Empirical Bayes Methods for Microarrays 3/7/2011 Copyright © 2011 Dan Nettleton.

## Presentation on theme: "1 Parametric Empirical Bayes Methods for Microarrays 3/7/2011 Copyright © 2011 Dan Nettleton."— Presentation transcript:

1 Parametric Empirical Bayes Methods for Microarrays 3/7/2011 Copyright © 2011 Dan Nettleton

2 Parametric Empirical Bayes Methods for Microarrays Kendziorski, C. M., Newton, M. A., Lan, H., Gould, M. N. (2003). On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Statistics in Medicine. 22, 3899-3914. Newton, M. A. and Kendziorski, C. M. (2003). Parametric empirical Bayes methods for microarrays. Chapter 11 of The Analysis of Gene Expression Data. Springer. New York.

3 The Gamma Distribution X ~ Gamma(α, β) f(x)= x α-1 e -βx for x>0. E(X)=α / β Var(X)= α / β 2 βαβα Г(α)Г(α) x density α=2 β=.00008 α=1 β=.0001 α=33 β=.0008 α=340 β=.006

4 A Model for the Data from a Two-Treatment Experiment Assume there are J genes indexed by j=1, 2,..., J. Data for gene j is x j = (x j1, x j2,..., x j I ) where x ji is the normalized measure of expression on the original scale for the j th gene and i th experimental unit. Let s 1 denote the subset of the indices {1,..., I } corresponding to treatment 1. Let s 2 denote the subset of the indices {1,..., I } corresponding to treatment 2.

5 The Model (continued) Assume that each gene is differentially expressed (DE) with an unknown probability p, and equivalently expressed (EE) with probability 1-p. If gene j is equivalently expressed, then x j1, x j2,..., x j I | λ j ~ Gamma( α, λ j ) with mean α / λ j, where λ j ~ Gamma( α 0, ν) i.i.d.

6 The Model (continued) If gene j is differentially expressed, then {x ji : i in s 1 } | λ j1 ~ Gamma( α, λ j1 ) with mean α / λ j1, where λ j1 ~ Gamma( α 0, ν), and {x ji : i in s 2 } | λ j2 ~ Gamma( α, λ j2 ) with mean α / λ j2, where λ j2 ~ Gamma( α 0, ν). All random variables are assumed to be independent. p, α, α 0, and ν are unknown parameters to be estimated from the data. i.i.d.

7 An example of how the model is imagined to generate the data for the j th gene. Suppose p=0.05, α=12, α 0 =0.9, and v=36. Generate a Bernoulli random variable with success probability 0.05. If the result is a success the gene is DE, otherwise the gene is EE. If EE, generate λ j from Gamma(α 0 =0.9, v=36). Then generate i.i.d. expression values from Gamma(α=12, λ j ).

8 If gene is EE... λjλj density Gamma(α 0 =0.9, v=36) Density density x Gamma(α=12, λ j =0.05) Density Expression values for the j th gene. Trt 1 and Trt 2

9 Example Continued If the gene is DE, generate λ j1 and λ j2 independently from Gamma(α 0 =0.9, v=36). Then generate treatment 1 expression values i.i.d. from Gamma(α=12, λ j1 ), and generate treatment 2 expression values i.i.d. from Gamma(α=12, λ j2 ).

10 density x Gamma(α=12, λ j1 =0.02) Gamma(α=12, λ j2 =0.07) If gene is DE... λ density Gamma(α 0 =0.9, v=36) Density

11 density x Gamma(α=12, λ j1 =0.02) Gamma(α=12, λ j2 =0.07) If gene is DE... λ density Gamma(α 0 =0.9, v=36) Density Trt 1 Data Trt 2 Data

12 Joint Density of x j for an EE Gene

13 Joint Density of x j for an EE Gene (continued) f EE (x j )

14 Joint Density for a DE Gene where = the number of treatment k observations. f DE (x j ) =

15 Marginal Density for Gene j f(x j ) = p f DE (x j ) + (1-p) f EE (x j ) Marginal Likelihood for the Observed Data f(x 1 ) f(x 2 ) f(x J )... Use the EM algorithm to find values of p, α, α 0, and v that make the log likelihood as large as possible.

16 The posterior probability of differential expression for gene j is obtained by replacing p, α, α 0, and v in p f DE (x j ) + (1-p) f EE (x j ) p f DE (x j ) with their maximum likelihood estimates. Software for EBArrays is available at http://www.biostat.wisc.edu/~kendzior.

17 Extension to Multiple Treatment Groups If there are 3 treatment groups, each gene can be classified into 5 categories rather than just the two categories EE and DE: a) 1=2=3 b) 1=2≠3 c) 1≠2=3 d) 1=3≠2 e) 1≠2, 2≠3, 1≠3. Extensions to more than 3 groups can be handled similarly.

18 Potential Drawbacks Coefficient of variation is assumed constant across gene-treatment combinations. This is analogous to assuming constant error variance across all gene-treatment combinations in the analysis of log-scale expression data. Between-gene difference are assumed to have the same distribution as within-gene between- treatment differences for differentially expressed genes.

19 Coefficient of Variation is Constant across Gene-Treatment Combinations Coefficient of Variation = CV = sd / mean Conditional on the mean for a gene-treatment combination, say α / λ jk, the CV for the expression data is the CV of Gamma(α, λ jk ). CV of Gamma(α, λ jk ) is (α 1/2 /λ jk )/(α/λ jk )=1/α 1/2. Note that α is assumed to be the same for all gene-treatment combinations.

20 Between-gene diffs = within-gene between-trt diffs In real data, differences between genes tend to be larger than treatment differences within a DE gene. Thus, assuming the same distribution for both types of differences leads to conservative inferences. d

Download ppt "1 Parametric Empirical Bayes Methods for Microarrays 3/7/2011 Copyright © 2011 Dan Nettleton."

Similar presentations