Presentation is loading. Please wait.

Presentation is loading. Please wait.

Differential Expressions Bayesian Techniques Lecture Topic 8.

Similar presentations


Presentation on theme: "Differential Expressions Bayesian Techniques Lecture Topic 8."— Presentation transcript:

1 Differential Expressions Bayesian Techniques Lecture Topic 8

2 Why Bayes? A friend of mine who is Bayesian said the following when asked this question: Some problems very hard to solve by classical techniques e.g. Behrens-Fisher problem Every new problem requires a new solution Bayes provides a coherent path

3 The Frequentist Paradigm Probability refers to a limiting relative frequency. Probability are OBJECTIVE properties in the real world. Parameters are fixed unknown constants, NO probability statement is possible about a parameter. Statistical procedures should be designed to have well- defined LONG-RUN frequency properties. For example a 95% confidence interval should trap the true value of the parameter with a limiting frequency of 95%.

4 Bayesian Philosophy Probability describes a DEGREE OF BELIEF not a relative frequency. As such you can make probability statements about anything, not just data We CAN make probability statement about parameters even if they are fixed constants. We make inferences about a parameter by producing its probability distributions. Inferences such as point or interval estimation maybe extracted from the probability distribution of the parameter.

5 The Contrasts According to Larry Wasserman: “Bayesian inference is a controversial approach as it embraces a subjective notion of probability”. In general Bayesian methods have NO guarantees for long run performance.

6 Advantages of Bayesian Methods Provide ability to formally incorporate prior information Inference conditional on actual data (not what might have been) More easily interpretable by non-specialists (e.g. confidence intervals) All analyses follow directly from posterior distribution Stopping Rule does not affect Inference Any question can be directly answered ex. bioequivalence –H0: θ0 ≠ θ0 –H1: θ0 = θ1 ■ Reverse role of null and alternative ■ Hard to use traditional testing methods in Bayes easy

7 Disadvantages Initial Bayesians were subjectivist Results not “objective,” could be manipulated to yield any desired result How to set the prior in general? Computationally difficult Need to evaluate complex integrals even for simple problems Need inexpensive high speed computing

8 How Bayesian Method Works Choose a probability density f(  ) – called the PRIOR distribution - that expresses our beliefs about a parameter BEFORE we see any data. We choose a statistical model f(x|  ) that reflects our beliefs our x given . Here we write it as f(x |  ) NOT f(x;  ) in the frequentist world. After OBSERVING the data X 1, …, X n, we update our belief in the parameter and calculate the posterior distribution f(  | x). It essentially uses the Bayes theorem to calculate the posterior distribution.

9 Bayes Theorem: Discrete Version A Simple Probability Result Let B 1,B 2... B n disjoint sets P(B k ) > 0, all k, P(B1 U B2... U Bn) = 1 (Mutually exclusive and exhaustive) For any event A P(B j |A) = P(B j )P(A|B j )/  P(B k )P(A|B k )

10 EXAMPLE: Disease incidence in population – P(D)=0.001 Diagnostic test –false positive rate 0.05, P(+|not D) = 0.05 –false negative rate 0.01, P(-|D) = 0.01 If Person drawn at random tests +, What is probability he has disease, D?

11 Comment Hence, probability that you HAVE the disease given that you have TESTED positive is still pretty LOW, even with very small FALSE POSITIVES and FALSE NEGATIVES. This rule is very useful in numerous other situations.

12 Bayes Theorem: The Continuous Version Let f(  be our prior distribution (density) for our parameter  Suppose we have the data X 1, …, X n, with density f(X 1, …, X n |  also written as L n (X,  )

13 Some Simplifications The denominator is sometimes very hard to deal with, since the integration over the parameters is not trivial. We call that the normalizing constant. And in most cases don’t explicitly evaluate it. And we use the idea that:

14 Bayes’ Idea Think of a model for data y 1,..., y n f(y 1,..., y n |θ ) e.g. Normal, Binomial, etc. θ random with prior density g(.) Bayes Rule says that: p (θ| y 1,..., y n ) = g (θ) f (y 1,..., y n |θ) Hence, the posterior is proportional to probability of prior multiplied by probability of data given the parameter.

15 Hypothesis Testing: Classical vs. Bayesian Classical: Set up null, alternative hypotheses, perform a test, calculate a p-value, reject or fail to reject the null Bayesian: Inference based on posterior distribution, p(θ|y 1,..., y n ) Consider evidence in favor of certain parameter values Data as well as prior beliefs influence inference

16 Major Challenge 1: Setting Priors Approaches Subjective - based on beliefs of individual, expert, etc. issues: – how to do in practice? –-people inconsistent – elicitation can help Non-informative - based on “prior ignorance” about parameter issues: – often hard to define – may lead to improper posteriors – sensitive to parameterization

17 Setting Priors: Conjugate Priors Conjugate priors are priors so that combined with the model the posterior will have a KNOWN distribution. issues: –choice of convenience –avoids computational problems –exists only for limited families Example: y ~ Bin(n,θ), θ ~ Beta(α,β) then p(θ|y) Beta(α+y,β+n-y) Normal conjugate is Normal for location Poisson conjugate is Gamma Inverse Gamma is often used as a prior for Normal  2. Generally all members of the Exponential Families have conjugate priors.

18 Setting Priors: Non-informative Assuming we have no REAL information about the parameter, we can model it with a “non-informative” prior. For example if  i is discrete we can think of –P(  i ) =1/n for i= 1…n If we know an interval (a,b) in which  lies, we can define –Prior as P(  ) = 1/(b-a) a <  < b. We can also define –P(  ) = c, c > 0. (improper Prior, since its not a pdf).

19 Setting Priors: Jeffery’s Prior Uniform non-informative priors are criticized since they do not lend themselves to transformation. Jeffery’s Prior is often used, that IS invariant under transformation. P(  ) = [I(  )] 1/2, I: information matrix

20 Major Challenge II: Computation Need to evaluate complicated high dimensional integrals Lots of technology developed in last 20-25 years Approaches Earliest solutions: approximations and numerical integration Noniterative Monte Carlo: direct sampling, indirect sampling (importance, rejection) Markov Chain Monte Carlo (MCMC): Gibbs sampling, Metropolis- Hastings algorithm, hybrid methods... MCMC most popular and can be implemented in high dimensional situations.

21 Simple Example

22 Simple Example contd… Posterior mean is weighted average of prior mean and data mean ■ Sample average is shrunk toward prior mean ■ Weight depends on relative variability of prior and data Posterior precision is sum of prior precision and data precision Samples from posterior are easy to get given data, σ², μ, τ²

23 Lessons from Example General principle: posterior is compromise between prior and data μ and τ² not known ■ Empirical Bayes: estimate μ and τ² ■ Hierarchical Bayes: put prior on μ and τ² as well

24 Bayesian Hypothesis Testing The idea is due to Jefferys (1961). Idea: Based on the data that each hypothesis is supposed to predict, one applies Bayes’ Theorem and computes the posterior probability that the first hypothesis is correct. UNLIKE Classical methods the hypothesis DO NOT have to be nested within each other.

25 Mechanics of Bayesian Hypothesis Testing Lets consider we have two hypothesis H 0 and H 1 (the Bayesians prefer to use the word “models” as opposed to hypothesis, but we will keep “hypothesis” to be consistent with the classical ideas). Let H 0 and H 1 be two hypotheses concerning the data Y, and let  0 and  1 be the associated parameters. We define  i (  i ) as the corresponding priors. Let f i (y |  i ) be the corresponding marginal distributions. We can use Bayes’ Theorem to calculate, P(  i |y) the posteriors. Bayes’ hypothesis testing consists of finding the following and using pre-specified cut-offs for decisions: –B=[P(  0 |y)/P(  1 |y)]/[P(  0 )/P(  1 )] (Bayes’ Factor) –P(  0 | Y=y), P(  0 | Y>=y) (Bayesian p-values)

26 Bayesian Hypothesis Tests in Microarrays Let H g1 : gene is differentially expressed H g0 : gene is not differentially expressed Traditional Bayesians would write this as

27 Method 1 Differential Expression Score Use t-statistic or Wilcoxon Rank sum statistic, z g Then Calculate P(H 0 | z g =z) or P(H 0 | z g z) or P(v g =0 | z g =z) or P(v g =0 | z g z) McClure and Wit (2004) show that the second term is identical to using the FDR method for controlling error.

28 Fully Bayesian Analysis In general we are interested in: The term given below where p 0 is the fraction of inactive genes in the array, F 0 is the distribution under the null hypothesis, v=0, F is the distribution of the test statistic

29 Bayesian t test The t statistic is given by: Assume: z g |{v g =0} ~ N(0,  0 2 ) z g |{v g =1} ~ N(0,  1 2 ) Hence, z g ~ (1-p 1 ) N(0,  0 2 )+ p 1 N(0,  1 2 )

30 Bayesian t test: Priors p 1 ~ Uniform(0,1) v g ~ Bernoulli (p 1 )  0 2 ~ Gamma( ,  ),  1 2 ~ Gamma( ,  ),  ~ Gamma( ,  ),  ~ Gamma( ,  ),   = (v, p1,  0 2, , ,  1 2, , , , ,  )  These are all conjugate priors to make the calculations easier.  One uses the Gibbs sampler to simulate from P(  | z) to estimate p1,  0 2,  1 2 to calculate the required probability.

31 Gibbs Sampler It is used to calculate the poster mean. It does not calculate P(  |y) explicitly. It simulates draws from this distribution. Using sample summaries we get a good idea of the joint posterior as well as the marginal distribution of interest P(v| y). It samples from the distribution of P(  i |  -i,y), until it converges to a stationery distribution. This is called “burn-in”. After burn-in each draw of  is a draw from a posterior distribution. Bayes Theorem states that the conditional distribution of P(  i |  -i,y) is proportional to the likelihood of the prior, P(y|  )P(  ) as a function of  i. If the marginal distributions without the specific component is defined (generally using conjugate priors) this procedure can be applied easily.

32 Empirical Bayes Idea The prior distributions depend upon unknown parameters which in turn may need a second or higher stage prior in some hierarchical setting. But at some point we HAVE to specify all remaining parameters of the hyper-prior. In other words we HAVE to use our knowledge to specify our prior. The Empirical Bayes method uses sample data to estimate the parameters for the final stage prior. The idea is if we are interested in  |y, let q ~ P(  1),  1~P(  2)…  L-1 ~P(  L ). In the empirical Bayes idea we use the data to estimate the parameter  L obtained as the value that maximizes the marginal likelihood P(Y|  L ). We replace the estimate of  L in the priors, and the posterior distribution is now P(  |y,est-  L ).

33 Empirical Bayes’ Idea in Differential Expression Average log fold change. Problem: non DE genes with large variances have too much chance of being selected. t-statistics Problem: apparently DE genes with very small sample variances are suspect. Moderated t-statistics A happy compromise between the two above, an empirical Bayes estimate, using data to estimate the new se, s g. Generally

34 The moderated t statistic Smoothed standard deviations: shrink towards Eliminates large t-statistics due merely to very small s values,and reduces the impact of very large s values.

35 EB Idea Posterior odds (for DE) Posterior probability of differential expression for any gene is A monotonic function of t˜ 2 for constant d.

36 Estimating hyper-parameters Closed form estimators with good properties are available: for s 0 and d 0 in terms of the first two moments of log s 2. for c 0 in terms of quantiles of the | t˜ g |. Nowadays the EB estimate is used most often for differential expressions and the genes are ranked by the EB estimates. Instead of doing strict Error Control, the top g genes are looked at using EB estimates for ranking purposes. Sometimes | t˜ g | >4 is used as an empirical cut-off. Limma in R uses empirical Bayes estimates for looking at which genes are differentially expressed.


Download ppt "Differential Expressions Bayesian Techniques Lecture Topic 8."

Similar presentations


Ads by Google