Presentation is loading. Please wait.

Presentation is loading. Please wait.

2-15-051 Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

Similar presentations


Presentation on theme: "2-15-051 Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point."— Presentation transcript:

1 2-15-051 Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point estimation Empirical Bayes estimation

2 2-15-052 Random Sample Random sample: A set of independently generated observations by the statistical model. For example, n replicated measurements of the differences in expression levels for a gene under two different treatments x 1,x 2,....,x n ~ iid N( ,  2 ) Given parameters, the statistical model defines the probability of observing any particular combination of the values in this sample Since the observations are independent, the probability distribution function describing the probability of observing a particular combination is the product of probability distribution functions

3 2-15-053 Probability distribution function vs probability In the case of the discrete random variables which have a countable number of potential values (can assume finitely many for now), probability density function is equal to the probability of each value (outcome) In the case of a continuous random variable which can yield any number in a particular interval, the probability distribution function is different from the probability Probability of any particular number for a continuous random variable is equal to zero Probability density function defines the probability of the number falling into any particular sub-interval as the area under the curve defined by the probability density function.

4 2-15-054 Probability distribution function vs probability Example: The assumption of our Normal model is that there the outcome can be pretty much any real number. This is obviously a wrong assumption, but it turns out that this model is a good approximation of the reality. We could "discretize" this random variable. Define r.v. y={1 if |x|>c and 0 otherwise} for some constant c This random variable can be assume 2 different values and the probability distribution function is define by p(y=1) Although the probability distribution function in the case of a continuous random variable does not give probabilities, it satisfies key properties of the probability.

5 2-15-055 Back to basics – Probability, Conditional Probability and Independence Discrete pdf (y) Continuous pdf (x) a b x For y 1,...,y n iid of p(y) For x 1,...,x n iid of f(x) From now on, we will talk in terms of just a pdf and things will hold for both discrete and continuous random variables

6 2-15-056 Expectation, Expected value and Variance Discrete pdf (y) Expectation of any function g of the random variable y (average value of the function after a large number of experiments Continuous pdf (x) Expectation of any function g of the random variable x a b x Expected value - average x after a very large number of experiments Expected value - average y after a very large number of experiments Variance - Expected value of (y-E(y)) 2 Variance - Expected value of (x-E(x)) 2

7 2-15-057 Expected Value and Variance of a Normal Random Variable Normal pdf a b x Expected value - average x after a very large number of experiments Variance - Expected value of (x-E(x)) 2

8 2-15-058 Maximum Likelihood x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates of the model parameters  and  2 are numbers that maximize the joint pdf for the fixed sample which is called the Likelihood function Likelihood function is basically the pdf for the fixed sample

9 2-15-059 Bayesian Inference Assumes parameters are random variables - key difference Inference based on the posterior distribution given data Prior Distribution Defines prior knowledge or ignorance about the parameter Posterior Distribution Prior belief modified by data

10 2-15-0510 Bayesian Inference Prior distribution of  Data model given  Posterior distribution of  given data (Bayes theorem) P(  >0|data)

11 2-15-0511 Bayesian Estimation Bayesian point-estimate is the expected value of the parameter under its posterior distribution given data In some cases, the expectation of the posterior distribution could be difficult to assess - easer to find the value for the parameter that maximized the posterior distribution given data - Maximum a Posteriori (MAP) estimate Since the numerator of the posterior distribution in the Bayes theorem is constant in the parameter, this is equivalent to maximizing the product of the likelihood and the prior pdf

12 2-15-0512 Alternative prior for the normal model Degenerate uniform prior for  assuming that any prior value is equally likely - this is clearly unrealistic - we know more than that MAP estimate for  is identical to the maximum likelihood estimate Bayesian point-estimation and maximum likelihood are very closely related

13 2-15-0513 Hierarchical Bayesian Models and Empirical Bayes Inference If we are not happy with pre-specifying  and  2, we can estimate them based on the "marginal" distribution of the data given  and  2 and plug them back into the formula for the Bayesian estimate - the result is the Empirical Bayes estimate x i ~ ind N(  i,  2 ), i=1,...,n, assume that variance is known Need to estimate  i, i=1,...,n The simplest estimate is Assuming that  i ~ iid N( ,  2 ), i=1,...,n

14 2-15-0514 Hierarchical Bayesian Models and Empirical Bayes Inference If x i ~ ind N(  i,  2 ),  i ~ iid N( ,  2 ), i=1,...,n, The "marginal" distribution of each x i, with  i 's "factored out" is N( ,  2 +  2 ), i=1,...,n Now we can estimate using say maximum likelihood and plug them back into the formula for the Bayesian estimates of  i 's

15 2-15-0515 Hierarchical Bayesian Models and Empirical Bayes Inference The estimates for individual means are "shrunk" towards the mean of all means Turns out such estimates are better overall than estimates based on the individual observations ("Stein effect") Individual observations from our model can be replaced with groups of observations x 1 i,x 2 i,...,x k i ~ ind N(  i,  2 ) Limma does the similar thing, only with variances Data for each gene i are assumed to be distributed as x 1 i,x 2 i,...,x k i ~ iid N(  i,  i 2 ), and the means are estimated in the usual way, while an additional hierarchy is placed on the variances describing how variances are expected to vary across genes:

16 2-15-0516 Hierarchical Bayesian Models and Empirical Bayes Inference Testing the hypothesis  i =0, by calculating the modified t-statistics


Download ppt "2-15-051 Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point."

Similar presentations


Ads by Google