Example of Bayes’ Theorem Suppose a woman has had a single unplanned, unprotected sexual encounter. She takes a pregnancy test, and it is positive. What does she want to know? What is the probability that I am pregnant?
Example of Bayes’ Theorem (cont.) Let ‘B’ denote ‘pregnant’ and ‘A’ denote ‘positive pregnancy test.’ Suppose P(A|B) is.90, p(A|~B) is.50, and P(B) is.15. The marginal P(A) can be expressed as P(A|B)P(B)+ P(A|~B)P(~B) = (.90)(.15) + (.50)(.85) =.56. P(B|A) =.90*.15 /.56 =.24107.
Example of Bayes’ theorem (cont.) So, on the basis of this one positive result, there is about a 1 in 4 chance that she is pregnant. Note that the probabilities used in this example are not accurate, and the problem is oversimplified.
Example of Bayes’ Theorem (cont.) Not a very satisfying answer. Solution: retest. Now, our prior probability of pregnancy is P(B) =.24107. We repeat and get another positive. P(A) = (.90)(.24107) + (.50)(.75893) =.59643 P(B|A) =.90*.24107/.59643 =.36377.
Example of Bayes’ Theorem (cont.) If she repeats this and continues to get positive results, her probabilities of pregnancy are: test 3 =.507, test 4 =.649, test 5 =.769, test 6 =.857, test 7 =.915, test 8 =.951, test 9 =.972, and test 10 =.984. Each time she adds a new test (= new data), her posterior probability of being pregnant changes.
Bayesian inference The basic idea of Bayesian inference is to apply Bayes’ theorem to the relationship between data and our prior beliefs about parameters. In the example, the parameter of interest was P(pregnant). We updated our prior belief on the basis of each subsequent test result (data).
Bayesian inference (cont.) P(A|B) is the density of the data (proportional to the likelihood). P(B) is our prior belief about the parameters. P(B|A) is our updated belief about the parameters, given the observed data. The updated belief is called the posterior distribution.
The likelihood function Joint densities (data, given parameters). View the joint density as a function of parameters given the data likelihood function. Traditional use of the likelihood function: maximum likelihood estimation.
Properties of maximum likelihood estimates (review) Maximum likelihood estimators are often biased. Minimum variance estimators. Likelihood ratio testing.
Prior and posterior distributions We have already defined the prior as a belief about the distribution of the parameter(s). Non-informative (vague) priors are used when we don’t have strong beliefs about the parameters. The posterior distribution is a statement of our belief about the parameters, updated to account for the evidence of the data.
Prior and posterior distributions (cont.) A conjugate prior is one chosen to produce a posterior that has the same form as the likelihood function. Examples: normal-normal, beta-binomial.
Bayesian estimation Bayes estimates are based on the posterior distribution. Often, the mean of a parameter’s posterior distribution is used as an estimate of the parameter. The math for that can become very difficult. Sometimes, the mode of the posterior is used instead (Bayes modal estimation).
Bayesian estimation (cont.) The maximum likelihood estimator may be thought of as a Bayes modal estimator with an uninformative prior. Modern computing power can remove the need for the nasty math traditionally needed for Bayesian estimation and inference. This makes the Bayesian approach more accessible than it once was.
Bayesian inference Bayesian inference involves probabilistic statements about parameters, based on the posterior distribution. Probabilistic statements are allowed because Bayesians view the parameters as random variables. For example, a Bayesian credibility interval allows us to make the kind of statement we wish we could make when we use confidence intervals.
Bayesian inference (cont.) In the Bayesian approach, one can discuss the probability that a parameter is in a particular range by calculating the area under the posterior curve for that range. For example, I might be able to make the statement that the probability mu exceeds 110 is.75. That sort of statement is never possible in frequentist statistics.
Bayesian inference (cont.) Bayesian inference does not involve null hypotheses. (Formally, the null hypothesis is known to be false if we take the Bayesian perspective. Why?) Rather, we make probabilistic statements about parameters. We can also compare models probabilistically.
An example using the Peabody data Suppose we are interested in estimating the mean Peabody score for a population of 10-year-old children. We have strong prior reasons to believe that the mean is 85. We operationalize that prior belief by stating that ~ N(85, 4).
Peabody example (cont.) Next, we assume that Peabody itself is normally distributed:
Peabody example (cont.) Recall that we want a posterior distribution for . Bayes’ theorem says Note that we can ignore the denominator here, as it is just a scaling constant.
Peabody example (cont.) Our posterior, then, is proportional to Some unpleasant algebra that involves completing the square shows that this is the same as normal with mean = (85 2 +4nM) / (4n + 2 ).
Peabody example (cont.) The variance of the posterior is 4 2 / (4n + 2 ). In our example, M = 81.675, an estimate of the variance is 119.2506, and n = 40. The posterior mean, then, is (85 119.2506 + 4 40 81.675) / (4 40 + 119.2506) = 83.095.
Peabody example (cont.) The variance is 4 119.2506 / (4 40 + 119.2506) = 1.708. A 95% credibility interval is given by 83.095 ± 1.96 √1.708 = (80.53, 85.66). As Bayesians, we may say that the probability that lies between those values is.95.
Peabody example (cont.) Now let’s suppose that we want to repeat the analysis, but with an uninformative prior for the mean. Instead of ~ N(85, 4), we’ll use N(85, 10000000). The posterior distribution of the mean, then, is centered at (85 119.2506 + 10000000 40 81.675) / (10000000 40 + 119.2506) = 81.675.
Peabody example (cont.) The variance is 10000000 119.2506 / (10000000 40 + 119.2506) = 2.98126. A Bayesian credibility interval for the mean, then, would 81.675 ± 1.96√2.98126 = (78.29, 85.06). Although this is identical to the confidence interval we would calculate using frequentist maximum likelihood, we are justified in giving it a Bayesian interpretation.