Presentation is loading. Please wait.

Presentation is loading. Please wait.

MCMC Output & Metropolis-Hastings Algorithm Part I

Similar presentations


Presentation on theme: "MCMC Output & Metropolis-Hastings Algorithm Part I"— Presentation transcript:

1 MCMC Output & Metropolis-Hastings Algorithm Part I
P548: Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/18/2017: Lecture 03-2 Note: This Powerpoint presentation may contain macros that I wrote to help me create the slides. The macros aren’t needed to view the slides. You can disable or delete the macros without any change to the presentation.

2 Outline Metropolis-Hastings (M-H) algorithm – one of the main tools for approximating a posterior distribution by means of Markov-Chain-Monte-Carlo (MCMC). M-H draws samples from the posterior distribution. With enough samples, you have a good approximation to the posterior distribution. This is the central step in computing a Bayesian analysis. Today's lecture is just a quick overview; we will look at the details in a later lecture. # Psych 548, Miyamoto, Win '17

3 Outline Assignment 3 focuses on R-code for computing the posterior of a binomial parameter by means of the M-H algorithm. The code in Assignment 3 is almost entirely due to Kruschke, but JM has made a few modifications plus added annotations. In actual research, you will almost never execute the M-H algorithm within R. Instead, you will send the problem of sampling from the posterior over to JAGS or Stan. Nevertheless, it is useful to see the details of M-H on a simple example, and this is what you do in Assignment 3. General Strategy of Bayesian Statistical Inference Psych 548, Miyamoto, Win '17

4 Three Strategies of Bayesian Statistical Inference
Define the Class of Statistical Models (Reality is Assumed to Lie within this Class of Models Define Likelihoods Conditional on Parameters Define Prior Distributions Data Compute Posterior from Conjugate Priors (if possible) Compute Posterior with Grid Approximation (if practically possible) Compute Approximate Posterior by MCMC Algorithm (if possible) Psych 548, Miyamoto, Win '17 Same Slide – Summary Representation

5 General Strategy of Bayesian Statistical Inference
Define the Class of Statistical Models (Reality is Assumed to Lie within this Class of Models Define Likelihoods Conditional on Parameters Define Prior Distributions Data Compute Posterior from Conjugate Priors (if possible) Compute Posterior with Grid Approximation (if practically possible) Compute Approximate Posterior by MCMC Algorithm (if possible) This lecture Psych 548, Miyamoto, Win '17 Illustrate Idea of Sampling from a Posterior Distribution

6 MCMC Algorithm Samples from the Posterior Distribution
BIG NUMBER Validity of the MCMC Approximation Psych 548, Miyamoto, Win '17

7 Validity of the MCMC Approximation
Theorem: Under very general mathematical conditions: As sample size K gets very large, the distribution in the sample converges to the true posterior probability distribution. Look Closely at Bayes Rule Psych 548, Miyamoto, Win '17

8 Reminder About Bayes Rule
Before computing a Bayesian analysis, the researcher knows: θ = (θ1, θ2, ..., θn) is a vector of parameters for a statistical model. E.g., in a oneway anova with 3 group, θ1 = mean 1, θ2 = mean 2, 3 = mean 3, and 4 = the common variance each of the 3 populations. P(θ) = the prior probability distribution over the vector θ. P(D | θ) = the likelihood of the data D given any particular vector θ of parameters. Bayes Rule: Known for each specific θ Known for each specific θ Known for each specific θ Unknown for the entire distribution Unknown Same Slide w-o Red Annotation Psych 548, Miyamoto, Win '17

9 Bayes Rule Before computing a Bayesian analysis, the researcher knows:
θ = (θ1, θ2, ..., θn) is a vector of parameters for a statistical model. E.g., in a oneway anova with 3 group, θ1 = mean 1, θ2 = mean 2, 3 = mean 3, and 4 = the common variance each of the 3 populations. P(θ) = the prior probability distribution over the vector θ. P(D | θ) = the likelihood of the data D given any particular vector θ of parameters. Bayes Rule: Why Is Bayes Rule Hard to Compute in Practice? Psych 548, Miyamoto, Win '17

10 Why Is Bayes Rule Hard to Apply in Practice?
Fact #1: P(D | θ) is easy to compute for individual cases. Fact #2: P(θ) is easy to compute for individual cases. Metropolis-Hastings Algorithm uses Facts #1 and #2 to compute an approximation to P( θ | D) where Reminder: Each Step of Metropolis-Hastings Algorithm . Depends only on Immediately Preceding Step. Psych 548, Miyamoto, Win '17

11 Reminder: Each Sample from the Posterior Depends only on the Immediate Preceding Step
* See ‘b:\ab\john\p548\nts\eqs.template.docm’ for a template for these equations BIG PICTURE: Metropolis-Hastings Algorithm Psych 548, Miyamoto, Win '17

12 BIG PICTURE: Metropolis Hastings Algorithm
At the k-th step, you have a current vector of k parameters. This is your current sample. A “proposal function” F proposes a random new vector based only on the values in Iteration k: A “rejection rule” decides whether the proposal is acceptable or not. If it is acceptable: Iteration k + 1 = Proposal k If it is rejected: Iteration k + 1 = Iteration k Repeat the process at the next step. Metropolis-Hastings: The Proposal Density Psych 548, Miyamoto, Win '17

13 Metropolis-Hastings (M-H) Algorithm The “Proposal” Density
Notation: Let be a vector of specific values for θ1, θ2, ..., θn that make up the k-th sample. Choose a "proposal" density F(θ | θk ) where for each , F(θ | θk ) is a probability distribution over the θ  Ω = the set of all parameter vectors. Example: F(θ | θk ) might be defined by: θ1 ~ N(θk1,  = 2), θ2 ~ N(θk2,  = 2) , ...., θn ~ N(θkn,  = 2). Psych 548, Miyamoto, Win '17 Case of M-H Algorithm With Symmetric Proposal Function

14 MH Algorithm for Case Where Proposal Function is Symmetric
Step 1: Assume that θk is the current value of θ: Step 2: Draw a candidate θc from F(θ | θk ), i.e. θc ~ F(θ | θk ) Step 3: Compute the posterior odds: Step 4: If R ≥ 1, set θk+1 = θc. If R < 1, draw u ~ Uniform(0, 1). ▪ If R ≥ u, set θk+1 = θc ▪ If R < u, set θk+1 = θk. Step 5: Set k = k+1, and return to Step Continue this process until you have a very large sample of θ. Now we have finished choosing Psych 548, Miyamoto, Win '17 Closer Look at Steps 3 and 4

15 Closer Look at Steps 3 & 4 Step 3: Compute the posterior odds:
θc = “candidate” sample θk = previously accepted k-th sample Step 4: If R ≥ 1.0, set θk+1 = θc. If R < 1.0, draw random u ~ Uniform(0, 1). ▪ If R ≥ u, set θk+1 = θc ▪ If R < u, set θk+1 = θk. If P( θc | D) ≥ P( θk | D), then R ≥ 1.0, so it is certain that θk+1 = θc. If P( θc | D) < P( θk | D), then R is the probability that θk+1 = θc. Conclusion: The MCMC chain tends to jump towards high probability regions of the posterior, but it can jump to low probability regions. Return to Slide Showing the Metropolis-Hastings Algorithm Psych 548, Miyamoto, Win '17

16 MH Algorithm for Case Where Proposal Function is Symmetric
Step 1: Assume that θk is the current value of θ: Step 2: Draw a candidate θc from F(θ | θk ), i.e. θc ~ F(θ | θk ) Step 3: Compute the posterior odds: Step 4: If R ≥ 1, set θk+1 = θc. If R < 1, draw u ~ Uniform(0, 1). ▪ If R ≥ u, set θk+1 = θc ▪ If R < u, set θk+1 = θk. Step 5: Set k = k+1, and return to Step Continue this process until you have a very large sample of θ. Now we have finished choosing Psych 548, Miyamoto, Win '17 Go to Handout on R-code for Metropolis-Hastings - END

17 Wednesday, January 18, 2017: The Lecture Ended Here
Psych 548, Miyamoto, Win '17


Download ppt "MCMC Output & Metropolis-Hastings Algorithm Part I"

Similar presentations


Ads by Google