Presentation is loading. Please wait.

Presentation is loading. Please wait.

Markov-Chain-Monte-Carlo (MCMC) & The Metropolis-Hastings Algorithm P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/19/2016:

Similar presentations


Presentation on theme: "Markov-Chain-Monte-Carlo (MCMC) & The Metropolis-Hastings Algorithm P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/19/2016:"— Presentation transcript:

1 Markov-Chain-Monte-Carlo (MCMC) & The Metropolis-Hastings Algorithm P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/19/2016: Lecture 03-1 Note: This Powerpoint presentation may contain macros that I wrote to help me create the slides. The macros aren’t needed to view the slides. You can disable or delete the macros without any change to the presentation.

2 Outline Overview of the JAGS approach to Bayesian computation Metropolis-Hastings Algorithm - a basic tool for approximating a posterior distribution. ♦ This is the central step in computing a Bayesian analysis. Psych 548: Miyamoto, Win ‘16 2

3 Outline Metropolis-Hastings Algorithm - a basic tool for approximating a posterior distribution. ♦ This is the central step in computing a Bayesian analysis. coda.samples and mcmc.list objects: ♦ Understanding the structure of the output that is returned by the coda.samples function. Psych 548, Miyamoto, Win '16 3 #

4 Metropolis-Hastings Algorithm

5 Outline Metropolis-Hastings (MH) algorithm – one of the main tools for approximating a posterior distribution by means of Markov-Chain- Monte-Carlo (MCMC) General idea behind MH algorithm Kruschke's example for computing the posterior of a binomial parameter by means of MH algorithm Psych 548, Miyamoto, Win '16 5 General Strategy of Bayesian Statistical Inference

6 Three Strategies of Bayesian Statistical Inference Psych 548, Miyamoto, Win '16 6 Define Prior Distributions Define Likelihoods Conditional on Parameters Data Compute Posterior from Conjugate Priors (if possible) Compute Approximate Posterior by MCMC Algorithm (if possible) Compute Posterior with Grid Approximation (if practically possible) Define the Class of Statistical Models (Reality is Assumed to Lie within this Class of Models Same Slide – Summary Representation

7 Psych 548, Miyamoto, Win '16 General Strategy of Bayesian Statistical Inference 7 Define Prior Distributions Define Likelihoods Conditional on Parameters Data Compute Posterior from Conjugate Priors (if possible) Compute Approximate Posterior by MCMC Algorithm (if possible) Compute Posterior with Grid Approximation (if practically possible) Define the Class of Statistical Models (Reality is Assumed to Lie within this Class of Models Illustrate Idea of Sampling from a Posterior Distribution

8 MCMC Algorithm Samples from the Posterior Distribution Psych 548, Miyamoto, Win '16 8 Validity of the MCMC Approximation BIG NUMBER

9 Validity of the MCMC Approximation Theorem:Under very general mathematical conditions: as sample size K gets very large, the sample distribution converges to the true posterior probability distribution. Psych 548, Miyamoto, Win '16 9 Look Closely at Bayes Rule

10 The true stat model belongs to an infinite class of models. The true stat model is characterized by a vector of parameters, θ = ( θ 1, θ 2,..., θ n ). o E.g., in a twoway normal model: θ = (θ 1, θ 2, θ 3, θ 4 ), where θ 1 = mean of pop 1, θ 2 = mean of pop 2, θ 3 = variance of pop 1, θ 4 = variance of pop 2 Psych 548, Miyamoto, Win '16 10 Before computing a Bayesian analysis, the researcher knows: Continue This Slide with Additional Bullet Points

11 The true stat model belongs to an infinite class of models. The true stat model is characterized by a vector of parameters, θ = ( θ 1, θ 2,..., θ n ). P( θ ) = the prior probability distribution over the vector θ. P(D | θ ) = the likelihood of the data D given a specific θ. Bayes Rule: Psych 548, Miyamoto, Win '16 11 Before computing a Bayesian analysis, the researcher knows: Why Is Bayes Rule Hard to Compute in Practice? Known No simple math formula Unknown

12 Before computing a Bayesian analysis, the researcher knows: θ = ( θ 1, θ 2,..., θ n ) is a vector of parameters for a statistical model. o E.g., in a oneway anova with 3 group, θ 1 = mean 1, θ 2 = mean 2,  3 = mean 3, and  4 = the common variance each of the 3 populations. P( θ ) = the prior probability distribution over the vector θ. P(D | θ ) = the likelihood of the data D given any particular vector θ of parameters. Bayes Rule: Psych 548, Miyamoto, Win '16 12 Bayes Rule Why Is Bayes Rule Hard to Compute in Practice?

13 Fact #1: is easy to compute for individual cases, but hard to compute for an entire distribution. Metropolis-Hastings Algorithm uses Fact #1 to compute an approximation to P(θ | D) where Why Is Bayes Rule Hard to Apply in Practice? Psych 548, Miyamoto, Win '16 13 Reminder: Each Step of Metropolis-Hastings Algorithm Depends only on Immediately Preceding Step.

14 Reminder: Each Sample from the Posterior Depends only on the Immediate Preceding Step Psych 548, Miyamoto, Win '16 14 BIG PICTURE: Metropolis-Hastings Algorithm

15 BIG PICTURE: Metropolis Hastings Algorithm At the k-th step, you have a current vector of k parameters. This is your current sample. A “proposal function” F proposes a random new vector based only on the values in Iteration k: A “rejection rule” decides whether the proposal is acceptable or not. ♦ If it is acceptable:Iteration k + 1 = Proposal k ♦ If it is rejected:Iteration k + 1 = Iteration k Repeat the process at the next step. Psych 548, Miyamoto, Win '16 15 Metropolis-Hastings: The Proposal Density

16 Psych 548, Miyamoto, Win '16 16 Metropolis-Hastings (M-H) Algorithm The “Proposal” Density Notation: Let be a vector of specific values for θ 1, θ 2,..., θ n that make up the k-th sample. Choose a "proposal" density F(θ | θ k ) where for any specific, F(θ | θ k ) is a probability distribution over the θ  Ω = the set of all parameter vectors. Example: F(θ | θ k ) might be defined by: θ 1 ~ N(θ k 1,  = 2), θ 2 ~ N(θ k 2,  = 2),...., θ n ~ N(θ k n,  = 2). Step-by-Step Explanation of Metropolis-Hastings Algorithm With Symmetric Proposal Function

17 Psych 548, Miyamoto, Win '16 17 MH Algorithm for Case Where Proposal Function is Symmetric Step 1: Choose starting values for θ: Step 2: Draw a candidate θ c from F(θ | θ k ), i.e. θ c ~ F(θ | θ k ) (Remember that F( θ | θ k ) is the proposal distribution which depends only on θ k. Step 3: Compute the posterior odds: Step 4: Draw u ~ Uniform(0, 1). If R > u, set θ k+1 = θ c. If R  u, set θ k+1 = θ k. Step 5: Set k = k+1, and return to Step 2. Continue this process until you have a very large sample of θ. Closer Look at Steps 3 and 4

18 Closer Look at Steps 3 & 4 (Assuming Symettric Proposal Fn) Step 3: Compute the posterior odds: θ c = “candidate” sample θ k = previously accepted k-th sample Step 4: Draw random u ~ Uniform(0, 1). If R > u, set θ k+1 = θ c. If R  u, set θ k+1 = θ k. ♦ If P( θ c | D) > P( θ k | D), then R > 1.0, so it is certain that set θ k+1 = θ c. ♦ If P( θ c | D) < P( θ k | D), then R is the probability that set θ k+1 = θ c. ♦ Conclusion: The MCMC chain tends to jump towards high probability regions of the posterior, but it can jump to low probability regions. Psych 548, Miyamoto, Win '16 18 Return to Slide Showing the Metropolis-Hastings Algorithm - END

19 Remainder of these Slides The remainder of these slides were added after class on 1/20/2016. They are pure of interest for students who want to delve deeper into the Metropolis-Hastings algorithm. Scott Lynch has a nice description of the Metropolis-Hastings algorithm, including the role of asymmetric proposal functions: Lynch, S. M. (2007). Introduction to applied bayesian statistics and estimation for social scientists. New York: Springer. Psych 548:, Miyamoto, Win ‘16 19 M-H with Asymmetric Proposal Functions

20 Psych 548, Miyamoto, Win '16 20 MH Algorithm for Case Where Proposal Function is Asymmetric When the proposal function is asymmetric, Step 3 must be modified but all other steps remain exactly the same. Kruschke only discusses the cases with a symmetric proposal function. This slide and the following slides shrink the font for Steps 1, 2, 4 and 5 because they are the same as before. Step 1: Choose starting values for θ: Step 2: Draw a candidate θ c from F(θ | θ k ), i.e. θ c ~ F(θ | θ k ) (Remember that F(  |  k ) is the proposal distribution which depends only on  k. Step 3: Compute the criterion R: Step 4: Draw u ~ Uniform(0, 1). If R > u, set θ k+1 = θ c. If R  u, set θ k+1 = θ k. Step 5: Set k = k+1, and return to Step 2. Continue this process until you have a very large sample of θ. Same Slide – Focus on Correction for Asymmetry Same as before Different

21 Psych 548, Miyamoto, Win '16 21 MH Algorithm for Case Where Proposal Function is Asymmetric Step 1: Choose starting values for θ: Step 2: Draw a candidate θ c from F(θ | θ k ), i.e. θ c ~ F(θ | θ k ) (Remember that F(  |  k ) is the proposal distribution which depends only on  k. Step 3: Compute the criterion R: Step 4: Draw u ~ Uniform(0, 1). If R > u, set θ k+1 = θ c. If R  u, set θ k+1 = θ k. Step 5: Set k = k+1, and return to Step 2. Continue this process until you have a very large sample of θ. Graphs that Contrast Symmetric & Asymmetric Proposal Functions What is this ratio? See next slide.

22 for Symmetric & Asymmetric Proposal Functions 22 Psych 548:, Miyamoto, Win ‘16 Symmetric, Equal Variance Asymmetric Proposal Functions Functions Note that heights are equal. Note that heights are unequal. Return to Slide with M-H Algorithm Steps

23 Psych 548, Miyamoto, Win '16 23 MH Algorithm for Case Where Proposal Function is Asymmetric Step 1: Choose starting values for θ: Step 2: Draw a candidate θ c from F(θ | θ k ), i.e. θ c ~ F(θ | θ k ) (Remember that F(  |  k ) is the proposal distribution which depends only on  k. Step 3: Compute the criterion R: Step 4: Draw u ~ Uniform(0, 1). If R > u, set θ k+1 = θ c. If R  u, set θ k+1 = θ k. Step 5: Set k = k+1, and return to Step 2. Continue this process until you have a very large sample of θ. Summary re Symmetric & Asymmetric Proposal Functions - END Correction for the asymmetry of the proposal function.

24 Summary re Proposal Functions Kruschke’s examples use a symmetric proposal function. Therefore the criterion R is computed as: For Psych 548, you don’t have to worry about asymmetric proposal functions – just be aware that they are possible and they influence how the M-H algorithm works. Psych 548:, Miyamoto, Win ‘16 24 END


Download ppt "Markov-Chain-Monte-Carlo (MCMC) & The Metropolis-Hastings Algorithm P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/19/2016:"

Similar presentations


Ads by Google