MCMC Output & Metropolis-Hastings Algorithm Part I

Slides:

Advertisements

Similar presentations

Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis.

Advertisements

Introduction to Monte Carlo Markov chain (MCMC) methods

MCMC estimation in MlwiN

AP Statistics: Section 10.1 A Confidence interval Basics.

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

Bayesian Estimation in MARK

CHAPTER 16 MARKOV CHAIN MONTE CARLO

Bayesian statistics – MCMC techniques

BAYESIAN INFERENCE Sampling techniques

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.

Inference about a Mean Part II

Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.

ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:

Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.

Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.

- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.

Tracking Multiple Cells By Correspondence Resolution In A Sequential Bayesian Framework Nilanjan Ray Gang Dong Scott T. Acton C.L. Brown Department of.

MCMC (Part II) By Marc Sobel. Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative.

An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.

Lecture #9: Introduction to Markov Chain Monte Carlo, part 3

CS 188: Artificial Intelligence Bayes Nets: Approximate Inference Instructor: Stuart Russell--- University of California, Berkeley.

Reducing MCMC Computational Cost With a Two Layered Bayesian Approach

Bayesian Statistics, Modeling & Reasoning What is this course about? P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/04/2016:

Bayes Theorem, a.k.a. Bayes Rule

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Stat 31, Section 1, Last Time Distribution of Sample Means –Expected Value  same –Variance  less, Law of Averages, I –Dist’n  Normal, Law of Averages,

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Markov-Chain-Monte-Carlo (MCMC) & The Metropolis-Hastings Algorithm P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/19/2016:

Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.

Daphne Koller Sampling Methods Metropolis- Hastings Algorithm Probabilistic Graphical Models Inference.

SIR method continued. SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For.

Computing with R & Bayesian Statistical Inference P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/11/2016: Lecture 02-1.

CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.

Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.

Markov Chain Monte Carlo in R

Bayesian Statistics, Modeling & Reasoning What is this course about?

Advanced Statistical Computing Fall 2016

Use of Pseudo-Priors in Bayesian Model Comparison

Set Up for Instructor MGH Display: Try setting your resolution to 1024 by 768 Run Powerpoint. For most reliable start up: Start laptop & projector before.

Jun Liu Department of Statistics Stanford University

CS 4/527: Artificial Intelligence

Markov Chain Monte Carlo (MCMC)

Bayesian inference Presented by Amir Hadadi

Markov Chain Monte Carlo

Markov chain monte carlo

Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.

CAP 5636 – Advanced Artificial Intelligence

Set Up for Instructor MGH Display: Try setting your resolution to 1024 by 768 Run Powerpoint. For most reliable start up: Start laptop & projector before.

Brief History of Cognitive Psychology

My Office Hours I will stay after class on both Monday and Wednesday, i.e., 1:30 Mon/Wed in MGH 030. Can everyone stay if they need to? Psych 548, Miyamoto,

Introduction to Bayesian Model Comparison

One-Sample Models (Continuous DV) Then: Simple Linear Regression

Multidimensional Integration Part I

Intro to Bayesian Hierarchical Modeling

Set Up for Instructor Classroom Support Services (CSS), 35 Kane Hall,

Instructors: Fei Fang (This Lecture) and Dave Touretzky

CS 188: Artificial Intelligence

Hierarchical Models of Memory Retention

Markov Chain Monte Carlo

Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics

Set Up for Instructor Classroom Support Services (CSS), 35 Kane Hall,

Sampling Distributions (§ )

CS639: Data Management for Data Science

Markov Networks.

Mathematical Foundations of BME Reza Shadmehr

Classical regression review

Presentation transcript:

MCMC Output & Metropolis-Hastings Algorithm Part I P548: Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/18/2017: Lecture 03-2 Note: This Powerpoint presentation may contain macros that I wrote to help me create the slides. The macros aren’t needed to view the slides. You can disable or delete the macros without any change to the presentation.

Outline Metropolis-Hastings (M-H) algorithm – one of the main tools for approximating a posterior distribution by means of Markov-Chain-Monte-Carlo (MCMC). M-H draws samples from the posterior distribution. With enough samples, you have a good approximation to the posterior distribution. This is the central step in computing a Bayesian analysis. Today's lecture is just a quick overview; we will look at the details in a later lecture. # Psych 548, Miyamoto, Win '17

Outline Assignment 3 focuses on R-code for computing the posterior of a binomial parameter by means of the M-H algorithm. The code in Assignment 3 is almost entirely due to Kruschke, but JM has made a few modifications plus added annotations. In actual research, you will almost never execute the M-H algorithm within R. Instead, you will send the problem of sampling from the posterior over to JAGS or Stan. Nevertheless, it is useful to see the details of M-H on a simple example, and this is what you do in Assignment 3. General Strategy of Bayesian Statistical Inference Psych 548, Miyamoto, Win '17

Three Strategies of Bayesian Statistical Inference Define the Class of Statistical Models (Reality is Assumed to Lie within this Class of Models Define Likelihoods Conditional on Parameters Define Prior Distributions Data Compute Posterior from Conjugate Priors (if possible) Compute Posterior with Grid Approximation (if practically possible) Compute Approximate Posterior by MCMC Algorithm (if possible) Psych 548, Miyamoto, Win '17 Same Slide – Summary Representation

General Strategy of Bayesian Statistical Inference Define the Class of Statistical Models (Reality is Assumed to Lie within this Class of Models Define Likelihoods Conditional on Parameters Define Prior Distributions Data Compute Posterior from Conjugate Priors (if possible) Compute Posterior with Grid Approximation (if practically possible) Compute Approximate Posterior by MCMC Algorithm (if possible) This lecture Psych 548, Miyamoto, Win '17 Illustrate Idea of Sampling from a Posterior Distribution

MCMC Algorithm Samples from the Posterior Distribution BIG NUMBER Validity of the MCMC Approximation Psych 548, Miyamoto, Win '17

Validity of the MCMC Approximation Theorem: Under very general mathematical conditions: As sample size K gets very large, the distribution in the sample converges to the true posterior probability distribution. Look Closely at Bayes Rule Psych 548, Miyamoto, Win '17

Reminder About Bayes Rule Before computing a Bayesian analysis, the researcher knows: θ = (θ1, θ2, ..., θn) is a vector of parameters for a statistical model. E.g., in a oneway anova with 3 group, θ1 = mean 1, θ2 = mean 2, 3 = mean 3, and 4 = the common variance each of the 3 populations. P(θ) = the prior probability distribution over the vector θ. P(D | θ) = the likelihood of the data D given any particular vector θ of parameters. Bayes Rule: Known for each specific θ Known for each specific θ Known for each specific θ Unknown for the entire distribution Unknown Same Slide w-o Red Annotation Psych 548, Miyamoto, Win '17

Bayes Rule Before computing a Bayesian analysis, the researcher knows: θ = (θ1, θ2, ..., θn) is a vector of parameters for a statistical model. E.g., in a oneway anova with 3 group, θ1 = mean 1, θ2 = mean 2, 3 = mean 3, and 4 = the common variance each of the 3 populations. P(θ) = the prior probability distribution over the vector θ. P(D | θ) = the likelihood of the data D given any particular vector θ of parameters. Bayes Rule: Why Is Bayes Rule Hard to Compute in Practice? Psych 548, Miyamoto, Win '17

Why Is Bayes Rule Hard to Apply in Practice? Fact #1: P(D | θ) is easy to compute for individual cases. Fact #2: P(θ) is easy to compute for individual cases. Metropolis-Hastings Algorithm uses Facts #1 and #2 to compute an approximation to P( θ | D) where Reminder: Each Step of Metropolis-Hastings Algorithm . Depends only on Immediately Preceding Step. Psych 548, Miyamoto, Win '17

Reminder: Each Sample from the Posterior Depends only on the Immediate Preceding Step * See ‘b:\ab\john\p548\nts\eqs.template.docm’ for a template for these equations BIG PICTURE: Metropolis-Hastings Algorithm Psych 548, Miyamoto, Win '17

BIG PICTURE: Metropolis Hastings Algorithm At the k-th step, you have a current vector of k parameters. This is your current sample. A “proposal function” F proposes a random new vector based only on the values in Iteration k: A “rejection rule” decides whether the proposal is acceptable or not. If it is acceptable: Iteration k + 1 = Proposal k If it is rejected: Iteration k + 1 = Iteration k Repeat the process at the next step. Metropolis-Hastings: The Proposal Density Psych 548, Miyamoto, Win '17

Metropolis-Hastings (M-H) Algorithm The “Proposal” Density Notation: Let be a vector of specific values for θ1, θ2, ..., θn that make up the k-th sample. Choose a "proposal" density F(θ | θk ) where for each , F(θ | θk ) is a probability distribution over the θ  Ω = the set of all parameter vectors. Example: F(θ | θk ) might be defined by: θ1 ~ N(θk1,  = 2), θ2 ~ N(θk2,  = 2) , ...., θn ~ N(θkn,  = 2). Psych 548, Miyamoto, Win '17 Case of M-H Algorithm With Symmetric Proposal Function

MH Algorithm for Case Where Proposal Function is Symmetric Step 1: Assume that θk is the current value of θ: Step 2: Draw a candidate θc from F(θ | θk ), i.e. θc ~ F(θ | θk ) Step 3: Compute the posterior odds: Step 4: If R ≥ 1, set θk+1 = θc. If R < 1, draw u ~ Uniform(0, 1). ▪ If R ≥ u, set θk+1 = θc. ▪ If R < u, set θk+1 = θk. Step 5: Set k = k+1, and return to Step 2. Continue this process until you have a very large sample of θ. Now we have finished choosing Psych 548, Miyamoto, Win '17 Closer Look at Steps 3 and 4

Closer Look at Steps 3 & 4 Step 3: Compute the posterior odds: θc = “candidate” sample θk = previously accepted k-th sample Step 4: If R ≥ 1.0, set θk+1 = θc. If R < 1.0, draw random u ~ Uniform(0, 1). ▪ If R ≥ u, set θk+1 = θc. ▪ If R < u, set θk+1 = θk. If P( θc | D) ≥ P( θk | D), then R ≥ 1.0, so it is certain that θk+1 = θc. If P( θc | D) < P( θk | D), then R is the probability that θk+1 = θc. Conclusion: The MCMC chain tends to jump towards high probability regions of the posterior, but it can jump to low probability regions. Return to Slide Showing the Metropolis-Hastings Algorithm Psych 548, Miyamoto, Win '17

MH Algorithm for Case Where Proposal Function is Symmetric Step 1: Assume that θk is the current value of θ: Step 2: Draw a candidate θc from F(θ | θk ), i.e. θc ~ F(θ | θk ) Step 3: Compute the posterior odds: Step 4: If R ≥ 1, set θk+1 = θc. If R < 1, draw u ~ Uniform(0, 1). ▪ If R ≥ u, set θk+1 = θc. ▪ If R < u, set θk+1 = θk. Step 5: Set k = k+1, and return to Step 2. Continue this process until you have a very large sample of θ. Now we have finished choosing Psych 548, Miyamoto, Win '17 Go to Handout on R-code for Metropolis-Hastings - END

Wednesday, January 18, 2017: The Lecture Ended Here Psych 548, Miyamoto, Win '17