Introduction to Monte Carlo Markov chain (MCMC) methods

Slides:

Advertisements

Similar presentations

Generalised linear mixed models in WinBUGS

Advertisements

Other MCMC features in MLwiN and the MLwiN->WinBUGS interface

MCMC estimation in MlwiN

Contrastive Divergence Learning

A Bayesian random coefficient nonlinear regression for a split-plot experiment for detecting differences in the half- life of a compound Reid D. Landes.

Bayesian inference of normal distribution

Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.

METHODS FOR HAPLOTYPE RECONSTRUCTION

Bayesian Estimation in MARK

Gibbs Sampling Qianji Zheng Oct. 5th, 2010.

Markov-Chain Monte Carlo

Bayesian statistics – MCMC techniques

Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.

Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.

Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short.

A Two Level Monte Carlo Approach To Calculating

End of Chapter 8 Neil Weisenfeld March 28, 2005.

Statistical Background

Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.

Introduction to Monte Carlo Methods D.J.C. Mackay.

Component Reliability Analysis

Bayes Factor Based on Han and Carlin (2001, JASA).

Modeling Menstrual Cycle Length in Pre- and Peri-Menopausal Women Michael Elliott Xiaobi Huang Sioban Harlow University of Michigan School of Public Health.

Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,

1 Least squares procedure Inference for least squares lines Simple Linear Regression.

Model Inference and Averaging

Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.

1 Physical Fluctuomatics 5th and 6th Probabilistic information processing by Gaussian graphical model Kazuyuki Tanaka Graduate School of Information Sciences,

Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.

Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.

Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.

1 Estimation From Sample Data Chapter 08. Chapter 8 - Learning Objectives Explain the difference between a point and an interval estimate. Construct and.

1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.

Bayesian Inversion of Stokes Profiles A.Asensio Ramos (IAC) M. J. Martínez González (LERMA) J. A. Rubiño Martín (IAC) Beaulieu Workshop ( Beaulieu sur.

Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.

Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.

An Efficient Sequential Design for Sensitivity Experiments Yubin Tian School of Science, Beijing Institute of Technology.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.

Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.

Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.

Latent Class Regression Model Graphical Diagnostics Using an MCMC Estimation Procedure Elizabeth S. Garrett Scott L. Zeger Johns Hopkins University

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Introduction to Sampling Methods Qi Zhao Oct.27,2004.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Bayesian Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.

Markov-Chain-Monte-Carlo (MCMC) & The Metropolis-Hastings Algorithm P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/19/2016:

The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.

The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.

SIR method continued. SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For.

How many iterations in the Gibbs sampler? Adrian E. Raftery and Steven Lewis (September, 1991) Duke University Machine Learning Group Presented by Iulian.

Hierarchical Models. Conceptual: What are we talking about? – What makes a statistical model hierarchical? – How does that fit into population analysis?

Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.

Markov Chain Monte Carlo in R

MCMC Output & Metropolis-Hastings Algorithm Part I

MCMC Stopping and Variance Estimation: Idea here is to first use multiple Chains from different initial conditions to determine a burn-in period so the.

ECO 173 Chapter 10: Introduction to Estimation Lecture 5a

Advanced Statistical Computing Fall 2016

Model Inference and Averaging

Introduction to the bayes Prefix in Stata 15

ECO 173 Chapter 10: Introduction to Estimation Lecture 5a

Markov Chain Monte Carlo

Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.

Statistical Methods For Engineers

Ch13 Empirical Methods.

CS 188: Artificial Intelligence

Confidence intervals for the difference between two means: Independent samples Section 10.1.

Classical regression review

Presentation transcript:

Introduction to Monte Carlo Markov chain (MCMC) methods Lecture 3 Introduction to Monte Carlo Markov chain (MCMC) methods

Lecture Contents How does WinBUGS fit a regression? Gibbs Sampling Convergence and burnin How many iterations? Logistic regression example

Linear regression Let us revisit our simple linear regression model To this model we added the following priors in WinBUGS Ideally we would sample from the joint posterior distribution

Linear Regression ctd. In this case we can sample from the joint posterior as described in the last lecture However this is not the case for all models and so we will now describe other simulation-based methods that can be used. These methods come from a family of methods called Markov chain Monte Carlo (MCMC) methods and here we will focus on a method called Gibbs Sampling.

MCMC Methods Goal: To sample from joint posterior distribution. Problem: For complex models this involves multidimensional integration Solution: It may be possible to sample from conditional posterior distributions, It can be shown that after convergence such a sampling approach generates dependent samples from the joint posterior distribution.

Gibbs Sampling When we can sample directly from the conditional posterior distributions then such an algorithm is known as Gibbs Sampling. This proceeds as follows for the linear regression example: Firstly give all unknown parameters starting values, Next loop through the following steps:

Gibbs Sampling ctd. Sample from These steps are then repeated with the generated values from this loop replacing the starting values. The chain of values produced by this procedure are known as a Markov chain, and it is hoped that this chain converges to its equilibrium distribution which is the joint posterior distribution.

Calculating the conditional distributions In order for the algorithm to work we need to sample from the conditional posterior distributions. If these distributions have standard forms then it is easy to draw random samples from them. Mathematically we write down the full posterior and assume all parameters are constants apart from the parameter of interest. We then try to match the resulting formulae to a standard distribution.

Matching distributional forms If a parameter θ follows a Normal(μ,σ2) distribution then we can write Similarly if θ follows a Gamma(α,β) distribution then we can write

Step 1: β0

Step 2: β1

Step 3: 1/σ2

Algorithm Summary Repeat the following three steps 1. Generate β0 from its Normal conditional distribution. 2. Generate β1 from its Normal conditional distribution. 3. Generate 1/σ2 from its Gamma conditional distribution

Convergence and burn-in Two questions that immediately spring to mind are: We start from arbitrary starting values so when can we safely say that our samples are from the correct distribution? After this point how long should we run the chain for and store values?

Checking Convergence This is the researchers responsibility! Convergence is to a target distribution (the required posterior), not to a single value as in ML methods. Once convergence has been reached, samples should look like a random scatter about a stable mean value.

Convergence Convergence occurs here at around 100 iterations.

Checking convergence 2 One approach (in WinBUGS) is to run many long chains with widely differing starting values. WinBUGS also has the Brooks-Gelman-Rubin diagnostic which is based on the ratio of between-within chain variances (ANOVA). This diagnostic should converge to 1.0 on convergence. MLwiN has other diagnostics that we will cover on Wednesday.

Demo of multiple chains in WinBUGS Here we transfer to the computer for a demonstration with the regression example of multiple chains (also mention node info)

Demo of multiple chains in WinBUGS Average 80% interval within-chains (blue) and pooled 80% interval between chains (green) – converge to stable values Ratio pooled:average interval width (red) – converge to 1.

Convergence in more complex models Convergence in linear regression is (almost) instantaneous. Here is an example of slower convergence

How many iterations after convergence? After convergence, further iterations are needed to obtain samples for posterior inference. More iterations = more accurate posterior estimates. MCMC chains are dependent samples and so the dependence or autocorrelation in the chain will influence how many iterations we need. Accuracy of the posterior estimates can be assessed by the Monte Carlo standard error (MCSE) for each parameter. Methods for calculating MCSE are given in later lectures.

Inference using posterior samples from MCMC runs A powerful feature of MCMC and the Bayesian approach is that all inference is based on the joint posterior distribution. We can therefore address a wide range of substantive questions by appropriate summaries of the posterior. Typically report either the mean or median of the posterior samples for each parameter of interest as a point estimate 2.5% and 97.5% percentiles of the posterior sample for each parameter give a 95% posterior credible interval (interval within which the parameter lies with probability 0.95)

Derived Quantities Once we have a sample from the posterior we can answer lots of questions simply by investigating this sample. Examples: What is the probability that θ>0? What is the probability that θ1> θ2? What is a 95% interval for θ1/(θ1+ θ2)? See later for examples of these sorts of derived quantities.

Logistic regression example In the practical that follows we will look at the following dataset of rat tumours and fit a logistic regression model to it: Dose level Number of rats Number with tumors 14 4 1 34 2

Logistic regression model A standard Bayesian logistic regression model for this data can be written as follows: WinBUGS can fit this model but can we write out the conditional posterior distributions and use Gibbs Sampling?

Conditional distribution for β0 This distribution is not a standard distribution and so we cannot simply simulate from a standard random number generator. However both WinBUGS and MLwiN can fit this model using MCMC. We will however not see how until day 5.

Hints for the next practical In the next practical you will be creating WinBUGS code for a logistic regression model. In this practical you get less help and so I would suggest that looking at the Seeds example in the WinBUGS examples may help. The seeds example is more complicated than what you require but will be helpful for showing the necessary WinBUGS statements.