Priors, Normal Models, Computing Posteriors

Slides:



Advertisements
Similar presentations
Introduction to Monte Carlo Markov chain (MCMC) methods
Advertisements

02/12/ a tutorial on Markov Chain Monte Carlo (MCMC) Dima Damen Maths Club December 2 nd 2008.
Monte Carlo Methods and Statistical Physics
Bayesian Estimation in MARK
Efficient Cosmological Parameter Estimation with Hamiltonian Monte Carlo Amir Hajian Amir Hajian Cosmo06 – September 25, 2006 Astro-ph/
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
Markov-Chain Monte Carlo
Computer Vision Lab. SNU Young Ki Baik An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)
Markov Chains 1.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Bayesian Reasoning: Markov Chain Monte Carlo
Bayesian statistics – MCMC techniques
Suggested readings Historical notes Markov chains MCMC details
Stochastic approximate inference Kay H. Brodersen Computational Neuroeconomics Group Department of Economics University of Zurich Machine Learning and.
BAYESIAN INFERENCE Sampling techniques
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
1 CE 530 Molecular Simulation Lecture 8 Markov Processes David A. Kofke Department of Chemical Engineering SUNY Buffalo
Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often.
Simulation Where real stuff starts. ToC 1.What, transience, stationarity 2.How, discrete event, recurrence 3.Accuracy of output 4.Monte Carlo 5.Random.
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Introduction to Monte Carlo Methods D.J.C. Mackay.
7. Bayesian phylogenetic analysis using MrBAYES UST Jeong Dageum Thomas Bayes( ) The Phylogenetic Handbook – Section III, Phylogenetic.
Bayes Factor Based on Han and Carlin (2001, JASA).
Gaussian process modelling
Queensland University of Technology CRICOS No J Towards Likelihood Free Inference Tony Pettitt QUT, Brisbane Joint work with.
Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
2 nd Order CFA Byrne Chapter 5. 2 nd Order Models The idea of a 2 nd order model (sometimes called a bi-factor model) is: – You have some latent variables.
Module 1: Statistical Issues in Micro simulation Paul Sousa.
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
Monte Carlo Methods1 T Special Course In Information Science II Tomas Ukkonen
A Comparison of Two MCMC Algorithms for Hierarchical Mixture Models Russell Almond Florida State University College of Education Educational Psychology.
Integrating Topics and Syntax -Thomas L
Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
Bayesian Reasoning: Tempering & Sampling A/Prof Geraint F. Lewis Rm 560:
Improved Cross Entropy Method For Estimation Presented by: Alex & Yanna.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination based on event counts (follow-up from 11 May 07) ATLAS Statistics Forum.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
A tutorial on Markov Chain Monte Carlo. Problem  g (x) dx I = If{X } form a Markov chain with stationary probability  i  I  g(x ) i  (x ) i  i=1.
Tutorial I: Missing Value Analysis
CS774. Markov Random Field : Theory and Application Lecture 15 Kyomin Jung KAIST Oct
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Bayesian Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 1 Computational Statistics with Application to Bioinformatics Prof. William.
SIR method continued. SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For.
How many iterations in the Gibbs sampler? Adrian E. Raftery and Steven Lewis (September, 1991) Duke University Machine Learning Group Presented by Iulian.
Hierarchical Models. Conceptual: What are we talking about? – What makes a statistical model hierarchical? – How does that fit into population analysis?
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Markov Chain Monte Carlo in R
Introduction to Sampling based inference and MCMC
MCMC Output & Metropolis-Hastings Algorithm Part I
Advanced Statistical Computing Fall 2016
Jun Liu Department of Statistics Stanford University
Markov chain monte carlo
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
CAP 5636 – Advanced Artificial Intelligence
Predictive distributions
CS 188: Artificial Intelligence
Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics
Presentation transcript:

Priors, Normal Models, Computing Posteriors st5219: Bayesian hierarchical modelling lecture 2.4

An all purpose sampling tool Monte Carlo: requires knowing the distribution---often don’t Importance Sampling: requires being able to sample from something vaguely like the posterior---often can’t Markov chain Monte Carlo can almost always be used

Markov chains The big idea: if you could set up a Markov chain that had a stationary distribution equal to the posterior, you could sample just by simulating a Markov chain and then using its trajectory as your sample from the posterior Discrete time stochastic process obeying the Markov property Denote the distribution of t+1 given t as π(t+1 | t) We consider  on some part of ℝd t+1 may have a stationary distribution, meaning that if s comes from the stationary distribution then so does t for s<t

Markov chain Monte Carlo Obviously if π(t+1 | t) = f( |data) then you would just be sampling the posterior When Monte Carlo is not possible, it’s really easy to set up a Markov chain with this property using the Metropolis-Hastings algorithm Metropolis et al (1953) J Chem Phys 21:1087--92 Hastings (1970) Biometrika 57:97--109

Metropolis-Hastings algorithm Board work

1: constants Suppose you can calculate f( |data) only to a constant of proportionality Eg f(data| )f( ) No problem: the constant of proportionality cancels

2: choice of q You can choose almost whatever you want for q(*→ ). A common choice is N(,Σ), with Σ an arbitrary (co)variance Note that for 1-D, you have and so the qs cancel This cancelling of qs is common to all symmetric distributions around the current value Board work

3: special case Monte Carlo If you can propose from the posterior itself, so that q(*→ )=f( |data), then α=1 and you always accept proposals So Monte Carlo is a special case of MCMC

4: choice of bandwidth If using normal proposals, how to choose the standard deviation or bandwidth? Aim for 20% to 40% of proposals accepted and you’re close to optimal Too small: very slow movement Too big: very slow movement Goldilocks: fast movement

5: special case Gibbs sampler If you know the marginal posterior of a parameter or block of parameter given the other parameters, you can just propose from that marginal This gives α=1 This is called Gibbs sampling---nothing special, just a good MCMC algorithm

6: burn in It is common to start with an arbitrary 0 To stop this biasing your estimates, usually discard samples from a “burn in” period This lets the chain forget where you started it If you start near posterior, short burn in ok If you start far from posterior, longer burn in needed

7: multiple chains Running multiple chains in parallel allows: you to check convergence to the same distribution even from initial values far from each other you can utilise X multiple processors (eg on a server) to get a sample X times as big in the same amount of time Make sure you start with different seeds though (eg not set.seed(666) all the time)

8: correlation of parameters If 2+ parameters are tightly correlated, then sampling one at a time will not work efficiently Several options: reparametrise the model so that the posteriors are more orthogonal use a multivariate proposal distribution that accounts for the correlation

9: assessing convergence The cowboy approach is to look at a trace plot of the chain only (Butcher’s test) More formal methods exist (see tutorial 2)

An example: H1N1 again As before New logpost=function(cu){ cu$logp=dbinom(98,727,cu$p*cu$sigma,log=TRUE) +dbeta(cu$p,1,1,log=TRUE) +dbeta(cu$sigma,630,136,log=TRUE) cu} New rejecter=function(cu){ reject=FALSE if(cu$p<0)reject=TRUE; if(cu$p>1)reject=TRUE if(cu$sigma<0)reject=TRUE;if(cu$sigma>1)reject=TRUE reject}

An example: H1N1 again current=list(p=0.5,sigma=0.5) current=logposterior(current) NDRAWS=10000 dump=list(p=rep(0,NDRAWS),sigma=rep(0,NDRAWS)) for(iteration in 1:NDRAWS) { old=current current$p=rnorm(1,current$p,0.1) current$sigma=rnorm(1,current$sigma,0.1) REJECT=rejecter(current) if(!REJECT)

An example: H1N1 again if(!REJECT) { current=logposterior(current) accept_prob=current$logp-old$logp lu=log(runif(1)) if(lu>accept_prob)REJECT=TRUE } if(REJECT)current=old dump$p[iteration]=current$p dump$sigma[iteration]=current$sigma

Using that routine

Using that routine

Using that routine

Bandwidths The choice of bandwidth is arbitrary Asymptotically, doesn’t matter But in practice, need to choose right...

Using bandwidths = 1

Using bandwidths = 1

Using bandwidths = 1

Using bandwidths = 0.01

Using bandwidths = 0.01

Using bandwidths = 0.01

Using bandwidths = 0.001

Using bandwidths = 0.001

Using bandwidths = 0.001

Another example Same dataset, but now the non-informative prior: p ~ U(0,1) σ ~ U(0,1)

Using bandwidths = 0.001

Using bandwidths = 0.01

Using bandwidths = 0.1

Using bandwidths = 1

Using bandwidths = 0.1

Example 2 Why does this not work? Tightly correlated posterior Plus weird shape Very hard to design a local movement rule to encourage swift mixing through the joint posterior distribution

Summary Importance sampling: Monte Carlo: MCMC: if you can find a distribution quite close to the posterior Monte Carlo: use whenever you can: but rarely are able MCMC: good general purpose tool sometimes an art to get working effectively

Next week: everything you already know how to do, differently Versions of: t-tests regression etc After that: hierarchical modelling BUGS