Priors, Normal Models, Computing Posteriors

Priors, Normal Models, Computing Posteriors
st5219: Bayesian hierarchical modelling lecture 2.4

An all purpose sampling tool
Monte Carlo: requires knowing the distribution---often don’t Importance Sampling: requires being able to sample from something vaguely like the posterior---often can’t Markov chain Monte Carlo can almost always be used

Markov chains The big idea: if you could set up a Markov chain that had a stationary distribution equal to the posterior, you could sample just by simulating a Markov chain and then using its trajectory as your sample from the posterior Discrete time stochastic process obeying the Markov property Denote the distribution of t+1 given t as π(t+1 | t) We consider  on some part of ℝd t+1 may have a stationary distribution, meaning that if s comes from the stationary distribution then so does t for s<t

Markov chain Monte Carlo
Obviously if π(t+1 | t) = f( |data) then you would just be sampling the posterior When Monte Carlo is not possible, it’s really easy to set up a Markov chain with this property using the Metropolis-Hastings algorithm Metropolis et al (1953) J Chem Phys 21: Hastings (1970) Biometrika 57:

Metropolis-Hastings algorithm
Board work

1: constants Suppose you can calculate f( |data) only to a constant of proportionality Eg f(data| )f( ) No problem: the constant of proportionality cancels

2: choice of q You can choose almost whatever you want for q(*→ ). A common choice is N(,Σ), with Σ an arbitrary (co)variance Note that for 1-D, you have and so the qs cancel This cancelling of qs is common to all symmetric distributions around the current value Board work

3: special case Monte Carlo
If you can propose from the posterior itself, so that q(*→ )=f( |data), then α=1 and you always accept proposals So Monte Carlo is a special case of MCMC

4: choice of bandwidth If using normal proposals, how to choose the standard deviation or bandwidth? Aim for 20% to 40% of proposals accepted and you’re close to optimal Too small: very slow movement Too big: very slow movement Goldilocks: fast movement

5: special case Gibbs sampler
If you know the marginal posterior of a parameter or block of parameter given the other parameters, you can just propose from that marginal This gives α=1 This is called Gibbs sampling---nothing special, just a good MCMC algorithm

6: burn in It is common to start with an arbitrary 0
To stop this biasing your estimates, usually discard samples from a “burn in” period This lets the chain forget where you started it If you start near posterior, short burn in ok If you start far from posterior, longer burn in needed

7: multiple chains Running multiple chains in parallel allows:
you to check convergence to the same distribution even from initial values far from each other you can utilise X multiple processors (eg on a server) to get a sample X times as big in the same amount of time Make sure you start with different seeds though (eg not set.seed(666) all the time)

8: correlation of parameters
If 2+ parameters are tightly correlated, then sampling one at a time will not work efficiently Several options: reparametrise the model so that the posteriors are more orthogonal use a multivariate proposal distribution that accounts for the correlation

9: assessing convergence
The cowboy approach is to look at a trace plot of the chain only (Butcher’s test) More formal methods exist (see tutorial 2)

An example: H1N1 again As before New
logpost=function(cu){ cu$logp=dbinom(98,727,cu$p*cu$sigma,log=TRUE) +dbeta(cu$p,1,1,log=TRUE) +dbeta(cu$sigma,630,136,log=TRUE) cu} New rejecter=function(cu){ reject=FALSE if(cu$p<0)reject=TRUE; if(cu$p>1)reject=TRUE if(cu$sigma<0)reject=TRUE;if(cu$sigma>1)reject=TRUE reject}

An example: H1N1 again current=list(p=0.5,sigma=0.5)
current=logposterior(current) NDRAWS=10000 dump=list(p=rep(0,NDRAWS),sigma=rep(0,NDRAWS)) for(iteration in 1:NDRAWS) { old=current current$p=rnorm(1,current$p,0.1) current$sigma=rnorm(1,current$sigma,0.1) REJECT=rejecter(current) if(!REJECT)

An example: H1N1 again if(!REJECT) { current=logposterior(current)
accept_prob=current$logp-old$logp lu=log(runif(1)) if(lu>accept_prob)REJECT=TRUE } if(REJECT)current=old dump$p[iteration]=current$p dump$sigma[iteration]=current$sigma

Using that routine

Bandwidths The choice of bandwidth is arbitrary
Asymptotically, doesn’t matter But in practice, need to choose right...

Using bandwidths = 1

Using bandwidths = 0.01

Another example Same dataset, but now the non-informative prior:
p ~ U(0,1) σ ~ U(0,1)

Using bandwidths = 1

Example 2 Why does this not work? Tightly correlated posterior
Plus weird shape Very hard to design a local movement rule to encourage swift mixing through the joint posterior distribution

Summary Importance sampling: Monte Carlo: MCMC:
if you can find a distribution quite close to the posterior Monte Carlo: use whenever you can: but rarely are able MCMC: good general purpose tool sometimes an art to get working effectively

Next week: everything you already know how to do, differently
Versions of: t-tests regression etc After that: hierarchical modelling BUGS

Priors, Normal Models, Computing Posteriors

Similar presentations

Presentation on theme: "Priors, Normal Models, Computing Posteriors"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Priors, Normal Models, Computing Posteriors

Similar presentations

Presentation on theme: "Priors, Normal Models, Computing Posteriors"— Presentation transcript:

Similar presentations

About project

Feedback