Advanced Statistical Computing Fall 2016

Slides:



Advertisements
Similar presentations
02/12/ a tutorial on Markov Chain Monte Carlo (MCMC) Dima Damen Maths Club December 2 nd 2008.
Advertisements

Slice Sampling Radford M. Neal The Annals of Statistics (Vol. 31, No. 3, 2003)
Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.
Bayesian Estimation in MARK
Introduction of Markov Chain Monte Carlo Jeongkyun Lee.
Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
Markov Chains Modified by Longin Jan Latecki
Computer Vision Lab. SNU Young Ki Baik An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)
Markov Chains 1.
Markov Chain Monte Carlo Prof. David Page transcribed by Matthew G. Lee.
11 - Markov Chains Jim Vallandingham.
TCOM 501: Networking Theory & Fundamentals
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Entropy Rates of a Stochastic Process
1 Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Bayesian Reasoning: Markov Chain Monte Carlo
Bayesian statistics – MCMC techniques
Suggested readings Historical notes Markov chains MCMC details
BAYESIAN INFERENCE Sampling techniques
Exact Inference (Last Class) variable elimination  polytrees (directed graph with at most one undirected path between any two vertices; subset of DAGs)
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
If time is continuous we cannot write down the simultaneous distribution of X(t) for all t. Rather, we pick n, t 1,...,t n and write down probabilities.
Haplotype Analysis based on Markov Chain Monte Carlo
Monte Carlo Methods in Partial Differential Equations.
Introduction to Monte Carlo Methods D.J.C. Mackay.
6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.
Priors, Normal Models, Computing Posteriors
Module 1: Statistical Issues in Micro simulation Paul Sousa.
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
1 Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Markov Chain Monte Carlo and Gibbs Sampling Vasileios Hatzivassiloglou University of Texas at Dallas.
Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.
Exact Inference (Last Class) Variable elimination  polytrees (directed graph with at most one undirected path between any two vertices; subset of DAGs)
Numerical Bayesian Techniques. outline How to evaluate Bayes integrals? Numerical integration Monte Carlo integration Importance sampling Metropolis algorithm.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
7. Metropolis Algorithm. Markov Chain and Monte Carlo Markov chain theory describes a particularly simple type of stochastic processes. Given a transition.
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
CS774. Markov Random Field : Theory and Application Lecture 15 Kyomin Jung KAIST Oct
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
How many iterations in the Gibbs sampler? Adrian E. Raftery and Steven Lewis (September, 1991) Duke University Machine Learning Group Presented by Iulian.
The Monte Carlo Method/ Markov Chains/ Metropolitan Algorithm from sec in “Adaptive Cooperative Systems” -summarized by Jinsan Yang.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Introduction to Sampling based inference and MCMC
MCMC Output & Metropolis-Hastings Algorithm Part I
Markov Chain Monte Carlo methods --the final project of stat 6213
ERGM conditional form Much easier to calculate delta (change statistics)
Handbook of Markov Chain Monte Carlo (Chap. 1&5)
Jun Liu Department of Statistics Stanford University
Bayesian inference Presented by Amir Hadadi
Markov Chain Monte Carlo
Markov chain monte carlo
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Advanced Statistical Computing Fall 2016
Haim Kaplan and Uri Zwick
7. Metropolis Algorithm.
Ch13 Empirical Methods.
Lecture 15 Sampling.
Slides for Sampling from Posterior of Shape
Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics
Presentation transcript:

Advanced Statistical Computing Fall 2016 Steve Qin

Markov Chain Monte Carlo The goal is to generates sequence of random samples from an arbitrary probability density function (usually high dimensional), The sequence of random samples form a Markov chain, The purpose is simulation (Monte Carlo).

Motivation Generate iid r.v. from high-dimensional arbitrary distributions is extremely difficult. Drop the “independent” requirement. How about also drop the “identically distributed” requirement?

Markov chain Assume a finite state, discrete Markov chain with N different states. Random process Xn, n = 0,1,2,… Markov property, Time-homogeneous Order Future state depends on the past m states.

Key parameters Transition matrix Initial probability distribution π(0) Stationary distribution (invariant/equilibrium)

Reducibility A state j is accessible from state i (written i j) if A Markov chain is irreducible if it is possible to get to any state from any state.

Recurrence A state i is transient if given that we start in state i, there is a non-zero probability that we will never return to i. State i is recurrent if it is not transient.

Ergodicity A state i is ergodic if it is aperiodic and positive recurrent. If all states in an irreducible Markov chain are ergodic, the chain is ergodic.

Reversible Markov chains Consider an ergodic Markov chain that converges to an invariant distribution π. A Markov chain is reversible if for all x, y ϵ S, which is known as the detailed balance equation. An ergodic chain in equilibrium and satisfying the detailed balance condition has π as its unique stationary distribution.

Markov Chain Monte Carlo The goal is to generates sequence of random samples from an arbitrary probability density function (usually high dimensional), The sequence of random samples form a Markov chain, in Markov chain, P  π in MCMC, π  P.

Examples

Bayesian Inference Genotype Haplotype Frequency Prior

History Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and Teller, E. (1953). Equation of state calculations by fast computing machines. Journal of Chemical Physics, 21, 1087–1092.

Metropolis algorithm Direct sampling from the target distribution is difficult, Generating candidate draws from a proposal distribution, These draws then “corrected” so that asymptotically, they can be viewed as random samples from the desired target distribution.

Pseudo code Initialize X0, Repeat Sample Y ~ q(x,.), Sample U ~ Uniform (0,1), If U ≤ α (X,Y), set Xi = y, Otherwise Xi = x.

An informal derivation Find α (X,Y): Joint density of current Markov chain state and the proposal is g(x,y) = q(x,y)π(x) Suppose q satisfies detail balance q(x,y) π(x) = q(y,x)π(y) If q(x,y) π(x) > q(y,x)π(y), introduce α(x,y) < 1 and α(y,x) =1 hence If q(x,y) π(x) > q(y,x)π(y), … The probability of acceptance is

Metropolis-Hastings Algorithm

Remarks Relies only on calculation of target pdf up to a normalizing constant.

Remarks How to choose a good proposal function is crucial. Sometimes tuning is needed. Rule of thumb: 30% acceptance rate Convergence is slow for high dimensional cases.

Illustration of Metropolis-Hastings Suppose we try to sample from a bi-variate normal distributions. Start from (0, 0) Proposed move at each step is a two dimensional random walk

Illustration of Metropolis-Hastings At each step, calculate since

Convergence Trace plot Good

Convergence Trace plot Bad

Convergence Trace plot

Convergence Autocorrelation plot Good

Convergence s = 0.5 Autocorrelation plot Bad

Convergence s = 3.0 Autocorrelation plot Okay

References Metropolis et al. 1953, Hastings 1973, Tutorial paper: Chib and Greenberg (1995). Understanding the Metropolis--Hastings Algorithm. The American Statistician 49, 327-335.

Gibbs Sampler Purpose: Draw random samples form a joint distribution (high dimensional) Method: Iterative conditional sampling

Illustration of Gibbs Sampler

References Geman and Geman 1984, Gelfand and Smith 1990, Tutorial paper: Casella and George (1992). Explaining the Gibbs sampler. The American Statistician, 46, 167-174.

Remarks Gibbs Sampler is a special case of Metropolis-Hastings Compare to EM algorithm, Gibbs sampler and Metropolis-Hastings are stochastic procedures Verify convergence of the sequence Require Burn in Use multiple chains

References Geman and Geman 1984, Gelfand and Smith 1990, Tutorial paper: Casella and George (1992). Explaining the Gibbs sampler. The American Statistician, 46, 167-174.