Suggested readings Historical notes Markov chains MCMC details

Slides:



Advertisements
Similar presentations
Introduction to Monte Carlo Markov chain (MCMC) methods
Advertisements

Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.
Bayesian Estimation in MARK
Introduction of Markov Chain Monte Carlo Jeongkyun Lee.
Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
Markov-Chain Monte Carlo
Markov Chains Modified by Longin Jan Latecki
Computer Vision Lab. SNU Young Ki Baik An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)
Bayesian Methods with Monte Carlo Markov Chains III
Markov Chains 1.
11 - Markov Chains Jim Vallandingham.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Lecture 3: Markov processes, master equation
Bayesian Reasoning: Markov Chain Monte Carlo
Bayesian statistics – MCMC techniques
BAYESIAN INFERENCE Sampling techniques
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
1 CE 530 Molecular Simulation Lecture 8 Markov Processes David A. Kofke Department of Chemical Engineering SUNY Buffalo
Gibbs sampling for motif finding in biological sequences Christopher Sheldahl.
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
MAE 552 – Heuristic Optimization Lecture 6 February 6, 2002.
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Today Introduction to MCMC Particle filters and MCMC
Image Analysis and Markov Random Fields (MRFs) Quanren Xiong.
Introduction to Monte Carlo Methods D.J.C. Mackay.
6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.
Priors, Normal Models, Computing Posteriors
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Module 1: Statistical Issues in Micro simulation Paul Sousa.
Stochastic Monte Carlo methods for non-linear statistical inverse problems Benjamin R. Herman Department of Electrical Engineering City College of New.
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
Monte Carlo Methods Versatile methods for analyzing the behavior of some activity, plan or process that involves uncertainty.
Monte Carlo Methods1 T Special Course In Information Science II Tomas Ukkonen
Simulated Annealing.
Markov Chain Monte Carlo and Gibbs Sampling Vasileios Hatzivassiloglou University of Texas at Dallas.
Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.
Simulation techniques Summary of the methods we used so far Other methods –Rejection sampling –Importance sampling Very good slides from Dr. Joo-Ho Choi.
Markov-Chain Monte Carlo CSE586 Computer Vision II Spring 2010, Penn State Univ.
Bayesian Reasoning: Tempering & Sampling A/Prof Geraint F. Lewis Rm 560:
Improved Cross Entropy Method For Estimation Presented by: Alex & Yanna.
Numerical Bayesian Techniques. outline How to evaluate Bayes integrals? Numerical integration Monte Carlo integration Importance sampling Metropolis algorithm.
Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
7. Metropolis Algorithm. Markov Chain and Monte Carlo Markov chain theory describes a particularly simple type of stochastic processes. Given a transition.
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
CS774. Markov Random Field : Theory and Application Lecture 15 Kyomin Jung KAIST Oct
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Lecture 18, CS5671 Multidimensional space “The Last Frontier” Optimization Expectation Exhaustive search Random sampling “Probabilistic random” sampling.
Markov-Chain-Monte-Carlo (MCMC) & The Metropolis-Hastings Algorithm P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/19/2016:
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
SIR method continued. SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For.
How many iterations in the Gibbs sampler? Adrian E. Raftery and Steven Lewis (September, 1991) Duke University Machine Learning Group Presented by Iulian.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Introduction to Sampling based inference and MCMC
MCMC Output & Metropolis-Hastings Algorithm Part I
Markov Chain Monte Carlo methods --the final project of stat 6213
Advanced Statistical Computing Fall 2016
Markov Chain Monte Carlo
Markov chain monte carlo
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Haim Kaplan and Uri Zwick
Ch13 Empirical Methods.
Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics
Presentation transcript:

Suggested readings Historical notes Markov chains MCMC details Chap1, Andrieu, C., et al. (2003). An Introduction to MCMC for Machine Learning. Machine Learning, 50, 5–43. Section 11.11 by Gelman. Markov chains Chap11, David J.C. MacKay. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press. http://www.cambridge.org/0521642981. Chap11, Grinstead & Snell, Introduction to Probability, found on internet. Section 7.1, Tan & Fox, (2007) Lecture Notes on Inverse Problems MCMC details Andrieu, C., et al. (2003). An Introduction to MCMC for Machine Learning. Machine Learning, 50, 5–43. Section 11.2~11.6 by Gelman. Section 7.2, Tan & Fox, (2007) Lecture Notes on Inverse Problems

Markov Chain Weather in the land of Oz Land of Oz has unique weather system. Note that next day state is not affected by the previous states. Case study It is R today. What is the probability of S tomorrow ? What is the probability of S, day after tomorrow ? For i’th state, what is the probability of j’th state, day after tomorrow ? After many days like 100 days ? Let the initial state is always R. Calculate states after 3, 10, 100 days. Let the initial state has equal chance. Calculate states after 3, 10, 100 days. If N, likely to S and R the next day, not N again. If R or S, 50% of the same the next day. If changed, only half of this change to N.

Markov Chain Remarks For a set of states S={s1,s2,…,sr}, Markov chain is a process that a new state sj is determined by current state si with probability pij. Note that the new state is not affected by the previous states. pij is called transition probabilities, denoted as P(j|i) = pij. The row vector, called probability vector (or distribution), sums to 1. pij(n) of the matrix Pn gives the probability that we have state sj, starting from state si, after n steps. As n→∞, Pn approaches limiting matrix W with all rows the same vector w. w is obtained by using wP=w or w(P-I)=0. in fact, w is left eigenvector of matrix P. Let u be arbitrary starting dist. Then probability distribution after n →∞ steps is uPn = w, which means that regardless of initial u, one gets always the same distribution w after many steps.

Markov Chain Monte Carlo Introduction MCMC is a strategy for generating samples x(i) by exploring the states using a Markov chain mechanism, such that the samples x(i) mimic samples drawn from the target distribution p(x). MCMC is used when it is not possible (or not computationally efficient) to sample q directly from p(q|y). Instead, sample iteratively such that at each step of the process we expect to draw from a distribution that becomes closer and closer to p(q|y). MCMC doesn’t require that p(q|y) be normalized. For a wide class of problems, this appears to be the easiest way to get reliable results.

Markov Chain Monte Carlo Forward process: Markov process Stochastic process x(i) is called Markov chain if Given a transition matrix T, find equilibrium distribution p. Example: Inverse process: MCMC process Given a distribution p, construct transition matrix T of Markov chain. Then we obtain after Markov process, the target distribution p.

Df In Markov simulation, sequences of simulation draws are created; each sequence is produced by starting at an arbitrary point , and drawing from a transition distribution . The transition probability distribution should be constructed so that the Markov chain converges to a stationary distribution which is the target of our concern. Recent survey places the Metropolis algorithm as one of the greatest ten algorithms of science and engineering development of 20th century. The algorithm is playing key role in sampling method known as MCMC. These algorithms have played a significant role in statistics, econometrics, physics and computing science over the last two decades.

Metropolis-Hastings algorithm Introduction Most popular MCMC method. Other MCMC algorithms are interpreted as special cases or extensions of this algorithm. MH Algorithm A proposal distribution q(x*|x) is introduced. The MH step is to sample a new candidate x* given the current x based on q(x*|x). Then the point is accepted with probability Otherwise it remains at x. By repeating this steps, one obtains in the end the samples of p(x).

Metropolis-Hastings algorithm Metropolis algorithm A proposal distribution q(x*|x) is symmetric w.r.t x* and x. Then the ratio is simplified. Example: normal pdf at x* with mean at x equals to the vice versa. Practice with matlab Generate samples of this distribution using a proposal pdf As the random walk progresses, the number of samples are increased, and the distribution converges to the target distribution.

Metropolis-Hastings algorithm Notes MH requires careful design of the proposal distribution Allowing asymmetric jump rules can increase the speed of random walk. There is matlab function for MH algorithm. mhsample(1,N,'pdf',p,'proprnd',qrnd,'proppdf',q); Variants of MCMC are Hybrid Monte Carlo Slice sampling Reversible jump MCMC

Simulated annealing for global optimization Algorithm Target distribution is objective function in optimization problem. After sampling via MH algorithm, the optimum solution is obtained as the mode of the simulated distribution, i.e., Using just original function is inefficient. Simulated annealing was developed to enhance speed of finding this, in which pdf value at iteration i is replaced by where Ti is decreasing cooling schedule Thus SA is just a minor modification of standard MCMC algorithms.

Simulated annealing for global optimization Algorithm

Gibbs sampler Algorithm Particular Markov chain useful for multidimensional problems, also called alternating conditional sampling. Let x=(x1,x2,…,xn). Each iteration of the Gibbs sampler cycles through the components, drawing each parameter conditional on the values of all the others. There are thus n steps in an iteration. Each parameter xj is updated conditional on the latest values of the other components which are the iteration i values for the already updated components and the iteration i-1 values for the others not updated yet.

Gibbs sampler Practice with matlab Generate samples of bivariate normal distribution with correlation r. To apply Gibbs, we need conditional distributions, using the properties of multivariate normal distribution ((A.1) or (A.2) on page 579) Gibbs sampling where sample for conditional pdf drawn by matlab function. Gibbs sampling where sample for conditional pdf drawn by MH algorithm.

MH Algorithm for two parameters Remark Gibbs sampler samples alternatively for each component of the parameter vector while MH algorithm samples altogether for the parameter vector. Algorithm is outlined via example.

Inference and convergence Remarks If iterations not proceeded long enough, result can be erroneous. Recommended simulation is to employ multiple sequences with starting points dispersed over the domain. Then monitor the variation between and within the sequence and proceed until the within variation equals the between variation. More detail is addressed in the textbook p296. Early iterations should be discarded. Usually the first half of each sequence is discarded. This discarding activity is referred as ‘burn-in’. One can apply several graphical and statistical tests to assess, roughly, if the chain has stabilized. In general, none of these tests provide entirely satisfactory diagnostics. See another comment on the convergence at Chap. 29 Monte Carlo Methods in David J.C. MacKay. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press.

Homework Problem For the 3.7 Example: Bioassay experiment, Perform Metropolis algorithm of two parameters to obtain samples of the posterior distribution. Draw scatter plot. Draw traces of each parameters and sequence of points in 2-D domain. Compute mean, 95% conf. intervals, variances and correlation of the two parameters. Try different initial point, different proposal density and compare the results 4.