Lecture #9: Introduction to Markov Chain Monte Carlo, part 3

Slides:



Advertisements
Similar presentations
Bayesian Estimation in MARK
Advertisements

Introduction of Markov Chain Monte Carlo Jeongkyun Lee.
Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
Markov Chains Modified by Longin Jan Latecki
Computer Vision Lab. SNU Young Ki Baik An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)
Markov Chains 1.
Markov Chain Monte Carlo Prof. David Page transcribed by Matthew G. Lee.
11 - Markov Chains Jim Vallandingham.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Image Parsing: Unifying Segmentation and Detection Z. Tu, X. Chen, A.L. Yuille and S-C. Hz ICCV 2003 (Marr Prize) & IJCV 2005 Sanketh Shetty.
Bayesian statistics – MCMC techniques
Suggested readings Historical notes Markov chains MCMC details
Stochastic approximate inference Kay H. Brodersen Computational Neuroeconomics Group Department of Economics University of Zurich Machine Learning and.
BAYESIAN INFERENCE Sampling techniques
Exact Inference (Last Class) variable elimination  polytrees (directed graph with at most one undirected path between any two vertices; subset of DAGs)
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
Today Introduction to MCMC Particle filters and MCMC
Announcements Homework 8 is out Final Contest (Optional)
Approximate Inference 2: Monte Carlo Markov Chain
Introduction to Monte Carlo Methods D.J.C. Mackay.
Priors, Normal Models, Computing Posteriors
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
CS 312: Algorithm Design & Analysis Lecture #12: Average Case Analysis of Quicksort This work is licensed under a Creative Commons Attribution-Share Alike.
Markov Chain Monte Carlo and Gibbs Sampling Vasileios Hatzivassiloglou University of Texas at Dallas.
Integrating Topics and Syntax -Thomas L
Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
Bayesian Reasoning: Tempering & Sampling A/Prof Geraint F. Lewis Rm 560:
Exact Inference (Last Class) Variable elimination  polytrees (directed graph with at most one undirected path between any two vertices; subset of DAGs)
1 Dirichlet Process Mixtures A gentle tutorial Graphical Models – Khalid El-Arini Carnegie Mellon University November 6 th, 2006 TexPoint fonts used.
Numerical Bayesian Techniques. outline How to evaluate Bayes integrals? Numerical integration Monte Carlo integration Importance sampling Metropolis algorithm.
Markov Chain Monte Carlo Prof. David Page transcribed by Matthew G. Lee.
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
CSE 517 Natural Language Processing Winter 2015
CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Introduction: Metropolis-Hasting Sampler Purpose--To draw samples from a probability distribution There are three steps 1Propose a move from x to y 2Accept.
Markov-Chain-Monte-Carlo (MCMC) & The Metropolis-Hastings Algorithm P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/19/2016:
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
Daphne Koller Sampling Methods Metropolis- Hastings Algorithm Probabilistic Graphical Models Inference.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 1 Computational Statistics with Application to Bioinformatics Prof. William.
The Monte Carlo Method/ Markov Chains/ Metropolitan Algorithm from sec in “Adaptive Cooperative Systems” -summarized by Jinsan Yang.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Introduction to Sampling based inference and MCMC
MCMC Output & Metropolis-Hastings Algorithm Part I
Markov Chain Monte Carlo methods --the final project of stat 6213
GEOGG121: Methods Monte Carlo methods, revision
Advanced Statistical Computing Fall 2016
Bayesian inference Presented by Amir Hadadi
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
CAP 5636 – Advanced Artificial Intelligence
Latent Dirichlet Analysis
Bayesian Inference for Mixture Language Models
CS 188: Artificial Intelligence
Markov Chain Monte Carlo
Lecture 15 Sampling.
Junghoo “John” Cho UCLA
Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics
Markov Networks.
Presentation transcript:

Lecture #9: Introduction to Markov Chain Monte Carlo, part 3 This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. CS 679: Text Mining Lecture #9: Introduction to Markov Chain Monte Carlo, part 3 Slide Credit: Teg Grenager of Stanford University, with adaptations and improvements by Eric Ringger.

Announcements Assignment #4 Assignment #5 Prepare and give lecture on your 2 favorite papers Starting in approx. 1.5 weeks Assignment #5 Pre-proposal Answer the 5 Questions!

Where are we? Joint distributions are useful for answering questions (joint or conditional) Directed graphical models represent joint distributions Hierarchical bayesian models: make parameters explicit We want to ask conditional questions on our models E.g., posterior distributions over latent variables given large collections of data Simultaneously inferring values of model parameters and answering the conditional questions: “posterior inference” Posterior inference is a challenging computational problem Sampling is an efficient mechanism for performing that inference. Variety of MCMC methods: random walk on a carefully constructed markov chain Convergence to desirable stationary distribution

Agenda Motivation The Monte Carlo Principle Markov Chain Monte Carlo Metropolis Hastings Gibbs Sampling Advanced Topics

Metropolis-Hastings The symmetry requirement of the Metropolis proposal distribution can be hard to satisfy Metropolis-Hastings is the natural generalization of the Metropolis algorithm, and the most popular MCMC algorithm Choose a proposal distribution which is not necessarily symmetric Define a Markov chain with the following process: Sample a candidate point x* from a proposal distribution q(x*|x(t)) Compute the importance ratio: With probability min(r,1) transition to x*, otherwise stay in the same state x(t)

MH convergence Theorem: The Metropolis-Hastings algorithm converges to the target distribution p(x). Proof: For all 𝑥,𝑦𝑿, WLOG assume 𝑝(𝑥)𝑞(𝑦|𝑥)  𝑝(𝑦)𝑞(𝑥|𝑦) Thus, it satisfies detailed balance candidate is always accepted (i.e., 𝑇(𝑥,𝑦)=𝑝(𝑥)𝑞(𝑦|𝑥)) b/c 𝑝(𝑥)𝑞(𝑦|𝑥)  𝑝(𝑦)𝑞(𝑥|𝑦) multiply by 1 commute transition prob. 𝑇(𝑦,𝑥)=𝑞(𝑥|𝑦)𝑝(𝑥)𝑞(𝑦|𝑥) /(𝑝(𝑦)𝑞(𝑥|𝑦))

Gibbs sampling A special case of Metropolis-Hastings which is applicable to state spaces in which we have a factored state space And access to the full (“complete”) conditionals: Perfect for Bayesian networks! Idea: To transition from one state (variable assignment) to another, Pick a variable, Sample its value from the conditional distribution That’s it! We’ll show in a minute why this is an instance of MH and thus must be sampling from joint or conditional distribution we wanted.

Markov blanket Recall that Bayesian networks encode a factored representation of the joint distribution Variables are independent of their non-descendents given their parents Variables are independent of everything else in the network given their Markov blanket! 𝑃 𝐵 𝑣𝑎𝑙𝑢𝑒𝑠 𝑜𝑓 𝑅, 𝑣𝑎𝑙𝑢𝑒𝑠 𝑜𝑓 𝐺 =𝑃 𝐵 𝑣𝑎𝑙𝑢𝑒𝑠 𝑜𝑓 𝐺) So, to sample each node, we only need to condition on its Markov blanket: 𝑃 𝑥 𝑗 𝑥 =𝑃 𝑥 𝑗 𝑥 −𝑗 =𝑃 𝑥 𝑗 𝑀𝐵 𝑥 𝑗

Gibbs sampling More formally, the proposal distribution is The importance ratio is So we always accept! if 𝑥 −𝑗 ∗ = 𝑥 −𝑗 𝑡 otherwise Dfn of proposal distribution and 𝑥 −𝑗 ∗ = 𝑥 −𝑗 𝑡 Dfn of conditional probability (twice) Definition of 𝑥 ∗ 𝑎𝑛𝑑 𝑥 𝑡 Cancel common terms

Gibbs sampling example Consider a simple, 2 variable Bayes net Initialize randomly Sample variables alternately A=1 A=0 0.5 F 1 T T F 1 T F 1 A B F T b b a a B=1 B=0 A=1 0.8 0.2 A=0 T 1 T

Gibbs Sampling (in 2-D) (MacKay, 2002)

Practical issues How many iterations? How to know when to stop? M-H: What’s a good proposal function? Gibbs: How to derive the complete conditionals?

Advanced Topics Simulated annealing, for global optimization, is a form of MCMC Mixtures of MCMC transition functions Monte Carlo EM stochastic E-step i.e., sample instead of computing full posterior Reversible jump MCMC for model selection Adaptive proposal distributions

Next Document Clustering by Gibbs Sampling on a Mixture of Multinomials From Dan Walker’s dissertation!