Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, 2003. R. M. Neal, Probabilistic.

Slides:



Advertisements
Similar presentations
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Advertisements

Bayesian Estimation in MARK
Inspiral Parameter Estimation via Markov Chain Monte Carlo (MCMC) Methods Nelson Christensen Carleton College LIGO-G Z.
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
Markov-Chain Monte Carlo
Computer Vision Lab. SNU Young Ki Baik An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)
Bayesian Methods with Monte Carlo Markov Chains III
Markov Networks.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Bayesian statistics – MCMC techniques
Suggested readings Historical notes Markov chains MCMC details
Stochastic approximate inference Kay H. Brodersen Computational Neuroeconomics Group Department of Economics University of Zurich Machine Learning and.
BAYESIAN INFERENCE Sampling techniques
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short.
Today Introduction to MCMC Particle filters and MCMC
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Bayes Factor Based on Han and Carlin (2001, JASA).
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,
Object Tracking using Particle Filter
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Random Numbers and Simulation  Generating truly random numbers is not possible Programs have been developed to generate pseudo-random numbers Programs.
Bayesian Inversion of Stokes Profiles A.Asensio Ramos (IAC) M. J. Martínez González (LERMA) J. A. Rubiño Martín (IAC) Beaulieu Workshop ( Beaulieu sur.
Monte Carlo Methods1 T Special Course In Information Science II Tomas Ukkonen
Style & Topic Language Model Adaptation Using HMM-LDA Bo-June (Paul) Hsu, James Glass.
Markov Random Fields Probabilistic Models for Images
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Randomized Algorithms for Bayesian Hierarchical Clustering
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
Improved Cross Entropy Method For Estimation Presented by: Alex & Yanna.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,
Lecture 2: Statistical learning primer for biologists
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
Expectation-Maximization (EM) Algorithm & Monte Carlo Sampling for Inference and Approximation.
Reducing MCMC Computational Cost With a Two Layered Bayesian Approach
CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Bayesian Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
An Iterative Monte Carlo Method for Nonconjugate Bayesian Analysis B. P. Carlin and A. E. Gelfand Statistics and Computing 1991 A Generic Approach to Posterior.
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
Probabilistic Reasoning Inference and Relational Bayesian Networks.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
How many iterations in the Gibbs sampler? Adrian E. Raftery and Steven Lewis (September, 1991) Duke University Machine Learning Group Presented by Iulian.
10 October, 2007 University of Glasgow 1 EM Algorithm with Markov Chain Monte Carlo Method for Bayesian Image Analysis Kazuyuki Tanaka Graduate School.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Markov Chain Monte Carlo in R
Introduction to Sampling based inference and MCMC
Random Numbers and Simulation
Reducing Photometric Redshift Uncertainties Through Galaxy Clustering
Advanced Statistical Computing Fall 2016
Basic simulation methodology
Markov Networks.
Latent Dirichlet Analysis
Ch13 Empirical Methods.
Bayesian Inference for Mixture Language Models
Class #19 – Tuesday, November 3
Junghoo “John” Cho UCLA
Graduate School of Information Sciences, Tohoku University
Markov Networks.
Presentation transcript:

Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic Inference Using Markov Chain Monte Carlo Methods, Markov A.A., "Extension of the limit theorems of probability theory to a sum of variables connected in a chain," John Wiley and Sons, 1971 Mar 24, 2015 Hee-Gook Jun

2 / 15 Outline  Markov Chain  Monte Carlo Method  Markov Chain Monte Carlo  Gibbs Sampling

3 / 15 Markov Chain  Markov chain – Stochastic model to estimate a set of progress  Random walk – Path that consists of a succession of random steps – One-dimensional random walk = Markov chain

4 / 15  E.g., Deterministic system  Markov Chain is non-deterministic system – Memorylessness and random state-transition model – Assumption: Next state depends only on the current state Markov Chain (Cont.) springsummerfallwinter

5 / 15 Monte Carlo Method  Simulation method – Based on the random variable  Rely on repeated random sampling – Obtain numerical results  Process – Define a domain of possible inputs – Generate inputs randomly from a probability distribution over the domain – Perform a deterministic computation on the inputs – Aggregate the results

6 / 15 Markov Chain Monte Carlo  Sampling from a probability distribution – Based on constructing a Markov Chain  The state of the chain after a number of steps – Used a s a sample of the desired distribution  The number of steps – Improves the quality of the sample

7 / 15 MCMC Example: 1 Chain

8 / 15 MCMC Example: 2 Chains

9 / 15 Markov Chain Monte Carlo (Cont.)  Used for approximating a multi-dimensional integral  Look for a place with a reasonably high contribution – to the integral to move into next.  Random walk Monte Carlo Methods – Metropolis-Hastings algorithm  Generate a random walk using a proposal density  Rejecting some of the proposed moves – Gibbs sampling  Requires all the conditional distributions of the target distributions to be sampled exactly  Do not require any tuning

10 / 15 Gibbs Sampling  MCMC algorithm for obtaining a sequence of observations – approximated from a specified multivariate probability distribution  Commonly used – as a means of statistical inference (Especially Bayesian inference)  Generate a Markov chain of samples  Samples from the beginning of the chain – May not accurately represent the desired distribution

11 / 15 Gibbs Sampling Example: : Normal Dist. Estimation [1/2]  X (as Input data): 10, 13, 15, 11, 9, 18, 20, 17, 23,21 1 st test 2 nd test

12 / 15 Gibbs Sampling Example: : Normal Dist. Estimation [2/2]  X (as Input data): 10, 13, 15, 11, 9, 18, 20, 17, 23,21

13 / 15 Bayesian inference  Posterior probability – Consequence of two antecedents  Prior probability  Likelihood function posteriorlikelihood x prior

14 / 15 Bayesian inference using Gibbs Sampling M < # length of chain burn < # burn-in length n <- 16 a <- 2 b <- 4 k <- 10 X <- matrix(nrow=M) th <- rbeta(1,1,1) X[1] <- rbinom(1, n, th) for(i in 2:M){ thTmp <- rbeta(1, X[i-1]+a, n-X[i-1]+b) X[i] <- rbinom(1, n, thTmp) } X # Chain 에서 처음 1000 개 관측 값 제거 x <- X[burn:M, ] Gibbs <- table(factor(x, levels=c(0:16))) barplot(Gibbs)

15 / 15 Gibbs Sampling: Bayesian inference in LDA  The original paper (by David Blei) – Used a variational Bayes approximation of the posterior distribution  Alternative inference techniques use – Gibbs sampling and expectation propagation  EM algorithm is used in PLSA – Good for computing parameters  MCMC is used in LDA – MCMC is better than EM  When a problem has too many parameters to compute