. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:

Slides:

Advertisements

Similar presentations

. Markov Chains. 2 Dependencies along the genome In previous classes we assumed every letter in a sequence is sampled randomly from some distribution.

Advertisements

Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.

CS433 Modeling and Simulation Lecture 06 – Part 03 Discrete Markov Chains Dr. Anis Koubâa 12 Apr 2009 Al-Imam Mohammad Ibn Saud University.

Introduction of Markov Chain Monte Carlo Jeongkyun Lee.

Gibbs Sampling Qianji Zheng Oct. 5th, 2010.

Markov Chains Modified by Longin Jan Latecki

1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.

Bayesian Methods with Monte Carlo Markov Chains III

Markov Chains 1.

Topics Review of DTMC Classification of states Economic analysis

Markov Chain Monte Carlo Prof. David Page transcribed by Matthew G. Lee.

11 - Markov Chains Jim Vallandingham.

The Rate of Concentration of the stationary distribution of a Markov Chain on the Homogenous Populations. Boris Mitavskiy and Jonathan Rowe School of Computer.

CHAPTER 16 MARKOV CHAIN MONTE CARLO

Андрей Андреевич Марков. Markov Chains Graduate Seminar in Applied Statistics Presented by Matthias Theubert Never look behind you…

1 Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University

Bayesian statistics – MCMC techniques

Exact Inference (Last Class) variable elimination  polytrees (directed graph with at most one undirected path between any two vertices; subset of DAGs)

Markov Chains Lecture #5

CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov

Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.

Machine Learning CUNY Graduate Center Lecture 7b: Sampling.

1 Markov Chains Algorithms in Computational Biology Spring 2006 Slides were edited by Itai Sharon from Dan Geiger and Ydo Wexler.

CS 561, Session 29 1 Belief networks Conditional independence Syntax and semantics Exact inference Approximate inference.

Genome evolution: a sequence-centric approach Lecture 4: Beyond Trees. Inference by sampling Pre-lecture draft – update your copy after the lecture!

. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.

INDR 343 Problem Session

Problems, cont. 3. where k=0?. When are there stationary distributions? Theorem: An irreducible chain has a stationary distribution  iff the states are.

. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.

. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.

Monte Carlo Methods in Partial Differential Equations.

Approximate Inference 2: Monte Carlo Markov Chain

Introduction to Monte Carlo Methods D.J.C. Mackay.

6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.

Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.

Monte Carlo Simulation CWR 6536 Stochastic Subsurface Hydrology.

Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.

Markov Chain Monte Carlo and Gibbs Sampling Vasileios Hatzivassiloglou University of Texas at Dallas.

Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.

Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.

Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.

Markov Chains and Random Walks. Def: A stochastic process X={X(t),t ∈ T} is a collection of random variables. If T is a countable set, say T={0,1,2, …

Exact Inference (Last Class) Variable elimination  polytrees (directed graph with at most one undirected path between any two vertices; subset of DAGs)

The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)

Markov Chain Monte Carlo Prof. David Page transcribed by Matthew G. Lee.

The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.

An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.

Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,

CS 188: Artificial Intelligence Bayes Nets: Approximate Inference Instructor: Stuart Russell--- University of California, Berkeley.

The Markov Chain Monte Carlo Method Isabelle Stanton May 8, 2008 Theory Lunch.

CS774. Markov Random Field : Theory and Application Lecture 15 Kyomin Jung KAIST Oct

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

11. Markov Chains (MCs) 2 Courtesy of J. Bard, L. Page, and J. Heyl.

Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.

UIUC CS 497: Section EA Lecture #7 Reasoning in Artificial Intelligence Professor: Eyal Amir Spring Semester 2004 (Based on slides by Gal Elidan (Hebrew.

Random Sampling Algorithms with Applications Kyomin Jung KAIST Aug ERC Workshop.

From DeGroot & Schervish. Example Occupied Telephone Lines Suppose that a certain business office has five telephone lines and that any number of these.

How many iterations in the Gibbs sampler? Adrian E. Raftery and Steven Lewis (September, 1991) Duke University Machine Learning Group Presented by Iulian.

CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.

The Monte Carlo Method/ Markov Chains/ Metropolitan Algorithm from sec in “Adaptive Cooperative Systems” -summarized by Jinsan Yang.

Markov Chains and Random Walks

Advanced Statistical Computing Fall 2016

Markov chain monte carlo

Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.

Markov Networks.

Markov Chain Monte Carlo

Instructors: Fei Fang (This Lecture) and Dave Touretzky

Approximate Inference by Sampling

Approximate Inference: Particle-Based Methods

Presentation transcript:

. PGM: Tirgul 8 Markov Chains

Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem: It is difficult to sample from P(X 1, …. X n |e ) u We had to use likelihood weighting to reweigh our samples u This introduced bias in estimation u In some case, such as when the evidence is on leaves, these methods are inefficient

MCMC Methods u We are going to discuss sampling methods that are based on Markov Chain l Markov Chain Monte Carlo (MCMC) methods u Key ideas: l Sampling process as a Markov Chain H Next sample depends on the previous one l These will approximate any posterior distribution u We start by reviewing key ideas from the theory of Markov chains

Markov Chains  Suppose X 1, X 2, … take some set of values wlog. These values are 1, 2,... u A Markov chain is a process that corresponds to the network: u To quantify the chain, we need to specify Initial probability: P(X 1 ) Transition probability: P(X t+1 |X t ) u A Markov chain has stationary transition probability P(X t+1 |X t ) is the same for all times t X1X1 X2X2 X3X3 XnXn...

Irreducible Chains  A state j is accessible from state i if there is an n such that P(X n = j | X 1 = i) > 0 There is a positive probability of reaching j from i after some number steps u A chain is irreducible if every state is accessible from every state

Ergodic Chains  A state is positively recurrent if there is a finite expected time to get back to state i after being in state i If X has finite number of states, then this is suffices that i is accessible from itself u A chain is ergodic if it is irreducible and every state is positively recurrent

(A)periodic Chains  A state i is periodic if there is an integer d such that P(X n = i | X 1 = i ) = 0 when n is not divisible by d u A chain is aperiodic if it contains no periodic state

Stationary Probabilities Thm:  If a chain is ergodic and aperiodic, then the limit exists, and does not depend on i  Moreover, let then, P * (X) is the unique probability satisfying

Stationary Probabilities  The probability P * (X) is the stationary probability of the process u Regardless of the starting point, the process will converge to this probability u The rate of convergence depends on properties of the transition probability

Sampling from the stationary probability u This theory suggests how to sample from the stationary probability: Set X 1 = i, for some random/arbitrary i For t = 1, 2, …, n  Sample a value x t+1 for X t+1 from P(X t+1 |X t =x t ) return x n  If n is large enough, then this is a sample from P * (X)

Designing Markov Chains u How do we construct the right chain to sample from? l Ensuring aperiodicity and irreducibility is usually easy u Problem is ensuring the desired stationary probability

Designing Markov Chains Key tool:  If the transition probability satisfies then, P * (X) = Q(X) u This gives a local criteria for checking that the chain will have the right stationary distribution

MCMC Methods  We can use these results to sample from P(X 1,…,X n |e) Idea:  Construct an ergodic & aperiodic Markov Chain such that P * (X 1,…,X n ) = P(X 1,…,X n |e)  Simulate the chain n steps to get a sample

MCMC Methods Notes: u The Markov chain variable Y takes as value assignments to all variables that are consistent evidence u For simplicity, we will denote such a state using the vector of variables

Gibbs Sampler u One of the simplest MCMC method  At each transition change the state of just on X i u We can describe the transition probability as a stochastic procedure: Input: a state x 1,…,x n Choose i at random (using uniform probability) Sample x’ i from P(X i |x 1, …, x i-1, x i+1,…, x n, e) let x’ j = x j for all j  i return x’ 1,…,x’ n

Correctness of Gibbs Sampler u By chain rule P(x 1, …, x i-1, x i, x i+1,…, x n |e) = P(x 1, …, x i-1, x i+1,…, x n |e)P(x i |x 1, …, x i-1, x i+1,…, x n, e) u Thus, we get u Since we choose i from the same distribution at each stage, this procedure satisfies the ratio criteria

Gibbs Sampling for Bayesian Network u Why is the Gibbs sampler “easy” in BNs? u Recall that the Markov blanket of a variable separates it from the other variables in the network l P(X i | X 1,…,X i-1,X i+1,…,X n ) = P(X i | Mb i )  This property allows us to use local computations to perform sampling in each transition

Gibbs Sampling in Bayesian Networks  How do we evaluate P(X i | x 1,…,x i-1,x i+1,…,x n ) ?  Let Y 1, …, Y k be the children of X i By definition of Mb i, the parents of Y j are in Mb i  {X i } u It is easy to show that

Sampling Strategy u How do we collect the samples? Strategy I:  Run the chain M times, each run for N steps l each run starts from a different state points u Return the last state in each run M chains

Sampling Strategy Strategy II: u Run one chain for a long time u After some “burn in” period, sample points every some fixed number of steps “burn in” M samples from one chain

Comparing Strategies Strategy I: l Better chance of “covering” the space of points especially if the chain is slow to reach stationarity l Have to perform “burn in” steps for each chain Strategy II: l Perform “burn in” only once l Samples might be correlated (although only weakly) Hybrid strategy: l run several chains, and sample few samples from each l Combines benefits of both strategies