Download presentation
Presentation is loading. Please wait.
1
. PGM: Tirgul 8 Markov Chains
2
Stochastic Sampling In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem: It is difficult to sample from P(X 1, …. X n |e ) u We had to use likelihood weighting to reweigh our samples u This introduced bias in estimation u In some case, such as when the evidence is on leaves, these methods are inefficient
3
MCMC Methods u We are going to discuss sampling methods that are based on Markov Chain l Markov Chain Monte Carlo (MCMC) methods u Key ideas: l Sampling process as a Markov Chain H Next sample depends on the previous one l These will approximate any posterior distribution u We start by reviewing key ideas from the theory of Markov chains
4
Markov Chains Suppose X 1, X 2, … take some set of values wlog. These values are 1, 2,... u A Markov chain is a process that corresponds to the network: u To quantify the chain, we need to specify Initial probability: P(X 1 ) Transition probability: P(X t+1 |X t ) u A Markov chain has stationary transition probability P(X t+1 |X t ) is the same for all times t X1X1 X2X2 X3X3 XnXn...
5
Irreducible Chains A state j is accessible from state i if there is an n such that P(X n = j | X 1 = i) > 0 There is a positive probability of reaching j from i after some number steps u A chain is irreducible if every state is accessible from every state
6
Ergodic Chains A state is positively recurrent if there is a finite expected time to get back to state i after being in state i If X has finite number of states, then this is suffices that i is accessible from itself u A chain is ergodic if it is irreducible and every state is positively recurrent
7
(A)periodic Chains A state i is periodic if there is an integer d such that P(X n = i | X 1 = i ) = 0 when n is not divisible by d u A chain is aperiodic if it contains no periodic state
8
Stationary Probabilities Thm: If a chain is ergodic and aperiodic, then the limit exists, and does not depend on i Moreover, let then, P * (X) is the unique probability satisfying
9
Stationary Probabilities The probability P * (X) is the stationary probability of the process u Regardless of the starting point, the process will converge to this probability u The rate of convergence depends on properties of the transition probability
10
Sampling from the stationary probability u This theory suggests how to sample from the stationary probability: Set X 1 = i, for some random/arbitrary i For t = 1, 2, …, n Sample a value x t+1 for X t+1 from P(X t+1 |X t =x t ) return x n If n is large enough, then this is a sample from P * (X)
11
Designing Markov Chains u How do we construct the right chain to sample from? l Ensuring aperiodicity and irreducibility is usually easy u Problem is ensuring the desired stationary probability
12
Designing Markov Chains Key tool: If the transition probability satisfies then, P * (X) = Q(X) u This gives a local criteria for checking that the chain will have the right stationary distribution
13
MCMC Methods We can use these results to sample from P(X 1,…,X n |e) Idea: Construct an ergodic & aperiodic Markov Chain such that P * (X 1,…,X n ) = P(X 1,…,X n |e) Simulate the chain n steps to get a sample
14
MCMC Methods Notes: u The Markov chain variable Y takes as value assignments to all variables that are consistent evidence u For simplicity, we will denote such a state using the vector of variables
15
Gibbs Sampler u One of the simplest MCMC method At each transition change the state of just on X i u We can describe the transition probability as a stochastic procedure: Input: a state x 1,…,x n Choose i at random (using uniform probability) Sample x’ i from P(X i |x 1, …, x i-1, x i+1,…, x n, e) let x’ j = x j for all j i return x’ 1,…,x’ n
16
Correctness of Gibbs Sampler u By chain rule P(x 1, …, x i-1, x i, x i+1,…, x n |e) = P(x 1, …, x i-1, x i+1,…, x n |e)P(x i |x 1, …, x i-1, x i+1,…, x n, e) u Thus, we get u Since we choose i from the same distribution at each stage, this procedure satisfies the ratio criteria
17
Gibbs Sampling for Bayesian Network u Why is the Gibbs sampler “easy” in BNs? u Recall that the Markov blanket of a variable separates it from the other variables in the network l P(X i | X 1,…,X i-1,X i+1,…,X n ) = P(X i | Mb i ) This property allows us to use local computations to perform sampling in each transition
18
Gibbs Sampling in Bayesian Networks How do we evaluate P(X i | x 1,…,x i-1,x i+1,…,x n ) ? Let Y 1, …, Y k be the children of X i By definition of Mb i, the parents of Y j are in Mb i {X i } u It is easy to show that
19
Sampling Strategy u How do we collect the samples? Strategy I: Run the chain M times, each run for N steps l each run starts from a different state points u Return the last state in each run M chains
20
Sampling Strategy Strategy II: u Run one chain for a long time u After some “burn in” period, sample points every some fixed number of steps “burn in” M samples from one chain
21
Comparing Strategies Strategy I: l Better chance of “covering” the space of points especially if the chain is slow to reach stationarity l Have to perform “burn in” steps for each chain Strategy II: l Perform “burn in” only once l Samples might be correlated (although only weakly) Hybrid strategy: l run several chains, and sample few samples from each l Combines benefits of both strategies
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.