Presentation is loading. Please wait.

Presentation is loading. Please wait.

Beam Sampling for the Infinite Hidden Markov Model by Jurgen Van Gael, Yunus Saatic, Yee Whye Teh and Zoubin Ghahramani (ICML 2008) Presented by Lihan.

Similar presentations


Presentation on theme: "Beam Sampling for the Infinite Hidden Markov Model by Jurgen Van Gael, Yunus Saatic, Yee Whye Teh and Zoubin Ghahramani (ICML 2008) Presented by Lihan."— Presentation transcript:

1 Beam Sampling for the Infinite Hidden Markov Model by Jurgen Van Gael, Yunus Saatic, Yee Whye Teh and Zoubin Ghahramani (ICML 2008) Presented by Lihan He ECE, Duke University Nov 14, 2008

2 Introduction Infinite HMM Beam sampler Experimental results Conclusion Outline 2/14

3 Introduction:HMM HMM: hidden Markov model 3/14 s0s0 s1s1 s2s2 sTsT y1y1 y2y2 yTyT … π0π0 π  Model parameters  Hidden state sequence s={s 1, s 2, …, s T },  Observation sequence y={y 1, y 2, …, y T }  π 0i = p(s 1 =i)  π ij = p(s t =j|s t-1 =i)  Complete likelihood Number of states

4 4/14 Introduction:HMM Inference Inference of HMM: forward-backward algorithm  Maximum likelihood: overfitting problem  Bayesian learning: VB or MCMC If we don’t know K a priori  Model selection: inference for all K; computationally expensive.  Nonparametric Bayesian model: iHMM (HMM with an infinite number of states) With iHMM framework  The forward-backward algorithm cannot be applied since the number of states K is infinite.  Gibbs sampling can be used, but convergence is very slow due to the strong dependencies between consecutive time steps.

5 Beam sampling = slice sampling + dynamic programming 5/14 Introduction:Beam Sampling  Slice sampling: limit the number of states considered at each time step to a finite number  Dynamic programming: sample whole state trajectory efficiently Advantages:  Converges in much fewer iterations than Gibbs sampling  Actual complexity per iteration is only marginally more than the Gibbs sampling  Mixes well regardless of strong correlations in the data  More robust with respect to varying initialization and prior distribution

6 Implemented via HDP 6/14 Infinite HMM In the stick-breaking representation Infinite hidden Markov model Transition probability Emission distribution parameter

7 7/14 Beam Sampler Intuitive thought: only consider the states with large transition probabilities so that the number of possible states in each time step is finite.  Approximation  How to define “large transition probability”?  Might change distributions of other variables Idea: introduce auxiliary variable u such that conditioned on u the number of trajectories with positive probability is finite.  The auxiliary variables do not change the marginal distribution over other variables so MCMC sampling will converge to true posterior

8 8/14 Beam Sampler Sampling u: for each t we introduce an auxiliary variable u t with conditional distribution (conditional on π, s t-1 and s t ) Sampling s: we sample the whole trajectory s given u and other variables using a form of forward filtering-backward sampling.  Forward filtering: compute  Backward sampling: sample s t sequentially for t = T, T-1, …, 2, 1 sequentially for t = 1, 2, …, T Only trajectories s with for all t will have non-zero probability given u

9 9/14 Beam Sampler Computing p(s t |- ) only needs to sum up a finite part of p(s t-1 |-) We only need to compute p(s t |y 1:t, u 1:t ) for the finitely many s t values belonging to some trajectory with positive probability. Forward filtering Backward sampling  Sample s T from  Sample s t given the sample for s t+1 : Sampling φ, π, β: directly from the theory of HDPs

10 10/14 Experiments Toy example 1: examining convergence speed & sensitivity of prior setting Transition: 1-2-3-4-1-2-3-…, p=0.01 self-transition Observation: discrete HMM: Strong / vague / fixed prior settings for α and γ # states summed up

11 11/14 Experiments Toy example 2: examining performance for positive correlation data Self transition =

12 12/14 Experiments Real example 1: changepoint detection (Well data) State partition from one beam sampling iteration Probability that two datapoints are in one segment Gibbs sampling: slow convergence harder decision Beam sampling: fast convergence softer decision

13 13/14 Experiments Real example 2: text prediction (Alice’s Adventures in Wonderland) iHMM by Gibbs sampling & beam sampling: have similar results; converge to around K=16 states. VB HMM: model selection: around K=16 worse than iHMM

14 14/14 Conclusion  The beam sampler is introduced for the iHMM inference  Beam sampler combines slice sampling and dynamic programming  Slice sampling limits the number of states considered at each time step to a finite number  Dynamic programming samples whole hidden state trajectories efficiently  Advantages of beam sampler: converges faster than Gibbs sampler mixes well regardless of strong correlations in the data more robust with respect to varying initialization and prior distribution


Download ppt "Beam Sampling for the Infinite Hidden Markov Model by Jurgen Van Gael, Yunus Saatic, Yee Whye Teh and Zoubin Ghahramani (ICML 2008) Presented by Lihan."

Similar presentations


Ads by Google