Download presentation

Presentation is loading. Please wait.

Published byDerick Hutson Modified over 2 years ago

1
Nonparametric hidden Markov models Jurgen Van Gael and Zoubin Ghahramani

2
Introduction n HM models: time series with discrete hidden states n Infinite HM models (iHMM): nonparametric Bayesian approach n Equivalence between Polya urn and HDP interpretations for iHMM n Inference algorithms: collapsed Gibbs sampler, beam sampler n Use of iHMM: simple sequence labeling task

3
Introduction

4
From HMMs to Bayesian HMMs n An example of HMM: speech recognition u Hidden state sequence: phones u Observation: acoustic signals u Parameters , come from a physical model of speech / can be learned from recordings of speech n Computational questions u 1.( , , K) is given: apply Bayes rule to find posterior of hidden variables u Computation can be done by a dynamic programming called forward-backward algorithm u 2. K given, , not given: apply EM u 3.( , , K) is not given: penalizing, etc..

5
From HMMs to Bayesian HMMs n Fully Bayesian approach u Adding priors for , and extending full joint pdf as u Compute the marginal likelihood or evidence for comparing, choosing or averaging over different values of K. u Analytic computing of the marginal likelihood is intractable

6
From HMMs to Bayesian HMMs n Methods for dealing the intractability u MCMC 1: by estimating the marginal likelihood explicitly. Annealed importance sampling, Bridge sampling. Computationally expensive. u MCMC 2: by switching between different K values. Reversible jump MCMC u Approximation by using good state sequence: by independency of parameters and conjugacy between prior and likelihood under given hidden states, marginal likelihood can be computed analytically. u Variational Bayesian inference: by computing lower bound of the marginal likelihood and applying VB inference.

7
Infinite HMM – hierarchical Polya Urn n iHMM: Instead of defining K different HMMs, implicitly define a distribution over the number of visited states. n Polya Urn: u add a ball of new color: / ( + n i ). u add a ball of color i : n i / ( + n i ). u Nonparametric clustering scheme n Hierarchical Polya Urn: u Assume separate Urn(k) for each state k u At each time step t, select a ball from the corresponding Urn(k)_(t-1) u Interpretation of transition probability by the # of balls of color j in Urn color i: u Probability of drawing from oracle:

9
Infinite HMM – HDP

10
HDP and hierarchical Polya Urn

11
Inference n Gibbs sampler: O(KT 2 ) n Approximate Gibbs sampler: O(KT) n State sequence variables are strongly correlated slow mixing n Beam sampler as an auxiliary variable MCMC algorithm u Resamples the whole Markov chain at once u Hence suffers less from slow mixing

12
Inference – collapsed Gibbs sampler

13
n Sampling s t : u Conditional likelihood of y t : u Second factor: a draw from a Polya urn

14
Inference – collapsed Gibbs sampler

15
Inference – Beam sampler

16
n u Compute only for finitely many s t, s t-1 values. n

17
Inference – Beam sampler n Complexity: O(TK 2 ) when K states are presented n Remarks: auxiliary variables need not be sampled from uniform. Beta distribution could also be used to bias auxiliary variables close to the boundaries of

18
Example: unsupervised part-of–speech (PoS) tagging n PoS-tagging: annotating the words in a sentence with their appropriate part- of-speech tag u “ The man sat” ‘The’ : determiner, ‘man’: noun, ‘sat’: verb u HM model is commonly used F Observation: words F Hidden: unknown PoP-tag F Usually learned using a corpus of annotated sentences: building corpus is expensive u In iHMM F Multinomial likelihood is assumed F with base distribution H as symmetric Dirichlet so its conjugate to multinomial likelihood u Trained on section 0 of WSJ of Penn Treebank: 1917 sentences with total of 50282 word tokens (observations) and 7904 word types (dictionary size) u Initialize the sampler with 50 states with 50000 iterations

19
Example: unsupervised part-of–speech (PoS) tagging n Top 5 words for the five most common states u Top line: state ID and frequency u Rows: top 5 words with frequency in the sample u state 9: class of prepositions u State 12: determinants + possessive pronouns u State 8: punctuation + some coordinating conjunction u State 18: nouns u State 17: personal pronouns

20
Beyond the iHMM: input-output(IO) iHMM n MC affected by external factors u A robot is driving around in a room while taking pictures (room index picture) u If robot follows a particular policy, robots action can be integrated as an input to iHMM (IO-iHMM) u Three dimensional transition matrix:

21
Beyond the iHMM: sticky and block-diagonal iHMM n Weight on the diagonal of the transition matrix controls the frequency of state transitions n Probability of staying in state i for g times: n Sticky iHMM: by adding a prior probability mass to the diagonal of the transition matrix and applying a dynamic programming based inference n Appropriate for segmentation problems where the number of segments is not known a priori n To carry more weight for diagonal entry: u is a parameter for controlling the switching rate n Block-diagonal iHMM:for grouping of states u Sticky iHMM is a case for size 1 block u Larger blocks allow unsupervised clustering of states u Used for unsupervised learning of view-based object models from video data where each block corresponds to an object. u Intuition behind: Temporary contiguous video frames are more likely correspond to different views of the same objects than different objects n Hidden semi-Markov model u Assuming an explicit duration model for the time spent in a particular state

22
Beyond the iHMM: iHMM with Pitman-Yor base distribution n Frequency vs. rank of colors (on log-log scale) u DP is quite specific about distribution implied in the Polya Urn: colors that appear once or twice is very small u Pitman-Yor can be more specific about the tails u Pitman-Yor fits a power-law distribution (linear fitting in the plot) u Replace DP by Pitman-Yor in most cases u Helpful comments on beam sampler

23
Beyond the iHMM: autoregressive iHMM, SLD-iHMM n AR-iHMM: Observations follow auto-regressive dynamics n SLD-iHMM: part of the continuous variables are observed and the unobserved variables follow linear dynamics SLD model FA-HMM model

Similar presentations

OK

Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.

Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

One act play ppt on tv Ppt on accounting concepts and conventions Ppt on mars one candidates Interactive ppt on classification Ppt on phase sequence checker for three phase supply Ppt on id ego superego quiz Ppt on index numbers examples Ppt on real numbers for class 9th question Slides for ppt on wireless communication Ppt on william harvey experiments on blood circulation