Presentation is loading. Please wait.

Presentation is loading. Please wait.

HMM - Part 2 The EM algorithm Continuous density HMM.

Similar presentations


Presentation on theme: "HMM - Part 2 The EM algorithm Continuous density HMM."— Presentation transcript:

1 HMM - Part 2 The EM algorithm Continuous density HMM

2 The EM Algorithm EM: Expectation Maximization Why EM?
Simple optimization algorithms for likelihood functions rely on the intermediate variables, called latent data For HMM, the state sequence is the latent data Direct access to the data necessary to estimate the parameters is impossible or difficult For HMM, it is almost impossible to estimate (A, B, ) without considering the state sequence Two Major Steps : E step: computes an expectation of the likelihood by including the latent variables as if they were observed M step: computes the maximum likelihood estimates of the parameters by maximizing the expected likelihood found in the E step

3 Three Steps for EM Step 1. Draw a lower bound
Use the Jensen’s inequality Step 2. Find the best lower bound  auxiliary function Let the lower bound touch the objective function at the current guess Step 3. Maximize the auxiliary function Obtain the new guess Go to Step 2 until converge [Minka 1998]

4 Form an Initial Guess of =(A,B,)
objective function Given the current guess , the goal is to find a new guess such that current guess

5 Step 1. Draw a Lower Bound objective function lower bound function

6 Step 2. Find the Best Lower Bound
objective function auxiliary function lower bound function

7 Step 3. Maximize the Auxiliary Function
objective function auxiliary function

8 Update the Model objective function

9 Step 2. Find the Best Lower Bound
objective function auxiliary function

10 Step 3. Maximize the Auxiliary Function
objective function

11 Step 1. Draw a Lower Bound (cont’d)
Objective function If f is a concave function, and X is a r.v., then E[f(X)]≤ f(E[X]) Apply Jensen’s Inequality The lower bound function of

12 Step 2. Find the Best Lower Bound (cont’d)
Find that makes the lower bound function touch the objective function at the current guess

13 Step 2. Find the Best Lower Bound (cont’d)
Take the derivative w.r.t Set it to zero

14 Step 2. Find the Best Lower Bound (cont’d)
Define We can check Q function

15 EM for HMM Training Basic idea
Assume we have  and the probability that each Q occurred in the generation of O i.e., we have in fact observed a complete data pair (O,Q) with frequency proportional to the probability P(O,Q|) We then find a new that maximizes It can be guaranteed that EM can discover parameters of model  to maximize the log-likelihood of the incomplete data, logP(O|), by iteratively maximizing the expectation of the log-likelihood of the complete data, logP(O,Q|) Expectation

16 Solution to Problem 3 - The EM Algorithm
The auxiliary function where and can be expressed as

17 Solution to Problem 3 - The EM Algorithm (cont’d)
The auxiliary function can be rewritten as example wi yi wj yj wk yk

18 Solution to Problem 3 - The EM Algorithm (cont’d)
The auxiliary function is separated into three independent terms, each respectively corresponds to , , and Maximization procedure on can be done by maximizing the individual terms separately subject to probability constraints All these terms have the following form

19 Solution to Problem 3 - The EM Algorithm (cont’d)
Proof: Apply Lagrange Multiplier Constraint

20 Solution to Problem 3 - The EM Algorithm (cont’d)
wi yi

21 Solution to Problem 3 - The EM Algorithm (cont’d)
wj yj

22 Solution to Problem 3 - The EM Algorithm (cont’d)
wk yk

23 Solution to Problem 3 - The EM Algorithm (cont’d)
The new model parameter set can be expressed as:

24 Discrete vs. Continuous Density HMMs
Two major types of HMMs according to the observations Discrete and finite observation: The observations that all distinct states generate are finite in number, i.e., V={v1, v2, v3, ……, vM}, vkRL In this case, the observation probability distribution in state j, B={bj(k)}, is defined as bj(k)=P(ot=vk|qt=j), 1kM, 1jN ot : observation at time t, qt : state at time t  bj(k) consists of only M probability values Continuous and infinite observation: The observations that all distinct states generate are infinite and continuous, i.e., V={v| vRL} In this case, the observation probability distribution in state j, B={bj(v)}, is defined as bj(v)=f(ot=v|qt=j), 1jN ot : observation at time t, qt : state at time t  bj(v) is a continuous probability density function (pdf) and is often a mixture of Multivariate Gaussian (Normal) Distributions

25 Gaussian Distribution
A continuous random variable X is said to have a Gaussian distribution with mean μand variance σ2(σ>0) if X has a continuous pdf in the following form:

26 Multivariate Gaussian Distribution
If X=(X1,X2,X3,…,XL) is an L-dimensional random vector with a multivariate Gaussian distribution with mean vector  and covariance matrix , then the pdf can be expressed as If X1,X2,X3,…,XL are independent random variables, the covariance matrix is reduced to diagonal, i.e.,

27 Multivariate Mixture Gaussian Distribution
An L-dimensional random vector X=(X1,X2,X3,…,XL) is with a multivariate mixture Gaussian distribution if In CDHMM, bj(v) is a continuous probability density function (pdf) and is often a mixture of multivariate Gaussian distributions Covariance matrix of the kth mixture of the jth state Mean vector Observation vector

28 Solution to Problem 3 – The Segmental K-means Algorithm
Assume that we have a training set of observations and an initial estimate of model parameters Step 1 : Segment the training data The set of training observation sequences is segmented into states, based on the current model, by Viterbi Algorithm Step 2 : Re-estimate the model parameters Step 3: Evaluate the model If the difference between the new and current model scores exceeds a threshold, go back to Step 1; otherwise, return

29 Solution to Problem 3 – The Segmental K-means Algorithm (cont’d)
3 states and 4 Gaussian mixtures per state State s3 s3 s3 s3 s3 s3 s3 s3 s3 s2 s2 s2 s2 s2 s2 s2 s2 s2 s1 s1 s1 s1 s1 s1 s1 s1 s1 N O1 O2 ON {12,12,c12} {11,11,c11} K-means Global mean Cluster 1 mean Cluster 2mean {13,13,c13} {14,14,c14}

30 Solution to Problem 3 – The Intuitive View (CDHMM)
Define a new variable t(j,k) probability of being in state j at time t with the k-th mixture component accounting for ot Observation-independent assumption

31 Solution to Problem 3 – The Intuitive View (CDHMM) (cont’d)
Re-estimation formulae for are

32 A Simple Example The Forward/Backward Procedure S1 S1 S1 State S2 S2
Time o1 o2 o3

33 A Simple Example (cont’d)
q: 1 1 1 q: 1 1 2 Total 8 paths

34 A Simple Example (cont’d)
back


Download ppt "HMM - Part 2 The EM algorithm Continuous density HMM."

Similar presentations


Ads by Google