. Learning Hidden Markov Models Tutorial #7 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.

. Learning Hidden Markov Models Tutorial #7 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger

2 Estimating model parameters Reminder: Data set Model Parameters: Θ = θ 1, θ 2, θ 3, … MLE inference ? training set

3 Estimating model parameters HMMs: 0.9 fair loaded H H T T 0.9 0.1 1/2 1/4 3/41/2 Start 1/2 transitions emissions initial Estimate parameters of the model given a training data set The state-path is given along with the sequence The state-path is unknown  supervised  unsupervised E S S

4 Supervised Learning of HMMs The state-path is given along with the sequence The likelihood of a given set of parameters, Θ : Pr[ X 1 … X L, S 1 … S L |Θ] X1X1 X2X2 X L-1 XLXL XiXi S1S1 S2S2 S L-1 SLSL SiSi E S S

5 Supervised Learning of HMMs The state-path is given along with the sequence We wish to find Θ which maximizes Pr[ X 1 … X L, S 1 … S L |Θ] = X1X1 X2X2 X L-1 XLXL XiXi S1S1 S2S2 S L-1 SLSL SiSi We maximize independently for each state s : P trans (s, -) and P emit (s,-) MLE for multinomial distribution + pseudo counts E S S

6 Unsupervised Learning of HMMs The sequence is not labeled by states We wish to find Θ which maximizes Pr[ X 1 … X L |Θ] = Σ Š ( Pr[ X 1 … X L, Š|Θ] )  No efficient general-purpose method to find this maximum Heuristic solution: 1. Guess an initial set of parameters 2. Iteratively improve your assessment X1X1 X2X2 X L-1 XLXL XiXi S1S1 S2S2 S L-1 SLSL SiSi ? EM algorithm

7 Baum-Welch - EM for HMMs EM (Expectation-Maximization) – Algorithm for learning the parameters from unlabeled sequences Start with some set of parameters (many possible choices) Iterate until convergence: E-step: Compute Pr[S i, X 1,…,X L ], Pr[S i-1, S i, X 1,…,X L ] using current set of parameters - there are L*| S | + (L-1)*| S | 2 such expressions to compute M-step: Use expected counts of transitions/emissions to update new parameter set

8 Start with some set of parameters ( λ=φ=½ ) Iterate until convergence: E-step: Compute Pr[S i-1 =0/1, S i =0/1, X 1,…,X L | λ,φ] using forward / backward algorithms (we will show how) M-step: update λ,φ simultaneously:    i Pr[S i-1 =0, S i =1, X 1,…,X L | λ,φ] /  i Pr[S i-1 =0, X 1,…,X L | λ,φ] λ   i Pr[S i-1 =1, S i =0, X 1,…,X L | λ,φ] /  i Pr[S i-1 =1, X 1,…,X L | λ,φ] Example 2-state/2-signal HMM X1X1 X2X2 X L-1 XLXL XiXi S1S1 S2S2 S L-1 SLSL SiSi 2 states 2 signal : 2 parameters + pseudo counts

9 Reminder from last week: Decomposing the computation Pr [X 1,…,X L, S i = S ] = Pr[X 1,…,X i, S i = S ] * Pr[X i+1,…,X L | X 1,…,X i, S i = S ] = = Pr[X 1,…,X i, S i = S ] * Pr[X i+1,…,X L | S i = S ] = = f i (S) * b i (S) X1X1 X2X2 X L-1 XLXL XiXi S1S1 S2S2 S L-1 SLSL SiSi Markov

10 The E-step Pr [S i = S, X 1,…,X L ] = f i (S) * b i (S) (from last week) Pr[S i-1 =S, S i =S’, X 1,…,X L | λ,φ] = f i-1 (S)*Pr trans [S  S’]*Pr emit [S’  X i ]*b i (S’) (prove in HW #4) Special case i=L : Pr[S L-1 =S, S L =S’, X 1,…,X L | λ,φ] = f L-1 (S)*Pr trans [S  S’]*Pr emit [S’  X i ]  define b L (S’)=1 (for all S’ ) X1X1 X2X2 X L-1 XLXL S1S1 S2S2 S L-1 SLSL X i-1 XiXi S i-1 SiSi

11 Coin-Tossing Example Fair/Loade d Head/Tail X1X1 X2X2 X L-1 XLXL XiXi H1H1 H2H2 H L-1 HLHL HiHi 0.9 fair loaded H H T T 0.9 0.1 1/2 1/4 3/41/2 Start 1/2 Reminder:

12 Start with some assignment ( θ = 0.9 ) Iterate until convergence: E-step: Compute Pr[S i-1 =L/F, S i =L/F, X 1,…,X L | θ] using forward / backward algorithms (as previously explained) M-step: update θ : θ   i Pr[ S i-1 = S i (=L/F), X 1,…,X L | θ] /  i Pr[(S i-1 =L/F), X 1,…,X L | θ] Example 2-state/2-signal HMM single parameter 0.9 fair loaded H H T T 0.9 0.1 1/2 1/4 3/41/2 Start 1/2 (L-1)* Pr[X 1,…,X L | θ] (likelihood )

13 Coin-Tossing Example Last time we calculated: forward S1S1 S2S2 S3S3 Loaded0.3750.2718750.06445 Fair0.250.131250.07265 backward S1S1 S2S2 S3S3 Loaded0.2090.275(1) Fair0.2340.475(1) Outcome of 3 tosses: Head, Head, Tail Recall: f i (S) = Pr [X 1,…,X i, S i =S ] =  S’ ( f i-1 (S’) * P trans [S’  S ]* P emit [S  X i ] ) b i (S) = Pr [X i+1,…,X L | S i =S ] =  S’ ( P trans [S  S’ ] *P emit [S’  X i+1 ]* b i+1 (S) )

14 Coin-Tossing Example The E-step Outcomes: Head, Head, Tail Pr[S 1 =S, S 2 =S’, HHT | θ] = f 1 (S) * Pr trans [S  S’] * Pr emit [S’  H] * b 2 (S’) Pr[S 1 =Loaded, S 2 =Loaded, HHT | θ] = 0.375 * 0.9 * 0.75 * 0.275 = 0.0696 Pr[S 1 =Loaded, S 2 =Fair, HHT | θ]= 0.375 * 0.1 * 0. 5 * 0.475 = 0.0089 Pr[S 1 =Fair, S 2 =Loaded, HHT | θ] = 0.25 * 0.1 * 0.75 * 0.275 = 0.0052 Pr[S 1 =Fair, S 2 =Fair, HHT | θ] = 0.25 * 0.9 * 0. 5 * 0.475 = 0.0534 forward S1S1 S2S2 S3S3 Loaded0.3750.2718750.06445 Fair0.250.131250.07265 backward S1S1 S2S2 S3S3 Loaded0.2090.275(1) Fair0.2340.475(1)

15 Outcomes: Head, Head, Tail Pr[S 2 =Loaded, S 3 =Loaded, HHT | θ] = 0.271875 * 0.9 * 0.75 * 1 = 0.0612 Pr[S 2 =Loaded, S 3 =Fair, HHT | θ]= 0.271875 * 0.1 * 0. 5 * 1 = 0.0136 Pr[S 2 =Fair, S 3 =Loaded, HHT | θ] = 0.13125 * 0.1 * 0.75 * 1 = 0.0033 Pr[S 2 =Fair, S 3 =Fair, HHT | θ] = 0.13125 * 0.9 * 0. 5 * 1 = 0.0591 forward S1S1 S2S2 S3S3 Loaded0.3750.2718750.06445 Fair0.250.131250.07265 backward S1S1 S2S2 S3S3 Loaded0.2090.2751 Fair0.2340.4751 Pr[S 2 =S, S 3 =S’, HHT | θ] = f 2 (S) * Pr trans [S  S’] * Pr emit [S’  T] * b 3 (S’) Coin-Tossing Example The E-step (cont)

16 θ  (Pr[ S 1 =S 2, HHT | θ] + Pr[ S 2 =S 3, HHT | θ]) / 2*Pr[HHT | θ] We saw last week: Pr[X 1,…,X L | θ]=  S ( f L (S) ) Pr[HHT | θ] = 0.06445 + 0.07265 = 0.1371 θ  ((0.0696 + 0.0534) + (0.0612 + 0.0591)) / 2* 0.1371 = 0.8873 Continue …  converges to ? Coin-Tossing Example The M-step M-step: update θ : θ   i Pr[ S i-1 = S i (=L/F), X 1,…,X L | θ] /  i Pr[(S i-1 = L/F), X 1,…,X L | θ] (L-1)* Pr[X 1,…,X L | θ] (likelihood ) forward S1S1 S2S2 S3S3 Loaded0.3750.2718750.06445 Fair0.250.131250.07265 L  L F  F

17 Coin-Tossing Example Learning simulation start at 0.001 start at 0.999

. Learning Hidden Markov Models Tutorial #7 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.

Similar presentations

Presentation on theme: ". Learning Hidden Markov Models Tutorial #7 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

. Learning Hidden Markov Models Tutorial #7 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.

Similar presentations

Presentation on theme: ". Learning Hidden Markov Models Tutorial #7 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger."— Presentation transcript:

Similar presentations

About project

Feedback