Presentation is loading. Please wait.

Presentation is loading. Please wait.

. Learning Parameters of Hidden Markov Models Prepared by Dan Geiger.

Similar presentations


Presentation on theme: ". Learning Parameters of Hidden Markov Models Prepared by Dan Geiger."— Presentation transcript:

1 . Learning Parameters of Hidden Markov Models Prepared by Dan Geiger.

2 2 Nothing is hidden H1H1 H2H2 H L-1 HLHL HiHi Maximum likelihood: P(H 1 =t) = N t /(N t +N f ) H1H1 Maximum likelihood: P(H 2 =t|H 1 =t) = N t,t /(N t,t +N f,t ) And so on for every edge - independently. Equal-prior MAP: P(H 1 =t) = (a+N t )/(a+N t + a+N f ) H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi How to extend to hidden variables ?

3 3 Learning the parameters (EM algorithm) A common algorithm to learn the parameters from unlabeled sequences is called Expectation-Maximization (EM). In the HMM context it reads as follows: Start with some initial probability tables (many choices) Iterate until convergence M-step: use these Expected counts to update the local probability tables via the Maximum likelihood formula. E-step: Compute p(h i,h i-1 | x 1,…,x L ) using the current probability tables (“current parameters”).

4 4 Example I: Homogenous HMM, one sample Start with some probability tables (say  = = ½) Iterate until convergence E-step: Compute p , (h i |h i -1,x 1,…,x L ) from p , (h i, h i -1 | x 1,…,x L ) which is computed using the forward- backward algorithm as explained earlier. H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi M-step: Update the parameters simultaneously:    i p , (h i =1 | h i-1 =0, x 1,…,x L )/(L-1)   i p , (h i =0 | h i-1 =1, x 1,…,x L )/(L-1)

5 5 Example II: Homogenous HMM, N samples Start with some probability tables (say  = = ½) Iterate until convergence E-step: Compute p , (h i | h i -1, [x 1,…,x L ] j ) for j=1,..,N, from p , (h i, h i -1 | [x 1,…,x L ] j ) which is computed using the forward- backward algorithm as explained earlier. H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi Changes in the equations due to N>1 are marked in bold blue. M-step: Update the parameters simultaneously:    j  i p , (h i =1 | h i-1 =0, [x 1,…,x L ] j )/ N(L-1)   j  i p , (h i =0 | h i-1 =1, [x 1,…,x L ] j )/ N(L-1)

6 6 Example III: Non Homogenous, N samples Start with some probability tables (say  = = ½) Iterate until convergence E-step: Compute p  i, i (h i | h i -1, [x 1,…,x L ] j ) for j=1,...,N, from p  i, i (h i, h i -1 | [x 1,…,x L ] j ) which is computed using the forward- backward algorithm as explained earlier. M-step: Update the parameters simultaneously:  i   j p  i, i (h i =1 | h i-1 =0, [x 1,…,x L ] j )/N i   j p  i, i (h i =0 | h i-1 =1, [x 1,…,x L ] j )/N H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi Summation over i is now dropped; The factor L-1 dropped.

7 7 Example IV: missing emission probabilities H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi Exercise: Write equations for the parameters ,  ’ and . Hint: compute P(x i,h i |Data). Often the learned parameters are collectively denoted by . E.g., in the context of homogeneous HMMs, if all parameters are learned from data, then  ={ ,  ’, , , }.

8 8 Viterbi Training Start with some probability tables (many possible choices) Iterate until convergence E-step (new): Compute using the current parameters. M-step: use these Expected counts to update the local probability tables via Maximum likelihood (=N s1  s2 /N). Comments: Useful when the posterior probability centers around the MAP value. Avoids the inconsistency of adding up each link separately. E.g., one can not have H 1 =0, H 2 =1 and H 2 =0, H 3 =1, simultaneously, as we did earlier. Summing over all joint options is exponential. A common variant of the EM algorithm for HMMs.

9 9 Summary of HMM H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi 1.Belief update = posterior decoding Forward-Backward algorithm 2.Maximum A Posteriori assignment Viterbi algorithm 3.Learning parameters The EM algorithm Viterbi training

10 10 Some applications of HMMs H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi 1. Haplotyping 2. Gene mapping 3. Speech recognition, finance, 4. … you name it… everywhere

11 11 Haplotyping G1G1 G2G2 G L-1 GLGL H1H1 H2H2 H L-1 HLHL HiHi GiGi H1H1 H2H2 HLHL HiHi Every G i is an unordered pair of letters {aa,ab,bb}. The source of one letter is the first chain and the source of the other letter is the second chain. Which letter comes from which chain ? (Is it a paternal or maternal DNA?(

12 12 Model of Inheritance Example with two parents and one child. 1 One locus i i+1 More loci 3 children


Download ppt ". Learning Parameters of Hidden Markov Models Prepared by Dan Geiger."

Similar presentations


Ads by Google