Presentation is loading. Please wait.

Presentation is loading. Please wait.

S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter 3.3-3.7.

Similar presentations


Presentation on theme: "S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter 3.3-3.7."— Presentation transcript:

1 S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter 3.3-3.7

2 S. Maarschalkerweerd & A. Tjhang2 Overview last lecture Hidden Markov Models Different algorithms: – Viterbi – Forward – Backward

3 S. Maarschalkerweerd & A. Tjhang3 Overview today Parameter estimation for HMMs – Baum-Welch algorithm HMM model structure More complex Markov chains Numerical stability of HMM algorithms

4 S. Maarschalkerweerd & A. Tjhang4 Specifying a HMM model Most difficult problem using HMMs is specifying the model – Design of the structure – Assignment of parameter values

5 S. Maarschalkerweerd & A. Tjhang5 Specifying a HMM model Most difficult problem using HMMs is specifying the model – Design of the structure – Assignment of parameter values

6 S. Maarschalkerweerd & A. Tjhang6 Parameter estimation for HMMs Estimate transition and emission probabilities a kl and e k (b) Two ways of learning: – Estimation when state sequence is known – Estimation when paths are unknown Assume that we have a set of example sequences (training sequences x 1, …x n )

7 S. Maarschalkerweerd & A. Tjhang7 Parameter estimation for HMMs Assume that x 1 …x n independent. So P(x 1,…,x n |  ) =  P(x j |  ) Since log ab = log a + logb n j=1

8 S. Maarschalkerweerd & A. Tjhang8 Estimation when state sequence is known Easier than estimation when paths unknown A kl = number of transitions k to l in trainingdata + r kl E k (b) = number of emissions of b from k in training data + r k (b)

9 S. Maarschalkerweerd & A. Tjhang9 Estimation when paths are unknown More complex than when paths are known We can’t use maximum likelihood estimators Instead, an iterative algorithm is used – Baum-Welch

10 S. Maarschalkerweerd & A. Tjhang10 The Baum-Welch algorithm We don’t know real values of A kl and E k (b) 1. Estimate A kl and E k (b) 2. Update a kl and e k (b) 3. Repeat with new model parameters a kl and e k (b)

11 S. Maarschalkerweerd & A. Tjhang11 Baum-Welch algorithm Forward valueBackward value

12 S. Maarschalkerweerd & A. Tjhang12 Baum-Welch algorithm Now that we have estimated A kl and E k (b), use maximum likelihood estimators to compute a kl and e k (b) We use these values to estimate A kl and E k (b) in the next iteration Continue doing this iteration until change is very small or max number of iterations is exceeded

13 S. Maarschalkerweerd & A. Tjhang13 Baum-Welch algorithm

14 S. Maarschalkerweerd & A. Tjhang14 Example

15 S. Maarschalkerweerd & A. Tjhang15

16 S. Maarschalkerweerd & A. Tjhang16 Drawbacks ML estimators – Vulnerable to overfitting if not enough data – Estimations can be undefined if never used in training set (use pseudocounts) Baum-Welch – Local maximum instead of global maximum can be found, depending on starting values of parameters – This problem will be worse for large HMMs

17 S. Maarschalkerweerd & A. Tjhang17 Modelling of labelled sequences Only -- and ++ are calculated Better than using ML estimators, when many different classes are present

18 S. Maarschalkerweerd & A. Tjhang18 Specifying a HMM model Most difficult problem using HMMs is specifying the model – Design of the structure – Assignment of parameter values

19 S. Maarschalkerweerd & A. Tjhang19 Design of the structure Design: how to connect states by transitions A good HMM is based on the knowledge about the problem under investigation Local maxima are biggest disadvantage in models that are fully connected After deleting a transition from model Baum- Welch will still work: set transition probability to zero

20 S. Maarschalkerweerd & A. Tjhang20 Example 1 Geometric distribution p 1-p

21 S. Maarschalkerweerd & A. Tjhang21 Example 2 Model distribution of length between 2 and 10

22 S. Maarschalkerweerd & A. Tjhang22 Example 3

23 S. Maarschalkerweerd & A. Tjhang23 Silent states States that do not emit symbols Also in other places in HMM B 

24 S. Maarschalkerweerd & A. Tjhang24 Example Silent states

25 S. Maarschalkerweerd & A. Tjhang25 Silent states Advantage: – Less estimations of transition probabilities needed Drawback: – Limits the possibilities of defining a model

26 S. Maarschalkerweerd & A. Tjhang26 More complex Markov chains So far, we assumed that probability of a symbol in a sequence depends only on the probability of the previous symbol More complex – High order Markov chains – Inhomogeneous Markov chains

27 S. Maarschalkerweerd & A. Tjhang27 High order Markov chains An nth order Markov process Probability of a symbol in a sequence depends on the probability of the previous n symbols An nth order Markov chain over some alphabet A is equivalent to a first order Markov chain over the alphabet A n of n- tuples, because P(AB|B) = P(A|B)

28 S. Maarschalkerweerd & A. Tjhang28 Example A second order Markov chain with two different symbols {A,B} This can be translated into a first order Markov chain of 2-tuples {AA, AB, BA, BB} Sometimes the framework of high order model is convenient

29 S. Maarschalkerweerd & A. Tjhang29 Gene candidates in DNA: -sequence of triplets of nucleotides: startcodon nr. of non-stopcodons stopcodon -open reading frame (ORF) An ORF can be either a gene or a non-coding ORF (NORF) Finding prokaryotic genes

30 S. Maarschalkerweerd & A. Tjhang30 Finding prokaryotic genes Experiment: – DNA from bacterium E.coli – Dataset contains 1100 genes (900 used for training, 200 for testing) Two models: – Normal model with first order Markov chains – Also first order Markov chains, but codons instead of nucleotides are used as symbol

31 S. Maarschalkerweerd & A. Tjhang31 Finding prokaryotic genes Outcomes:

32 S. Maarschalkerweerd & A. Tjhang32 Inhomogeneous Markov chains Using the position information in the codon – Three models for position 1, 2 and 3 CAT GCA P(C)a CA a AT a TG a GC a CA P(C)a 2 CA a 3 AT a 1 TG a 2 GC a 3 CA HomogeneousInhomogeneous 1 2 3

33 S. Maarschalkerweerd & A. Tjhang33 Numerical Stability of HMM algorithms Multiplying many probabilities can cause numerical problems: – Underflow errors – Wrong numbers are calculated Solutions: – Log transformation – Scaling of probabilities

34 S. Maarschalkerweerd & A. Tjhang34 The log transformation Compute log probabilities – Log 10 -100000 = -100000 – Underflow problem is essentially solved Sum operation is often faster than product operation In the Viterbi algorithm:

35 S. Maarschalkerweerd & A. Tjhang35 Scaling of probabilities Scale f and b variables Forward variable: – For each i a scaling variable s i is defined – New f variables are defined: – New forward recursion:

36 S. Maarschalkerweerd & A. Tjhang36 Scaling of probabilities Backward variable – Scaling has to be with same numbers as forward variable – New backward recursion: This normally works well, however underflow errors can still occur in models with many silent states (chapter 5)

37 S. Maarschalkerweerd & A. Tjhang37 Summary Hidden Markov Models Parameter estimation – State sequence known – State sequence unknown Model structure – Silent states More complex Markov chains Numerical stability


Download ppt "S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter 3.3-3.7."

Similar presentations


Ads by Google