Presentation is loading. Please wait.

Presentation is loading. Please wait.

Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter 3.3-3.7.

Similar presentations


Presentation on theme: "Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter 3.3-3.7."— Presentation transcript:

1 Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter 3.3-3.7

2 Elze de Groot2 Overview Parameter estimation for HMMs –Baum-Welch algorithm HMM model structure More complex Markov chains Numerical stability of HMM algorithms

3 Elze de Groot3 Specifying a HMM model Most difficult problem using HMMs is specifying the model –Design of the structure –Assignment of parameter values

4 Elze de Groot4 Specifying a HMM model Most difficult problem using HMMs is specifying the model –Design of the structure –Assignment of parameter values

5 Elze de Groot5 Parameter estimation for HMMs Estimate transition and emission probabilities a kl and e k (b) Two ways of learning: –Estimation when state sequence is known –Estimation when paths are unknown Assume that we have a set of example sequences (training sequences x 1, …x n )

6 Elze de Groot6 Parameter estimation for HMMs Assume that x 1 …x n independent. joint probability Log space Since log ab = log a + logb

7 Elze de Groot7 Estimation when state sequence is known Easier than estimation when paths unknown A kl = number of transitions k to l in trainingdata + r kl E k (b) = number of emissions of b from k in training data + r k (b)

8 Elze de Groot8 Estimation when paths are unknown More complex than when paths are known We can’t use maximum likelihood estimators Instead, an iterative algorithm is used –Baum-Welch

9 Elze de Groot9 The Baum-Welch algorithm We don’t know real values of A kl and E k (b) 1.Estimate A kl and E k (b) 2.Update a kl and e k (b) 3.Repeat with new model parameters a kl and e k (b)

10 Elze de Groot10 Baum-Welch algorithm Forward valueBackward value

11 Elze de Groot11 Baum-Welch algorithm Now that we have estimated A kl and E k (b), use maximum likelihood estimators to compute a kl and e k (b) We use these values to estimate A kl and E k (b) in the next iteration Continue doing this iteration until change is very small or max number of iterations is exceeded

12 Elze de Groot12 Baum-Welch algorithm

13 Elze de Groot13 Example Estimated model with 300 rolls and 30.000 rolls

14 Elze de Groot14 Drawbacks ML estimators –Vulnerable to overfitting if not enough data –Estimations can be undefined if never used in training set (so use of pseudocounts) Baum-Welch –Many local maximums instead of global maximum can be found, depending on starting values of parameters –This problem will be worse for large HMMs

15 Elze de Groot15 Viterbi Training Most probable path derived using viterbi algorithm Continue until none of paths change Finds value of θ that maximises contribution to likelihood Performs less well than baum welch

16 Elze de Groot16 Modelling of labelled sequences Only -- and ++ are calculated Better than using ML estimators, when many different classes are present

17 Elze de Groot17 Specifying a HMM model Most difficult problem using HMMs is specifying the model –Design of the structure –Assignment of parameter values

18 Elze de Groot18 Design of the structure Design: how to connect states by transitions A good HMM is based on the knowledge about the problem under investigation Local maxima are biggest disadvantage in models that are fully connected After deleting a transition from model Baum- Welch will still work: set transition probability to zero

19 Elze de Groot19 Example 1 Geometric distribution p 1-p

20 Elze de Groot20 Example 2 Model distribution of length between 2 and 10

21 Elze de Groot21 Example 3 Negative binomial distribution p=0.99 n≤5

22 Elze de Groot22 Silent states States that do not emit symbols Also in other places in HMM B 

23 Elze de Groot23 Example Silent states

24 Elze de Groot24 Silent states Advantage: –Less estimations of transition probabilities needed Drawback: –Limits the possibilities of defining a model

25 Elze de Groot25 Silent states Change in forward algorithm For ‘real’ states the same For silent states set Starting from lowest numbered silent state l add for all silent states k<l

26 Elze de Groot26 More complex Markov chains So far, we assumed that probability of a symbol in a sequence depends only on the probability of the previous symbol More complex –High order Markov chains –Inhomogeneous Markov chains

27 Elze de Groot27 High order Markov chains An nth order Markov process Probability of a symbol in a sequence depends on the probability of the previous n symbols An nth order Markov chain over some alphabet A is equivalent to a first order Markov chain over the alphabet A n of n-tuples, because: P(AB|B) = P(A|B)

28 Elze de Groot28 Example A second order Markov chain with two different symbols {A,B} This can be translated into a first order Markov chain of 2-tuples {AA, AB, BA, BB} Sometimes the framework of high order model is convenient

29 Elze de Groot29 Gene candidates in DNA: -sequence of triplets of nucleotides: startcodon nr. of non-stopcodons stopcodon -open reading frame (ORF) An ORF can be either a gene or a non- coding ORF (NORF) Finding prokaryotic genes

30 Elze de Groot30 Finding prokaryotic genes Experiment: –DNA from bacterium E.coli –Dataset contains 1100 genes (900 used for training, 200 for testing) Two models: –Normal model with first order Markov chains –Also first order Markov chains, but codons instead of nucleotides are used as symbol

31 Elze de Groot31 Finding prokaryotic genes Outcomes:

32 Elze de Groot32 Inhomogeneous Markov chains Using the position information in the codon –Three models for position 1, 2 and 3 CAT GCA P(C)a CA a AT a TG a GC a CA P(C)a 2 CA a 3 AT a 1 TG a 2 GC a 3 CA HomogeneousInhomogeneous 1 2 3

33 Elze de Groot33 Numerical Stability of HMM algorithms Multiplying many probabilities can cause numerical problems: –Underflow errors –Wrong numbers are calculated Solutions: –Log transformation –Scaling of probabilities

34 Elze de Groot34 The log transformation Compute log probabilities –Log 10 -100000 = -100000 –Underflow problem is essentially solved Sum operation is often faster than product operation In the Viterbi algorithm:

35 Elze de Groot35 Scaling of probabilities Scale f and b variables Forward variable: –For each i a scaling variable s i is defined –New f variables are defined: –New forward recursion:

36 Elze de Groot36 Scaling of probabilities Backward variable –Scaling has to be with same numbers as forward variable –New backward recursion: This normally works well, however underflow errors can still occur in models with many silent states (chapter 5)

37 Elze de Groot37 Summary Hidden Markov Models Parameter estimation –State sequence known –State sequence unknown Model structure –Silent states More complex Markov chains Numerical stability


Download ppt "Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter 3.3-3.7."

Similar presentations


Ads by Google