 # PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005.

## Presentation on theme: "PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005."— Presentation transcript:

PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005

Markov Models: Definition  Markov chains are Bayesian networks that model sequences of events (states)  Sequential events are dependent  Two non-sequential events are conditionally independent given the intermediate events (MM-1)

Markov chains q1 q4q3q2 q0q1q4q3q2 q0q1q4q3q2 q0q1q4q3q2 MM-0 MM-1 MM-2 MM-3 … … … …

Markov Chains MM-0: P(q 1,q 2.. q N ) =  n=1..N P(q n ) MM-1: P(q 1,q 2.. q N ) =  n=1..N P(q n |q n-1 ) MM-2: P(q 1,q 2.. q N ) =  n=1..N P(q n |q n-1,q n-2 ) MM-3: P(q 1,q 2.. q N ) =  n=1..N P(q n |q n-1,q n-2,q n-3 )

Hidden Markov Models  Hidden Markov chains model sequences of events and corresponding sequences of observations  Events form an Markov chain (MM-1)  Observations are conditionally independent given the sequence of events  Each observation is directly connected with a single event (and conditionally independent with the rest of the events in the network)

Hidden Markov Models q0q1q4q3q2 … o0o1o4o3o2 … P(o 0,o 1..o N, q 0,q 1..q N ) =  n=0..N P(q n |q n-1 )P(o n |q n ) HMM-1

Parameter Estimation  The parameters that have to be estimated are the a-priori probabilities P(q 0 ) transition probabilities P(q n |q n-1 ) observation probabilities P(o n |q n )  For example if there are 3 types of events and continuous 1-D observations that follow a Gaussian distribution there are 18 parameters to estimate: 3 a-priori probabilities 3x3 transition probabilities matrix 3 means and 3 variances (observation probabilities)

Parameter Estimation  If both the sequence of events and sequences of observations are fully observable then ML is used  Usually the sequence of events q 0,q 1..q N are non-observable in which case EM is used  The EM algorithm for HMMs is the Baum- Welsh or forward-backward algorithm

Inference/Decoding  The main inference problem for HMMs is known as the decoding problem: given a sequence of observations find the best sequence of states: q = argmax q P(q|O) = argmax q P(q,O)  An efficient decoding algorithm is the Viterbi algorithm

Viterbi algorithm max q P(q,O) = max q P(o 0,o 1..o N, q 0,q 1..q N ) = max q  n=0..N P(q n |q n-1 )P(o n |q n ) = max q N {P(o N |q N ) max q N-1 {P(q N |q N-1 )P(o N-1 |q N-1 ) … max q2 {P(q 3 |q 2 )P(o 2 |q 2 ) max q1 {P(q 2 |q 1 )P(o 1 |q 1 ) max q0 {P(q 1 |q 0 ) P(o 0 |q 0 ) P(q 0 )}}}…}}

Viterbi algorithm 1 2 3 4 K.... time At each node keep only the best (most probable) path from all the paths passing through that node

Deep Thoughts  HMM-0 (HMM with MM-0 event chain) is the Bayes classifier!!!  MMs and HMMs are poor models but simple and efficient computationally How do you fix this? (dependent observations?)

Some Applications  Speech Recognition  Optical Character Recognition  Part-of-Speech Tagging  …

Conclusions  HMMs and MMs are useful modeling tools for dependent sequence of events (states or classes)  Efficient algorithms exist for training HMM parameters (Baum-Welsh) and decoding the most probable sequence of states given an observation sequence (Viterbi)  HMMs have many applications

Download ppt "PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005."

Similar presentations