Download presentation
Presentation is loading. Please wait.
1
1 Hidden Markov Model Instructor : Saeed Shiry CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004
2
2 مقدمه In a word successive letters are dependent; in English ‘h' is very likely to follow ‘t' but not 'x'. Such processes where there is a sequence of observations-for example, letters in a word, base pairs in a DNA sequence-cannot be modeled as simple probability distributions. A similar example is speech recognition where speech utterances are composed of speech primitives called phonemes; only certain sequences of phonemes are allowed, which are the words of the language. At a higher level, words can be written or spoken in certain sequences to form a sentence as defined by the syntactic and semantic rules of the language. A sequence can be characterized as being generated by a parametric random process. In this chapter, we discuss how this modeling is done and also how the parameters of such a model can be learned from a training sample of example sequences.
3
3 اهداف Modeling dependencies in input Sequences: Temporal: In speech; phonemes in a word (dictionary), words in a sentence (syntax, semantics of the language). In handwriting, pen movements Spatial: In a DNA sequence; base pairs
4
4 History Markov chain theory developed around 1900. Hidden Markov Models developed in late 1960’s. Used extensively in speech recognition in 1960-70. Introduced to computer science in 1989. Bioinformatics. Signal Processing Data analysis and Pattern recognition Applications
5
5 HMMs and their Usage HMMs are very common in Computational Linguistics: Speech recognition (observed: acoustic signal, hidden: words) Handwriting recognition (observed: image, hidden: words) Part-of-speech tagging (observed: words, hidden: part-of- speech tags) Machine translation (observed: foreign words, hidden: words in target language)
6
6 Discrete Markov Processes Consider a system that at any time is in one of a set of N distinct states: The state at time t is denoted as so for example qt = Si means that at time t. the system is in state Si. S = { s1, s2..., sn } q t, t = 1,2,...,
7
7 Discrete Markov Processes At regularly spaced discrete times the system moves to a state with a given probability, depending on the values of the previous states: P(q t+1 =S j | q t =S i, q t-1 =S k,...) = P(q t+1 =S j | q t =S i ) First-order Markov P(q t+1 =S j | q t =S i, q t-1 =S k,...) For the special case of a first-order Markov model, the state at time t + 1 depends only on state at time t, regardless of the states in the previous times: Today is the first day of the rest of your life.
8
8 Discrete Markov Processes Transition probabilities a ij ≡ P(q t+1 =S j | q t =S i ) a ij ≥ 0 and Σ j=1 N a ij =1 Initial probabilities π i ≡ P(q 1 =S i ) Σ j=1 N π i =1 Going from Si to Sj has the same probability no matter when it happens, or where it happens in the observation sequence. A = [aij] is a N x N matrix whose rows sum to 1.
9
9 Stochastic Automaton This can be seen as a stochastic automaton
10
10 Observable Markov model In an observable Markov model. the states are observable. At any time t, we know qt, and as the system moves from one state to another, we get an observation sequence that is a sequence of states. The output of the process is the set of states at each instant of time where each state corresponds to a physical observable event. We have an observation sequence 0 that is the state sequence O = Q = {q1 q2.. qt}, whose probability is given as:
11
11 Example: Balls and Urns Three urns each full of balls of one color S 1 : red, S 2 : blue, S 3 : green
12
12 Balls and Urns: Learning Given K example sequences of length T
13
13 Hidden Markov Models States are not observable but when we visit a state, an observation is recorded that is a probabilistic function of the state. We assume a discrete observation in each state from the set {v 1,v 2,...,v M } Emission probabilities b j (m) ≡ P(O t =v m | q t =S j ) bj(m) is the observation, or emission probability that we observe V rn, m = 1,...,M in state Sj. The state sequence Q is not observed, that is what makes the model "hidden," but it should be inferred from the observation sequence O.
14
14 For each observation sequence, there are multiple state sequences In this case of a hidden Markov model, there are two sources of randomness: Additional to randomly moving from one state to another, the observation in a state is also random.
15
15 Example: Balls and Urns The hidden case: each urn contains balls of different colors. Let bj (m) the probability of drawing a ball of color m from urn j. We observe a sequence of ball colors but without knowing the sequence of urns from which the balls were drawn. The number of ball colors may be different from the number of urns. For example, let us say we have three urns and the observation sequence is O = {red, red, green, blue, yellow} in the case of a hidden model, a ball could have been picked from any urn. In this case, for the same observation sequence O, there may be many possible state sequences Q that could have generated.
16
16 HMM Unfolded in Time
17
17 Elements of an HMM N: Number of states M: Number of observation symbols A = [a ij ]: N by N state transition probability matrix B = b j (m): N by M observation probability matrix Π = [π i ]: N by 1 initial state probability vector λ = (A, B, Π), parameter set of HMM
18
18 Three Basic Problems of HMMs (Rabiner, 1989) Given a number of sequences of observations, we are interested in three problems: Evaluation: Given a model λ, evaluate the probability of any given observation sequence, O = {O1O2.. OT}, namely, P (O | λ) State sequence: Given λ, and O, find out the state sequence Q = {qlq2... qT}, which has the highest probability of generating O, or find Q * such that maximizes P (Q * | O, λ ) Learning: Given a training set of observation sequences, X ={O k } k, find λ * such that P ( X | λ * )=max λ P ( X | λ )
19
19 Evaluation Given an observation sequence 0 = {0102... OT} and a state sequence Q = {ql q2... qT}, the robability of observing O given the state sequence Q is simply The probability of the state sequence Q is
20
20 Forward variable: We define the forward variable at (i) as the probability of observing the partial sequence {01... Ot} until time t and being in Si at time t, given the model, λ : Initialization: Recursion
21
21 Forward variable: When we calculate the forward variables, it is easy to calculate the probability of the observation sequence: T (i) is the probability of generating the full observation sequence and ending up in state Si. We need to sum up over all such possible final states.
22
22 Backward variable:
23
23 caution
24
24 Finding the State Sequence Let us define t (i) as the probability of being in state Si at time t, given O and λ, which can be computed as No! Choose the state that has the highest probability, for each time step: q t * = arg max i γ t (i)
25
25 Viterbi’s Algorithm δ t (i) ≡ max q1q2∙∙∙ qt-1 p(q 1 q 2 ∙∙∙q t-1,q t =S i,O 1 ∙∙∙O t | λ) Initialization: δ 1 (i) = π i b i (O 1 ), ψ 1 (i) = 0 Recursion: δ t (j) = max i δ t-1 (i)a ij b j (O t ), ψ t (j) = argmax i δ t-1 (i)a ij Termination: p * = max i δ T (i), q T * = argmax i δ T (i) Path backtracking: q t * = ψ t+1 (q t+1 * ), t=T-1, T-2,..., 1
26
26 Learning We define (i, j) as the probability of being in Sj at time t and in Sj at time t + I, given the whole observation O and λ :
27
27 Baum-Welch (EM)
28
28 References Concept: Rabiner, L. R. (1989). A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, 77(2), 257-285. Application: Geng, J. and Yang, J. (2004). Automatic Extraction of Bibliographic Information on the Web. Proceedings of the 8th International Database Engineering and Applications Symposium (IDEAS’04), 193-204.
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.