Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Hidden Markov Model Presented by Qinmin Hu. 2 Outline Introduction Generating patterns Markov process Hidden Markov model Forward algorithm Viterbi.

Similar presentations


Presentation on theme: "1 Hidden Markov Model Presented by Qinmin Hu. 2 Outline Introduction Generating patterns Markov process Hidden Markov model Forward algorithm Viterbi."— Presentation transcript:

1 1 Hidden Markov Model Presented by Qinmin Hu

2 2 Outline Introduction Generating patterns Markov process Hidden Markov model Forward algorithm Viterbi algorithm Forward-backward algorithm Summary

3 3 Introduction Motivation: we are interested in finding patterns which appear over a space of time. These patterns occur in many areas; the pattern of commands someone uses in instructing a computer, sequences of words in sentences, the sequence of phonemes in spoken words - any area where a sequence of events occurs could produce useful patterns. Seaweed Weather Soggy Wet Dry Sun Observable states Hidden states

4 4 Generating Patterns (1) Deterministic Patterns Example: the sequence of traffic lights is red - red/amber - green – amber - red. The sequence can be pictured as a state machine, where the different states of the lights follow each other. Notice that each state is dependent solely on the previous state, so if the light is green, an amber light will always follow - that is, the system is deterministic.

5 5 Generating Patterns (2) Non-deterministic patterns Weather example: Unlike the light example, we cannot expect these three weather states to follow each other deterministically. So the weather states are non-deterministic. Markov Assumption (simplifies problems greatly) : The state of the model depends only upon the previous states of the model. sunnycloudyrainy Previous statesPredicted states D1Dry / sunny ? D2Dry / rainy D3Soggy / cloudy D4Soggy / rainy D5Dry / sunny

6 6 Markov Process Consisted of: –States : Three weather states - sunny, cloudy, rainy. –π vector : the probability of the system being in each of the states at time 1. –State transition matrix : The probability of the weather given the previous day's weather.

7 7 Hidden Markov Model (1) Definitions: A hidden Markov model (HMM) is a triple (π, A, B). П = ( π_i ) - the vector of the initial state probabilities; A = ( a_ij ) - the state transition matrix; a_ij = Pr( x_i_future | x_j_previous ) B = ( b_ij ) - the confusion matrix; b_ij = Pr( y_i | x_j ) NOTE: Each probability in the state transition matrix and in the confusion matrix is time independent - that is, the matrices do not change in time as the system evolves. In practice, this is one of the most unrealistic assumptions of Markov models about real processes.

8 8 Hidden Markov Model (2) HMM - contains two sets of states and three sets of probabilities: –Hidden states: e.g., the weather states. –Observable states: e.g., seaweed states. –π vector: the probability of the (hidden) model at time t = 1. –State transition matrix: the probability of a hidden state given the previous hidden state. –Confusion matrix: the probability of observing states given the hidden states.

9 9 Hidden Markov Model (3) A simple first order Markov process

10 10 Confusion Matrix Hidden Markov Model (4) π vector at t = 1 State transition matrix = 1 Hidden states Previous hidden states observable states Hidden states

11 11 Hidden Markov Model (5) Once a system can be described as a HMM, three problems can be solved. –Evaluation: Finding the probability of an observed sequence given a HMM. For example, we may have a `Summer' model and a `Winter' model for the seaweed, we may then hope to determine the season on the basis of a sequence of dampness observations. Algorithm: Forward algorithm. –Decoding: finding the sequence of hidden states that most probably generated an observed sequence. For example, we may know the states of seaweed as the observed sequence, then find the states of hidden weather. Algorithm: Viterbi algorithm. –Learning (hardest): generating a HMM given a sequence of observations For example, we may determine the triple (π, A, B) of weather HMM. Algorithm: Forward-backward algorithm.

12 12 Forward Algorithm (1) (Evaluation) Input: π vectors, A (state transition matrix), B (confusion matrix). Output: The probability of an observed sequence. Initial State Probabilities ( π Vector) sunny0.63 cloudy0.17 rainy0.20 Confusion matrix ( B ) drydryishdampsoggy sunny0.600.200.150.05 cloudy0.25 rainy0.050.100.350.50 State transition matrix ( A ) Weather today Weather yesterday sunnycloudyrainy sunny0.5000.250 cloudy0.3750.1250.375 rainy0.1250.6750.375 Probability of the observed sequence drydryishdampsoggy sunny???? cloudy???? rainy????

13 13 Forward Algorithm (2) (Evaluation) Partial probability From the example, there would be 3^3=27 possible different weather sequences, and so the probability is : Pr(dry, damp, soggy | HMM) = Pr(dry, damp, soggy | sunny, sunny, sunny) + Pr(dry, damp, soggy | sunny, sunny, cloudy) + Pr(dry, damp, soggy | sunny, sunny, rainy) +.... Pr(dry, damp, soggy | rainy, rainy, rainy) Expensive! Reduction of complexity using recursive!

14 14 Forward Algorithm (3) (Evaluation) Partial probability: α_t ( j ) = Pr( observation | hidden state is j ) * Pr(all paths to state j at time t) Steps: Step 1: t =1 Step 2: t >1

15 15 Initial State Probabilities ( π Vector) sunny0.63 cloudy0.17 rainy0.20 Confusion matrix ( B ) drydryishdampsoggy sunny0.600.200.150.05 cloudy0.25 rainy0.050.100.350.50 State transition matrix ( A ) Weather today Weather yesterday sunnycloudyrainy sunny0.5000.250 cloudy0.3750.1250.375 rainy0.1250.6750.375 α_1(1) = π(1) * b11 = 0.63 * 0.60 = 0.378 α_1(2) = π(2) * b21 = 0.17 * 0.25 = 0.043 α_1(3) = π(3) * b31 = 0.20 * 0.05 = 0.010 α_2(1) = [α_1(1) * a11 + α_1(2) * a21 + α_1(3) * a31] * b12 = 0.412 α_2(2) = [α_1(1) * 0.250 + α_1(2) * 0.125 + α_1(3) * 0.675] * 0.25 = 0.027 α_2(3) = [α_1(1) * 0.250 + α_1(2) * 0.375 + α_1(3) * 0.375] * 0.10 = 0.011 ………………………….. Probability of the observed sequence drydryishdampsoggy sunnyα_1(1)α_2(1)α_3(1)α_4(1) cloudyα_1(2)α_2(2)α_3(2)α_4(2) rainyα_1(3)α_2(3)α_3(3)α_4(3) Probability of the observed sequence drydryishdampsoggy sunny0.3780.4124.81E-32.74E-4 cloudy0.0430.0275.34E-31.92E-3 rainy0.0100.0118.60E-33.21E-3

16 16 Viterbi Algorithm (1) (Decoding) Input: π vectors, A (state transition matrix), B (confusion matrix), the probability of an observed sequence. Output: The most probable sequence of hidden states. Initial State Probabilities ( π Vector) sunny0.63 cloudy0.17 rainy0.20 Confusion matrix ( B ) drydryishdampsoggy sunny0.600.200.150.05 cloudy0.25 rainy0.050.100.350.50 State transition matrix ( A ) Weather today Weather yesterday sunnycloudyrainy sunny0.5000.250 cloudy0.3750.1250.375 rainy0.1250.6750.375 Probability of the observed sequence drydryishdampsoggy sunny0.3780.4124.81E-32.74E-4 cloudy0.0430.0275.34E-31.92E-3 rainy0.0100.0118.60E-33.21E-3

17 17 Viterbi Algorithm (2) (Decoding) Description: Goal: to recapture the most likely underlying state sequence. Pr( observed sequence | hidden state combination) Algorithm: –Through an execution trellis, calculating a partial probability for each cell. –With a back-pointer, indicating how that cell could most probably be reached. –On completion, the most likely final state is taken as correct, and the path to it traced back to t=1 via the back pointers.

18 18 Viterbi Algorithm (3) (Decoding) Partial probability From the example, the most probable sequence of hidden states is the sequence that maximises : Pr(dry,damp,soggy | sunny,sunny,sunny), Pr(dry,damp,soggy | sunny,sunny,cloudy), Pr(dry,damp,soggy | sunny,sunny,rainy),…, Pr(dry,damp,soggy | rainy,rainy,rainy) Expensive! Similar to Forward algorithm, we use time invariance of the probabilities to reduce the complexity of the calculation.

19 19 Viterbi Algorithm (4) (Decoding) each of the three states at t = 3 will have a most probable path to it, perhaps like the paths displayed in the second picture. The paths called partial best paths. Each of them has an associated probability which is the partial probability δ’s. t = 1, t > 1,

20 20 Viterbi Algorithm (5) (Decoding) Back pointer φ’s δ(i,t) – we have known at each intermediate and end state. However the aim is to find the most probable sequence of states through the trellis given an observation sequence. Therefore we need some way of remembering the partial best paths through the trellis. We want the back point φ’s to answer the question: If I am here, by what route is it most likely I arrived?

21 21 Initial State Probabilities ( π Vector) sunny0.63 cloudy0.17 rainy0.20 Confusion matrix ( B ) drydryishdampsoggy sunny0.600.200.150.05 cloudy0.25 rainy0.050.100.350.50 State transition matrix ( A ) Weather today Weather yesterday sunnycloudyrainy sunny0.5000.250 cloudy0.3750.1250.375 rainy0.1250.6750.375 δ_1(1) = 0.63 * 0.60 = 0.378 δ_1(2) = 0.17 * 0.25 = 0.043 δ_1(3) = 0.20 * 0.05 = 0.010 Max (δ_1(1), δ_1(2), δ_1(3)) = δ_1(1) = 0.378 δ_2(1) = max (0.378 * 0.50 * 0.20, 0.043 * 0.375 * 0.20, 0.010 * 0.125 * 0.20) = 0.0378 δ_2(2) = max (0.378 * 0.25 * 0.25, 0.043 * 0.125 * 0.25, 0.010 * 0.675 * 0.25) = 0.0236 δ_2(3) = max (0.378 * 0.25 * 0.10, 0.043 * 0.375 * 0.10, 0.010 * 0.375 * 0.10) = 0.00945 ………………………….. Probability of the observed sequence drydryishdampsoggy sunny0.3780.4124.81E-32.74E-4 cloudy0.0430.0275.34E-31.92E-3 rainy0.0100.0118.60E-33.21E-3 sunny cloudy rainy drydryishdampsoggy

22 22 Forward-backward algorithm (Learning) Evaluation (forward algorithm) and decoding (viterbi algorithm): “useful” They both depend upon foreknowledge of the HMM parameters - the state transition matrix, the observation matrix, and the π vector. Learning problem: forward-backward algorithm –There are many circumstances in practical problems where these are not directly measurable, and have to be estimated. –The algorithm permits this estimate to be made on the basis of a sequence of observations known to come from a given set, that represents a known hidden set following a Markov model. Though the forward-backward algorithm is not unduly hard to comprehend, it is more complex in nature, so here not detailed in this presentation.

23 23 Summary Generating patterns –Patterns do not appear in isolation but as part of a series in time. –A Markov assumption is that the process's state is dependent only on the preceding N states. Markov model –Because the process states (patterns) are not directly observable, but are indirectly, and probabilistically, observable as another set of patterns – There problems are solved Evaluation: forward algorithm Decoding: Viterbi algorithm Learning: forward-backward algorithm

24 24 Thanks! Questions?


Download ppt "1 Hidden Markov Model Presented by Qinmin Hu. 2 Outline Introduction Generating patterns Markov process Hidden Markov model Forward algorithm Viterbi."

Similar presentations


Ads by Google