Presentation is loading. Please wait.

Presentation is loading. Please wait.

PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,

Similar presentations


Presentation on theme: "PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,"— Presentation transcript:

1 PGM 2003/04 Tirgul 2 Hidden Markov Models

2 Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models, although they were developed long before the notion of general models existed (1913). They are used to model time-invariant and limited horizon models that have both an underlying mechanism (hidden states) and an observable consequence. They have been extremely successful in language modeling and speech recognition systems and are still the most widely used technique in these domains.

3 Markov Models A Markov process or model assumes that we can predict the future based just on the present (or on a limited horizon into the past): Let {X 1,…,X T } be a sequence of random variables taking values {1,…,N} then the Markov properties are: Limited Horizon: Time invariant (stationary):

4 Describing a Markov Chain A Markov chain can be described by the transition matrix A and initial state probabilities Q: or alternatively: and we calculate: 1 2 4 3 0.6 0.4 1.0 0.3 0.7 1.0

5 Hidden Markov Models In a Hidden Markov Model (HMM) we do not observe the sequence that the model passed through but only some probabilistic function of it. Thus, it is a Markov model with the addition of emission probabilities: For example: Observed: The House isonfire States:def noun verbprep noun

6 Why use HMMs? A lot of real-life processes are composed of underlying events generating surface phenomena. Tagging parts of speech is a common example. We can usually think of processes as having a limited horizon (we can easily extend to the case of a constant horizon larger than 1) We have efficient training algorithm using EM Once the model is set, we can easily run it: t=1, start in state i with probability q i forever : move from state i to j with probability a ij emit y t =k with probability b ik t=t+1

7 The fundamental questions Likelihood: Given a model  =(A,B,Q), how do we efficiently compute the likelihood of an observation P(Y|  )? Decoding: Given the observation sequence Y and a model , what state sequence explains it best (MPE)? This is, for example, the tagging process of an observed sentence. Learning: Given an observation sequence Y, and a generic model, how do we estimate the parameters that define the best model to describe the data?

8 Computing the Likelihood For any state sequence (X 1,…,X T ): using P(Y,X)=P(Y|X)P(X) we get: But, we have O(TN T ) multiplications!

9 The trellis (lattice) algorithm To compute likelihood: Need to enumerate over all paths in the lattice (all possible instantiations of X 1 …X T ). But… some starting subpath (blue) is common to many continuing paths (blue+red) Idea: using dynamic programming, calculate a path in terms of shorter sub-paths

10 The trellis (lattice) algorithm We build a matrix of the probability of being at time t at state i -  t (i)=P(x t =i,y 1 y 2 …y t ) is a function of the previous column (forward procedure): 1 2 N i a 1i a Ni a 2i  t ()  t+1 (i)

11 The trellis (lattice) algorithm (cont.) We can similarly define a backwards procedure for filling the matrix 1 N i A i1 b 1y(t+1) a iN b Ny(t+1)  t+1 ()  t (i)

12 The trellis (lattice) algorithm (cont.) And we can easily combine: And then

13 Finding the best state sequence We would like to the most likely path (and not just the most likely state at each time slice) The Viterbi algorithm is an efficient trellis method for finding the MPE: and we to reconstruct the path:

14 The Casino HMM A casino switches from a fair die (state F) to a loaded one (state U) with probability 0.05 and the other way around with probability 0.1. The loaded die has 0.5 probability of showing a 6. The casino, honestly, reads off the number that was rolled.

15 The Casino HMM (cont.) What is the likelyhood of 3151166661? Y= 3 1 5 1 1 6 6 6 6 1  1 (1)=0.5*1/6=1/12,  1 (2)=0.5*0.1=0.05  2 (1)=1/6*(0.95*1/12+0.1*0.05)  0.014  2 (1)=0.1*(0.05*1/12+0.9*0.05)  0.0049  3 (1)  0.0023,  3 (2)  0.0005  4 (1)  0.0004,  4 (1)  0.0001  5 (1)  0.0001,  5 (1) < 0.0001 … all smaller then 0.0001!

16 The Casino HMM (cont.) What explains 3151166661 best? Y= 3 1 5 1 1 6 6 6 6 1  1 (1)=0.5*1/6=1/12,  1 (2)=0.5*0.1=0.05  2 (1)=1/6*max(0.95*1/12,0.1*0.05)  0.0132  2 (1)=1/10*max(0.05*1/12, 0.9*0.05)  0.0045  3 (1)  0.0021,  3 (2)  0.0004  4 (1)  0.0003,  4 (1) < 0.0001  5 (1)  0.0001,  5 (1) < 0.001 … …

17 The Casino HMM (cont.) An example of reconstruction using Viterbi (Durbin): Rolls3151162464466442453113216311641521336 Die 0000000000000000000000000000000000000 Viterbi 0000000000000000000000000000000000000 Rolls 2514454363165662656666665116645313265 Die0000000011111111111111111111100000000 Viterbi0000000000011111111111111111100000000 Rolls 1245636664631636663162326455236266666 Die0000111111111111111100011111111111111 Viterbi0000111111111111111111111111111111111

18 Learning If we were given both X and Y, we could choose Using the Maximum Likelihood principal, we simply assign the parameter for each relative frequency What do we do when we have only Y? ML here does not have a closed form formula!

19 EM (Baum Welch) Idea: Using current guess to complete data and re- estimate Thm: Likelihood of observables never decreases!!! (to be proved later in the course) Problems: Gets stuck at sub-optimal solutions E-Step “Guess” X using Y and current parameters M-Step Reestimate parameters using current completion of data

20 Parameter Estimation We define the expected number of transitions from state i to j at time t: the expected # of transition from i to j in Y is then

21 Parameter Esimation (cont.) We use EM re-estimation formulas using the expected counts we already have:

22 Application: Sequence Pair Alignmnet DNA sequences are “strings” with a four letter alphabet = { C,G,A,T }. Real-life sequences often have by way of mutation of a letter or even a deletion. A fundamental problem in computation biology is to align such sequences: Input:Output: CTTTACGTTACTTACGCTTTACGTTAC-TTACG CTTAGTTACGTTAGC-TTA-GTAACGTTA-G How can we use an HMM for such a task?

23 Sequence Pair Alignmnet (cont.) We construct and HMM with 3 states that emits two letters: (M)atch: Probability of emission of aligned pairs (high probability for matching letter low for mismatch) (D)elete1: Emission of a letter in the first sequence and an insert in the second sequence (D)elete2: The converse of D1 Transition matrix A MD1D2 M0.90.05 D10.950.050 D20.9500.05 Q M0.9 D10.05 D20.05 Matrix B If in M, emit same letter with probability 0.24 (for each letter) If in D1 or D2 emit all letters uniformly

24 Sequence Pair Alignmnet (cont.) How to align 2 new sequences? We also need (B)egin and (E)nd states to signify the start and end of the sequence. From B we will have to same transition probabilities as for M and we will never return to it. We will have a probability of 0.05 to reach E from any states and we will never leave it. This probability will determine the average length of a an alignment.  We now just do Viterbi with some extra technical details (Programming ex1)

25 Extensions HMM have been used so extensively, that it is impossible to even start on the many forms they take. Several extension however are worth mentioning: using different kind of transition matrices using continuous observations using a larger horizon assigning probabilities to wait times using several training sets …


Download ppt "PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,"

Similar presentations


Ads by Google