Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE5230 - Data Mining, 2002Lecture 10.1 Data Mining - CSE5230 Hidden Markov Models (HMMs) CSE5230/DMS/2002/10.

Similar presentations


Presentation on theme: "CSE5230 - Data Mining, 2002Lecture 10.1 Data Mining - CSE5230 Hidden Markov Models (HMMs) CSE5230/DMS/2002/10."— Presentation transcript:

1 CSE5230 - Data Mining, 2002Lecture 10.1 Data Mining - CSE5230 Hidden Markov Models (HMMs) CSE5230/DMS/2002/10

2 CSE5230 - Data Mining, 2002Lecture 10.2 Lecture Outline u Time- and space-varying processes u First-order Markov models u Hidden Markov models u Examples: coin toss experiments u Formal Definition u Use of HMMs for classification u References

3 CSE5230 - Data Mining, 2002Lecture 10.3 Time- and Space-varying Processes (1) u The data mining techniques we have discussed so far have focused on the classification, prediction or characterization of single data points, e.g.: vAssign a record to one of a set of classes »Decision trees, back-propagation neural networks, Bayesian classifiers, etc. vPredicting the value of a field in a record given the values of the other fields »Regression, back-propagation neural networks, etc. vFinding regions of feature space where data points are densely grouped »Clustering, self-organizing maps

4 CSE5230 - Data Mining, 2002Lecture 10.4 Time- and Space-varying Processes (2) u In the methods we have considered so far, we have assumed that each observed data point is statistically independent from the observation that preceded it, e.g. vClassification: the class of data point x t is not influenced by the class of x_ t-1 (or indeed any other data point) vPrediction: the value of a field for a record depends only on the values of the field of that record, not on values in any other records. u Several important real-world data mining problems can not be modeled in this way.

5 CSE5230 - Data Mining, 2002Lecture 10.5 Time- and Space-varying Processes (3) u We often encounter sequences of observations, where each observation may depend on the observations which preceded it u Examples vSequences of phonemes (fundamental sounds) in speech (speech recognition) vSequences of letters or words in text (text categorization, information retrieval, text mining) vSequences of web page accesses (web usage mining) vSequences of bases (CGAT) in DNA (genome projects [human, fruit fly, etc.)) vSequences of pen-strokes (hand-writing recognition) u In all these cases, the probability of observing a particular value in the sequence can depend on the values which came before it

6 CSE5230 - Data Mining, 2002Lecture 10.6 Example: web log u Consider the following extract from a web log: u Cleary the URL which is requested depends on the URL which was requested before vIf the user uses the “Back” button in his/her browser, the requested URL may depend on earlier URLs in the sequence too u The given a particular observed URL, we can calculate the probabilities of observing all the other possible URLs next. vNote that we may even observe the same URL next. xxx - - [16/Sep/2002:14:50:34 +1000]"GET /courseware/cse5230/ HTTP/1.1"20013539 xxx - - [16/Sep/2002:14:50:42 +1000]"GET /courseware/cse5230/html/research_paper.html HTTP/1.1"20011118 xxx - - [16/Sep/2002:14:51:28 +1000]"GET /courseware/cse5230/html/tutorials.html HTTP/1.1"2007750 xxx - - [16/Sep/2002:14:51:30 +1000]"GET /courseware/cse5230/assets/images/citation.pdf HTTP/1.1"20032768 xxx - - [16/Sep/2002:14:51:31 +1000]"GET /courseware/cse5230/assets/images/citation.pdf HTTP/1.1"206146390 xxx - - [16/Sep/2002:14:51:40 +1000]"GET /courseware/cse5230/assets/images/clustering.pdf HTTP/1.1"20017100 xxx - - [16/Sep/2002:14:51:40 +1000]"GET /courseware/cse5230/assets/images/clustering.pdf HTTP/1.1"20614520 xxx - - [16/Sep/2002:14:51:56 +1000]"GET /courseware/cse5230/assets/images/NeuralNetworksTute.pdf HTTP/1.1"20017137 xxx - - [16/Sep/2002:14:51:56 +1000]"GET /courseware/cse5230/assets/images/NeuralNetworksTute.pdf HTTP/1.1"20616017 xxx - - [16/Sep/2002:14:52:03 +1000]"GET /courseware/cse5230/html/lectures.html HTTP/1.1"2009608 xxx - - [16/Sep/2002:14:52:05 +1000]"GET /courseware/cse5230/assets/images/week03.ppt HTTP/1.1"200121856 xxx - - [16/Sep/2002:14:52:24 +1000]"GET /courseware/cse5230/assets/images/week06.ppt HTTP/1.1"200527872

7 CSE5230 - Data Mining, 2002Lecture 10.7 First-Order Markov Models (1) u In order to model processes such as these, we make use of the idea of states. At any time t, we consider the system to be in state w(t). u We can consider a sequence of successive states of length T: wT = {w(1), w(2), …, w(T)} u We will model the production of such a sequence using transition probabilities: u The probability that the system will be in state w j and time t+1 given that it was in state w i at time t

8 CSE5230 - Data Mining, 2002Lecture 10.8 First-Order Markov Models (2) u A model of states and transition probabilities, such as the one we have just described, is called a Markov model. u Since we have assumed that the transition probabilities depend only on the previous state, this is a first-order Markov model vHigher order Markov models are possible, but we will not consider them here. u For example, Markov models for human speech could have states corresponding phonemes vA Markov model for the word “cat” would have states for /k/, /a/, /t/ and a final silent state

9 CSE5230 - Data Mining, 2002Lecture 10.9 Example: Markov model for “cat” /k//a//t/ /silent/

10 CSE5230 - Data Mining, 2002Lecture 10.10 Hidden Markov Models u In the preceding example, we have said that the states correspond to phonemes u In a speech recognition system, however, we don’t have access to phonemes – we can only measure properties of the sound produced by a speaker u In general, our observed data does not correspond directly to a state of the model: the data corresponds to the visible states of the system vThe visible states are directly accessible for measurement. u The system can also have internal “hidden” states, which can not be observed directly vFor each hidden state, there is a probability of observing each visible state. u This sort of model is called Hidden Markov Model (HMM)

11 CSE5230 - Data Mining, 2002Lecture 10.11 Example: coin toss experiments u Let us imagine a scenario where we are in a room which is divided in two by a curtain. u We are on one side of the curtain, and on the other is a person who will carry out a procedure using coins resulting in a head (H) or a tail (T). u When the person has carried out the procedure, they call out the result, H or T, which we record.  This system will allow us to generate a sequence of Hs and Ts, e.g. HHTHTHTTHTTTTTHHTHHHHTHHHTTHHHHHHTTT TTTTTHTHHTHTTTTTHHTHTHHHTHTHHTTTTHHT TTHHTHHTTTHTHTHTHTHHHTHHTTHT ….

12 CSE5230 - Data Mining, 2002Lecture 10.12 Example: single fair coin u Imagine that the person behind the curtain has a single fair coin (i.e. it has equal probabilities of coming up heads or tails) u We could model the process producing the sequence of Hs and Ts as a Markov model with two states, and equal transition probabilities: u Note that here the visible states correspond exactly to the internal states – the model is not hidden u Note also that states can transition to themselves TH 0.5

13 CSE5230 - Data Mining, 2002Lecture 10.13 Example: a fair and a biased coin u Now let us imagine a more complicated scenario. The person behind the curtain has two coins, one fair and one biased (for example, P(T) = 0.9) 1.The person starts by picking a coin a random 2.The person tosses the coin, and calls out the result (H or T) 3.If the result was H, the person switches coins 4.Go back to step 2, and repeat.  This process generates sequences like: TTTTTTTTTTTTTTTTTTTTTTTTHHTTTTTTTHHTTTTTTT TTTTTTTTTTTTTTTHHTTTTTTTTTHTTHTTHHTTTTTHHT TTTTTTTTTHHTTTTTTTTHTHHHTTTTTTTTTTTTTTHHTT TTTTTHTHTTTTTTTHHTTTTT… u Note this looks quite different from the sequence for the fair coin example.

14 CSE5230 - Data Mining, 2002Lecture 10.14 Example: a fair and a biased coin u In this scenario, the visible state no longer corresponds exactly to the hidden state of the system: vVisible state: output of H or T vHidden state: which coin was tossed u We can model this process using a HMM: Biased Fair 0.5 0.1 0.9 0.5 0.9 0.1 0.5 T H H T

15 CSE5230 - Data Mining, 2002Lecture 10.15 Example: a fair and a biased coin u We see from the diagram on the preceding slide that we have extended our model vThe visible states are shown in blue, and the emission probabilities are shown too. u As well as internal states w(t) and state transition probabilities a ij, we have visible states v(t) and emission probabilities b jk vNote that the b jk do not need to be related to the a ij as they are in the example above. u We now have full model such as this is called a Hidden Markov Model

16 CSE5230 - Data Mining, 2002Lecture 10.16 HMM: formal definition u We can now give a more formal definition of a first-order Hidden Markov Model (adapted from [RaJ1986]: vThere is a finite number of (internal) states, N vAt each time t, a new state is entered, based upon a transition probability distribution which depends on the state at time t – 1. Self-transitions are allowed vAfter each transition is made, a symbol is output, according to a probability distribution which depends only on the current state. There are thus N such probability distributions. u Estimating the number of states N, and the transition and emission probabilities are complex issues, but solutions do exist.

17 CSE5230 - Data Mining, 2002Lecture 10.17 Use of HMMs u We have now seen what sorts of processes can be modeled using HMMs, and how an HMM is specified mathematically. u We now consider how HMMs are actually used. u Consider the two H and T sequences we saw in the previous examples: vHow could we decided which coin-toss system was most likely to have produced each sequence?  To which system would you assign these sequences? 1: TTTHHTTTTTTTTTTTTTHHTTTTTTHHTTTHH 2: THHTTTHHHTTHTHTTHTHHTTHHHTTHTHTHT 3: THHTHTHTHTHHHTTHTTTHHTTHTTTTTHHHT 4: HTTTHTTHTTTTHTTTHHTTHTHTTTTTTTTHT u We can answer question this using a Bayesian formulation (see last week’s lecture)

18 CSE5230 - Data Mining, 2002Lecture 10.18 Use of HMMs for classification u HMMs are often used to classify sequences u To do this, a separate HMM is built and trained (i.e. the parameters are estimated) for each class of sequence in which we are interested ve.g. we might have an HMM for each word in a speech recognition system. The hidden states would correspond to phonemes, and the visible states to measured sound features u For a given observed sequence v T, we estimate the probability that each HMM M l generated it: u We assign the sequence to the model with the highest posterior probability. u The algorithms for calculating these probabilities are beyond the scope of this unit, but can be found in the references.

19 CSE5230 - Data Mining, 2002Lecture 10.19 References u [DHS2000] Richard O. Duda, Peter E. Hart and David G. Stork, Pattern Classification (2nd Edn), Wiley, New York, NY, 2000, pp. 128-138 u [RaJ1986] L. R. Rabiner and B. H. Juang, An introduction to hidden Markov models, IEEE Magazine on Acoustics, Speech and Signal Processing, 3, 1, pp. 4-16, January 1986.


Download ppt "CSE5230 - Data Mining, 2002Lecture 10.1 Data Mining - CSE5230 Hidden Markov Models (HMMs) CSE5230/DMS/2002/10."

Similar presentations


Ads by Google