Markov Models A model of sequences of events where the probability of an event occurring depends upon the fact that a preceding event occurred. Observable states: 1, 2, …, N Observed sequences: O 1, O 2, …, O l, …, O T P(O l =j|O 1 =a,…,O l-1 =b,O l+1 =c,…)=P(O l =j|O 1 =a,…,O l-1 =b) Order n model A Markov process is a process which moves from state to state depending (only) on the previous n states.
Markov Models(cont.) First Order Model (n=1) P(O l =j|O l-1 =a,O l-2 =b,…)=P(O l =j|O l-1 =a) The state of model depends only on its previous state. Components: States, initial probabilities & state transition probabilities
Hidden Markov Models Markov Model is used to predict what will come next based on previous observations. However, sometimes, what we want to predict is not what we observed. Example someone trying to deduce the weather from a piece of seaweed For some reason, he can not access weather information (sun, cloud, rain) directly But he can know the dampness of a piece of seaweed (soggy, damp, dryish, dry) And the state of the seaweed is probabilistically related to the state of the weather
Hidden Markov Models (cont.) Hidden Markov Models are used to solve this kind of problems. Hidden Markov Model is an extension of First Order Markov Model The “true” states are not observable directly (Hidden) Observable states are probabilistic functions of the hidden states The hidden system is First Order Markov
Hidden Markov Models (cont.) A Hidden Markov Model is consist of two sets of states and three sets of probabilities: hidden states : the (TRUE) states of a system that may be described by a Markov process (e.g. weather states in our example). observable states : the states of the process that are `visible‘ (e.g. dampness of the seaweed). Initial probabilities for hidden states Transition probabilities for hidden states Confusion probabilities from hidden states to observable states
HMM problems HMMs are used to solve three kinds of problems Finding the probability of an observed sequence given a HMM (evaluation); Finding the sequence of hidden states that most probably generated an observed sequence (decoding). The third problem is generating a HMM given a sequence of observations (learning). –learning the probabilities from training data.
HMM problems (cont.) 1. Evaluation Problem: We have a number of HMMs and a sequence of observations. We may want to know which HMM most probably generated the given sequence. Solution: Computing the probability of the observed sequences for each HMM. Choose the one produced highest probability Can use Forward algorithm to reduce complexity.
HMM problems (cont.) 1. Decoding Problem: Given a particular HMM and an observation sequence, we want to know the most likely sequence of underlying hidden states that might have generated the observation sequence. Solution: Computing the probability of the observed sequences for each possible sequence of underlying hidden states. Choose the one produced highest probability Can use Viterbi algorithm to reduce the complexity.
HMM problems (cont.) the most probable sequence of hidden states is the sequence that maximizes : Pr(dry,damp,soggy | sunny,sunny,sunny), Pr(dry,damp,soggy | sunny,sunny,cloudy), Pr(dry,damp,soggy | sunny,sunny,rainy),.... Pr(dry,damp,soggy | rainy,rainy,rainy)
HMM problems (cont.) 3. Learning Problem: Estimate the probabilities of HMM from training data Solution: Training with labeled data Transition probability P(a,b)=(number of transitions from a to b)/ total number of transitions of a Confusion probability P(a, o)=(number of symbol o occurrences in state a)/(number of all symbol occurrences in state a) Training with unlabeled data Baum-Welch algorithm The basic idea Random generate HMM at the beginning Estimate new probability from the previous HMM until P(current HMM) – P( previous HMM) < e (a small number)
Experiments Problem Parsing a reference string into fields (author, journal, volume, page, year, etc.) Model as HMM Hidden states – fields (author, journal, volume, etc) and some special characters ( “,”, “and”, etc.) Observable states – words Probability matrixes --learning from training data Reference parsing Using Viterbi algorithm to find the most possible sequence of hidden states for an observation sequences.
Experiment (cont.) Select 1000 reference strings (refer to article) from APS. Using the first 750 for training and the rest for testing. Do similar feature generalization as that we did for SVM last time. “M. “ :init: “Phys. “ :abbrev: “1994” :Digs4: “Liu” :CapNonWord: “physics” :LowDictWord: … Should but not special processing tag
Experiment(cont.) Measurement Let C be the total number of words (tokens) which are predicted correctly. Let N be the total number of words Correct Rate R=C/N *100% Our result N=4284; C=4219; R=98.48%
Conclusion HMM is used to model What we want to predict is not what we observed The underlying system can be model as first order Markov HMM assumption The next state is independent of all states but its previous state The probability matrixes learned from samples are the actual probability matrixes. After learning, the probability matrixes will keep unchanged