Natural Language Processing Spring 2007 V. “Juggy” Jagannathan

Foundations of Statistical Natural Language Processing By Christopher Manning & Hinrich Schutze Course Book

Chapter 9 Markov Models March 5, 2007

Markov models Markov assumption –Suppose X = (X 1, …, X T ) is a sequence of random variables taking values in some finite set S = {s 1,…,s N }, Markov properties are: Limited Horizon –P(X t+1 = s k |X 1,…,X t ) = P(X t+1 = s k |X t ) –i.e. the t+1 value only depends on t value Time invariant (stationary) Stochastic Transition matrix A: –a ij = P(X t+1 = s j |X t =s i ) where

Markov model example

Probability: {lem,ice-t} given the machine starts in CP? 0.3x0.7x0.1+0.3x0.3x0.7 =0.021+0.063 = 0.084 Hidden Markov Model Example

Why use HMMs? Underlying events generating surface observable events Eg. Predicting weather based on dampness of seaweeds http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/ main.html Linear Interpolation in n-gram models:

Look at Notes from David Meir Blei [UC Berkley] http://www-nlp.stanford.edu/fsnlp/hmm-chap/blei-hmm-ch9.ppt Slides 1-13

(Observed states)

Forward Procedure

Initialization: Induction: Total computation: Forward Procedure

Initialization: Induction: Total computation: Backward Procedure

Combining both – forward and backward

Finding the best state sequence To determine the state sequence that best explains observations Let: Individually the most likely state is: This approach, however, does not correctly estimate the most likely state sequence.

Finding the best state sequence Viterbi algorithm Store the most probable path that leads to a given node Initialization Induction Store Backtrace

Parameter Estimation

Probability of traversing an arc at time t given observation sequence O:

Parameter Estimation

