 Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕.

Presentation on theme: "Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕."— Presentation transcript:

Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕

2 Contents  Introduction  Markov Models  Hidden Markov Models –Why use HMMs –General form of an HMM –The Three Fundamental Questions for HMMs  Fundamental Questions For HMMs  Implementation, Properties, and Variants

3 Introduction  Markov Model –Markov processes/chains/models were first developed by Andrei A. Markov –First use linguistic purpose : modeling the letter sequences in Russian literature(1913) –Current use general statistical tool  VMM (Visible Markov Model) –Words in sentences is depend on their syntax.  HMM (Hidden Markov Model) –operate high level abstraction by postulating additional “hidden” structures.

4 Markov Models  Markov assumption –Future elements of the sequence independent of past elements, given the present element.  Limited Horizon –X t = sequence of random variables –S k = state space  Time invariant (stationary)

5 Markov Models(Cont’)  Notation –stochastic transition –probability of different initial state  Application : Linear sequence of events –modeling valid phone sequences in speech recognition –sequences of speech acts in dialog systems

6 Markov Chain  circle : state and state name  arrows connecting states : possible transition  arc label : probability of each transition

7 Visible Markov Model  We know what states the machine is passing through.  m th order Markov model –n  3, n-gram violate Limited Horizen condition –reformulate any n-gram model as a visible Markov model by simply encoding (n-1)-gram

8 Hidden Markov Model  We don’t know the state sequence that the model passes through, only some probabilistic function of it  Example 1 : The crazy soft drink machine –two state : cola preferring(CP), iced tea preferring(IP) –VMM : machine always put out a cola in CP –HMM : emission probability –Output probability given From state

9 Crazy soft drink machine  Problem –What is the probability of seeing the output sequence {lem, ice-t} if the machine always start off in the cola preferring state?

10 Crazy soft drink machine(Cont’)

11 Why use HMMs?  underlying events probabilistically generate surface events –the words in a text  parts of speech  Linear interpolation of n-gram  Hidden state –the choice of whether to use the unigram, bigram, or trigram probabilities.  Two Keys –This is conversion works by adding epsilon transitions. –Separate parameters iab don’t adjust them separately.

12

13 Notation A B AAA BB SSS KKK S K S K

14 General form of an HMM  Arc-emission HMM –the symbol emits at time t depends on both the state at time t and at time(t+1).  State-emission HMM : ex) crazy drink machine –the symbol emits at time t depends just on the state at time t Figure 9.4 A program for a Markov process.

15 The Three Fundamental Questions for HMMS

16 Finding the probability of an observation

17 The forward procedure  Cheap algorithm required only 2N 2 T multiplication

18 The backward procedure  The total probability of seeing the rest of the observation sequence.  Use of a combination of forward and backward probabilities is vital for solving the third problem of parameter reestimation.  Backward variables Combining forward & backward

19 Finding the best state sequence  State sequence that explains the observations is more than one way.  Find X t that maximizes P(X t |O,  )  This may yield a quite unlikely state sequence.  Viterbi algorithm is more efficient.

20 Viterbi algorithem  The most likely complete path  This is sufficient to maximize for a fixed O  Definition

21 Variable calculations for O = (lem, ice_t, cola)

22 Parameter estimation  Given a certain observation sequence  Find the values of the model parameter  = (A, B,  )  Using Maximum Likelihood Estimation  Locally maximize by an iterative hill-climbing algorithm  usually effective for HMM

23 Parameter estimation (Cont’)

24 Parameter estimation (Cont’)

25 Implementation, Properties, Variants  Implementation –Obvious issue : keeping on multiplying very small numbers  Use Log function  Variants –It is not impossible to estimate many number parameter.  Multiple input observations  Initialization of parameter values –Try to approach near global maximum

Download ppt "Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕."

Similar presentations