Pattern Recognition and Machine Learning-Chapter 13: Sequential Data

Name: Pattern Recognition and Machine Learning-Chapter 13: Sequential Data
Uploaded: 2017-07-13T12:26:00+00:00
Duration: PTM11S30
Channel: Richard Rogers
Description: Pattern Recognition and Machine Learning-Chapter 13: Sequential Data

Pattern Recognition and Machine Learning-Chapter 13: Sequential Data
Affiliation: Kyoto University Name: Kevin Chien, Dr. Oba Shigeyuki, Dr. Ishii Shin Date: Dec. 9, 2011

Origin of Markov Models
Idea Origin of Markov Models

Why Markov Models IID data not always possible. Illustrate future data (prediction) dependent on some recent data, using DAGs where inference is done by sum-product algorithm. State Space (Markov) Model: Latent Variables Discrete latent: Hidden Markov Model Gaussian latent: Linear Dynamical Systems Order of Markov Chain: data dependence 1st order: Current observation depends only on previous 1 observation

State Space Model Latent variable Zn forms a Markov chain. Each Zn contributes to its observation Xn. As order grows #parameter grows, to organize this we use State Space Model Zn-1 and Zn+1 is now independent given Zn (d-separated)

For understanding Markov Models
Terminologies For understanding Markov Models

Terminologies Markovian Property: stochastic process that probability of a transition is dependent only on present state and not on the manner in which the current state is reached. Transition diagram for same variable different state

Terminologies (cont.) F is bounded above and below by g asymptotically
(review)Zn+1 and Zn-1 is d-separated given Zn: means given we block Zn’s outgoing edges there is no path from Zn+1 and Zn-1 =>independent [Big_O_notation, Wikipedia, Dec. 2011]

Formula and motivation
Markov Models Formula and motivation

Hidden Markov Models (HMM)
Zn discrete multinomial variable Transition probability matrix Sum of each row =1 P(staying in present state) is non-zero Counting non-diagonals K(K-1) parameters

Hidden Markov Models (cont.)
Emission (transition) probability with parameters governing the distribution homogeneous model: latent variable share the same parameter A Sampling data is simply noting the parameter values while following transitions with emission probability.

HMM, Expect. Max. for max. likelihood
Likelihood function: marginalizing over latent variables Start with initial model parameters for Evaluate Defining Likelihood function results

HMM: forward-backward algorithm
2 stage message passing in tree for HMM, to find marginals p(node) efficiently Here the marginals are Assume p(xk|zk), p(zk|zk-1),p(z1) known X=(x1,..,xn), xi:j=(xi,xi+1,..,xj) Goal compute p(zk|x) Forward part: compute p(zk, x1:k) for every k=1,..,n Backward part: compute p(xk+1:n|zk) for every k=1,…,n

HMM: forward-backward algorithm (cont.)
P(zk|x) ∝ p(zk,x) = p(xk+1:n|zk,x1:k) p(zk,x1:k) Where xk+1:n and x1:k are d-separated given zk so P(zk|x) ∝ p(zk,x) = p(xk+1:n|zk) p(zk,x1:k) Now we can do EM algorithm and Baum-Welch algorithm to estimate parameter values Sample from posterior z given x. Most likely z with Viterbi algorithm xk+1:n

HMM forward-backward algorithm: Forward part
Compute p(zk,x1:k) p(zk,x1:k)=∑(all values of zk-1) p(zk,zk-1,x1:k) = ∑(all values of zk-1) p(xk|zk,zk-1,x1:k-1)p(zk|zk-1,x1:k-1)p(zk-1,x1:k-1) mm…look like a recursive function, if p(zk,x1:k) is labeled αk(zk) then zk-1,x1:k-1 and xk d-separated given zk zk and xk-1 d-separated given zk-1 So αk(zk)=∑(all values of zk-1) p(xk|zk)p(zk|zk-1) αk-1(zk-1) xk+1:n For k=2,..,n Emission prob. transition prob. recursive part

HMM forward-backward algorithm: Forward part (cont.)
α1(z1)=p(z1,x1)=p(z1)p(x1|z1) If each z has m states then computational complexity is Θ(m) for each zk for one k Θ(m2) for each k Θ(nm2) in total xk+1:n

HMM forward-backward algorithm: Backward part
Compute p(xk+1:n|zk) for all zk and all k=1,..,n-1 p(xk+1:n|zk)=∑(all values of zk+1) p(xk+1:n,zk+1|zk) =∑(all values of zk+1) p(xk+2:n|zk+1,zk,xk+1)p(xk+1|zk+1,zk)p(zk+1|zk) mm…look like a recursive function, if p(xk+1:n|zk) is labeled βk(zk) then zk,xk+1 and xk+2:n d-separated given zk+1 zk and xk+1 d-separated given zk+1 So βk(zk) =∑(all values of zk+1) βk+1(zk+1) p(xk+1|zk+1)p(zk+1|zk) xk+1:n For k=1,..,n-1 recursive part Emission prob. transition prob.

HMM forward-backward algorithm: Backward part (cont.)
βn(zn) =1 for all zn If each z has m states then computational complexity is same as forward part Θ(nm2) in total xk+1:n

HMM: Viterbi algorithm
Max-sum algorithm for HMM, to find most probable sequence of hidden states for a given observation sequence X1:n Example: transform handwriting images into text Assume p(xk|zk), p(zk|zk-1),p(z1) known Goal: compute z*= argmaxz p(z|x) Given x=x1:n, z=z1:n Given lemma f(a)≥0 ∀a and g(a,b) ≥0 ∀a,b then Maxa,b f(a)g(a,b) = maxa[f(a) maxb g(a,b)] maxz p(z|x) ∝ maxz p(z,x)

HMM: Viterbi algorithm (cont.)
μk(zk)=maxz1:k p(z1:k,x1:k) =maxz1:k p(xk|zk)p(zk|zk-1) …..f(a) part p(z1:k-1,x1:k-1) ....g(a,b) part mm…look like a recursive function, if we can make max to appear in front of p(z1:k-1,x1:k-1). Use lemma - by setting a=zk-1, b=z1:k-2 =maxzk-1[p(xk|zk)p(zk|zk-1) maxz1:k-2 p(z1:k-1,x1:k-1)] =maxzk-1[p(xk|zk) p(zk|zk-1) μk-1(zk-1) ] For k=2,…,n

HMM: Viterbi algorithm (finish up)
μk(zk)=maxzk-1 p(xk|zk) p(zk|zk-1) μk-1(zk-1) μ1(z1)= p(x1,z1)=p(z1)p(x1|z1) Same method to get maxz μn(zn)=maxz p(x,z) We can get max value, to get max sequence, compute recursive equation bottom-up while remembering values (μk(zk) looks at all paths of μk-1(zk-1)) For k=2,…,n

Additional Information
Excerpt of equations and diagrams from [Pattern Recognition and Machine Learning, Bishop C.M.] page Excerpt of equations from Mathematicalmonk, Youtube LLC, Google Inc., (ML 14.6 and 14.7) various titles, July 2011

Pattern Recognition and Machine Learning-Chapter 13: Sequential Data

Similar presentations

Presentation on theme: "Pattern Recognition and Machine Learning-Chapter 13: Sequential Data"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Pattern Recognition and Machine Learning-Chapter 13: Sequential Data

Similar presentations

Presentation on theme: "Pattern Recognition and Machine Learning-Chapter 13: Sequential Data"— Presentation transcript:

Similar presentations

About project

Feedback