Markov Models. Markov Chain A sequence of states: X 1, X 2, X 3, … Usually over time The transition from X t-1 to X t depends only on X t-1 (Markov Property).

Slides:



Advertisements
Similar presentations
Markov Models.
Advertisements

HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Link Analysis: PageRank
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Topics Review of DTMC Classification of states Economic analysis
Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.
Hidden Markov Models Eine Einführung.
Tutorial on Hidden Markov Models.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Hidden Markov Models Modified from:
Statistical NLP: Lecture 11
Entropy Rates of a Stochastic Process
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
Advanced Artificial Intelligence
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –One exception: games with multiple moves In particular, the Bayesian.
… Hidden Markov Models Markov assumption: Transition model:
Introduction to PageRank Algorithm and Programming Assignment 1 CSC4170 Web Intelligence and Social Computing Tutorial 4 Tutor: Tom Chao Zhou
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Model Special case of Dynamic Bayesian network Single (hidden) state variable Single (observed) observation variable Transition probability.
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 21: Link Analysis.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Link Analysis, PageRank and Search Engines on the Web
Forward-backward algorithm LING 572 Fei Xia 02/23/06.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Fast Temporal State-Splitting for HMM Model Selection and Learning Sajid Siddiqi Geoffrey Gordon Andrew Moore.
Hidden Markov Models David Meir Blei November 1, 1999.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Hidden Markov models Sushmita Roy BMI/CS 576 Oct 16 th, 2014.
Fall 2001 EE669: Natural Language Processing 1 Lecture 9: Hidden Markov Models (HMMs) (Chapter 9 of Manning and Schutze) Dr. Mary P. Harper ECE, Purdue.
CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.
CS262 Lecture 5, Win07, Batzoglou Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:
The effect of New Links on Google Pagerank By Hui Xie Apr, 07.
Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21- Forward Probabilities and Robotic Action Sequences.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
CS315 – Link Analysis Three generations of Search Engines Anchor text Link analysis for ranking Pagerank HITS.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markovian Model. Some Definitions Finite automation is defined by a set of states, and a set of transitions between states that are taken based.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Other Models for Time Series. The Hidden Markov Model (HMM)
From DeGroot & Schervish. Example Occupied Telephone Lines Suppose that a certain business office has five telephone lines and that any number of these.
DTMC Applications Ranking Web Pages & Slotted ALOHA
Hidden Markov Models - Training
CS 188: Artificial Intelligence Spring 2007
Presentation transcript:

Markov Models

Markov Chain A sequence of states: X 1, X 2, X 3, … Usually over time The transition from X t-1 to X t depends only on X t-1 (Markov Property). A Bayesian network that forms a chain The transition probabilities are the same for any t (stationary process) X2X3X4X1

Example: Gambler’s Ruin Specification: Gambler has 3 dollars. Win a dollar with prob. 1/3. Lose a dollar with prob. 2/3. Fail: no dollars. Succeed: Have 5 dollars. States: the amount of money 0, 1, 2, 3, 4, 5 Transition Probabilities Courtsey of Michael Littman

Example: Bi-gram Language Modeling States: Transition Probabilities:

Transition Probabilities Suppose a state has N possible values X t =s 1, X t =s 2,….., X t =s N. N 2 Transition Probabilities P(X t =s i |X t-1 =s j ), 1≤ i, j ≤N The transition probabilities can be represented as a NxN matrix or a directed graph. Example: Gambler’s Ruin

What can Markov Chains Do? Example: Gambler’s Ruin The probability of a particular sequence  3, 4, 3, 2, 3, 2, 1, 0 The probability of success for the gambler The average number of bets the gambler will make.

Example: Academic Life A.Assistant Prof.: 20 B. Associate Prof.: 60 T. Tenured Prof.: 90 S. Out on the Street: 10 D. Dead: What is the expected lifetime income of an academic? Courtsey of Michael Littman

Solving for Total Reward L(i) is expected total reward received starting in state i. How could we compute L(A)? Would it help to compute L(B), L(T), L(S), and L(D) also?

Solving the Academic Life The expected income at state D is 0 L(T)=90+0.7x x90+… L(T)=90+0.7xL(T) L(T)=300 T. Tenured Prof.: 90 D. Dead:

Working Backwards A.Assistant Prof.: 20 B. Associate Prof.: 60 T. Tenured Prof.: 90 S. Out on the Street: 10 D. Dead: Another question: What is the life expectancy of professors?

Ruin Chain /3 2/

Gambling Time Chain /3 2/

Google’s Search Engine Assumption: A link from page A to page B is a recommendation of page B by the author of A (we say B is successor of A)  Quality of a page is related to its in-degree Recursion: Quality of a page is related to its in-degree, and to the quality of pages linking to it  PageRank [Brin and Page ‘98]

Definition of PageRank Consider the following infinite random walk (surf): Initially the surfer is at a random page At each step, the surfer proceeds  to a randomly chosen web page with probability d  to a randomly chosen successor of the current page with probability 1-d The PageRank of a page p is the fraction of steps the surfer spends at p in the limit.

Random Web Surfer What’s the probability of a page being visited?

Stationary Distributions Let S is the set of states in a Markov Chain P is its transition probability matrix The initial state chosen according to some probability distribution q (0) over S q (t) = row vector whose i-th component is the probability that the chain is in state i at time t q (t+1) = q (t) P  q (t) = q (0) P t A stationary distribution is a probability distribution q such that q = q P (steady-state behavior)

Markov Chains Theorem: Under certain conditions: There exists a unique stationary distribution q with q i > 0 for all i Let N(i,t) be the number of times the Markov chain visits state i in t steps. Then,

PageRank PageRank = the probability for this Markov chain, i.e. where n is the total number of nodes in the graph d is the probability of making a random jump. Query-independent Summarizes the “web opinion” of the page importance

PageRank P A B PageRank of P is (1-d)  (  1/4 th the PageRank of A + 1/3 rd the PageRank of B ) +d/n

Kth-Order Markov Chain What we have discussed so far is the first-order Markov Chain. More generally, in kth-order Markov Chain, each state transition depends on previous k states. What’s the size of transition probability matrix? X2X3X4X1

Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.

Hidden Markov Model A HMM is a quintuple (S, E,  S : {s 1 …s N } are the values for the hidden states E : {e 1 …e T } are the values for the observations  probability distribution of the initial state  transition probability matrix  emission probability matrix X t+1 XtXt X t-1 e t+1 etet e t-1 X1X1 e1e1 XTXT eTeT

Alternative Specification If we define a special initial state, which does not emit anything, the probability distribution  becomes part of transition probability matrix.

Notations X t : A random variable denoting the state at time t. x t : A particular value of X t. X t =s i. e 1:t : an observation sequence from time 1 to t. x 1:t : a state sequence from time 1 to t.

Forward Probability Forward Probability: P(X t =s i, e 1:t ) Why compute forward probability? Probability of observations: P(e 1:t ). Prediction: P(X t+1 =s i | e 1:t )=?

Compute Forward Probability P(X t =s i, e 1:t ) = P(X t =s i, e 1:t-1, e t ) =  X t-1 =S j P(X t-1 =s j, X t =s i, e 1:t-1, e t ) =  X t-1 =S j P(e t |X t =s i, X t-1 =s j, e 1:t-1 ) P(X t =s i, X t-1 =s j, e 1:t-1 ) =  X t-1 =S j P(e t |X t =s i ) P(X t =s i |X t-1 =s j, e 1:t-1 ) P(X t-1 =s j, e 1:t-1 ) =  X t-1 =S j P(e t |X t =s i ) P(X t =s i |X t-1 =s j ) P(X t-1 =s j, e 1:t-1 ) Same form. Use recursion

Compute Forward Probability (continued) α i (t) = P(X t =s i, e 1:t ) =  X t-1 =S j P(X t =s i |X t-1 =s j ) P(e t |X t =s i ) α j (t-1) =  j a ij b ie t α j (t-1) where a ij is an entry in the transition matrix b ie t is an entry in the emission matrix

Inferences with HMM Decoding: argmax x 1:t P(x 1:t |e 1:t ) Given an observation sequence, compute the most likely hidden state sequence. Learning: argmax  P  (e 1:t ) where  =(   ) are parameters of the HMM Given an observation sequence, find out which transition probability and emission probability table assigns the highest probability to the observations. Unsupervised learning

Viterbi Algorithm Compute argmax x 1:t P(x 1:t |e 1:t ) Since P(x 1:t |e 1:t ) = P(x 1:t, e 1:t )/P(e 1:t ), and P(e 1:t ) remains constant when we consider different x 1:t argmax x 1:t P(x 1:t |e 1:t )= argmax x 1:t P(x 1:t, e 1:t ) Since the Markov chain is a Bayes Net, P(x 1:t, e 1:t )=P(x 0 )  i=1,t P(x i |x i-1 ) P(e i |x i ) Minimize – log P(x 1:t, e 1:t ) =–logP(x 0 ) +  i=1,t (–log P(x i |x i-1 ) –log P(e i |x i ))

Viterbi Algorithm Given a HMM (S, E,  and observations o 1:t, construct a graph that consists 1+tN nodes: One initial node N node at time i. The jth node at time i represent X i =s j. The link between the nodes X i-1 =s j and X i =s k is associated with the length –log(P(X i =s k | X i-1 =s j )P(e i |X i =s k ))

The total length of a path is -logP(x 1:t,e 1:t ) The problem of finding argmax x 1:t P(x 1:t |e 1:t ) becomes that of finding the shortest path from x 0 =s 0 to one of the nodes x t =s t.

Example

Baum-Welch Algorithm The previous two kinds of computation needs parameters  =(  ). Where do the probabilities come from? Relative frequency? But the states are not observable! Solution: Baum-Welch Algorithm Unsupervised learning from observations Find argmax  P  (e 1:t )

Baum-Welch Algorithm Start with an initial set of parameters  0 Possibly arbitrary Compute pseudo counts How many times the transition from X i-i =s j to X i =s k occurred? Use the pseudo counts to obtain another (better) set of parameters  1 Iterate until P  1 (e 1:t ) is not bigger than P  (e 1:t ) A special case of EM (Expectation-Maximization)

Pseudo Counts Given the observation sequence e 1:T, the pseudo count of the state s i at time t is the probability P(X t =s i |e 1:T ) the pseudo counts of the link from X t =s i to X t+1 =s j is the probability P(X t =s i,X t+1 =s j |e 1:T ) X t =s i X t+1 =s j

Update HMM Parameters count(i): the total pseudo count of state s i. count(i,j): the total pseudo count of transition from s i to s j. Add P(X t =s i,X t+1 =s j |e 1:T ) to count(i,j) Add P(X t =s i |e 1:T ) to count(i) Add P(X t =s i |e 1:T ) to count(i,e t ) Updated a ij = count(i,j)/count(i); Updated b je t =count(j,e t )/count(j)

P(X t =s i,X t+1 =s j |e 1:T ) =P(X t =s i,X t+1 =s j, e 1:t, e t+1, e t+2:T )/ P(e 1:T ) =P(X t =s i, e 1:t )P(X t+1 =s j |X t =s i )P(e t+1 |X t+1 =s j ) P(e t+2:T |X t+1 =s j )/P(e 1:T ) =P(X t =s i, e 1:t ) a ij b je t+1 P(e t+2:T |X t+1 =s j )/P(e 1:T ) =  i (t) a ij b je t β j (t+1)/P(e 1:T )

Forward Probability

Backward Probability

X t =s i X t+1 =s j t-1 tt+1t+2  i (t)  j (t+1) a ij b je t

P(X t =s i |e 1:T ) =P(X t =s i, e 1:t, e t+1:T )/P(e 1:T ) =P(e t+1:T | X t =s i, e 1:t )P(X t =s i, e 1:t )/P(e 1:T ) = P(e t+1:T | X t =s i )P(X t =s i |e 1:t )P(e 1:t )/P(e 1:T ) =  i (t) β i (t)/P(e t+1:T |e 1:t )