CPS 170: Artificial Intelligence Markov processes and Hidden Markov Models (HMMs) Instructor: Vincent Conitzer.

Slides:



Advertisements
Similar presentations
Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Advertisements

Lirong Xia Probabilistic reasoning over time Tue, March 25, 2014.
Dynamic Bayesian Networks (DBNs)
Lirong Xia Hidden Markov Models Tue, March 28, 2014.
Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
Introduction to Hidden Markov Models
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Patterns, Profiles, and Multiple Alignment.
Cognitive Computer Vision
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Advanced Artificial Intelligence
1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.
Part II. Statistical NLP Advanced Artificial Intelligence (Hidden) Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Part II. Statistical NLP Advanced Artificial Intelligence Hidden Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –One exception: games with multiple moves In particular, the Bayesian.
… Hidden Markov Models Markov assumption: Transition model:
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
CS 188: Artificial Intelligence Fall 2009 Lecture 20: Particle Filtering 11/5/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.
Temporal Processes Eran Segal Weizmann Institute.
CPSC 322, Lecture 31Slide 1 Probability and Time: Markov Models Computer Science cpsc322, Lecture 31 (Textbook Chpt 6.5) March, 25, 2009.
CPSC 322, Lecture 32Slide 1 Probability and Time: Hidden Markov Models (HMMs) Computer Science cpsc322, Lecture 32 (Textbook Chpt 6.5) March, 27, 2009.
Hidden Markov Models David Meir Blei November 1, 1999.
CPSC 422, Lecture 14Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14 Feb, 4, 2015 Slide credit: some slides adapted from Stuart.
CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.
CHAPTER 15 SECTION 3 – 4 Hidden Markov Models. Terminology.
Instructor: Vincent Conitzer
CHAPTER 15 SECTION 1 – 2 Markov Models. Outline Probabilistic Inference Bayes Rule Markov Chains.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
CPSC 322, Lecture 32Slide 1 Probability and Time: Hidden Markov Models (HMMs) Computer Science cpsc322, Lecture 32 (Textbook Chpt 6.5.2) Nov, 25, 2013.
UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))
Hidden Markov Models & POS Tagging Corpora and Statistical Methods Lecture 9.
CS 416 Artificial Intelligence Lecture 17 Reasoning over Time Chapter 15 Lecture 17 Reasoning over Time Chapter 15.
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
QUIZ!!  In HMMs...  T/F:... the emissions are hidden. FALSE  T/F:... observations are independent given no evidence. FALSE  T/F:... each variable X.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
CPS 570: Artificial Intelligence Markov decision processes, POMDPs
Probability and Time. Overview  Modelling Evolving Worlds with Dynamic Baysian Networks  Simplifying Assumptions Stationary Processes, Markov Assumption.
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
Reasoning Under Uncertainty: Independence and Inference CPSC 322 – Uncertainty 5 Textbook §6.3.1 (and for HMMs) March 25, 2011.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
CPSC 7373: Artificial Intelligence Lecture 12: Hidden Markov Models and Filters Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215.
CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models.
Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy
CS498-EA Reasoning in AI Lecture #23 Instructor: Eyal Amir Fall Semester 2011.
Probabilistic reasoning over time
Instructor: Vincent Conitzer
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Probabilistic Reasoning over Time
Hidden Markov Models Part 2: Algorithms
CS 188: Artificial Intelligence Spring 2007
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Hidden Markov Model LR Rabiner
Probabilistic Reasoning
Instructor: Vincent Conitzer
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Hidden Markov Models Lirong Xia.
Instructor: Vincent Conitzer
CSCI 5582 Artificial Intelligence
Instructor: Vincent Conitzer
Probabilistic reasoning over time
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Presentation transcript:

CPS 170: Artificial Intelligence Markov processes and Hidden Markov Models (HMMs) Instructor: Vincent Conitzer

Motivation The Bayes nets we considered so far were static: they referred to a single point in time –E.g., medical diagnosis Agent needs to model how the world evolves –Speech recognition software needs to process speech over time –Artificially intelligent software assistant needs to keep track of user’s intentions over time –… … …

Markov processes We have time periods t = 0, 1, 2, … In each period t, the world is in a certain state S t The Markov assumption: given S t, S t+1 is independent of all S i with i < t –P(S t+1 | S 1, S 2, …, S t ) = P(S t+1 | S t ) –Given the current state, history tells us nothing more about the future S1S1 S2S2 S3S3 … StSt … Typically, all the CPTs are the same: For all t, P(S t+1 = j | S t = i) = a ij (stationarity assumption)

Weather example S t is one of {s, c, r} (sun, cloudy, rain) Transition probabilities: s c r not a Bayes net! Also need to specify an initial distribution P(S 0 ) Throughout, assume P(S 0 = s) = 1

Weather example… s c r What is the probability that it rains two days from now? P(S 2 = r) P(S 2 = r) = P(S 2 = r, S 1 = r) + P(S 2 = r, S 1 = s) + P(S 2 = r, S 1 = c) =.1*.3 +.6*.1 +.3*.3 =.18

Weather example… s c r What is the probability that it rains three days from now? Computationally inefficient way: P(S 3 = r) = P(S 3 = r, S 2 = r, S 1 = r) + P(S 3 = r, S 2 = r, S 1 = s) + … For n periods into the future, need to sum over 3 n-1 paths

Weather example… s c r More efficient: P(S 3 = r) = P(S 3 = r, S 2 = r) + P(S 3 = r, S 2 = s) + P(S 3 = r, S 2 = c) = P(S 3 = r | S 2 = r)P(S 2 = r) + P(S 3 = r | S 2 = s)P(S 2 = s) + P(S 3 = r | S 2 = c)P(S 2 = c) Only hard part: figure out P(S 2 ) Main idea: compute distribution P(S 1 ), then P(S 2 ), then P(S 3 ) Linear in number of periods! example on board

Stationary distributions As t goes to infinity, “generally,” the distribution P(S t ) will converge to a stationary distribution A distribution given by probabilitiesπ i (where i is a state) is stationary if: P(S t = i) =π i means that P(S t+1 = i) =π i Of course, P(S t+1 = i) = Σ j P(S t+1 = i, S t = j) = Σ j P(S t = j) a ji So, stationary distribution is defined by π i = Σ j π j a ji

Computing the stationary distribution π s =.6π s +.4π c +.2π r π c =.3π s +.3π c +.5π r π r =.1π s +.3π c +.3π r s c r

Restrictiveness of Markov models Are past and future really independent given current state? E.g., suppose that when it rains, it rains for at most 2 days S1S1 S2S2 S3S3 S4S4 … Second-order Markov process Workaround: change meaning of “state” to events of last 2 days S 1, S 2 … S 2, S 3 S 3, S 4 S 4, S 5 Another approach: add more information to the state E.g., the full state of the world would include whether the sky is full of water –Additional information may not be observable –Blowup of number of states…

Hidden Markov models (HMMs) Same as Markov model, except we cannot see the state Instead, we only see an observation each period, which depends on the current state S1S1 S2S2 S3S3 … StSt … Still need a transition model: P(S t+1 = j | S t = i) = a ij Also need an observation model: P(O t = k | S t = i) = b ik O1O1 O2O2 O3O3 … OtOt …

Weather example extended to HMM Transition probabilities: s c r Observation: labmate wet or dry b sw =.1, b cw =.3, b rw =.8

HMM weather example: a question s c r You have been stuck in the lab for three days (!) On those days, your labmate was dry, wet, wet, respectively What is the probability that it is now raining outside? P(S 2 = r | O 0 = d, O 1 = w, O 2 = w) By Bayes’ rule, really want to know P(S 2, O 0 = d, O 1 = w, O 2 = w) b sw =.1 b cw =.3 b rw =.8

Solving the question s c r Computationally efficient approach: first compute P(S 1 = i, O 0 = d, O 1 = w) for all states i General case: solve for P(S t, O 0 = o 0, O 1 = o 1, …, O t = o t ) for t=1, then t=2, … This is called monitoring P(S t, O 0 = o 0, O 1 = o 1, …, O t = o t ) = Σ s t-1 P(S t-1 = s t-1, O 0 = o 0, O 1 = o 1, …, O t-1 = o t-1 ) P(S t | S t-1 = s t-1 ) P(O t = o t | S t ) b sw =.1 b cw =.3 b rw =.8

Predicting further out s c r You have been stuck in the lab for three days On those days, your labmate was dry, wet, wet, respectively What is the probability that two days from now it will be raining outside? P(S 4 = r | O 0 = d, O 1 = w, O 2 = w) b sw =.1 b cw =.3 b rw =.8

Predicting further out, continued… s c r Want to know: P(S 4 = r | O 0 = d, O 1 = w, O 2 = w) Already know how to get: P(S 2 | O 0 = d, O 1 = w, O 2 = w) P(S 3 = r | O 0 = d, O 1 = w, O 2 = w) = Σ s 2 P(S 3 = r, S 2 = s 2 | O 0 = d, O 1 = w, O 2 = w) Σ s 2 P(S 3 = r | S 2 = s 2 )P(S 2 = s 2 | O 0 = d, O 1 = w, O 2 = w) Etc. for S 4 So: monitoring first, then straightforward Markov process updates b sw =.1 b cw =.3 b rw =.8

Integrating newer information s c r You have been stuck in the lab for four days (!) On those days, your labmate was dry, wet, wet, dry respectively What is the probability that two days ago it was raining outside? P(S 1 = r | O 0 = d, O 1 = w, O 2 = w, O 3 = d) – Smoothing or hindsight problem b sw =.1 b cw =.3 b rw =.8

Hindsight problem continued… s c r Want: P(S 1 = r | O 0 = d, O 1 = w, O 2 = w, O 3 = d) “Partial” application of Bayes’ rule: P(S 1 = r | O 0 = d, O 1 = w, O 2 = w, O 3 = d) = P(S 1 = r, O 2 = w, O 3 = d | O 0 = d, O 1 = w) / P(O 2 = w, O 3 = d | O 0 = d, O 1 = w) So really want to know P(S 1, O 2 = w, O 3 = d | O 0 = d, O 1 = w) b sw =.1 b cw =.3 b rw =.8

Hindsight problem continued… s c r Want to know P(S 1 = r, O 2 = w, O 3 = d | O 0 = d, O 1 = w) P(S 1 = r, O 2 = w, O 3 = d | O 0 = d, O 1 = w) = P(S 1 = r | O 0 = d, O 1 = w) P(O 2 = w, O 3 = d | S 1 = r) Already know how to compute P(S 1 = r | O 0 = d, O 1 = w) Just need to compute P(O 2 = w, O 3 = d | S 1 = r) b sw =.1 b cw =.3 b rw =.8

Hindsight problem continued… s c r Just need to compute P(O 2 = w, O 3 = d | S 1 = r) P(O 2 = w, O 3 = d | S 1 = r) = Σ s 2 P(S 2 = s 2, O 2 = w, O 3 = d | S 1 = r) = Σ s 2 P(S 2 = s 2 | S 1 = r) P(O 2 = w | S 2 = s 2 ) P(O 3 = d | S 2 = s 2 ) First two factors directly in the model; last factor is a “smaller” problem of the same kind Use dynamic programming, backwards from the future – Similar to forwards approach from the past b sw =.1 b cw =.3 b rw =.8

Backwards reasoning in general Want to know P(O k+1 = o k+1, …, O t = o t | S k ) First compute P(O t = o t | S t-1 ) =Σ s t P(S t = s t | S t-1 )P(O t = o t | S t = s t ) Then compute P(O t = o t, O t-1 = o t-1 | S t-2 ) =Σ s t-1 P(S t-1 = s t-1 | S t-2 )P(O t-1 = o t-1 | S t-1 = s t-1 ) P(O t = o t | S t-1 = s t-1 ) Etc.

Variable elimination Because all of this is inference in a Bayes net, we can also just do variable elimination S1S1 S2S2 S3S3 … StSt … E.g., P(S 3 = r, O 1 = d, O 2 = w, O 3 = w) = Σ s 2 Σ s 1 P(S 1 =s 1 )P(O 1 =d|S 1 =s 1 )P(S 2 =s 2 |S 1 =s 1 ) P(O 2 =w|S 2 =s 2 )P(S 3 =r|S 2 =s 2 )P(O 3 =w|S 3 =r) It’s a tree, so variable elimination works well O1O1 O2O2 O3O3 … OtOt …

Dynamic Bayes Nets So far assumed that each period has one variable for state, one variable for observation Often better to divide state and observation up into multiple variables NC wind, 1 weather in Durham, 1 weather in Beaufort, 1 NC wind, 2 weather in Durham, 2 weather in Beaufort, 2 … edges both within a period, and from one period to the next…

Some interesting things we skipped Finding the most likely sequence of states, given observations – Not necessary equal to the sequence of most likely states! (example?) – Viterbi algorithm Key idea: for each period t, for every state, keep track of most likely sequence to that state at that period, given evidence up to that period Continuous variables Approximate inference methods – Particle filtering