CS 416 Artificial Intelligence Lecture 17 Reasoning over Time Chapter 15 Lecture 17 Reasoning over Time Chapter 15.

Slides:



Advertisements
Similar presentations
Probabilistic Reasoning over Time
Advertisements

Lirong Xia Probabilistic reasoning over time Tue, March 25, 2014.
Modeling Uncertainty over time Time series of snapshot of the world “state” we are interested represented as a set of random variables (RVs) – Observable.
Lirong Xia Hidden Markov Models Tue, March 28, 2014.
Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
Statistical NLP: Lecture 11
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
Hidden Markov Models Theory By Johan Walters (SR 2003)
Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
Advanced Artificial Intelligence
1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.
Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –One exception: games with multiple moves In particular, the Bayesian.
… Hidden Markov Models Markov assumption: Transition model:
CS 188: Artificial Intelligence Fall 2009 Lecture 20: Particle Filtering 11/5/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.
Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California
Markov Models. Markov Chain A sequence of states: X 1, X 2, X 3, … Usually over time The transition from X t-1 to X t depends only on X t-1 (Markov Property).
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
CPSC 422, Lecture 14Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14 Feb, 4, 2015 Slide credit: some slides adapted from Stuart.
CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
CHAPTER 15 SECTION 3 – 4 Hidden Markov Models. Terminology.
Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science.
Conditional & Joint Probability A brief digression back to joint probability: i.e. both events O and H occur Again, we can express joint probability in.
QUIZ!!  T/F: The forward algorithm is really variable elimination, over time. TRUE  T/F: Particle Filtering is really sampling, over time. TRUE  T/F:
Recap: Reasoning Over Time  Stationary Markov models  Hidden Markov models X2X2 X1X1 X3X3 X4X4 rainsun X5X5 X2X2 E1E1 X1X1 X3X3 X4X4 E2E2 E3E3.
CS 188: Artificial Intelligence Fall 2006 Lecture 18: Decision Diagrams 10/31/2006 Dan Klein – UC Berkeley.
UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))
CS Statistical Machine learning Lecture 24
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.
QUIZ!!  In HMMs...  T/F:... the emissions are hidden. FALSE  T/F:... observations are independent given no evidence. FALSE  T/F:... each variable X.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
Decision Making Under Uncertainty CMSC 471 – Spring 2014 Class #12– Thursday, March 6 R&N, Chapters , material from Lise Getoor, Jean-Claude.
CPS 170: Artificial Intelligence Markov processes and Hidden Markov Models (HMMs) Instructor: Vincent Conitzer.
Probability and Time. Overview  Modelling Evolving Worlds with Dynamic Baysian Networks  Simplifying Assumptions Stationary Processes, Markov Assumption.
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Reasoning over Time  Often, we want to reason about a sequence of observations  Speech recognition  Robot localization  User attention  Medical monitoring.
Probability and Time. Overview  Modelling Evolving Worlds with Dynamic Baysian Networks  Simplifying Assumptions Stationary Processes, Markov Assumption.
CPSC 7373: Artificial Intelligence Lecture 12: Hidden Markov Models and Filters Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
CSE 573: Artificial Intelligence Autumn 2012 Particle Filters for Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell,
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models.
CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models.
CS498-EA Reasoning in AI Lecture #23 Instructor: Eyal Amir Fall Semester 2011.
HMM: Particle filters Lirong Xia. HMM: Particle filters Lirong Xia.
Today.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Course: Autonomous Machine Learning
Probabilistic Reasoning over Time
Probabilistic Reasoning over Time
Hidden Markov Models Part 2: Algorithms
Hidden Markov Autoregressive Models
CS 188: Artificial Intelligence Spring 2007
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Hidden Markov Models Markov chains not so useful for most agents
Chapter14-cont..
Hidden Markov Models Lirong Xia.
CS 416 Artificial Intelligence
Instructor: Vincent Conitzer
HMM: Particle filters Lirong Xia. HMM: Particle filters Lirong Xia.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Presentation transcript:

CS 416 Artificial Intelligence Lecture 17 Reasoning over Time Chapter 15 Lecture 17 Reasoning over Time Chapter 15

Sampling your way to a solution As time proceeds, you collect information X t – The variables you cannot observe (at time t)X t – The variables you cannot observe (at time t) E t – the variables you can observe (at time t)E t – the variables you can observe (at time t) –A particular observation is e t X a:b – indicates set of variables from X a to X bX a:b – indicates set of variables from X a to X b As time proceeds, you collect information X t – The variables you cannot observe (at time t)X t – The variables you cannot observe (at time t) E t – the variables you can observe (at time t)E t – the variables you can observe (at time t) –A particular observation is e t X a:b – indicates set of variables from X a to X bX a:b – indicates set of variables from X a to X b

Dealing with time Consider P ( x t | e 0:t ) To construct Bayes NetworkTo construct Bayes Network –x t depends on e t –e t depends on e t-1 –e t-1 depends on e t-2 –… potentially infinite number of parents Avoid this by making an assumption!Avoid this by making an assumption! Consider P ( x t | e 0:t ) To construct Bayes NetworkTo construct Bayes Network –x t depends on e t –e t depends on e t-1 –e t-1 depends on e t-2 –… potentially infinite number of parents Avoid this by making an assumption!Avoid this by making an assumption!

Markov assumption The current state depends only on a finite history of previous states First-order Markov process: the current state depends only on the previous stateFirst-order Markov process: the current state depends only on the previous state The current state depends only on a finite history of previous states First-order Markov process: the current state depends only on the previous stateFirst-order Markov process: the current state depends only on the previous state

Stationarity assumption Changes in the real world are caused by a stationary process The laws that cause a state variable to change at time t are exactly the same at all other timesThe laws that cause a state variable to change at time t are exactly the same at all other times –The variable values may change over time, but the nature of the system doesn’t change Changes in the real world are caused by a stationary process The laws that cause a state variable to change at time t are exactly the same at all other timesThe laws that cause a state variable to change at time t are exactly the same at all other times –The variable values may change over time, but the nature of the system doesn’t change

Models of state transitions State transition model Sensor model Evidence variables depend only on the current stateEvidence variables depend only on the current state – The actual state of the world causes the evidence valuesThe actual state of the world causes the evidence values State transition model Sensor model Evidence variables depend only on the current stateEvidence variables depend only on the current state – The actual state of the world causes the evidence valuesThe actual state of the world causes the evidence values

Initial Conditions Specify a prior probability over the states at time 0 P(X 0 )P(X 0 ) Specify a prior probability over the states at time 0 P(X 0 )P(X 0 )

A complete joint distribution We know Initial conditions of state variables: P (X 0 )Initial conditions of state variables: P (X 0 ) Initial observations (evidence variables):Initial observations (evidence variables): Transition probabilities:Transition probabilities: Therefore we have a complete model We know Initial conditions of state variables: P (X 0 )Initial conditions of state variables: P (X 0 ) Initial observations (evidence variables):Initial observations (evidence variables): Transition probabilities:Transition probabilities: Therefore we have a complete model

What might we do with our model? Filtering given all evidence to date, compute the belief state of the unobserved variables: P(X t | e 1:t )given all evidence to date, compute the belief state of the unobserved variables: P(X t | e 1:t )Prediction Predict the posterior distribution of a future state: P(X t+k | e 1:t )Predict the posterior distribution of a future state: P(X t+k | e 1:t )Smoothing Use recent evidence values as hindsight to predict previous values of the unobserved variables: P(X k | e 1:t ), 0<=k<tUse recent evidence values as hindsight to predict previous values of the unobserved variables: P(X k | e 1:t ), 0<=k<t Most likely explanation What sequence of states most likely generated the sequence of observations? argmax x1:t P(x 1:t | e 1:t )What sequence of states most likely generated the sequence of observations? argmax x1:t P(x 1:t | e 1:t )Filtering given all evidence to date, compute the belief state of the unobserved variables: P(X t | e 1:t )given all evidence to date, compute the belief state of the unobserved variables: P(X t | e 1:t )Prediction Predict the posterior distribution of a future state: P(X t+k | e 1:t )Predict the posterior distribution of a future state: P(X t+k | e 1:t )Smoothing Use recent evidence values as hindsight to predict previous values of the unobserved variables: P(X k | e 1:t ), 0<=k<tUse recent evidence values as hindsight to predict previous values of the unobserved variables: P(X k | e 1:t ), 0<=k<t Most likely explanation What sequence of states most likely generated the sequence of observations? argmax x1:t P(x 1:t | e 1:t )What sequence of states most likely generated the sequence of observations? argmax x1:t P(x 1:t | e 1:t )

Filtering / Prediction Given filtering up to t, can we predict t+1 from new evidence at t+1? Two steps: Project state at x t to x t+1 using transition model: P(X t | X t-1 )Project state at x t to x t+1 using transition model: P(X t | X t-1 ) Update that projection using e t+1 and sensor model: P(E t | X t )Update that projection using e t+1 and sensor model: P(E t | X t ) Given filtering up to t, can we predict t+1 from new evidence at t+1? Two steps: Project state at x t to x t+1 using transition model: P(X t | X t-1 )Project state at x t to x t+1 using transition model: P(X t | X t-1 ) Update that projection using e t+1 and sensor model: P(E t | X t )Update that projection using e t+1 and sensor model: P(E t | X t )

Filtering/Projection Project state at x t to x t+1 using transition model: P(X t | X t-1 )Project state at x t to x t+1 using transition model: P(X t | X t-1 ) Update that projection using e t+1 and sensor model: P(E t | X t )Update that projection using e t+1 and sensor model: P(E t | X t ) Project state at x t to x t+1 using transition model: P(X t | X t-1 )Project state at x t to x t+1 using transition model: P(X t | X t-1 ) Update that projection using e t+1 and sensor model: P(E t | X t )Update that projection using e t+1 and sensor model: P(E t | X t ) sensor model transition model must solve because we don’t know X t

Filtering/Projection X t+1 is really a function of e 1:t and x tX t+1 is really a function of e 1:t and x t Because we don’t know x t, we sum across all possible valuesBecause we don’t know x t, we sum across all possible values X t+1 is really a function of e 1:t and x tX t+1 is really a function of e 1:t and x t Because we don’t know x t, we sum across all possible valuesBecause we don’t know x t, we sum across all possible values must solve prediction of X t+1 (previous values not useful)

Filtering example Is it raining t ? Based on observation of umbrella t Initial probability, P(R 0 ) = Initial probability, P(R 0 ) = Transition model: P (R t+1 | r t ) = P (R t+1 | ~r t ) = Transition model: P (R t+1 | r t ) = P (R t+1 | ~r t ) = Sensor model: P (R t | u t ) = P (R t | ~u t ) = Sensor model: P (R t | u t ) = P (R t | ~u t ) = Given U 1 = TRUE, what is P(R 1 )? First, predict transition from x 0 to x 1 and update with evidenceFirst, predict transition from x 0 to x 1 and update with evidence Is it raining t ? Based on observation of umbrella t Initial probability, P(R 0 ) = Initial probability, P(R 0 ) = Transition model: P (R t+1 | r t ) = P (R t+1 | ~r t ) = Transition model: P (R t+1 | r t ) = P (R t+1 | ~r t ) = Sensor model: P (R t | u t ) = P (R t | ~u t ) = Sensor model: P (R t | u t ) = P (R t | ~u t ) = Given U 1 = TRUE, what is P(R 1 )? First, predict transition from x 0 to x 1 and update with evidenceFirst, predict transition from x 0 to x 1 and update with evidence

Given U 1 = TRUE, what is P(R 1 )? Predict transition from x 0 to x 1 Because we don’t know x 0 we have to consider all casesBecause we don’t know x 0 we have to consider all cases Predict transition from x 0 to x 1 Because we don’t know x 0 we have to consider all casesBecause we don’t know x 0 we have to consider all cases It was raining It wasn’t raining

Given U 1 = TRUE, what is P(R 1 )? Update with evidence sensor model prob. of seeing umbrella given it was raining prob. of seeing umbrella given it wasn’t raining

Given U 1 and U 2 = true, what is P(R 2 ) We computed R 1 in previous steps First, predict R 2 from R 1 We computed R 1 in previous steps First, predict R 2 from R 1

Given U 1 and U 2 = true, what is P(R 2 ) Second, update R 2 with evidence When queried to solve for R n Use a forward algorithm that recursively solves for R i for i < nUse a forward algorithm that recursively solves for R i for i < n Second, update R 2 with evidence When queried to solve for R n Use a forward algorithm that recursively solves for R i for i < nUse a forward algorithm that recursively solves for R i for i < n From R 1

Prediction Use evidence 1  t to predict state at t+k+1 For all possible states x t+k consider the transition model to x t+k+1For all possible states x t+k consider the transition model to x t+k+1 For all states x t+k consider the likelihood given e 1:tFor all states x t+k consider the likelihood given e 1:t Use evidence 1  t to predict state at t+k+1 For all possible states x t+k consider the transition model to x t+k+1For all possible states x t+k consider the transition model to x t+k+1 For all states x t+k consider the likelihood given e 1:tFor all states x t+k consider the likelihood given e 1:t

Prediction Limits of prediction As k increases, a fixed output results – stationary distributionAs k increases, a fixed output results – stationary distribution The time to reach the stationary distribution – mixing timeThe time to reach the stationary distribution – mixing time Limits of prediction As k increases, a fixed output results – stationary distributionAs k increases, a fixed output results – stationary distribution The time to reach the stationary distribution – mixing timeThe time to reach the stationary distribution – mixing time

Smoothing P(X k | e 1:t ), 0<=k<t Attack this in two partsAttack this in two parts –P(X k | e 1:k, e k+1:t ) P(X k | e 1:t ), 0<=k<t Attack this in two partsAttack this in two parts –P(X k | e 1:k, e k+1:t ) Bayes b k+1:t = P(e k+1:t | X k )

Smoothing Forward part: What is probability X k given evidence 1  kWhat is probability X k given evidence 1  k Backward part: What is probability of observing evidence k+1  t given X kWhat is probability of observing evidence k+1  t given X k How do we compute the backward part? Forward part: What is probability X k given evidence 1  kWhat is probability X k given evidence 1  k Backward part: What is probability of observing evidence k+1  t given X kWhat is probability of observing evidence k+1  t given X k How do we compute the backward part?

Smoothing Computing the backward part

Whiteboard

Example Probability r 1 given u 1 and u 2 solved for this in step one of forward soln.

Viterbi Consider finding the most likely path through a sequence of states given observations Could enumerate all 2 5 permutations of five-sequence rain/~rain options and evaluate P(x 1:5 | e 1:5 ) Consider finding the most likely path through a sequence of states given observations Could enumerate all 2 5 permutations of five-sequence rain/~rain options and evaluate P(x 1:5 | e 1:5 )

Viterbi Could use smoothing to find posterior distribution for weather at each time step and create path through most probable – treats each as a single step, not a sequence!

Viterbi Specify a final state and find previous states that form most likely path Let R 5 = trueLet R 5 = true Find R 4 such that it is on the optimal path to R 5. Consider each value of R 4Find R 4 such that it is on the optimal path to R 5. Consider each value of R 4 –Evaluate how likely it will lead to R 5 =true and how easily it is reached  Find R 3 such that it is on optimal path to R 4. Consider each value… Specify a final state and find previous states that form most likely path Let R 5 = trueLet R 5 = true Find R 4 such that it is on the optimal path to R 5. Consider each value of R 4Find R 4 such that it is on the optimal path to R 5. Consider each value of R 4 –Evaluate how likely it will lead to R 5 =true and how easily it is reached  Find R 3 such that it is on optimal path to R 4. Consider each value…

Viterbi – Recursive algorithm

Viterbi - Recursive

The Viterbi algorithm is just like the filtering algorithm except for two changes Replace f 1:t = P(X t | e 1:t )Replace f 1:t = P(X t | e 1:t ) –with: Summation over x t replaced with max over x tSummation over x t replaced with max over x t The Viterbi algorithm is just like the filtering algorithm except for two changes Replace f 1:t = P(X t | e 1:t )Replace f 1:t = P(X t | e 1:t ) –with: Summation over x t replaced with max over x tSummation over x t replaced with max over x t

Review Forward:Forward/Backward:Max:Forward:Forward/Backward:Max:

Hidden Markov Models (HMMs) Represent the state of the world with a single discrete variable If your state has multiple variables, form one variable whose value takes on all possible tuples of multiple variablesIf your state has multiple variables, form one variable whose value takes on all possible tuples of multiple variables Let number of states be SLet number of states be S –Transition model is an SxS matrix  Probability of transitioning from any state to another –Evidence is an SxS diagonal matrix  Diagonal consists of likelihood of observation at time t Represent the state of the world with a single discrete variable If your state has multiple variables, form one variable whose value takes on all possible tuples of multiple variablesIf your state has multiple variables, form one variable whose value takes on all possible tuples of multiple variables Let number of states be SLet number of states be S –Transition model is an SxS matrix  Probability of transitioning from any state to another –Evidence is an SxS diagonal matrix  Diagonal consists of likelihood of observation at time t

Kalman Filters Gauss invented least-squares estimation and important parts of statistics in 1745 When he was 18 and trying to understand the revolution of heavy bodies (by collecting data from telescopes)When he was 18 and trying to understand the revolution of heavy bodies (by collecting data from telescopes) Invented by Kalman in 1960 A means to update predictions of continuous variables given observations (fast and discrete for computer programs)A means to update predictions of continuous variables given observations (fast and discrete for computer programs) –Critical for getting Apollo spacecrafts to insert into orbit around Moon. Gauss invented least-squares estimation and important parts of statistics in 1745 When he was 18 and trying to understand the revolution of heavy bodies (by collecting data from telescopes)When he was 18 and trying to understand the revolution of heavy bodies (by collecting data from telescopes) Invented by Kalman in 1960 A means to update predictions of continuous variables given observations (fast and discrete for computer programs)A means to update predictions of continuous variables given observations (fast and discrete for computer programs) –Critical for getting Apollo spacecrafts to insert into orbit around Moon.