Advanced Artificial Intelligence

Slides:



Advertisements
Similar presentations
Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Advertisements

State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.
Lirong Xia Probabilistic reasoning over time Tue, March 25, 2014.
Dynamic Bayesian Networks (DBNs)
Lirong Xia Hidden Markov Models Tue, March 28, 2014.
Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
Bayes Filters Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics TexPoint fonts used in EMF. Read the.
1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.
QUIZ!!  T/F: Rejection Sampling without weighting is not consistent. FALSE  T/F: Rejection Sampling (often) converges faster than Forward Sampling. FALSE.
HMMs Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew.
… Hidden Markov Models Markov assumption: Transition model:
CS 188: Artificial Intelligence Fall 2009 Lecture 20: Particle Filtering 11/5/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Markov Models. Markov Chain A sequence of states: X 1, X 2, X 3, … Usually over time The transition from X t-1 to X t depends only on X t-1 (Markov Property).
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Announcements Homework 8 is out Final Contest (Optional)
CPSC 422, Lecture 14Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14 Feb, 4, 2015 Slide credit: some slides adapted from Stuart.
CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.
CS 188: Artificial Intelligence Fall 2009 Lecture 18: Decision Diagrams 10/29/2009 Dan Klein – UC Berkeley.
CHAPTER 15 SECTION 3 – 4 Hidden Markov Models. Terminology.
CS 188: Artificial Intelligence Fall 2008 Lecture 18: Decision Diagrams 10/30/2008 Dan Klein – UC Berkeley 1.
QUIZ!!  T/F: The forward algorithm is really variable elimination, over time. TRUE  T/F: Particle Filtering is really sampling, over time. TRUE  T/F:
CSE 573: Artificial Intelligence Autumn 2012
Recap: Reasoning Over Time  Stationary Markov models  Hidden Markov models X2X2 X1X1 X3X3 X4X4 rainsun X5X5 X2X2 E1E1 X1X1 X3X3 X4X4 E2E2 E3E3.
CS 188: Artificial Intelligence Fall 2008 Lecture 19: HMMs 11/4/2008 Dan Klein – UC Berkeley 1.
CHAPTER 15 SECTION 1 – 2 Markov Models. Outline Probabilistic Inference Bayes Rule Markov Chains.
CS 188: Artificial Intelligence Fall 2006 Lecture 18: Decision Diagrams 10/31/2006 Dan Klein – UC Berkeley.
UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))
Announcements  Project 3 will be posted ASAP: hunting ghosts with sonar.
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
CS 416 Artificial Intelligence Lecture 17 Reasoning over Time Chapter 15 Lecture 17 Reasoning over Time Chapter 15.
CS Statistical Machine learning Lecture 24
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
CSE 473: Artificial Intelligence Spring 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
QUIZ!!  In HMMs...  T/F:... the emissions are hidden. FALSE  T/F:... observations are independent given no evidence. FALSE  T/F:... each variable X.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
CPS 170: Artificial Intelligence Markov processes and Hidden Markov Models (HMMs) Instructor: Vincent Conitzer.
Bayes Nets & HMMs Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Reasoning over Time  Often, we want to reason about a sequence of observations  Speech recognition  Robot localization  User attention  Medical monitoring.
CSE 473: Artificial Intelligence Spring 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart.
CPSC 7373: Artificial Intelligence Lecture 12: Hidden Markov Models and Filters Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
CSE 573: Artificial Intelligence Autumn 2012 Particle Filters for Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell,
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
CSE 473: Artificial Intelligence Spring 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart.
CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models.
HMM: Particle filters Lirong Xia. HMM: Particle filters Lirong Xia.
Probabilistic reasoning over time
Probabilistic Reasoning
Artificial Intelligence
Probabilistic Reasoning Over Time
CAP 5636 – Advanced Artificial Intelligence
Probabilistic Reasoning
CS 188: Artificial Intelligence Spring 2007
CS 188: Artificial Intelligence Fall 2008
Instructors: Fei Fang (This Lecture) and Dave Touretzky
CS 188: Artificial Intelligence
Probabilistic Reasoning
Probabilistic Reasoning
Hidden Markov Models Markov chains not so useful for most agents
Hidden Markov Models (cont.) Markov Decision Processes
Hidden Markov Models Lirong Xia.
Speech recognition, machine learning
Announcements Assignments HW10 Due Wed 4/17 HW11
HMM: Particle filters Lirong Xia. HMM: Particle filters Lirong Xia.
Probabilistic reasoning over time
Speech recognition, machine learning
Presentation transcript:

Advanced Artificial Intelligence Lecture 6: Hidden Markov Models and Temporal Filtering

Class-On-A-Slide X1 X2 X3 X4 X5 E1 E2 E3 E4 E5

Example: Minerva

Example: Robot Localization

Example: Groundhog

Example: Groundhog

Example: Groundhog

Overview Markov Chains Hidden Markov Models Particle Filters More on HMMs

Reasoning over Time Often, we want to reason about a sequence of observations Speech recognition Robot localization User attention Medical monitoring Financial modeling

Markov Models A Markov model is a chain-structured BN Each node is identically distributed (stationarity) Value of X at a given time is called the state As a BN: Parameters: called transition probabilities or dynamics, specify how the state evolves over time (also, initial probs) X1 X2 X3 X4

Conditional Independence X1 X2 X3 X4 Basic conditional independence: Past and future independent of the present Each time step only depends on the previous This is called the Markov property Note that the chain is just a (growing) BN We can always use generic BN reasoning on it if we truncate the chain at a fixed length

Example: Markov Chain Weather: States: X = {rain, sun} Transitions: 0.1 Weather: States: X = {rain, sun} Transitions: Initial distribution: 1.0 sun What’s the probability distribution after one step? 0.9 rain sun This is a CPT, not a BN! 0.9 0.1

Mini-Forward Algorithm Question: What’s P(X) on some day t? An instance of variable elimination! sun sun sun sun rain rain rain rain Forward simulation

Example From initial observation of sun From initial observation of rain P(X1) P(X2) P(X3) P(X) P(X1) P(X2) P(X3) P(X)

Stationary Distributions If we simulate the chain long enough: What happens? Uncertainty accumulates Eventually, we have no idea what the state is! Stationary distributions: For most chains, the distribution we end up in is independent of the initial distribution Called the stationary distribution of the chain Usually, can only predict a short time out

Example: Web Link Analysis PageRank over a web graph Each web page is a state Initial distribution: uniform over pages Transitions: With prob. c, uniform jump to a random page (dotted lines, not all shown) With prob. 1-c, follow a random outlink (solid lines) Stationary distribution Will spend more time on highly reachable pages Google 1.0 returned the set of pages containing all your keywords in decreasing rank, now all search engines use link analysis along with many other factors (rank actually getting less important over time)

Overview Markov Chains Hidden Markov Models Particle Filters More on HMMs

Hidden Markov Models Markov chains not so useful for most agents Eventually you don’t know anything anymore Need observations to update your beliefs Hidden Markov models (HMMs) Underlying Markov chain over states S You observe outputs (effects) at each time step As a Bayes net: X1 X2 X3 X4 X5 E1 E2 E3 E4 E5

Example: Robot Localization Example from Michael Pfeiffer Prob 1 t=0 Sensor model: never more than 1 mistake Motion model: may not execute action with small prob.

Example: Robot Localization Prob 1 t=1

Example: Robot Localization Prob 1 t=2

Example: Robot Localization Prob 1 t=3

Example: Robot Localization Prob 1 t=4

Example: Robot Localization Prob 1 t=5

Hidden Markov Model HMMs have two important independence properties: Markov hidden process, future depends on past via the present Current observation independent of all else given current state Quiz: does this mean that observations are mutually independent? [No, correlated by the hidden state] X1 X2 X3 X4 X5 E1 E2 E3 E4 E5

Inference in HMMs (Filtering) X1 X1 X2 E1

Example An HMM is defined by: Initial distribution: Transitions: Emissions:

Example HMM

Example: HMMs in Robotics 3030

Overview Markov Chains Hidden Markov Models Particle Filters More on HMMs

Particle Filtering Sometimes |X| is too big to use exact inference |X| may be too big to even store B(X) E.g. X is continuous |X|2 may be too big to do updates Solution: approximate inference Track samples of X, not all values Samples are called particles Time per step is linear in the number of samples But: number needed may be large In memory: list of particles, not states This is how robot localization works in practice 0.0 0.1 0.2 0.5

Representation: Particles Our representation of P(X) is now a list of N particles (samples) Generally, N << |X| Storing map from X to counts would defeat the point P(x) approximated by number of particles with value x So, many x will have P(x) = 0! More particles, more accuracy For now, all particles have a weight of 1 Particles: (3,3) (1,2) (3,2) (2,3)

Particle Filtering: Elapse Time Each particle is moved by sampling its next position from the transition model This is like prior sampling – samples’ frequencies reflect the transition probs Here, most samples move clockwise, but some move in another direction or stay in place This captures the passage of time If we have enough samples, close to the exact values before and after (consistent)

Particle Filtering: Observe Slightly trickier: Don’t do rejection sampling (why not?) We don’t sample the observation, we fix it This is similar to likelihood weighting, so we downweight our samples based on the evidence Note that, as before, the probabilities don’t sum to one, since most have been downweighted (in fact they sum to an approximation of P(e))

Particle Filtering: Resample Old Particles: (1,3) w=0.1 (3,2) w=0.9 (3,3) w=0.4 (2,3) w=0.3 (2,2) w=0.4 (3,1) w=0.4 (2,1) w=0.9 Rather than tracking weighted samples, we resample N times, we choose from our weighted sample distribution (i.e. draw with replacement) This is analogous to renormalizing the distribution Now the update is complete for this time step, continue with the next one New Particles: (2,3) w=1 (3,1) w=1 (3,2) w=1 (2,2) w=1 (3,3) w=1

Particle Filters

Sensor Information: Importance Sampling

Robot Motion

Sensor Information: Importance Sampling

Robot Motion

Particle Filter Algorithm Sample the next generation for particles using the proposal distribution Compute the importance weights : weight = target distribution / proposal distribution Resampling: “Replace unlikely samples by more likely ones”

Particle Filter Algorithm Algorithm particle_filter( St-1, ut-1 zt): For Generate new samples Sample index j(i) from the discrete distribution given by wt-1 Sample from using and Compute importance weight Update normalization factor Insert For Normalize weights

Overview Markov Chains Hidden Markov Models Particle Filters More on HMMs

Other uses of HMM Find most likely sequence of states Viterbi algorithm Learn HMM parameters from data Baum-Welch (EM) algorithm Other types of HMMs Continuous, Gaussian-linear: Kalman filter Structured transition/emission probabilities: Dynamic Bayes network (DBN)

Real HMM Examples Speech recognition HMMs: Machine translation HMMs: Observations are acoustic signals (continuous valued) States are specific positions in specific words (so, tens of thousands) Machine translation HMMs: Observations are words (tens of thousands) States are translation options (dozens per word) Robot tracking: Observations are range readings (continuous) States are positions on a map (continuous)

HMM Application Domain: Speech Speech input is an acoustic wave form s p ee ch l a b “l” to “a” transition: Graphs from Simon Arnfield’s web tutorial on speech, Sheffield: http://www.psyc.leeds.ac.uk/research/cogn/speech/tutorial/

Learning Problem Given example observation trajectories umbrella, umbrella, no-umbrella, umbrella no-umbrella, no-umbrella, no-umbrella … Given: structure of HMM Problem: Learn probabilities P(x), P(x’|x), P(z|x)

Learning: Basic Idea Initialize P(x), P(x’|x), P(z|x) randomly Calculate for each sequence z1..zK P(x1 | z1..zK), P(x2 | z1..zK), …, P(xN | z1..zK) Those are known as “expectations” Now, compute P(x), P(x’|x), P(z|x) to best match those internal expectations Iterate

Let’s first learn a Markov Chain 3 Episodes (R=rain, S=sun) S, R, R, S, S, S, S, S, S, S R, S, S, S, S, R, R, R, R, R S, S, S, R, R, S, S, S, S, S Initial probability P(S) = 2/3 State transition probability P(S’|S) = 5/6 P(R’|R) = 2/3