CS498-EA Reasoning in AI Lecture #23 Instructor: Eyal Amir Fall Semester 2011.

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.
CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.
Dynamic Bayesian Networks (DBNs)
CSCI 121 Special Topics: Bayesian Networks Lecture #5: Dynamic Bayes Nets.
Modeling Uncertainty over time Time series of snapshot of the world “state” we are interested represented as a set of random variables (RVs) – Observable.
Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
An Introduction to Variational Methods for Graphical Models.
Introduction of Probabilistic Reasoning and Bayesian Networks
Chapter 15 Probabilistic Reasoning over Time. Chapter 15, Sections 1-5 Outline Time and uncertainty Inference: ltering, prediction, smoothing Hidden Markov.
1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –One exception: games with multiple moves In particular, the Bayesian.
… Hidden Markov Models Markov assumption: Transition model:
10/28 Temporal Probabilistic Models. Temporal (Sequential) Process A temporal process is the evolution of system state over time Often the system state.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Lecture 25: CS573 Advanced Artificial Intelligence Milind Tambe Computer Science Dept and Information Science Inst University of Southern California
Hidden Markov Model Special case of Dynamic Bayesian network Single (hidden) state variable Single (observed) observation variable Transition probability.
Genome evolution: a sequence-centric approach Lecture 3: From Trees to HMMs.
CPSC 322, Lecture 31Slide 1 Probability and Time: Markov Models Computer Science cpsc322, Lecture 31 (Textbook Chpt 6.5) March, 25, 2009.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Bayesian Networks Alan Ritter.
CPSC 422, Lecture 14Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14 Feb, 4, 2015 Slide credit: some slides adapted from Stuart.
CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.
1 Factored MDPs Alan Fern * * Based in part on slides by Craig Boutilier.
UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))
Dynamic Bayesian Networks and Particle Filtering COMPSCI 276 (chapter 15, Russel and Norvig) 2007.
Simultaneously Learning and Filtering Juan F. Mancilla-Caceres CS498EA - Fall 2011 Some slides from Connecting Learning and Logic, Eyal Amir 2006.
CS 416 Artificial Intelligence Lecture 17 Reasoning over Time Chapter 15 Lecture 17 Reasoning over Time Chapter 15.
CS Statistical Machine learning Lecture 24
CSC321: Neural Networks Lecture 16: Hidden Markov Models
QUIZ!!  In HMMs...  T/F:... the emissions are hidden. FALSE  T/F:... observations are independent given no evidence. FALSE  T/F:... each variable X.
Tractable Inference for Complex Stochastic Processes X. Boyen & D. Koller Presented by Shiau Hong Lim Partially based on slides by Boyen & Koller at UAI.
Lecture 2: Statistical learning primer for biologists
1 Chapter 15 Probabilistic Reasoning over Time. 2 Outline Time and UncertaintyTime and Uncertainty Inference: Filtering, Prediction, SmoothingInference:
Beam Sampling for the Infinite Hidden Markov Model by Jurgen Van Gael, Yunus Saatic, Yee Whye Teh and Zoubin Ghahramani (ICML 2008) Presented by Lihan.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Probability and Time. Overview  Modelling Evolving Worlds with Dynamic Baysian Networks  Simplifying Assumptions Stationary Processes, Markov Assumption.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
Dynamic Bayesian Network Fuzzy Systems Lifelog management.
Probabilistic Robotics Probability Theory Basics Error Propagation Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models.
CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models.
Integrative Genomics I BME 230. Probabilistic Networks Incorporate uncertainty explicitly Capture sparseness of wiring Incorporate multiple kinds of data.
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
Hidden Markov Models BMI/CS 576
HMM: Particle filters Lirong Xia. HMM: Particle filters Lirong Xia.
Inference in Bayesian Networks
Today.
ICS 280 Learning in Graphical Models
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Probabilistic Reasoning over Time
Probabilistic Reasoning over Time
Probability and Time: Markov Models
Probability and Time: Markov Models
CS 188: Artificial Intelligence Spring 2007
CS498-EA Reasoning in AI Lecture #20
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Probability and Time: Markov Models
Class #19 – Tuesday, November 3
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Class #16 – Tuesday, October 26
Chapter14-cont..
Probability and Time: Markov Models
HMM: Particle filters Lirong Xia. HMM: Particle filters Lirong Xia.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Presentation transcript:

CS498-EA Reasoning in AI Lecture #23 Instructor: Eyal Amir Fall Semester 2011

Last Time Time and uncertainty Inference: filtering, prediction, smoothing Hidden Markov Models (HMMs) –Model –Exact Reasoning Dynamic Bayesian Networks –Model –Exact Reasoning

Inference Tasks Filtering: –Belief state: probability of state given the evidence Prediction: –Like filtering without evidence Smoothing: –Better estimate of past states Most likelihood explanation: –Scenario that explains the evidence

Filtering (forward algorithm) Predict: Update : Recursive step E t-1 E t+1 X t-1 XtXt X t+1 EtEt

Smoothing Forwardbackward

Most Likely Explanation Finding most likely path E t-1 E t+1 X t-1 XtXt X t+1 EtEt Called Viterbi

Today Dynamic Bayesian Networks –Exact Inference –Approximate Inference

Dynamic Bayesian Network DBN is like a 2time-BN –Using the first order Markov assumptions Standard BN Time 0Time 1

Dynamic Bayesian Network Basic idea: –Copy state and evidence for each time step –Xt: set of unobservable (hidden) variables (e.g.: Pos, Vel) –Et: set of observable (evidence) variables (e.g.: Sens.A, Sens.B) Notice: Time is discrete

Example

Inference in DBN Unroll: Inference in the above BN Not efficient (depends on the sequence length)

Exact Inference in DBNs Variable Elimination: –Add slice t+1, sum out slice t using variable elimination x 1 (0) x 1 (3) x 1 (2) x 1 (1) X 2 (0) X 2 (3) X 2 (2) X 2 (1) X 3 (0) X 3 (3) X 3 (2) X 3 (1) X 4 (0) X 4 (3) X 4 (2) X 4 (1) No conditional independence after few steps

s1s4s3s2s5s1s4s3s2s5s1s4s3s2s5s1s4s3s2s5 Exact Inference in DBNs Variable Elimination: –Add slice t+1, sum out slice t using variable elimination

Variable Elimination s4s3s2s5 s4s3s2s5 s4s3s2s5 s4s3s2s5

Variable Elimination s4s3s5 s4s3s5 s4s3s5 s4s3s5

Variable Elimination s4s5 s4s5 s4s5 s4s5

DBN Representation: DelC TtTt LtLt CR t RHC t T t+1 L t+1 CR t+1 RHC t+1 f CR (L t, CR t, RHC t, CR t+1 ) f T (T t, T t+1 ) L CR RHC CR (t+1) CR (t+1) O T T E T T O F T E F T O T F E T F O F F E F F T T (t+1) T (t+1) T F RHM t RHM t+1 MtMt M t+1 f RHM (RHM t, RHM t+1 ) RHM R (t+1) R (t+1) T F

Benefits of DBN Representation Pr (Rm t+1,M t+1,T t+1, L t+1,C t+1, Rc t+1 | Rm t,M t,T t, L t,C t, Rc t ) = f Rm (Rm t, Rm t+1 ) * f M (M t, M t+1 ) * f T (T t, T t+1 ) * f L (L t, L t+1 ) * f Cr (L t, Cr t, Rc t, Cr t+1 ) * f Rc (Rc t, Rc t+1 ) - Only few parameters vs for matrix -Removes global exponential dependence s 1 s 2... s 160 s s s TtTt LtLt CR t RHC t T t+1 L t+1 CR t+1 RHC t+1 RHM t RHM t+1 MtMt M t+1

DBN Myth Bayesian Network: a decomposed structure to represent the full joint distribution Does it imply easy decomposition for the belief state? No!

Tractable, approximate representation Exact inference in DBN is intractable Need approximation –Maintain an approximate belief state –E.g. assume Gaussian processes Boyen-Koller approximation: –Factored belief state

Idea Use a decomposable representation for the belief state (pre-assume some independency)

Problem What about the approximation errors? –It might accumulate and grow unbounded…

Contraction property Main properties of B-K approximation: –Under reasonable assumptions about the stochasticity of the process, every state transition results in a contraction of the distance between the two distributions by a constant factor –Since approximation errors from previous steps decrease exponentially, the overall error remains bounded indefinitely

Basic framework Definition 1: –Prior belief state: –Posterior belief state: Monitoring task:

Simple contraction Distance measure: –Relative entropy (KL-divergence) between the actual and the approximate belief state Contraction due to O: Contraction due to T (can we do better?):

Simple contraction (cont) Definition: –Minimal mixing rate: Theorem 3 (the single process contraction theorem): –For process Q, anterior distributions φ and ψ, ulterior distributions φ ’ and ψ ’,

Simple contraction (cont) Proof Intuition:

Compound processes Mixing rate could be very small for large processes The trick is to assume some independence among subprocesses and factor the DBN along these subprocesses Fully independent subprocesses: –Theorem 5 of [BK98]: For L independent subprocesses T 1, …, T L. Let γ l be the mixing rate for T l and let γ = min l γ l. Let φ and ψ be distributions over S 1 (t), …, S L (t), and assume that ψ renders the S l (t) marginally independent. Then:

Compound processes (cont) Conditionally independent subprocesses Theorem 6 of [BK98]: –For L independent subprocesses T 1, …, T L, assume each process depends on at most r others, and each influences at most q others. Let γ l be the mixing rate for T l and let γ = min l γ l. Let φ and ψ be distributions over S 1 (t), …, S L (t), and assume that ψ renders the S l (t) marginally independent. Then:

Efficient, approximate monitoring If each approximation incurs an error bounded by ε, then –Total error =>error remains bounded Conditioning on observations might introduce momentary errors, but the expected error will contract

Approximate DBN monitoring Algorithm (based on standard clique tree inference): 1.Construct a clique tree from the 2-TBN 2.Initialize clique tree with conditional probabilities from CPTs of the DBN 3.For each time step: a.Create a working copy of the tree Y. Create σ (t+1). b.For each subprocess l, incorporate the marginal σ (t) [X (t) l ] in the appropriate factor in Y. c.Incorporate evidence r (t+1) in Y. d.Calibrate the potentials in Y. e.For each l, query Y for marginal over X l (t+1) and store it in σ (t+1).

Solution: BK algorithm With mixing, bounded projection error: total error is bounded Exact step Approximation/ marginalization step Break into smaller clusters

Boyen-Koller Approximation Example of variational inference with DBNs Computer posterior for time t from (factored) state estimate at time t-1 –Assume posterior has factored form Error is bounded