Tractable Inference for Complex Stochastic Processes X. Boyen & D. Koller Presented by Shiau Hong Lim Partially based on slides by Boyen & Koller at UAI.

Slides:

Advertisements

Similar presentations

Bayesian Belief Propagation

Advertisements

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.

State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.

1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.

SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.

Dynamic Bayesian Networks (DBNs)

Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.

Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections

An Introduction to Variational Methods for Graphical Models.

Chapter 15 Probabilistic Reasoning over Time. Chapter 15, Sections 1-5 Outline Time and uncertainty Inference: ltering, prediction, smoothing Hidden Markov.

1 Slides for the book: Probabilistic Robotics Authors: Sebastian Thrun Wolfram Burgard Dieter Fox Publisher: MIT Press, Web site for the book & more.

What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.

1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

10/28 Temporal Probabilistic Models. Temporal (Sequential) Process A temporal process is the evolution of system state over time Often the system state.

1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.

1 Distributed localization of networked cameras Stanislav Funiak Carlos Guestrin Carnegie Mellon University Mark Paskin Stanford University Rahul Sukthankar.

Distributed Inference in Dynamical Systems Emergency response systems: monitoring in hazardous conditions sensor calibration, localization Autonomous teams.

Global Approximate Inference Eran Segal Weizmann Institute.

CS 547: Sensing and Planning in Robotics Gaurav S. Sukhatme Computer Science Robotic Embedded Systems Laboratory University of Southern California

WHY ARE DBNs SPARSE? Shaunak Chatterjee and Stuart Russell, UC Berkeley Sparsity in DBNs is counter-intuitive Consider the unrolled version of a sample.

Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.

5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.

Single Point of Contact Manipulation of Unknown Objects Stuart Anderson Advisor: Reid Simmons School of Computer Science Carnegie Mellon University.

CPSC 322, Lecture 31Slide 1 Probability and Time: Markov Models Computer Science cpsc322, Lecture 31 (Textbook Chpt 6.5) March, 25, 2009.

11/14  Continuation of Time & Change in Probabilistic Reasoning Project 4 progress? Grade Anxiety? Make-up Class  On Monday?  On Wednesday?

Bayesian Networks Alan Ritter.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

CPSC 422, Lecture 14Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14 Feb, 4, 2015 Slide credit: some slides adapted from Stuart.

. Expressive Graphical Models in Variational Approximations: Chain-Graphs and Hidden Variables Tal El-Hay & Nir Friedman School of Computer Science & Engineering.

Does Naïve Bayes always work?

Quiz 4: Mean: 7.0/8.0 (= 88%) Median: 7.5/8.0 (= 94%)

1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.

ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:

Recap: Reasoning Over Time  Stationary Markov models  Hidden Markov models X2X2 X1X1 X3X3 X4X4 rainsun X5X5 X2X2 E1E1 X1X1 X3X3 X4X4 E2E2 E3E3.

1 Robot Environment Interaction Environment perception provides information about the environment’s state, and it tends to increase the robot’s knowledge.

UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))

Computer Vision Group Prof. Daniel Cremers Autonomous Navigation for Flying Robots Lecture 6.1: Bayes Filter Jürgen Sturm Technische Universität München.

-Arnaud Doucet, Nando de Freitas et al, UAI

Bayesian inference for Plackett-Luce ranking models

Simultaneously Learning and Filtering Juan F. Mancilla-Caceres CS498EA - Fall 2011 Some slides from Connecting Learning and Logic, Eyal Amir 2006.

CS Statistical Machine learning Lecture 24

The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)

1 Chapter 15 Probabilistic Reasoning over Time. 2 Outline Time and UncertaintyTime and Uncertainty Inference: Filtering, Prediction, SmoothingInference:

1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

CPSC 422, Lecture 17Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17 Oct, 19, 2015 Slide Sources D. Koller, Stanford CS - Probabilistic.

Tracking with dynamics

1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.

Probability and Time. Overview  Modelling Evolving Worlds with Dynamic Baysian Networks  Simplifying Assumptions Stationary Processes, Markov Assumption.

Uncertain Observation Times Shaunak Chatterjee & Stuart Russell Computer Science Division University of California, Berkeley.

Rao-Blackwellised Particle Filtering for Dynamic Bayesian Network Arnaud Doucet Nando de Freitas Kevin Murphy Stuart Russell.

Smart Sleeping Policies for Wireless Sensor Networks Venu Veeravalli ECE Department & Coordinated Science Lab University of Illinois at Urbana-Champaign.

Reasoning Under Uncertainty: Independence and Inference CPSC 322 – Uncertainty 5 Textbook §6.3.1 (and for HMMs) March 25, 2011.

Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.

Can small quantum systems learn? NATHAN WIEBE & CHRISTOPHER GRANADE, DEC

Bayesian Belief Propagation for Image Understanding David Rosenberg.

 Dynamic Bayesian Networks Beyond Graphical Models – Carlos Guestrin Carnegie Mellon University December 1 st, 2006 Readings: K&F: 18.1, 18.2,

Probabilistic Robotics Probability Theory Basics Error Propagation Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics.

Probabilistic Reasoning Inference and Relational Bayesian Networks.

CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models.

CS498-EA Reasoning in AI Lecture #23 Instructor: Eyal Amir Fall Semester 2011.

Integrative Genomics I BME 230. Probabilistic Networks Incorporate uncertainty explicitly Capture sparseness of wiring Incorporate multiple kinds of data.

Does Naïve Bayes always work?

Inference in Bayesian Networks

Markov ó Kalman Filter Localization

Course: Autonomous Machine Learning

Filtering and State Estimation: Basic Concepts

Markov Random Fields Presented by: Vladan Radosavljevic.

Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.

Kalman Filter: Bayes Interpretation

Presentation transcript:

Tractable Inference for Complex Stochastic Processes X. Boyen & D. Koller Presented by Shiau Hong Lim Partially based on slides by Boyen & Koller at UAI ‘98

Dynamic Systems Filtering in stochastic, dynamic systems: Monitoring freeway traffic (from an autonomous driver or for traffic analysis) Monitoring freeway traffic (from an autonomous driver or for traffic analysis) Monitoring patient’s symptoms Monitoring patient’s symptoms Models to deal with uncertainty and/or partial observability in dynamic systems: Hidden Markov Models (HMMs), Kalman Filters etc Hidden Markov Models (HMMs), Kalman Filters etc All are special cases of Dynamic Bayesian Networks (DBNs) All are special cases of Dynamic Bayesian Networks (DBNs)

Dynamic Bayesian Networks Markov assumption Partial observability: true state rarely actually known Maintain a belief state Maintain a belief state Belief state = probability distribution over all system states Belief state = probability distribution over all system states Propagate the belief state at time t to time (t+1) using a state evolution model and an observation model

Monitoring in practice Sometimes, belief states admit compact representations & manipulations: E.g. Kalman filters (assuming Gaussian processes) E.g. Kalman filters (assuming Gaussian processes) What about general Dynamic Bayesian Networks ?

DBN Myth Bayesian Network: a decomposed structure to represent the full joint distribution Does it imply easy decomposition for the belief state? No!

Tractable, approximate representation Exact inference in DBN is intractable Need approximation Maintain an approximate belief state Maintain an approximate belief state E.g. assume Gaussian processes E.g. assume Gaussian processes This paper: Factored belief state Factored belief state

Idea Use a decomposable representation for the belief state (pre-assume some independency)

Problem What about the approximation errors? It might accumulate and grow unbounded… It might accumulate and grow unbounded…

Contraction property Main result of the paper: Under reasonable assumptions about the stochasticity of the process, every state transition results in a contraction of the distance between the two distributions by a constant factor Under reasonable assumptions about the stochasticity of the process, every state transition results in a contraction of the distance between the two distributions by a constant factor Since approximation errors from previous steps decrease exponentially, the overall error remains bounded indefinitely Since approximation errors from previous steps decrease exponentially, the overall error remains bounded indefinitely

Basic framework Definition 1: Prior belief state: Prior belief state: Posterior belief state: Posterior belief state: Monitoring task:

Simple contraction Distance measure: Relative entropy (KL-divergence) between the actual and the approximate belief state Relative entropy (KL-divergence) between the actual and the approximate belief state Contraction due to O: Contraction due to T (can we do better?):

Simple contraction (cont) Definition: Minimal mixing rate: Minimal mixing rate: Theorem 3 (the single process contraction theorem): For process Q, anterior distributions φ and ψ, ulterior distributions φ’ and ψ’, For process Q, anterior distributions φ and ψ, ulterior distributions φ’ and ψ’,

Simple contraction (cont) Proof Intuition:

Compound processes Mixing rate could be very small for large processes The trick is to assume some independence among subprocesses and factor the DBN along these subprocesses Fully independent subprocesses: Theorem 5: Theorem 5: For L independent subprocesses T 1, …, T L. Let γ l be the mixing rate for T l and let γ = min l γ l. Let φ and ψ be distributions over S 1 (t), …, S L (t), and assume that ψ renders the S l (t) marginally independent. Then:

Compound processes (cont) Conditionally independent subprocesses Theorem 6 (the main theorem): For L independent subprocesses T 1, …, T L, assume each process depends on at most r others, and each influences at most q others. Let γ l be the mixing rate for T l and let γ = min l γ l. Let φ and ψ be distributions over S 1 (t), …, S L (t), and assume that ψ renders the S l (t) marginally independent. Then: For L independent subprocesses T 1, …, T L, assume each process depends on at most r others, and each influences at most q others. Let γ l be the mixing rate for T l and let γ = min l γ l. Let φ and ψ be distributions over S 1 (t), …, S L (t), and assume that ψ renders the S l (t) marginally independent. Then:

Efficient, approximate monitoring If each approximation incurs an error bounded by ε, then Total error Total error =>error remains bounded Conditioning on observations might introduce momentary errors, but the expected error will contract

Approximate DBN monitoring Algorithm (based on standard clique tree inference): 1. Construct a clique tree from the 2-TBN 2. Initialize clique tree with conditional probabilities from CPTs of the DBN 3. For each time step: a.Create a working copy of the tree Y. Create σ (t+1). b.For each subprocess l, incorporate the marginal σ (t) [X (t) l ] in the appropriate factor in Y. c.Incorporate evidence r (t+1) in Y. d.Calibrate the potentials in Y. e.For each l, query Y for marginal over X l (t+1) and store it in σ (t+1).

Conclusion Accuracy-efficiency tradeoff: Small partition => Small partition => Faster inference Better contraction Worse approximation Key to good approximation: Discover weak/sparse interactions among subprocesses and factor the DBN along these lines Discover weak/sparse interactions among subprocesses and factor the DBN along these lines Domain knowledge helps Domain knowledge helps