Dynamic Bayesian Network Fuzzy Systems Lifelog management.

Slides:



Advertisements
Similar presentations
State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.
Advertisements

Rutgers CS440, Fall 2003 Review session. Rutgers CS440, Fall 2003 Topics Final will cover the following topics (after midterm): 1.Uncertainty & introduction.
Dynamic Bayesian Networks (DBNs)
Kansas State University Department of Computing and Information Sciences KDD Seminar Series, Fall 2002: Dynamic Bayesian Networks Friday, August 23, 2002.
Modeling Uncertainty over time Time series of snapshot of the world “state” we are interested represented as a set of random variables (RVs) – Observable.
Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
Introduction of Probabilistic Reasoning and Bayesian Networks
Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.
Hidden Markov Models M. Vijay Venkatesh. Outline Introduction Graphical Model Parameterization Inference Summary.
Advanced Artificial Intelligence
1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
… Hidden Markov Models Markov assumption: Transition model:
CS 188: Artificial Intelligence Fall 2009 Lecture 20: Particle Filtering 11/5/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Part 2 of 3: Bayesian Network and Dynamic Bayesian Network.
Conditional Random Fields
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
11/14  Continuation of Time & Change in Probabilistic Reasoning Project 4 progress? Grade Anxiety? Make-up Class  On Monday?  On Wednesday?
CPSC 422, Lecture 14Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14 Feb, 4, 2015 Slide credit: some slides adapted from Stuart.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
A Unifying Review of Linear Gaussian Models
QUIZ!!  T/F: The forward algorithm is really variable elimination, over time. TRUE  T/F: Particle Filtering is really sampling, over time. TRUE  T/F:
Computer vision: models, learning and inference Chapter 19 Temporal models.
SIS Sequential Importance Sampling Advanced Methods In Simulation Winter 2009 Presented by: Chen Bukay, Ella Pemov, Amit Dvash.
Computer vision: models, learning and inference Chapter 19 Temporal models.
Recap: Reasoning Over Time  Stationary Markov models  Hidden Markov models X2X2 X1X1 X3X3 X4X4 rainsun X5X5 X2X2 E1E1 X1X1 X3X3 X4X4 E2E2 E3E3.
Reasoning Under Uncertainty: Bayesian networks intro CPSC 322 – Uncertainty 4 Textbook §6.3 – March 23, 2011.
Overview Particle filtering is a sequential Monte Carlo methodology in which the relevant probability distributions are iteratively estimated using the.
UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))
Dynamic Bayesian Networks and Particle Filtering COMPSCI 276 (chapter 15, Russel and Norvig) 2007.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
CS 416 Artificial Intelligence Lecture 17 Reasoning over Time Chapter 15 Lecture 17 Reasoning over Time Chapter 15.
CS Statistical Machine learning Lecture 24
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
QUIZ!!  In HMMs...  T/F:... the emissions are hidden. FALSE  T/F:... observations are independent given no evidence. FALSE  T/F:... each variable X.
Tractable Inference for Complex Stochastic Processes X. Boyen & D. Koller Presented by Shiau Hong Lim Partially based on slides by Boyen & Koller at UAI.
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
Pattern Recognition and Machine Learning-Chapter 13: Sequential Data
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
CPS 170: Artificial Intelligence Markov processes and Hidden Markov Models (HMMs) Instructor: Vincent Conitzer.
Tracking with dynamics
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Probability and Time. Overview  Modelling Evolving Worlds with Dynamic Baysian Networks  Simplifying Assumptions Stationary Processes, Markov Assumption.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Probability and Time. Overview  Modelling Evolving Worlds with Dynamic Baysian Networks  Simplifying Assumptions Stationary Processes, Markov Assumption.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Probabilistic Robotics Probability Theory Basics Error Propagation Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models.
CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models.
CS498-EA Reasoning in AI Lecture #23 Instructor: Eyal Amir Fall Semester 2011.
HMM: Particle filters Lirong Xia. HMM: Particle filters Lirong Xia.
Today.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Course: Autonomous Machine Learning
Probabilistic Reasoning over Time
Hidden Markov Models Part 2: Algorithms
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Chapter14-cont..
Non-Standard-Datenbanken
HMM: Particle filters Lirong Xia. HMM: Particle filters Lirong Xia.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Presentation transcript:

Dynamic Bayesian Network Fuzzy Systems Lifelog management

Introduction Definition Representation Inference Learning Comparison Summary Outline

Brief Review of Bayesian Networks Graphical representations of joint distributions: AB B D C A Static world, each random variable has a single fixed value. Mathematical formula used for calculating conditional probabilities. Develop by the mathematician and theologian Thomas Bayes (published in 1763) 2/29

Introduction Dynamic system –Sequential data modeling (part of speech) –Time series modeling (activity recognition) Classic approaches –Linear models: ARIMA (autoregressive integrated moving average), ARMAX (autoregressive moving average exogenous variables model) –Nonlinear models: neural networks, decision trees –Problems Prediction of the future based on only a finite window Difficult to incorporate prior knowledge Difficult to deal with multi-dimensional inputs and/or outputs Recent approaches –Hidden Markov models (HMMs): discrete random variable –Kalman filter models (KFMs): continuous state variables –Dynamic Bayesian networks (DBNs) 3/29

Motivation Time = t MtMt XtXt OtOt Transportation Mode: Walking, Running, Car, Bus True velocity and location Observed location Need conditional probability distributions e.g. a distribution on (velocity, location) given the transportation mode Prior knowledge or learned from data Time = t+1 M t+1 X t+1 O t+1 Given a sequence of observations (O t ), find the most likely M t ’s that explain it. Or could provide a probability distribution on the possible M t ’s. 4/29

Introduction Definition Representation Inference Learning Comparison Summary Outline

Dynamic Bayesian Networks BNs consisting of a structure that repeats an indefinite (or dynamic) number of times –Time-invariant: the term ‘dynamic’ means that we are modeling a dynamic model, not that the networks change over time General form of HMMs and KFLs by representing the hidden and observed state in terms of state variables of complex interdependencies D A C B D A C B D A C B frame i frame i +1frame i- 1 6/29

Formal Definition Defined as – : a directed, acyclic graph of starting nodes (initial probability distribution) – : a directed, acyclic graph of transition nodes (transition probabilities between time slices) – : starting vectors of observable as well as hidden random variable – : transition matrices regarding observable as well as hidden random variables 7/29

Introduction Definition Representation Inference Learning Comparison Summary Outline

Representation (1): Problem Target: Is it raining today? Necessity to specify an unbounded number of conditional probability table, one for each variable in each slice Each one might involve an unbounded number of parents next step: specify dependencies among the variables. 9/29

Representation (2): Solution Assume that change in the world state are caused by a stationary process (unmoving process over time) Use Markov assumption - The current state depends on only in a finite history of previous states. Using the first-order Markov process: In addition to restricting the parents of the state variable Xt, we must restrict the parents of the evidence variable Et is the same for all t Transition Model Sensor Model 10/29

Representation: Extension There are two possible fixes if the approximation is too inaccurate: –Increasing the order of the Markov process model. For example, adding as a parent of, which might give slightly more accurate predictions –Increasing the set of state variables. For example, adding to allow to incorporate historical records of rainy seasons, or adding, and to allow to use a physical model of rainy conditions Bigram Trigram W i+1 WiWi W i W i+1 WiWi W i /29

Introduction Definition Representation Inference Learning Comparison Summary Outline

Inference: Overview To infer the hidden states X given the observations Y 1:t –Extend HMM and KFM’s / call BN inference algorithms as subroutines –NP-hard problem Inference tasks –Filtering(monitoring): recursively estimate the belief state using Bayes’ rule Predict: computing P(X t | y 1:t-1 ) Updating: computing P(X t | y 1:t ) Throw away the old belief state once we have computed the prediction(“rollup”) –Smoothing: estimate the state of the past, given all the evidence up to the current time Fixed-lag smoothing(hindsight): computing P(X t-1 | y 1:t ) where l > 0 is the lag –Prediction: predict the future Lookahead: computing P(X t+h | y 1:t ) where h > 0 is how far we want to look ahead –Viterbi decoding: compute the most likely sequence of hidden states given the data MPE(abduction): x* 1:t = argmax P(x 1:t | y 1:t ) 13/29

Inference: Comparison Filtering: r = t Smoothing: r > t Prediction: r < t Viterbi: MPE 14/29

Inference: Filtering Compute the belief state - the posterior distribution over the current state, given all evidence to date Filtering is what a rational agent needs to do in order to keep track of the current state so that the rational decisions can be made Given the results of filtering up to time t, one can easily compute the result for t+1 from the new evidence (dividing up the evidence) (for some function f) (using Bayes’ Theorem) (by the Marcov property of evidence) α is a normalizing constant used to make probabilities sum up to 1 15/29

Inference: Filtering Illustration for two steps in the Umbrella example: On day 1, the umbrella appears so U1=true –The prediction from t=0 to t=1 is and updating it with the evidence for t=1 gives On day 2, the umbrella appears so U2=true –The prediction from t=1 to t=2 is and updating it with the evidence for t=2 gives 16/29

Inference: Smoothing Compute the posterior distribution over the past state, given all evidence up to the present Hindsight provides a better estimate of the state than was available at the time, because it incorporates more evidence for some k such that 0 ≤ k < t. 17/29

Inference: Prediction Compute the posterior distribution over the future state, given all evidence to date The task of prediction can be seen simply as filtering without the addition of new evidence for some k>0 18/29

Inference: Most Likely Explanation (MLE) Compute the sequence of states that is most likely to have generated a given sequence of observation Algorithms for this task are useful in many applications, including speech recognition There exist a recursive relationship between the most likely paths to each state Xt+1 and the most likely paths to each state Xt. This relationship can be write as an equation connecting the probabilities of the paths: 19/29

Inference: Algorithms Exact Inference algorithms –Forwards-backwards smoothing algorithm (on any discrete-state DBN) –The frontier algorithm (sweep a Markov blanket, the frontier set F, across the DBN, first forwards and then backwards) –The interface algorithm (use only the set of nodes with outgoing arcs to the next time slice to d-separate the past from the future) –Kalman filtering and smoothing Approximate algorithms: –The Boyen-Koller (BK) algorithm (approximate the joint distribution over the interface as a product of marginals) –Factored frontier (FF) algorithm / Loopy propagation algorithm (LBP) –Kalman filtering and smoother –Stochastic sampling algorithm: Importance sampling or MCMC (offline inference) Particle filtering (PF) (online) 20/29

Introduction Definition Representation Inference Learning Comparison Summary Outline

Learning (1) The techniques for learning DBN are mostly straightforward extensions of the techniques for learning BNs Parameter learning –The transition model P(X t | X t-1 ) / The observation model P(Y t | X t ) –Offline learning Parameters must be tied across time-slices The initial state of the dynamic system can be learned independently of the transition matrix –Online learning Add the parameters to the state space and then do online inference (filtering) –The usual criterion is maximum-likelihood(ML) The goal of parameter learning is to compute –θ * ML = argmax θ P( Y| θ) = argmax θ log P( Y| θ) –θ * MAP = argmax θ log P( Y| θ) + logP(θ) –Two standard approaches: gradient ascent and EM(Expectation Maximization) 22/29

Learning (2) Structure learning –The intra-slice connectivity must be a DAG –Learning the inter-slice connectivity is equivalent to the variable selection problem, since for each node in slice t, we must choose its parents from slice t-1. –Learning for DBNs reduces to feature selection if we assume the intra- slice connections are fixed 23/29

Introduction Definition Representation Inference Learning Comparison Summary Outline

Comparison (HMM: Hidden Markov Model) Structure –One discrete hidden node (X: hidden variables) –One discrete or continuous observed node per time slice (Y: observations) Parameters –The initial state distribution P( X 1 ) –The transition model P( X t | X t-1 ) –The observation model P( Y t | X t ) Features –A discrete state variable with arbitrary dynamics and arbitrary measurements –Structures and parameters remain same over time X1X1 Y1Y1 X2X2 Y2Y2 X3X3 Y3Y3 X4X4 Y4Y4 25/29

Comparison with HMMs HMMs 12 3 obs qiqi q i P(q i |q i-1 ) q=1 q=2 q=3 P( obs i | q i ) frame i QiQi obs i... Q i-1 obs i-1 Q i+1 obs i+ 1 frame i+1frame i-1 = state = allowed transition = variable = allowed dependency DBNs 26/29

Comparison (KFL: Kalman Filter Model) KFL has the same topology as an HMM All the nodes are assumed to have linear-Gaussian distributions –x(t+1) = F*x(t) + w(t), w ~ N(0, Q) : process noise, x(0) ~ N(X(0), V(0)) –y(t) = H*x(t) + v(t), v ~ N(0, R) : measurement noise Features –A continuous state variable with linear-Gaussian dynamics and measurements –Also known as Linear Dynamic Systems(LDSs) A partially observed stochastic process With linear dynamics and linear observations: f( a + b) = f(a) + f(b) Both subject to Gaussian noise X1X1 Y1Y1 X2X2 Y2Y2 27/29

Comparison with HMM and KFM DBN represents the hidden state in terms of a set of random variables –HMM’s state space consists of a single random variable DBN allows arbitrary CPDs –KFM requires all the CPDs to be linear-Gaussian DBN allows much more general graph structures –HMMs and KFMs have a restricted topology DBN generalizes HMM and KFM (more expressive power) 28/29

Summary DBN: a Bayesian network with a temporal probability model Complexity in DBNs –Inference –Structure learning Comparison with other methods –HMMs: discrete variables –KFMs: continuous variables Discussion –Why to use DBNs instead of HMMs or KFMs? –Why to use DBNs instead of BNs? 29/29