# T EMPORAL P ROBABILISTIC M ODELS. M OTIVATION Observing a stream of data Monitoring (of people, computer systems, etc) Surveillance, tracking Finance.

## Presentation on theme: "T EMPORAL P ROBABILISTIC M ODELS. M OTIVATION Observing a stream of data Monitoring (of people, computer systems, etc) Surveillance, tracking Finance."— Presentation transcript:

T EMPORAL P ROBABILISTIC M ODELS

M OTIVATION Observing a stream of data Monitoring (of people, computer systems, etc) Surveillance, tracking Finance & economics Science Questions: Modeling & forecasting Unobserved variables

T IME S ERIES M ODELING Time occurs in steps t=0,1,2,… Time step can be seconds, days, years, etc State variable X t, t=0,1,2,… For partially observed problems, we see observations O t, t=1,2,… and do not see the X’s X’s are hidden variables (aka latent variables)

M ODELING T IME Arrow of time Causality? Bayesian networks to the rescue CausesEffects

P ROBABILISTIC M ODELING For now, assume fully observable case What parents? X0X0 X1X1 X2X2 X3X3 X0X0 X1X1 X2X2 X3X3

M ARKOV A SSUMPTION Assume X t+k is independent of all X i for i { "@context": "http://schema.org", "@type": "ImageObject", "contentUrl": "http://images.slideplayer.com/13/4044539/slides/slide_6.jpg", "name": "M ARKOV A SSUMPTION Assume X t+k is independent of all X i for i

1 ST ORDER M ARKOV C HAIN MC’s of order k>1 can be converted into a 1st order MC [left as exercise] So w.o.l.o.g., “MC” refers to a 1st order MC X0X0 X1X1 X2X2 X3X3

I NFERENCE IN MC What independence relationships can we read from the BN? X0X0 X1X1 X2X2 X3X3 Observe X 1 X 0 independent of X 2, X 3, … P(X t |X t-1 ) known as transition model

I NFERENCE IN MC Prediction: the probability of future state? P(X t ) =  x0,…,xt-1 P (X 0,…,X t ) =  x0,…,xt-1 P (X 0 )  x1,…,xt P(X i |X i-1 ) =  xt-1 P(X t |X t-1 ) P(X t-1 ) “Blurs” over time, and approaches stationary distribution as t grows Limited prediction power Rate of blurring known as mixing time [Incremental approach]

H OW DOES THE M ARKOV ASSUMPTION AFFECT THE CHOICE OF STATE ? Suppose we’re tracking a point (x,y) in 2D What if the point is… A momentumless particle subject to thermal vibration? A particle with velocity? A particle with intent, like a person?

H OW DOES THE M ARKOV ASSUMPTION AFFECT THE CHOICE OF STATE ? Suppose the point is the position of our robot, and we observe velocity and intent What if: Terrain conditions affect speed? Battery level affects speed? Position is noisy, e.g. GPS?

I S THE M ARKOV ASSUMPTION APPROPRIATE FOR : A car on a slippery road? Sales of toothpaste? The stock market?

H ISTORY D EPENDENCE In Markov models, the state must be chosen so that the future is independent of history given the current state Often this requires adding variables that cannot be directly observed

P ARTIAL O BSERVABILITY Hidden Markov Model (HMM) X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3 Hidden state variables Observed variables P(O t |X t ) called the observation model (or sensor model)

I NFERENCE IN HMM S Filtering Prediction Smoothing, aka hindsight Most likely explanation X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3

I NFERENCE IN HMM S Filtering Prediction Smoothing, aka hindsight Most likely explanation X0X0 X1X1 X2X2 O1O1 O2O2 Query variable

F ILTERING Name comes from signal processing P(X t |o 1:t ) =  xt-1 P(x t-1 |o 1:t-1 ) P(X t |x t-1,o t ) P(X t |X t-1,o t ) = P(o t |X t-1,X t )P(X t |X t-1 )/P(o t |X t-1 ) =  P(o t |X t )P(X t |X t-1 ) X0X0 X1X1 X2X2 O1O1 O2O2 Query variable

F ILTERING P(X t |o 1:t ) =   xt-1 P(x t-1 |o 1:t-1 ) P(o t |X t )P(X t |x t-1 ) Forward recursion If we keep track of P(X t |o 1:t ) => O(1) updates for all t! X0X0 X1X1 X2X2 O1O1 O2O2 Query variable

I NFERENCE IN HMM S Filtering Prediction Smoothing, aka hindsight Most likely explanation X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3 Query

P REDICTION P(X t+k |o 1:t ) 2 steps: P(X t |o 1:t ), then P(X t+k |X t ) Filter then predict as with standard MC X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3 Query

I NFERENCE IN HMM S Filtering Prediction Smoothing, aka hindsight Most likely explanation X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3 Query

S MOOTHING P(X k |o 1:t ) for k < t P(X k |o 1:k,o k+1:t ) = P(o k+1:t |X k,o 1:k )P(X k |o 1:k )/P(o k+1:t |o 1:k ) =  P(o k+1:t |X k )P(X k |o 1:k ) X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3 Query Standard filtering to time k

S MOOTHING Computing P(o k+1:t |X k ) P(o k+1:t |X k ) =  xk+1 P(o k+1:t |X k,x k+1 ) P(x k+1 |X k ) =  xk+1 P(o k+1:t |x k+1 ) P(x k+1 |X k ) =  xk+1 P(o k+2:t |x k+1 )P(o k+1 |x k+1 )P(x k+1 |X k ) X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3 Given prior states What’s the probability of this sequence? Backward recursion

I NFERENCE IN HMM S Filtering Prediction Smoothing, aka hindsight Most likely explanation X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3 Query returns a path through state space x0,…,x3

MLE: V ITERBI A LGORITHM Recursive computation of max likelihood of path to all x t in Val(X t ) m t (X t ) = max x1:t-1 P(x 1,…,x t-1,X t |o 1:t ) =  P(o t |X t ) max xt-1 P(X t |x t-1 ) m t-1 (x t-1 ) Previous ML state argmax xt-1 P(X t |x t-1 ) m t-1 (x t-1 )

A PPLICATIONS OF HMM S IN NLP Speech recognition Hidden phones (e.g., ah eh ee th r) Observed, noisy acoustic features (produced by signal processing)

P HONE O BSERVATION M ODELS Phone t Signal processing Features (24,13,3,59) Features t Model defined to be robust over variations in accent, speed, pitch, noise

P HONE T RANSITION M ODELS Phone t Features t Good models will capture (among other things): Pronunciation of words Subphone structure Coarticulation effects Triphone models = order 3 Markov chain Phone t+1

W ORD S EGMENTATION Words run together when pronounced Unigrams P(w i ) Bigrams P(w i |w i-1 ) Trigrams P(w i |w i-1,w i-2 ) Logical are as confusion a may right tries agent goal the was diesel more object then information- gathering search is Planning purely diagnostic expert systems are very similar computational approach would be represented compactly using tic tac toe a predicate Planning and scheduling are integrated the success of naïve bayes model is just a possible prior source by that time Random 20 word samples from R&N using N-gram models

T RICKS TO IMPROVE RECOGNITION Narrow the # of variables Digits, yes/no, phone tree Training with real user data Real story: “Yes ma’am”

K ALMAN F ILTERING In a nutshell Efficient filtering in continuous state spaces Gaussian transition and observation models Ubiquitous for tracking with noisy sensors, e.g. radar, GPS, cameras

H IDDEN M ARKOV M ODEL FOR R OBOT L OCALIZATION Use observations to get a better idea of where the robot is at time t X0X0 X1X1 X2X2 X3X3 z1z1 z2z2 z3z3 Hidden state variables Observed variables Predict – observe – predict – observe…

L INEAR G AUSSIAN T RANSITION M ODEL Consider position and velocity x t, v t Time step h Without noise x t+1 = x t + h v t v t+1 = v t With Gaussian noise of std   P(x t+1 |x t )  exp(-(x t+1 – (x t + h v t )) 2 /(2   2  i.e. x t+1 ~ N (x t + h v t,   )

L INEAR G AUSSIAN T RANSITION M ODEL If prior on position is Gaussian, then the posterior is also Gaussian vh 11 N( ,  )  N(  +vh,  +  1 )

L INEAR G AUSSIAN O BSERVATION M ODEL Position observation z t Gaussian noise of std  2 z t ~ N (x t,   )

L INEAR G AUSSIAN O BSERVATION M ODEL If prior on position is Gaussian, then the posterior is also Gaussian   (  2 z+  2 2  )/(  2 +  2 2 )  2   2  2 2 /(  2 +  2 2 ) Position prior Posterior probability Observation probability

M ULTIVARIATE C ASE Transition matrix F, covariance  x Observation matrix H, covariance  z  t+1 = F  t + K t+1 (z t+1 – HF  t )  t+1 = (I - K t+1 )(F  t F T +  x ) Where K t+1 = (F  t F T +  x )H T (H(F  t F T +  x )H T +  z ) -1 Got that memorized?

P ROPERTIES OF K ALMAN F ILTER Optimal Bayesian estimate for linear Gaussian transition/observation models Need estimates of covariance… model identification necessary Extensions to nonlinear transition/observation models work as long as they aren’t too nonlinear Extended Kalman Filter Unscented Kalman Filter

P ROPERTIES OF K ALMAN F ILTER Optimal Bayesian estimate for linear Gaussian transition/observation models Need estimates of covariance… model identification necessary Extensions to nonlinear systems Extended Kalman Filter : linearize models Unscented Kalman Filter : pass points through nonlinear model to reconstruct gaussian Work as long as systems aren’t too nonlinear

N ON -G AUSSIAN DISTRIBUTIONS Gaussian distributions are a “lump” Kalman filter estimate

N ON -G AUSSIAN DISTRIBUTIONS Integrating continuous and discrete states Splitting with a binary choice “up” “down”

E XAMPLE : F AILURE DETECTION Consider a battery meter sensor Battery = true level of battery BMeter = sensor reading Transient failures: send garbage at time t Persistent failures: send garbage forever

E XAMPLE : F AILURE DETECTION Consider a battery meter sensor Battery = true level of battery BMeter = sensor reading Transient failures: send garbage at time t 5555500555… Persistent failures: sensor is broken 5555500000…

D YNAMIC B AYESIAN N ETWORK BMeter t Battery t Battery t-1 BMeter t ~ N(Battery t,  ) (Think of this structure “unrolled” forever…)

D YNAMIC B AYESIAN N ETWORK BMeter t Battery t Battery t-1 BMeter t ~ N(Battery t,  ) P(BMeter t =0 | Battery t =5) = 0.03 Transient failure model

R ESULTS ON T RANSIENT F AILURE E(Battery t ) Transient failure occurs Without model With model Meter reads 55555005555…

R ESULTS ON P ERSISTENT F AILURE E(Battery t ) Persistent failure occurs With transient model Meter reads 5555500000…

P ERSISTENT F AILURE M ODEL BMeter t Battery t Battery t-1 BMeter t ~ N(Battery t,  ) P(BMeter t =0 | Battery t =5) = 0.03 Broken t-1 Broken t P(BMeter t =0 | Broken t ) = 1 Example of a Dynamic Bayesian Network (DBN)

R ESULTS ON P ERSISTENT F AILURE E(Battery t ) Persistent failure occurs With transient model Meter reads 5555500000… With persistent failure model

H OW TO PERFORM INFERENCE ON DBN? Exact inference on “unrolled” BN Variable Elimination – eliminate old time steps After a few time steps, all variables in the state space become dependent! Lost sparsity structure Approximate inference Particle Filtering

P ARTICLE F ILTERING ( AKA S EQUENTIAL M ONTE C ARLO ) Represent distributions as a set of particles Applicable to non- gaussian high-D distributions Convenient implementations Widely used in vision, robotics

P ARTICLE R EPRESENTATION Bel(x t ) = {(w k,x k )} w k are weights, x k are state hypotheses Weights sum to 1 Approximates the underlying distribution

Weighted resampling step P ARTICLE F ILTERING Represent a distribution at time t as a set of N “particles” S t 1,…,S t N Repeat for t=0,1,2,… Sample S[i] from P(X t+1 |X t =S t i ) for all i Compute weight w[i] = P(e|X t+1 =S[i]) for all i Sample S t+1 i from S[.] according to weights w[.]

B ATTERY E XAMPLE BMeter t Battery t Battery t-1 Broken t-1 Broken t Sampling step

B ATTERY E XAMPLE BMeter t Battery t Battery t-1 Broken t-1 Broken t Suppose we now observe BMeter=0 P(BMeter=0|sample) = ? 0.03 1

B ATTERY E XAMPLE BMeter t Battery t Battery t-1 Broken t-1 Broken t Compute weights (drawn as particle size) P(BMeter=0|sample) = ? 0.03 1

B ATTERY E XAMPLE BMeter t Battery t Battery t-1 Broken t-1 Broken t Weighted resampling P(BMeter=0|sample) = ?

B ATTERY E XAMPLE BMeter t Battery t Battery t-1 Broken t-1 Broken t Sampling Step

B ATTERY E XAMPLE BMeter t Battery t Battery t-1 Broken t-1 Broken t Now observe BMeter t = 5

B ATTERY E XAMPLE BMeter t Battery t Battery t-1 Broken t-1 Broken t Compute weights 1 0

B ATTERY E XAMPLE BMeter t Battery t Battery t-1 Broken t-1 Broken t Weighted resample

A PPLICATIONS OF P ARTICLE F ILTERING IN R OBOTICS Simultaneous Localization and Mapping (SLAM) Observations: laser rangefinder State variables: position, walls

S IMULTANEOUS L OCALIZATION AND M APPING (SLAM) Mobile robots Odometry Locally accurate Drifts significantly over time Vision/ladar/sonar Inaccurate locally Global reference frame Combine the two State: (robot pose, map) Observations: (sensor input)

C OUPLE OF P LUGS CSCI B553 CSCI B659: Principles of Intelligent Robot Motion http://cs.indiana.edu/classes/b659-hauserk CSCI B657: Computer Vision David Crandall/Chen Yu

N EXT T IME Learning distributions from data Read R&N 20.1-3

MLE: V ITERBI A LGORITHM Recursive computation of max likelihood of path to all x t in Val(X t ) m t (X t ) = max x1:t-1 P(x 1,…,x t-1,X t |o 1:t ) =  P(o t |X t ) max xt-1 P(X t |x t-1 ) m t-1 (x t-1 ) Previous ML state argmax xt-1 P(X t |x t-1 ) m t-1 (x t-1 ) Does this sound familiar?

MLE: V ITERBI A LGORITHM Do the “logarithm trick” log m t (X t ) = log  P(o t |X t ) + max xt-1 [log P(X t |x t-1 ) + log m t-1 (x t-1 ) ] View: log  P(o t |X t ) as a reward log P(X t |x t-1 ) as a cost log m t (X t ) as a value function Bellman equation

Download ppt "T EMPORAL P ROBABILISTIC M ODELS. M OTIVATION Observing a stream of data Monitoring (of people, computer systems, etc) Surveillance, tracking Finance."

Similar presentations