Download presentation

Presentation is loading. Please wait.

Published byKeegan Reader Modified about 1 year ago

1
T EMPORAL P ROBABILISTIC M ODELS

2
M OTIVATION Observing a stream of data Monitoring (of people, computer systems, etc) Surveillance, tracking Finance & economics Science Questions: Modeling & forecasting Unobserved variables

3
T IME S ERIES M ODELING Time occurs in steps t=0,1,2,… Time step can be seconds, days, years, etc State variable X t, t=0,1,2,… For partially observed problems, we see observations O t, t=1,2,… and do not see the X’s X’s are hidden variables (aka latent variables)

4
M ODELING T IME Arrow of time Causality? Bayesian networks to the rescue CausesEffects

5
P ROBABILISTIC M ODELING For now, assume fully observable case What parents? X0X0 X1X1 X2X2 X3X3 X0X0 X1X1 X2X2 X3X3

6
M ARKOV A SSUMPTION Assume X t+k is independent of all X i for i

7
1 ST ORDER M ARKOV C HAIN MC’s of order k>1 can be converted into a 1st order MC [left as exercise] So w.o.l.o.g., “MC” refers to a 1st order MC X0X0 X1X1 X2X2 X3X3

8
I NFERENCE IN MC What independence relationships can we read from the BN? X0X0 X1X1 X2X2 X3X3 Observe X 1 X 0 independent of X 2, X 3, … P(X t |X t-1 ) known as transition model

9
I NFERENCE IN MC Prediction: the probability of future state? P(X t ) = x0,…,xt-1 P (X 0,…,X t ) = x0,…,xt-1 P (X 0 ) x1,…,xt P(X i |X i-1 ) = xt-1 P(X t |X t-1 ) P(X t-1 ) “Blurs” over time, and approaches stationary distribution as t grows Limited prediction power Rate of blurring known as mixing time [Incremental approach]

10

11
H OW DOES THE M ARKOV ASSUMPTION AFFECT THE CHOICE OF STATE ? Suppose we’re tracking a point (x,y) in 2D What if the point is… A momentumless particle subject to thermal vibration? A particle with velocity? A particle with intent, like a person?

12
H OW DOES THE M ARKOV ASSUMPTION AFFECT THE CHOICE OF STATE ? Suppose the point is the position of our robot, and we observe velocity and intent What if: Terrain conditions affect speed? Battery level affects speed? Position is noisy, e.g. GPS?

13
I S THE M ARKOV ASSUMPTION APPROPRIATE FOR : A car on a slippery road? Sales of toothpaste? The stock market?

14
H ISTORY D EPENDENCE In Markov models, the state must be chosen so that the future is independent of history given the current state Often this requires adding variables that cannot be directly observed

15
P ARTIAL O BSERVABILITY Hidden Markov Model (HMM) X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3 Hidden state variables Observed variables P(O t |X t ) called the observation model (or sensor model)

16
I NFERENCE IN HMM S Filtering Prediction Smoothing, aka hindsight Most likely explanation X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3

17
I NFERENCE IN HMM S Filtering Prediction Smoothing, aka hindsight Most likely explanation X0X0 X1X1 X2X2 O1O1 O2O2 Query variable

18
F ILTERING Name comes from signal processing P(X t |o 1:t ) = xt-1 P(x t-1 |o 1:t-1 ) P(X t |x t-1,o t ) P(X t |X t-1,o t ) = P(o t |X t-1,X t )P(X t |X t-1 )/P(o t |X t-1 ) = P(o t |X t )P(X t |X t-1 ) X0X0 X1X1 X2X2 O1O1 O2O2 Query variable

19
F ILTERING P(X t |o 1:t ) = xt-1 P(x t-1 |o 1:t-1 ) P(o t |X t )P(X t |x t-1 ) Forward recursion If we keep track of P(X t |o 1:t ) => O(1) updates for all t! X0X0 X1X1 X2X2 O1O1 O2O2 Query variable

20
I NFERENCE IN HMM S Filtering Prediction Smoothing, aka hindsight Most likely explanation X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3 Query

21
P REDICTION P(X t+k |o 1:t ) 2 steps: P(X t |o 1:t ), then P(X t+k |X t ) Filter then predict as with standard MC X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3 Query

22
I NFERENCE IN HMM S Filtering Prediction Smoothing, aka hindsight Most likely explanation X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3 Query

23
S MOOTHING P(X k |o 1:t ) for k < t P(X k |o 1:k,o k+1:t ) = P(o k+1:t |X k,o 1:k )P(X k |o 1:k )/P(o k+1:t |o 1:k ) = P(o k+1:t |X k )P(X k |o 1:k ) X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3 Query Standard filtering to time k

24
S MOOTHING Computing P(o k+1:t |X k ) P(o k+1:t |X k ) = xk+1 P(o k+1:t |X k,x k+1 ) P(x k+1 |X k ) = xk+1 P(o k+1:t |x k+1 ) P(x k+1 |X k ) = xk+1 P(o k+2:t |x k+1 )P(o k+1 |x k+1 )P(x k+1 |X k ) X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3 Given prior states What’s the probability of this sequence? Backward recursion

25
I NFERENCE IN HMM S Filtering Prediction Smoothing, aka hindsight Most likely explanation X0X0 X1X1 X2X2 X3X3 O1O1 O2O2 O3O3 Query returns a path through state space x0,…,x3

26
MLE: V ITERBI A LGORITHM Recursive computation of max likelihood of path to all x t in Val(X t ) m t (X t ) = max x1:t-1 P(x 1,…,x t-1,X t |o 1:t ) = P(o t |X t ) max xt-1 P(X t |x t-1 ) m t-1 (x t-1 ) Previous ML state argmax xt-1 P(X t |x t-1 ) m t-1 (x t-1 )

27
A PPLICATIONS OF HMM S IN NLP Speech recognition Hidden phones (e.g., ah eh ee th r) Observed, noisy acoustic features (produced by signal processing)

28
P HONE O BSERVATION M ODELS Phone t Signal processing Features (24,13,3,59) Features t Model defined to be robust over variations in accent, speed, pitch, noise

29
P HONE T RANSITION M ODELS Phone t Features t Good models will capture (among other things): Pronunciation of words Subphone structure Coarticulation effects Triphone models = order 3 Markov chain Phone t+1

30
W ORD S EGMENTATION Words run together when pronounced Unigrams P(w i ) Bigrams P(w i |w i-1 ) Trigrams P(w i |w i-1,w i-2 ) Logical are as confusion a may right tries agent goal the was diesel more object then information- gathering search is Planning purely diagnostic expert systems are very similar computational approach would be represented compactly using tic tac toe a predicate Planning and scheduling are integrated the success of naïve bayes model is just a possible prior source by that time Random 20 word samples from R&N using N-gram models

31
T RICKS TO IMPROVE RECOGNITION Narrow the # of variables Digits, yes/no, phone tree Training with real user data Real story: “Yes ma’am”

32
K ALMAN F ILTERING In a nutshell Efficient filtering in continuous state spaces Gaussian transition and observation models Ubiquitous for tracking with noisy sensors, e.g. radar, GPS, cameras

33
H IDDEN M ARKOV M ODEL FOR R OBOT L OCALIZATION Use observations to get a better idea of where the robot is at time t X0X0 X1X1 X2X2 X3X3 z1z1 z2z2 z3z3 Hidden state variables Observed variables Predict – observe – predict – observe…

34
L INEAR G AUSSIAN T RANSITION M ODEL Consider position and velocity x t, v t Time step h Without noise x t+1 = x t + h v t v t+1 = v t With Gaussian noise of std P(x t+1 |x t ) exp(-(x t+1 – (x t + h v t )) 2 /(2 2 i.e. x t+1 ~ N (x t + h v t, )

35
L INEAR G AUSSIAN T RANSITION M ODEL If prior on position is Gaussian, then the posterior is also Gaussian vh 11 N( , ) N( +vh, + 1 )

36
L INEAR G AUSSIAN O BSERVATION M ODEL Position observation z t Gaussian noise of std 2 z t ~ N (x t, )

37
L INEAR G AUSSIAN O BSERVATION M ODEL If prior on position is Gaussian, then the posterior is also Gaussian ( 2 z+ 2 2 )/( 2 + 2 2 ) 2 2 2 2 /( 2 + 2 2 ) Position prior Posterior probability Observation probability

38
M ULTIVARIATE C ASE Transition matrix F, covariance x Observation matrix H, covariance z t+1 = F t + K t+1 (z t+1 – HF t ) t+1 = (I - K t+1 )(F t F T + x ) Where K t+1 = (F t F T + x )H T (H(F t F T + x )H T + z ) -1 Got that memorized?

39
P ROPERTIES OF K ALMAN F ILTER Optimal Bayesian estimate for linear Gaussian transition/observation models Need estimates of covariance… model identification necessary Extensions to nonlinear transition/observation models work as long as they aren’t too nonlinear Extended Kalman Filter Unscented Kalman Filter

40
P ROPERTIES OF K ALMAN F ILTER Optimal Bayesian estimate for linear Gaussian transition/observation models Need estimates of covariance… model identification necessary Extensions to nonlinear systems Extended Kalman Filter : linearize models Unscented Kalman Filter : pass points through nonlinear model to reconstruct gaussian Work as long as systems aren’t too nonlinear

41
N ON -G AUSSIAN DISTRIBUTIONS Gaussian distributions are a “lump” Kalman filter estimate

42
N ON -G AUSSIAN DISTRIBUTIONS Integrating continuous and discrete states Splitting with a binary choice “up” “down”

43
E XAMPLE : F AILURE DETECTION Consider a battery meter sensor Battery = true level of battery BMeter = sensor reading Transient failures: send garbage at time t Persistent failures: send garbage forever

44
E XAMPLE : F AILURE DETECTION Consider a battery meter sensor Battery = true level of battery BMeter = sensor reading Transient failures: send garbage at time t … Persistent failures: sensor is broken …

45
D YNAMIC B AYESIAN N ETWORK BMeter t Battery t Battery t-1 BMeter t ~ N(Battery t, ) (Think of this structure “unrolled” forever…)

46
D YNAMIC B AYESIAN N ETWORK BMeter t Battery t Battery t-1 BMeter t ~ N(Battery t, ) P(BMeter t =0 | Battery t =5) = 0.03 Transient failure model

47
R ESULTS ON T RANSIENT F AILURE E(Battery t ) Transient failure occurs Without model With model Meter reads …

48
R ESULTS ON P ERSISTENT F AILURE E(Battery t ) Persistent failure occurs With transient model Meter reads …

49
P ERSISTENT F AILURE M ODEL BMeter t Battery t Battery t-1 BMeter t ~ N(Battery t, ) P(BMeter t =0 | Battery t =5) = 0.03 Broken t-1 Broken t P(BMeter t =0 | Broken t ) = 1 Example of a Dynamic Bayesian Network (DBN)

50
R ESULTS ON P ERSISTENT F AILURE E(Battery t ) Persistent failure occurs With transient model Meter reads … With persistent failure model

51
H OW TO PERFORM INFERENCE ON DBN? Exact inference on “unrolled” BN Variable Elimination – eliminate old time steps After a few time steps, all variables in the state space become dependent! Lost sparsity structure Approximate inference Particle Filtering

52
P ARTICLE F ILTERING ( AKA S EQUENTIAL M ONTE C ARLO ) Represent distributions as a set of particles Applicable to non- gaussian high-D distributions Convenient implementations Widely used in vision, robotics

53
P ARTICLE R EPRESENTATION Bel(x t ) = {(w k,x k )} w k are weights, x k are state hypotheses Weights sum to 1 Approximates the underlying distribution

54
Weighted resampling step P ARTICLE F ILTERING Represent a distribution at time t as a set of N “particles” S t 1,…,S t N Repeat for t=0,1,2,… Sample S[i] from P(X t+1 |X t =S t i ) for all i Compute weight w[i] = P(e|X t+1 =S[i]) for all i Sample S t+1 i from S[.] according to weights w[.]

55
B ATTERY E XAMPLE BMeter t Battery t Battery t-1 Broken t-1 Broken t Sampling step

56
B ATTERY E XAMPLE BMeter t Battery t Battery t-1 Broken t-1 Broken t Suppose we now observe BMeter=0 P(BMeter=0|sample) = ?

57
B ATTERY E XAMPLE BMeter t Battery t Battery t-1 Broken t-1 Broken t Compute weights (drawn as particle size) P(BMeter=0|sample) = ?

58
B ATTERY E XAMPLE BMeter t Battery t Battery t-1 Broken t-1 Broken t Weighted resampling P(BMeter=0|sample) = ?

59
B ATTERY E XAMPLE BMeter t Battery t Battery t-1 Broken t-1 Broken t Sampling Step

60
B ATTERY E XAMPLE BMeter t Battery t Battery t-1 Broken t-1 Broken t Now observe BMeter t = 5

61
B ATTERY E XAMPLE BMeter t Battery t Battery t-1 Broken t-1 Broken t Compute weights 1 0

62
B ATTERY E XAMPLE BMeter t Battery t Battery t-1 Broken t-1 Broken t Weighted resample

63
A PPLICATIONS OF P ARTICLE F ILTERING IN R OBOTICS Simultaneous Localization and Mapping (SLAM) Observations: laser rangefinder State variables: position, walls

64
S IMULTANEOUS L OCALIZATION AND M APPING (SLAM) Mobile robots Odometry Locally accurate Drifts significantly over time Vision/ladar/sonar Inaccurate locally Global reference frame Combine the two State: (robot pose, map) Observations: (sensor input)

65
C OUPLE OF P LUGS CSCI B553 CSCI B659: Principles of Intelligent Robot Motion CSCI B657: Computer Vision David Crandall/Chen Yu

66
N EXT T IME Learning distributions from data Read R&N

67
MLE: V ITERBI A LGORITHM Recursive computation of max likelihood of path to all x t in Val(X t ) m t (X t ) = max x1:t-1 P(x 1,…,x t-1,X t |o 1:t ) = P(o t |X t ) max xt-1 P(X t |x t-1 ) m t-1 (x t-1 ) Previous ML state argmax xt-1 P(X t |x t-1 ) m t-1 (x t-1 ) Does this sound familiar?

68
MLE: V ITERBI A LGORITHM Do the “logarithm trick” log m t (X t ) = log P(o t |X t ) + max xt-1 [log P(X t |x t-1 ) + log m t-1 (x t-1 ) ] View: log P(o t |X t ) as a reward log P(X t |x t-1 ) as a cost log m t (X t ) as a value function Bellman equation

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google