Presentation is loading. Please wait.

Presentation is loading. Please wait.

Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1.

Similar presentations


Presentation on theme: "Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1."— Presentation transcript:

1 Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

2  Temporal probabilistic agents  Inference: Filtering, prediction, smoothing and most likely explanation  Hidden Markov models  Kalman filters  Dynamic Bayesian Networks  Speech Recognition 2

3  A process that grows in space or time in accordance with some probability distribution.  In the simplest possible case ("discrete time"), a stochastic process amounts to a sequence of random variables 3

4 A stochastic process (family of random variables) {X n, n = 0, 1, 2,...}, satisfying  it takes on a finite or countable number of possible values. If X n = i, the process is said to be in state i at time n  whenever the process is in state i, there is a fixed probability P ij that it will next be in state j. Formally:  P{X n+1 = j |X n = i,X n−1 = i n−1,...,X 1 = i 1,X 0 = i 0 } = P ij for all states i 0, i 1,..., i n−1, i n, i, j and all n≥0. 4

5  Set of states:  Process moves from one state to another generating a sequence of states :  Markov chain property: probability of each subsequent state depends only on what was the previous state:  States are not visible, but each state randomly generates one of M observations (or visible states) 5

6  To define hidden Markov model, the following probabilities have to be specified:  Matrix of transition probabilities  A=(a ij ) where a ij = P(s i | s j )  Matrix of observation probabilities  B=( b i (v m ) ) where b i (v m ) = P(v m | s i )  A vector of initial probabilities   =(  i )where  i = P(s i ). 6

7 7 Hidden Markov ModelHidden Markov model unfolded in time

8 8 Markov chain processOutput process

9  Transition Matrix T ij =  Sensor Matrix with U 1 = true, O 1 = 9

10  Forward and backward messages as column vectors: 10

11  Can avoid storing all forward messages in smoothing by running forward algorithm backwards: 11

12 Low High 0.70.3 0.20.8 Dry Rain 0.6 0.4 12

13  Two states: ‘Low’ and ‘High’ atmospheric pressure  Two observations: ‘Rain’ and ‘Dry’  Transition probabilities: P(‘Low’|‘Low’)=0.3, P(‘High’|‘Low’)=0.7, P(‘Low’|‘High’)=0.2, P(‘High’|‘High’)=0.8  Observation probabilities: P(‘Rain’|‘Low’)=0.6, P(‘Dry’|‘Low’)=0.4, P(‘Rain’|‘High’)=0.4, P(‘Dry’|‘High’)=0.3  Initial probabilities: say P(‘Low’)=0.4, P(‘High’)=0.6. 13

14  Suppose we want to calculate a probability of a sequence of observations in our example, {‘Dry’, ’Rain’}  Consider all possible hidden state sequences  P({‘Dry’,’Rain’} ) = P({‘Dry’,’Rain’}, {‘Low’,’Low’}) + P({‘Dry’,’Rain’}, {‘Low’,’High’}) + P({‘Dry’,’Rain’}, {‘High’,’Low’}) + P({‘Dry’,’Rain’}, {‘High’,’High’}) 14

15  The first term can be calculated as : P({‘Dry’,’Rain’}, {‘Low’,’Low’}) =P({‘Dry’,’Rain’} | {‘Low’,’Low’}) * P({‘Low’,’Low’}) =P(‘Dry’|’Low’)*P(‘Rain’|’Low’) * P(‘Low’)*P(‘Low’|’Low) = 0.4*0.4*0.6*0.4*0.3 = 0.088 15

16  Temporal probabilistic agents  Inference: Filtering, prediction, smoothing and most likely explanation  Hidden Markov models  Kalman filters  Dynamic Bayesian Networks  Speech Recognition 16

17 17  System state cannot be measured directly  Need to estimate “optimally” from measurements Measuring Devices Kalman Filter Measurement Error Sources System State (desired but not known) External Controls Observed Measurements Optimal Estimate of System State System Error Sources System Black Box

18 18  A set of mathematical equations  Iterative, recursive process  Optimal data processing algorithm under certain criteria  For linear system and white Gaussian errors, Kalman filter is “best” estimate based on all previous measurementswhite Gaussian  Estimates past, present, future states

19  White noise is a random signal (or process) with a flat power spectral density. In other words, the signal contains equal power within a fixed bandwidth at any center frequency. 19

20  Dependent upon the criteria chosen to evaluate performance  Under certain assumptions, KF is optimal with respect to virtually any criteria that makes sense  Linear data  Gaussian model 20

21  A Kalman filter only needs info from the previous state  Updated for each iteration  Older data can be discarded ▪ Saves computation capacity and storage 21

22  In order to use the Kalman filter to estimate the internal state of a process given only a sequence of noisy observations, one must model the process in accordance with the framework of the Kalman filter.  This means specifying the matrices F k, H k, Q k, R k and sometimes B k for each time-step k. 22

23  x k = state vector, process to examine  w k = process noise  White, Gaussian, Mean= 0, Covariance Matrix Q  v k = measurement noise  White, Gaussian, Mean= 0, Covariance Matrix R  Uncorrelated with w k  S k = Covariance of the innovation, residual  K k = Kalman gain matrix  P k = Covariance of prediction error  z k = Measurement of system state 23

24 24

25 25

26  Relates the new estimate to the most certain of the previous estimates  Large measurement noise -> Small gain  Large system noise -> Large gain  System and measurement noise unchanged  Steady-state Kalman Filter 26

27  The Kalman filter has two distinct phases:  Predict  Update  The predict phase uses the state estimate from the previous timestep to produce an estimate of the state at the current timestep.  In the update phase, measurement information at the current timestep is used to refine this prediction to arrive at a new, (hopefully) more accurate state estimate, again for the current timestep. 27

28  Prediction  The state  The error covariance  Update  Kalman gain  Update with new measurement  Update with new error covariance 28 Update Predict

29  Prediction  Update 29 Update Predict

30 30  Lost on the 1-dimensional line  Position – y(t)  Assume Gaussian distributed measurements y

31 Sextant Measurement at t 1 : Mean = z 1 and Variance =  z1 Optimal estimate of position is: ŷ(t 1 ) = z 1 Variance of error in estimate:  2 x (t 1 ) =  2 z1 Boat in same position at time t 2 - Predicted position is z 1 31

32 So we have the prediction ŷ - (t 2 ) GPS Measurement at t 2 : Mean = z 2 and Variance =  z2 Need to correct the prediction due to measurement to get ŷ(t 2 ) Closer to more trusted measurement – linear interpolation? prediction ŷ - (t 2 ) measurement z(t 2 ) 32

33 33 Corrected mean is the new optimal estimate of position New variance is smaller than either of the previous two variances measurement z(t 2 ) corrected optimal estimate ŷ(t 2 ) prediction ŷ - (t 2 )

34 34 Assume that the system variables, represented by the vector x, are governed by the equation x k+1 = Ax k + w k where w k is random process noise, and the subscripts on the vectors represent the time step. A spacecraft is accelerating with random bursts of gas from its reaction control system thrusters The vector x might consist of position p and velocity v.

35  The system equation would be given by  where a k is the random time-varying acceleration, and T is the time between step k and step k+1. 35

36  The system represented was simulated on a computer with random bursts of acceleration which had a standard deviation of 0.5 feet/sec 2.  The position was measured with an error of 10 feet (one standard deviation).  Software used: MATLAB® 36

37 37

38  Temporal models use state and sensor variables replicated over time  Markov assumptions and stationarity assumption, so we need  { transition mode – P(X t |X t-1 )  { sensor model – P (E t |X t )  Tasks are Filtering, prediction, smoothing, most likely sequence; all done recursively with constant cost per time step 38

39  Hidden Markov models have a single discrete state variable; used for speech recognition  Kalman Filters allow n state variables, linear Gaussian, O(n 3 ) update 39

40  Stuart Russell and Peter Norvig(2003), Artificial Intelligence: A Modern Approach, 2/e, Prentice Hall, ISBN 0-13-790395-2  Uncertain Reasoning over Time, Artificial Intelligence, CMSC 25000, University of Chicago, February 22, 2007  Hidden Markov Models, CS294: Practical Machine Learning, UC Berkeley Oct. 23, 2006 40

41  Probabilistic Reasoning Over Time, CS570, KAIST,2004  Probabilistic Reasoning Over Time, CMSC421 – Fall 2005, University of Maryland  Introduction to Hidden Markov Chains, CS 224S, University of Stanford,  Dan Simon, Kalman Filtering, Innovatia Software, 2009  Wikipedia: Kalman Filtering, HMM, White noise 41


Download ppt "Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1."

Similar presentations


Ads by Google