# Probabilistic Reasoning over Time

## Presentation on theme: "Probabilistic Reasoning over Time"— Presentation transcript:

Probabilistic Reasoning over Time
Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 Presented to: Prof. Dr. S. M. Aqil Burney Presented by: Zain Abbas (MSCS-UBIT)

Agenda Temporal probabilistic agents
Inference: Filtering, prediction, smoothing and most likely explanation Hidden Markov models Kalman filters Dynamic Bayesian Networks

Stochastic (Random) Process
A process that grows in space or time in accordance with some probability distribution. In the simplest possible case ("discrete time"), a stochastic process amounts to a sequence of random variables

Markov Chain A stochastic process (family of random variables) {Xn, n = 0, 1, 2, . . .}, satisfying it takes on a finite or countable number of possible values. If Xn = i, the process is said to be in state i at time n whenever the process is in state i , there is a fixed probability Pij that it will next be in state j . Formally: P{Xn+1= j |Xn = i ,Xn−1= in−1, ,X1= i1,X0= i0 } = Pij for all states i0, i1, , in−1, in, i , j and all n≥0.

Hidden Markov Model Set of states:
Process moves from one state to another generating a sequence of states : Markov chain property: probability of each subsequent state depends only on what was the previous state: States are not visible, but each state randomly generates one of M observations (or visible states)

Hidden Markov Model A=(aij) where aij= P(si | sj)
To define hidden Markov model, the following probabilities have to be specified: Matrix of transition probabilities A=(aij) where aij= P(si | sj) Matrix of observation probabilities B=( bi (vm ) ) where bi(vm ) = P(vm | si) A vector of initial probabilities =(i) where i = P(si) .

Hidden Markov model unfolded in time
HMM ( Graphical View) Hidden Markov Model Hidden Markov model unfolded in time

Summary of the Concept Markov chain process Output process

Earlier Example Transition Matrix Tij =
Sensor Matrix with U1= true, O1=

Messages as column vectors
Forward and backward messages as column vectors:

Messages as column vectors
Can avoid storing all forward messages in smoothing by running forward algorithm backwards:

Example Low High 0.7 0.3 0.2 0.8 0.6 0.6 0.4 0.4 Rain Dry

Example Two states: ‘Low’ and ‘High’ atmospheric pressure
Two observations: ‘Rain’ and ‘Dry’ Transition probabilities: P(‘Low’|‘Low’)=0.3 , P(‘High’|‘Low’)=0.7 , P(‘Low’|‘High’)=0.2, P(‘High’|‘High’)=0.8 Observation probabilities: P(‘Rain’|‘Low’)=0.6 , P(‘Dry’|‘Low’)=0.4 , P(‘Rain’|‘High’)=0.4 , P(‘Dry’|‘High’)=0.3 Initial probabilities: say P(‘Low’)=0.4 , P(‘High’)=0.6 .

Calculation of probabilities
Suppose we want to calculate a probability of a sequence of observations in our example, {‘Dry’, ’Rain’} Consider all possible hidden state sequences P({‘Dry’,’Rain’} ) = P({‘Dry’,’Rain’} , {‘Low’,’Low’}) + P({‘Dry’,’Rain’} , {‘Low’,’High’}) + P({‘Dry’,’Rain’} , {‘High’,’Low’}) + P({‘Dry’,’Rain’} , {‘High’,’High’})

Calculation of probabilities
The first term can be calculated as : P({‘Dry’,’Rain’} , {‘Low’,’Low’}) =P({‘Dry’,’Rain’} | {‘Low’,’Low’}) * P({‘Low’,’Low’}) =P(‘Dry’|’Low’)*P(‘Rain’|’Low’) * P(‘Low’)*P(‘Low’|’Low) = 0.4*0.4*0.6*0.4*0.3 = 0.088

Agenda Temporal probabilistic agents
Inference: Filtering, prediction, smoothing and most likely explanation Hidden Markov models Kalman filters Dynamic Bayesian Networks

Kalman Filters System state cannot be measured directly
Black Box System Error Sources External Controls System System State (desired but not known) Optimal Estimate of System State Observed Measurements Measuring Devices Kalman Filter Measurement Error Sources System state cannot be measured directly Need to estimate “optimally” from measurements

What is a Kalman Filter? A set of mathematical equations
Iterative, recursive process Optimal data processing algorithm under certain criteria For linear system and white Gaussian errors, Kalman filter is “best” estimate based on all previous measurements Estimates past, present, future states

White Gaussian Noise White noise is a random signal (or process) with a flat power spectral density. In other words, the signal contains equal power within a fixed bandwidth at any center frequency.

Optimal Dependent upon the criteria chosen to evaluate performance
Under certain assumptions, KF is optimal with respect to virtually any criteria that makes sense Linear data Gaussian model

Recursive A Kalman filter only needs info from the previous state
Updated for each iteration Older data can be discarded Saves computation capacity and storage

Variables In order to use the Kalman filter to estimate the internal state of a process given only a sequence of noisy observations, one must model the process in accordance with the framework of the Kalman filter. This means specifying the matrices Fk , Hk , Qk , Rk and sometimes Bk for each time-step k .

Variables xk = state vector, process to examine wk = process noise
White, Gaussian, Mean=0, Covariance Matrix Q vk = measurement noise White, Gaussian, Mean=0, Covariance Matrix R Uncorrelated with wk Sk = Covariance of the innovation, residual Kk = Kalman gain matrix Pk = Covariance of prediction error zk= Measurement of system state

Equations

More Equations

Kalman gain Relates the new estimate to the most certain of the previous estimates Large measurement noise -> Small gain Large system noise -> Large gain System and measurement noise unchanged Steady-state Kalman Filter

Kalman Filter The Kalman filter has two distinct phases:
Predict Update The predict phase uses the state estimate from the previous timestep to produce an estimate of the state at the current timestep. In the update phase, measurement information at the current timestep is used to refine this prediction to arrive at a new, (hopefully) more accurate state estimate, again for the current timestep.

Iterative calculations
Prediction The state The error covariance Update Kalman gain Update with new measurement Update with new error covariance Update Predict

Iterative calculations
Prediction Update Update Predict

Example Lost on the 1-dimensional line Position – y(t)
Assume Gaussian distributed measurements

Example Sextant Measurement at t1: Mean = z1 and Variance = z1
Optimal estimate of position is: ŷ(t1) = z1 Variance of error in estimate: 2x (t1) = 2z1 Boat in same position at time t2 - Predicted position is z1

Example So we have the prediction ŷ-(t2)
measurement z(t2) So we have the prediction ŷ-(t2) GPS Measurement at t2: Mean = z2 and Variance = z2 Need to correct the prediction due to measurement to get ŷ(t2) Closer to more trusted measurement – linear interpolation? 32

Example Corrected mean is the new optimal estimate of position
prediction ŷ-(t2) corrected optimal estimate ŷ(t2) measurement z(t2) Corrected mean is the new optimal estimate of position New variance is smaller than either of the previous two variances

Example – Accelerating Spacecraft
Assume that the system variables, represented by the vector x, are governed by the equation xk+1 = Axk + wk where wk is random process noise, and the subscripts on the vectors represent the time step. A spacecraft is accelerating with random bursts of gas from its reaction control system thrusters The vector x might consist of position p and velocity v.

Example – Accelerating Spacecraft
The system equation would be given by where ak is the random time-varying acceleration, and T is the time between step k and step k+1.

Example – Accelerating Spacecraft
The system represented was simulated on a computer with random bursts of acceleration which had a standard deviation of 0.5 feet/sec2. The position was measured with an error of 10 feet (one standard deviation). Software used: MATLAB®

Example – Accelerating Spacecraft

Similar presentations