Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kalman Filter: Bayes Interpretation

Similar presentations


Presentation on theme: "Kalman Filter: Bayes Interpretation"— Presentation transcript:

1 Kalman Filter: Bayes Interpretation
Study Guide for ES205 Yu-Chi Ho Jonathan T. Lee Jan. 15, 2001 Version: 3.1 Date:

2 Outline Example Problem Statement Solutions
Propagation of Distribution Applications

3 Example How do we estimate the state (x) given i+1 observations (z(1), …, z(i+1))? By averaging. The ith observation z(i) = x + (i). The estimate with i+1 observations: The noise in observations, (i)’s, are i.i.d. N

4 Example (cont.) New Estimate = Old Estimate + Confidence Factor  Correction Term We can re-write the simple averaging into a more meaning form and shown here. The purpose is to show the principle of successive averaging as more and more observations are added. The correction term is also known as the innovation term. This term tells us the new information that is being provided by the new observation that is not captured by the old estimate. It has also been shown that the correction terms are i.i.d. Note, the importance of the correction decreases as the number of observations increases. This is due to the increase in certainty of the state with larger number of observations. N

5 Example (cont.) Let Then,
The formula for P(i+1) and represent the Kalman filter. We continue to manipulate the simple formula to show intuitively, Kalman Filter tries to do average of the observations in a more meaningful way. N

6 Example (cont.) Kalman filter is a sophisticated way to do averaging.
P(i) becomes smaller as a function of i. The importance of the correction term becomes less due to P(i). Since we are able to obtain the Kalman filter equation using averaging, one can think of the Kalman filter as a more sophisticated way to do averaging. The P becomes smaller as we take in more observations. This is due to the fact that we are more certain what is the value of the state. N

7 Problem Statement z = Hx +  where z is the observation x is the state variables H is the scaling matrix  is the noise in the observation Assume M-1: confidence in prior knowledge of x R-1: confidence in measurement The matrix H is a transformation of the state x. For example, we might not be able to observe all the components of the state vector x but the first component. N

8 Solution Want to know We know N
The observation z is a linear combination of the state, x, and the noise, . Since both the state and the noise are Gaussian, the distribution of z is also a Gaussian. Also, the p(x|z) is Gaussian. Note, the variance of the observation, z, increase due to the noise in the measurement, . In the conditional probability p(z|x), since x is assumed to be correctly known, its variance M will not be considered in considering the variance of z|x, but not z. Here the subscript of the p shows that the distribution of different random variables are different. N

9 Solution (cont.) Let Then with conditional mean the covariance of error of the estimate The conditional mean can be interpret as the old estimate + confidence factor  correction term. Note, the covariance of the error of the estimate, has a smaller variance than the variance based on prior knowledge. This shows that the observations provide addition information about the state. Thus, we are more certain of what the actual state is. N

10 Least Square Approach Set Let Then Cost Factors N
Here is another way (deterministic) to derive the same results via least squares. Note no probabilistic assumptions are made here. The goal of the first cost term is to minimize the difference between the estimate and the prior knowledge whereas the goal of the second cost term is to minimize the difference between the estimate and the observation. In other words, the estimate tries to be close to both the known knowledge and the new observation. Note, beside the lack of indexing, the equation on this page is the same as the solution based on the Bayesian approach. Thus, it is clear that this is the basic idea of KF. First, it builds an performance criteria based on the physical interpretation. Then use the partial differential equation of J to get the optimum estimation x_hat. Then to formulate the process in a recursive form, introduce P. Finally, with operation, the recursive form of x_hat creates. N

11 Propagation of Distribution
x(i+1) = x(i) + w(i) where Instead of estimating a constant term, x, the state, x(i), changes with time according to the dynamic described in the slide. Since the variance propagate in the recursive formulation, we should analyze the propagation of distribution. The term w(i) is the disturbance to the system at time i. A good example of the disturbance would be the cross wind during the flight of an airplane. Note, the variance, M(i+1), tend to increase after one time step due to the addition of the disturbance variance Q. But the effect of F may magnify or shrink N

12 Propagation of Distribution (cont.)
The prior distribution of the state x(i+1) is, given the estimation from time i: Assume we have x_hat(i), the best estimate of the state at time i, and P(i), the variance of the estimator, based on the state dynamics, the equations above describe the propagation of the mean and variance of the distribution of the state at time (i+1), e.g., x(i+1). In words, P(i) is the variance of the estimator at time i. N

13 Propagation of Distribution (cont.)
Mean: Variance: Note, the variance shrinks after the observation has taken place, since M(i+1) is the variance of the prior to estimation using observation i+1. N

14 Least Square Approach to Propagation of Distribution
Set Once again, we derive the same formula using a least square approach showing the equivalence between the probabilistic and the deterministic approach to estimation. This is the same as the stationary case, except everything is index by time i. The goal of the first cost term is to minimize the difference between the estimate and the prior knowledge whereas the goal of the second cost term is to minimize the difference between the estimate and the observation. In other words, the estimate tries to be close to both the known knowledge and the new observation. Note, beside the lack of indexing, the equation on this page is the same as the solution based on the Bayesian approach. N

15 Least Square to Propagation (cont.)
Let Then Least square derivation continued. Note, the result is the same as the one from slide 13. N

16 Graphical Demonstration
This demonstration shows, once again, that the variance of the state increases as time goes on, without any observation, due to the magnification by the state transition matrix . Once an observation is made, the variance decreases. This process goes on as more time is passed and more observations are being made. N

17 Continuous Time of Propagation
where In the continuous time, the form of the model is the same as the discrete case. The only difference is that instead of difference equation, as in slide 11, we have differential equation which describes the dynamic of the system. Both the state variable and the observation is indexed by the continuous time, t. The indexing of the variables is to emphasize what are stationary and what changes with time. Note, the disturbance to the system, w(t), and the noise in observation, (t), are independent of time. Thus, each of them creates a white noise process. N

18 Continuous Time (cont.)
Mirroring what is being done in the discrete time propagation, the new estimate is equal to the old estimate plus the correction term scaled by the confidence factor. Of course that this is described by the differential equation since we are in continuous time. The variance term needs to account for the variance from the system itself and the disturbance in the system minus the amount of certainty gained through the observation. Since both the estimator and the variance are described by the differential equations, the initial conditions are necessary to specified the constant terms when solving the differential equations. N

19 Continuous Time (cont.)
In the case of purely prediction, e.g., R = : The capital X represents the variance of the state x. In other words, the observations are useless, what can we say about the state as the time evolves? As can be seen by the differential equation for the variance, the variance becomes larger and larger as time goes on. This is due to a lack of observations. Thus, we are less certain of what the current state is as the system evolves over time under the influence of disturbance. R->infinite shows that the uncertainty of the prediction. Since R-1 shows the variance in the observation, in pure prediction, we know nothing as the observation of the future. Thus the variance of such “observation” is infinite. So R->infinite is reasonable. N

20 Applications Apollo program: to estimate the position of the space craft. Radar gun: to estimate the speed of a vehicle. GPS: to estimate the current location.

21 References: Bryson, Jr., A. E. and Y.-C. Ho, Applied Optimal Control: Optimization, Estimation, and Control, Taylor & Francis, 1975. Ho, Y.-C., Lecture Notes, Harvard University, 1997.


Download ppt "Kalman Filter: Bayes Interpretation"

Similar presentations


Ads by Google