Kalman Filter: Bayes Interpretation

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Things to do in Lecture 1 Outline basic concepts of causality
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
AGC DSP AGC DSP Professor A G Constantinides©1 Modern Spectral Estimation Modern Spectral Estimation is based on a priori assumptions on the manner, the.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Newton’s Method Application to LMS Recursive Least Squares Exponentially-Weighted.
Observers and Kalman Filters
Visual Recognition Tutorial
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
CS 547: Sensing and Planning in Robotics Gaurav S. Sukhatme Computer Science Robotic Embedded Systems Laboratory University of Southern California
Lecture 8 The Principle of Maximum Likelihood. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Prediction and model selection
Estimation and the Kalman Filter David Johnson. The Mean of a Discrete Distribution “I have more legs than average”
Course AE4-T40 Lecture 5: Control Apllication
Tracking with Linear Dynamic Models. Introduction Tracking is the problem of generating an inference about the motion of an object given a sequence of.
Lecture II-2: Probability Review
Principles of the Global Positioning System Lecture 13 Prof. Thomas Herring Room A;
Principles of the Global Positioning System Lecture 11 Prof. Thomas Herring Room A;
Computational Stochastic Optimization: Bridging communities October 25, 2012 Warren Powell CASTLE Laboratory Princeton University
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I Error: Finding a difference when there is no true difference.
R. Kass/W03P416/Lecture 7 1 Lecture 7 Some Advanced Topics using Propagation of Errors and Least Squares Fitting Error on the mean (review from Lecture.
Kalman Filter (Thu) Joon Shik Kim Computational Models of Intelligence.
Computer Vision - A Modern Approach Set: Tracking Slides by D.A. Forsyth The three main issues in tracking.
Probabilistic Robotics Bayes Filter Implementations.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Robotics Research Laboratory 1 Chapter 7 Multivariable and Optimal Control.
ECE-7000: Nonlinear Dynamical Systems Overfitting and model costs Overfitting  The more free parameters a model has, the better it can be adapted.
An Introduction to Kalman Filtering by Arthur Pece
Dept. E.E./ESAT-STADIUS, KU Leuven
An Introduction To The Kalman Filter By, Santhosh Kumar.
Tracking with dynamics
Cameron Rowe.  Introduction  Purpose  Implementation  Simple Example Problem  Extended Kalman Filters  Conclusion  Real World Examples.
Univariate Gaussian Case (Cont.)
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION Kalman Filter with Process Noise Gauss- Markov.
Linear-Quadratic-Gaussian Problem Study Guide for ES205 Yu-Chi Ho Jonathan T. Lee Feb. 5, 2001.
CHAPTER 4 ESTIMATES OF MEAN AND ERRORS. 4.1 METHOD OF LEAST SQUARES I n Chapter 2 we defined the mean  of the parent distribution and noted that the.
DSP-CIS Part-III : Optimal & Adaptive Filters Chapter-9 : Kalman Filters Marc Moonen Dept. E.E./ESAT-STADIUS, KU Leuven
Probabilistic Robotics Probability Theory Basics Error Propagation Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics.
R. Kass/Sp07P416/Lecture 71 More on Least Squares Fit (LSQF) In Lec 5, we discussed how we can fit our data points to a linear function (straight line)
Data Modeling Patrice Koehl Department of Biological Sciences
The Maximum Likelihood Method
12. Principles of Parameter Estimation
Probability Theory and Parameter Estimation I
The Maximum Likelihood Method
PSG College of Technology
Roberto Battiti, Mauro Brunato
Lecture 10: Observers and Kalman Filters
Ensemble variance loss in transport models:
Hidden Markov Models Part 2: Algorithms
Modern Spectral Estimation
Filtering and State Estimation: Basic Concepts
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
10701 / Machine Learning Today: - Cross validation,
5.2 Least-Squares Fit to a Straight Line
Bayes and Kalman Filter
Markov Decision Problems
Study Guide for ES205 Yu-Chi Ho Jonathan T. Lee Jan. 11, 2001
Principles of the Global Positioning System Lecture 11
Product moment correlation
Kalman Filtering COS 323.
LECTURE 09: BAYESIAN LEARNING
Principles of the Global Positioning System Lecture 13
Optimization of Multistage Dynamic Systems
Parametric Methods Berlin Chen, 2005 References:
11. Conditional Density Functions and Conditional Expected Values
11. Conditional Density Functions and Conditional Expected Values
12. Principles of Parameter Estimation
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Kalman Filter: Bayes Interpretation Study Guide for ES205 Yu-Chi Ho Jonathan T. Lee Jan. 15, 2001 Version: 3.1 Date: 2003-8-17

Outline Example Problem Statement Solutions Propagation of Distribution Applications

Example How do we estimate the state (x) given i+1 observations (z(1), …, z(i+1))? By averaging. The ith observation z(i) = x + (i). The estimate with i+1 observations: The noise in observations, (i)’s, are i.i.d. N

Example (cont.) New Estimate = Old Estimate + Confidence Factor  Correction Term We can re-write the simple averaging into a more meaning form and shown here. The purpose is to show the principle of successive averaging as more and more observations are added. The correction term is also known as the innovation term. This term tells us the new information that is being provided by the new observation that is not captured by the old estimate. It has also been shown that the correction terms are i.i.d. Note, the importance of the correction decreases as the number of observations increases. This is due to the increase in certainty of the state with larger number of observations. N

Example (cont.) Let Then, The formula for P(i+1) and represent the Kalman filter. We continue to manipulate the simple formula to show intuitively, Kalman Filter tries to do average of the observations in a more meaningful way. N

Example (cont.) Kalman filter is a sophisticated way to do averaging. P(i) becomes smaller as a function of i. The importance of the correction term becomes less due to P(i). Since we are able to obtain the Kalman filter equation using averaging, one can think of the Kalman filter as a more sophisticated way to do averaging. The P becomes smaller as we take in more observations. This is due to the fact that we are more certain what is the value of the state. N

Problem Statement z = Hx +  where z is the observation x is the state variables H is the scaling matrix  is the noise in the observation Assume M-1: confidence in prior knowledge of x R-1: confidence in measurement The matrix H is a transformation of the state x. For example, we might not be able to observe all the components of the state vector x but the first component. N

Solution Want to know We know N The observation z is a linear combination of the state, x, and the noise, . Since both the state and the noise are Gaussian, the distribution of z is also a Gaussian. Also, the p(x|z) is Gaussian. Note, the variance of the observation, z, increase due to the noise in the measurement, . In the conditional probability p(z|x), since x is assumed to be correctly known, its variance M will not be considered in considering the variance of z|x, but not z. Here the subscript of the p shows that the distribution of different random variables are different. N

Solution (cont.) Let Then with conditional mean the covariance of error of the estimate The conditional mean can be interpret as the old estimate + confidence factor  correction term. Note, the covariance of the error of the estimate, has a smaller variance than the variance based on prior knowledge. This shows that the observations provide addition information about the state. Thus, we are more certain of what the actual state is. N

Least Square Approach Set Let Then Cost Factors N Here is another way (deterministic) to derive the same results via least squares. Note no probabilistic assumptions are made here. The goal of the first cost term is to minimize the difference between the estimate and the prior knowledge whereas the goal of the second cost term is to minimize the difference between the estimate and the observation. In other words, the estimate tries to be close to both the known knowledge and the new observation. Note, beside the lack of indexing, the equation on this page is the same as the solution based on the Bayesian approach. Thus, it is clear that this is the basic idea of KF. First, it builds an performance criteria based on the physical interpretation. Then use the partial differential equation of J to get the optimum estimation x_hat. Then to formulate the process in a recursive form, introduce P. Finally, with operation, the recursive form of x_hat creates. N

Propagation of Distribution x(i+1) = x(i) + w(i) where Instead of estimating a constant term, x, the state, x(i), changes with time according to the dynamic described in the slide. Since the variance propagate in the recursive formulation, we should analyze the propagation of distribution. The term w(i) is the disturbance to the system at time i. A good example of the disturbance would be the cross wind during the flight of an airplane. Note, the variance, M(i+1), tend to increase after one time step due to the addition of the disturbance variance Q. But the effect of F may magnify or shrink N

Propagation of Distribution (cont.) The prior distribution of the state x(i+1) is, given the estimation from time i: Assume we have x_hat(i), the best estimate of the state at time i, and P(i), the variance of the estimator, based on the state dynamics, the equations above describe the propagation of the mean and variance of the distribution of the state at time (i+1), e.g., x(i+1). In words, P(i) is the variance of the estimator at time i. N

Propagation of Distribution (cont.) Mean: Variance: Note, the variance shrinks after the observation has taken place, since M(i+1) is the variance of the prior to estimation using observation i+1. N

Least Square Approach to Propagation of Distribution Set Once again, we derive the same formula using a least square approach showing the equivalence between the probabilistic and the deterministic approach to estimation. This is the same as the stationary case, except everything is index by time i. The goal of the first cost term is to minimize the difference between the estimate and the prior knowledge whereas the goal of the second cost term is to minimize the difference between the estimate and the observation. In other words, the estimate tries to be close to both the known knowledge and the new observation. Note, beside the lack of indexing, the equation on this page is the same as the solution based on the Bayesian approach. N

Least Square to Propagation (cont.) Let Then Least square derivation continued. Note, the result is the same as the one from slide 13. N

Graphical Demonstration This demonstration shows, once again, that the variance of the state increases as time goes on, without any observation, due to the magnification by the state transition matrix . Once an observation is made, the variance decreases. This process goes on as more time is passed and more observations are being made. N

Continuous Time of Propagation where In the continuous time, the form of the model is the same as the discrete case. The only difference is that instead of difference equation, as in slide 11, we have differential equation which describes the dynamic of the system. Both the state variable and the observation is indexed by the continuous time, t. The indexing of the variables is to emphasize what are stationary and what changes with time. Note, the disturbance to the system, w(t), and the noise in observation, (t), are independent of time. Thus, each of them creates a white noise process. N

Continuous Time (cont.) Mirroring what is being done in the discrete time propagation, the new estimate is equal to the old estimate plus the correction term scaled by the confidence factor. Of course that this is described by the differential equation since we are in continuous time. The variance term needs to account for the variance from the system itself and the disturbance in the system minus the amount of certainty gained through the observation. Since both the estimator and the variance are described by the differential equations, the initial conditions are necessary to specified the constant terms when solving the differential equations. N

Continuous Time (cont.) In the case of purely prediction, e.g., R = : The capital X represents the variance of the state x. In other words, the observations are useless, what can we say about the state as the time evolves? As can be seen by the differential equation for the variance, the variance becomes larger and larger as time goes on. This is due to a lack of observations. Thus, we are less certain of what the current state is as the system evolves over time under the influence of disturbance. R->infinite shows that the uncertainty of the prediction. Since R-1 shows the variance in the observation, in pure prediction, we know nothing as the observation of the future. Thus the variance of such “observation” is infinite. So R->infinite is reasonable. N

Applications Apollo program: to estimate the position of the space craft. Radar gun: to estimate the speed of a vehicle. GPS: to estimate the current location.

References: Bryson, Jr., A. E. and Y.-C. Ho, Applied Optimal Control: Optimization, Estimation, and Control, Taylor & Francis, 1975. Ho, Y.-C., Lecture Notes, Harvard University, 1997.