Presentation on theme: "ECMWF Data Assimilation Training Course - Kalman Filter Techniques"— Presentation transcript:
1 ECMWF Data Assimilation Training Course - Kalman Filter Techniques Mike Fisher
2 Kalman Filter – Derivation Consider a general linear analysis:where y is a vector of observations, xb is a background. K and L are matrices.Suppose also that we have an operator H that takes us from model space to observation space.Assume this can be done without error, so that:We require that, if y and xb are error-free, the analysis should also be error-free:I.e.
3 Kalman Filter – Derivation Consider any analysis of the form:Define errors,Then:Where is representativeness error.( is often ignored.)Assume that the errors are unbiased.
4 Kalman Filter – Derivation Assume that is uncorrelated withThe covariance matrix for the analysis error is:
5 Kalman Filter – Derivation Use the following matrix calculus identities:To get:
6 Kalman Filter – Derivation Minimum variance =>I.e.This is the Kalman gain matrix
7 Kalman Filter – Derivation The Kalman filter consists of two sets of equations.The first set define the minimum–variance, linear analysis, and its covariance matrix:where:
8 Kalman Filter – Derivation The second set of equations describe how to propagate the state and the covariance matrix so that they can be used for the background for the next cycle of analysis.For the state, we have:where is a linear model.The equation for the error is:where is the model error.
9 Kalman Filter – Derivation Assume that model error is unbiased, and uncorrelated with analysis error, and form the covariance matrix:I.e.
10 Kalman Filter – Derivation The Kalman Filter Equations:
11 Extended Kalman Filter (EKF) The Extended Kalman Filter is an ad hoc generalization of the Kalman filter to weakly non-linear systems.The forecast model and observation operators are allowed to be non-linear:The matrices and in the equations for and are the linearized about andNB: The EKF requires tangent linear and adjoint codes!The Iterated Extended Kalman Filter (IEKF) repeats the linearization of Hk as soon as a better estimate for the state is available – much like incremental 4dVar.
12 Kalman Filter for Large Dimensions The Kalman filter is impractical for large dimension systems.It requires us to handle matrices of dimension N×N, where N is the dimension of the state vector.This is out of the question if N~106.Propagating the covariance matrix using:requires N integrations of the tangent linear model.Even the matrix multiplies required to construct are prohibitively expensive.A range of reduced-rank, approximate Kalman filters have been developed for use in large systems.
13 Reduced-Rank MethodsReduced-rank methods approximate the Kalman filter by restricting the covariance equations to a small subspace.Suppose we can write , where is N×K, with K small (e.g. K~100).The Kalman gain becomes :The initial in this equation means that the analysis error covariance is also restricted to the subspace:
14 Reduced-Rank MethodsHence, the covariance may be propagated using only K integrations of the linear model:We can then project onto a new subspace to generate an approximate covariance matrix for use in the next analysis cycle.
15 Reduced-Rank MethodsIt is important to remember that reduced-rank Kalman filters are approximations to the full Kalman filter.They are not optimal.How sub-optimal are they, e.g. compared to 4dVar?The jury is still out!A particular defect is the leading in:This means that the analysis increment is restricted to lie in the space spanned by the columns ofThis is sometimes called the “rank problem”.
16 Reduced-Rank MethodsAnother consequence of approximating by a low-rank matrix is that spurious long-range correlations are produced.Example:Suppose there is a spurious long-range correlation between Antarctica and Europe.The analysis will find it difficult to generate increments over Antarctica, since these will contradict the observations over Europe.More generally, the analysis will not have enough available degrees of freedom to allow it to fit all the observations (100 degrees of freedom v. 105 obs!).Two ways around this problem are:Local analysis (e.g. Evensen, 2003, Ocean Dynamics pp )Schur product (e.g. Houtekamer and Mitchell, 2001, MWR pp )
17 Ways Around the Rank Problem Local analysis solves the analysis equations independently for each gridpoint (or for each of a set of regions).Each analysis uses only observations that are local to the gridpoint (or region).In this way, the analysis at each gridpoint (or region) is not influenced by distant observations.The global analysis is no longer a linear combination of the spanning vectors.The method acts to vastly increase the effective rank of the covariance matrix.The analysis is sub-optimal because, at each gridpoint, only a subset of available information is used.
18 Ways Around the Rank Problem The Schur product of two matrices, denoted , is the element-wise product:Spurious, long-range correlations may be removed from by forming the Schur product of the covariance matrix with a matrix representing a decaying function of distance.The modified covariance matrix is never explicitly formed (it is too big). Rather, the method deals with terms such asThe Schur product also has the effect of vastly increasing the effective rank of the covariance matrix.Choosing the product function is non-trivial. It is easy to modify the correlations in undesirable ways. E.g. balance relationships may be adversely affected.
19 Ensemble MethodsEnsemble Kalman filters are reduced-rank Kalman filters that construct their covariance matrices as sample covariance matrices:where the index, i, refers to sample (ensemble) member.is a sample (ensemble) of background states whose sample covariance matrix is an estimate of the true background error covariance matrix.
20 Ensemble MethodsThe terms involving in the analysis equation are represented as sample error covariance matrices:It is never necessary to explicitly form the N×N background error covariance matrix.No tangent linear or adjoint observation operators or models are required.
21 Ensemble MethodsThe sample of background states is generated from a sample of analysis states, :where is a noise process with covariance matrix Qk+1.NB: The full nonlinear model is used to propagate the states.The random sample , may be generated by perturbing the observations with random noise drawn from the covariance matrix of observation error (Burgers et al., 1998, MWR pp ).This method is similar to the analysis-ensemble method for generating Jb statistics.
22 Ensemble MethodsAdding noise to the observations results in a small amount of additional sampling noise.This additional noise is avoided in the Ensemble Adjustment Kalman Filter (EAKF, Anderson 2001, MWR ) and the Ensemble Transform Kalman Filter (ETKF, Bishop et al. 2001, MWR ).
23 Ensemble MethodsThe ensemble adjustment Kalman filter avoids the need to add noise by implicitly calculating a matrix A, such that:andThe ensemble transform Kalman filter calculates T such that Va represents an analysis sample in:These methods can be more accurate than the perturbed-observation method, but they make heavier demands on the linearity of the underlying system, and on the Gaussian assumption for the statistics.
24 Other Low-Rank Methods Ensemble methods are popular and attractive because they don’t require adjoint or tangent linear codes. However, a random basis is unlikely to be optimal.Singular vectors, bred modes, etc. can be used to define deterministic subspaces for reduced-rank Kalman filtering that attempt to capture important aspects of covariance evolution.ECMWF RRKF (R.I.P.)defined a subspace that evolved into the leading eigenvectors of the forecast error covariance matrix at day 2.SEEK, SEIK, SSEIK, SEPLK (Pham et al, 1998)a plethora of evolving/partially-evolving subspaces and a plague of acronyms!Reduced Order Kalman Filter (Farrell and Ioannou, 2001)uses model-reduction techniques to define an optimal subspace.
25 Non-Gaussian MethodsParticle filters approximate the forecast pdf by a discrete distribution:An ensemble of forecasts, x(1)...x(K) is run. Each member of the ensemble has an associated weight, w(i).When an observation is available, the weights are adjusted using Bayes’ theorem. E.g:Eventually, the weights for some members become tiny.These members are dropped from the ensemble, and replaced by new, more probable members.
27 Non-Gaussian MethodsParticle filters work well for highly-nonlinear, low- dimensional systems.Successful applications include missile tracking, computer vision, etc. (see the book “Sequential Monte Carlo Methods in Practice”, Doucet, de Freitas, Gordon (eds.), 2001)van Leeuwen (2003) has successfully applied the technique for an ocean model.The main problem to be overcome is that, for a large- dimensional system, with lots of observations, almost any forecast will contradict an observation somewhere on the globe.=> Every cycle, unless the ensemble is truly enormous, all the particles (forecasts) are highly unlikely (given the obseravtions).van Leeuwen has recently suggested methods that may get around this problem.
28 Non-Gaussian MethodsEstimated initial location of robotWhere am I?Actual initial location of robotfrom: Fox et al. 1999, proc 16th National Conference on Artificial Intelligence
29 Non-Gaussian Methodsfrom: Fox et al. 1999, proc 16th National Conference on Artificial Intelligence
30 The Non-Sequential Approach All the preceding is based on the sequential (recursive) view of the filtering problem:An optimal estimate for step k+1 is produced using only the state and covariance matrices from step k.All the information from earlier steps is brought to the present step via the covariance matrices.The advantage of the sequential approach is that we don’t need to go back any further than the previous step in order to determine the current analysis.The disadvantage is that we must explicitly propagate the covariance matrices.For very large systems, the matrices are so enormous that a non-sequential approach may be preferable.
31 Equivalence of Kalman Smoother and 4dVar Suppose we want to produce the optimal estimate of the states x0...xK, at steps 0...K, given observations y0...yK at steps 0...K, and a background state xb at step 0.Assuming errors are Gaussian, the probability of x0...xK given y0...yK and xb is:
32 Equivalence of Kalman Smoother and 4dVar Taking the logarithm gives us the weak-constraint 4dVar cost function:The maximum likelihood solution is given by the minimum of the cost function.For Gaussians, this is also the minimum-variance solution.Hence, at step K, weak-constraint 4dVar gives the same (minimum-variance) solution as the Kalman filter.
33 Equivalence of Kalman Smoother and 4D-Var The solution of the minimization problem differs from the Kalman filter solution at steps 0…K-1.The 4dVar solution for step k is optimal with respect to observations at steps 0…K.The Kalman filter solution at step k is optimal only with respect to observations at steps 0…k.I.e. weak constraint 4D-Var is equivalent to the Kalman smoother.A purely algebraic proof of this equivalence is given by Ménard and Daley (1996, Tellus 48A, ). (See also Li and Navon (2001) QJRMS, )This proof shows that the equivalence does not depend on the statistical assumptions made in formulating the analysis problem.
34 The Non-Sequential Approach So, to determine the optimal state at step K, given observations at steps 0...K, and a background at step 0, we can:Either: run a sequential Kalman filter, starting from the background at step 0, and updating the N×N covariance matrices at each step.Or: minimize the weak-constraint 4dVar cost function using all observations for steps 0…K.The sequential approach is impractical for large N.In principle, the 4dVar approach becomes impractical as K becomes large.What may save 4dVar is the limited memory of the Kalman filter.The analysis is insensitive to observations in the distant past=> We can minimize over steps K-p...K, instead of 0...K.
35 Martin Leutbecher’s “Planet L95” EKF (Lorenz, 1995, ECMWF Seminar onPredictability,and Lorenz and Emanuel, 1998)unit time ~ 5 daysChaotic system: 13 positive Lyapunov exponents.The largest exponent corresponds to a doubling time of 2.1 days.
36 Demonstration of Long-Window 4dVar for the L95 Toy Problem Weak-constraint 4dVar was run for 230 days, with window lengths of 1-10 days.One cycle of analysis performed every 6 hours.NB: Analysis windows overlap.Obs every 6h at 3 out of every 5 grid-points.First guess was constructed from the overlapping part of the preceding cycle, plus a 6-hour forecast:Quadratic cost function!No background term in the cost function!analysisfc
37 Mean RMS Analysis Error RMS error at end of 4dVar window.NB: No background term!RMS error for OIRMS error for EKF
38 Analysis experiments started with/without satellite data on 1st August 2002 More EvidenceMemory of the initial statedisappears after approx. 7 daysfrom: Graeme Kelly
39 Analysis experiment started with satellite data reintroduced on 16th August 2005 Limited MemoryMemory of the initial statedisappears after approx. 3 days
40 Analysis experiment started with satellite data reintroduced on 16th August 2005 Limited MemoryMemory of the initial statedisappears after approx. 3 days
41 Long-Window, Weak-Constraint 4dVar Long window, weak constraint 4dVar is equivalent to the Kalman filter, provided we make the window long enough.For NWP, 3-10 days is long enough.Long-window, weak constraint 4dVar is feasible (but expensive).The resulting Kalman filter is full-rank.
42 SummaryThe Kalman filter is the optimal (minimum-variance) analysis for a linear system.For weakly nonlinear systems, the extended Kalman filter can be used, but it needs adjoints and tangent linear models and observation operators.Ensemble methods are relatively easy to develop (no adjoints required), but little rigorous investigation of how well they approximate a full-rank Kalman filter has been carried out.Particle filters are interesting, but it is not clear how useful they are for large-dimension systems.For large systems, the covariance matrices are too big to be handled.We must either use low-rank approximations of the matrices (and risk destroying the advantages of the Kalman filter)Or use the non-sequential approach (weak-constraint 4dVar).