An introduction to data assimilation for the geosciences Ross Bannister Amos Lawless Alison Fowler National Centre for Earth Observation School of Mathematics.

An introduction to data assimilation for the geosciences Ross Bannister Amos Lawless Alison Fowler National Centre for Earth Observation School of Mathematics and Physical Sciences University of Reading (A)Introductory lecture (B) Variational intro + practical (C) Kalman filter + practical DA ‘surgery’

NCEO Early Career Science Conference 16th – 18th April 2012 Introduction to data assimilation Page 2 of 20 What is data assimilation? What is the temperature, T, of the fluid inside each jar as a function of time, t? measurement at t=0: thermometer y A (0) radiometer y B (0) (in-situ)(remotely sensed) modelT A (t) = T env + (T A (0)-T env ) × exp –α A tT B (t) = T env + (T B (0)-T env ) × exp –α B t measurement at t=t: thermometer y A (t) radiometer y B (t) A B

NCEO Early Career Science Conference 16th – 18th April 2012 Introduction to data assimilation Page 3 of 20 What is data assimilation? Data assimilation is concerned with how we combine these pieces of information to obtain the best possible knowledge of the system as a function of time. Observations+ gauge of uncertainty Model estimates+ gauge of uncertainty Data assimilation →Combined estimate+ gauge of uncertainty probability possible Note on uncertainty: value (observed or modelled) Gaussian with std dev. σ = √ “All models are wrong …” (George Box) “All models are wrong and all observations are inaccurate” (a data assimilator)

NCEO Early Career Science Conference 16th – 18th April 2012 Introduction to data assimilation Page 4 of 20 What is data assimilation? start of the system time = observation x true (t) (unknown) x f (t 1 ) x a (t 1 ) x f (t 2 ) x a (t 2 ) x f (t 3 ) This is an example of a ‘filter’ Data assimilation has: prediction stages (x f = ‘forecast’, ‘prior’, ‘background’) analysis stages (x a ) (extrapolation) (interpolation)

NCEO Early Career Science Conference 16th – 18th April 2012 Introduction to data assimilation Page 5 of 20 What is data assimilation? “[The atmosphere] is a chaotic system in which errors introduced into the system can grow with time … As a consequence, data assimilation is a struggle between chaotic destruction of knowledge and its restoration by new observations.” Leith (1993)

NCEO Early Career Science Conference 16th – 18th April 2012 Introduction to data assimilation Page 6 of 20 Outline and references What is data assimilation? Applications of data assimilation in the geosciences A prototype data assimilation system Indirect observations and prior knowledge Errors Leading data assimilation methods Essential mathematics Challenges, subtleties, caveats, … References: Kalnay, 2003, Atmospheric Modeling, Data Assimilation and Predictability. Daley, 1991, Atmospheric Data Analysis. Lorenc, 2003, The potential of the ensemble Kalman Filter for NWP – a comparison with 4d-Var, QJRMS 129, 3183-3203. van Leeuwen, Particle filtering in geophysical systems. Rodgers, 2000, Inverse methods for atmospheric sounding, theory and practice, World Scientific, Singapore. Wang X., Snyder C., Hamill T.M., 2007, On the theoretical equivalence of differently proposed ensemble-3D-Var hybrid analysis schemes, Mon. Wea. Rev. 135. pp. 222-227.

NCEO Early Career Science Conference 16th – 18th April 2012 Introduction to data assimilation Page 7 of 20 Applications of data assimilation in the geosciences Atmospheric retrievals H L L Atmospheric dynamics / NWP Inverse modelling for sources/sinks Reanalysis Atmospheric chemistry Hydrological cycle Carbon cycle Oceanography Parameter estimation α, β, γ

NCEO Early Career Science Conference 16th – 18th April 2012 Introduction to data assimilation Page 8 of 20 A prototype data assimilation problem Consider two sources of information (e.g. measurements), x 1 ± σ 1 and x 2 ± σ 2 that each estimate x (assume Gaussian statistics) p n (x n |x) δx n : “the probability that the data x n lies between x n and x n +δx n given that the ‘true’ value is x” The joint probability is p 1 (x 1 |x) δx 1 p 2 (x 2 |x) δx 2 (“the probability that x 1 is … and x 2 is … given x”) In the above theory, x is known and x 1 and x 2 are unknown. Now introduce actual information x 1 and x 2 : now x 1 and x 2 are known and x is unknown. What x maximizes p(x 1, x 2 |x)? Combining imperfect data

NCEO Early Career Science Conference 16th – 18th April 2012 Introduction to data assimilation Page 9 of 20 A prototype data assimilation problem Combining imperfect data What x maximizes p(x 1, x 2 |x)? The same x that minimizes the ‘cost function’ To minimize, look for stationary values of I: If information source 1 is much more accurate than information source 2, then σ 1 << σ 2 : If information source 2 is much more accurate than information source 1, then σ 2 << σ 1 :

NCEO Early Career Science Conference 16th – 18th April 2012 Introduction to data assimilation Page 10 of 20 Indirect observations and prior information If x 1 and x 2 were measurements, they are direct measurements of x. Many observations are indirect. E.g. Interested in (x) …Have observations of (y) … Atmospheric T, O 3, q, ρ x Infrared radiances from satellite Atmospheric T, qTime delays from GPS satellite Sources of trace gasesTrace gas measurements Leaf area indexOptical reflectance from satellite Sea surface temperatureInfrared or microwave radiances from satellite PrecipitationRadar reflectivity Generalise: x is the state vector (n elements) y mo is the model’s version of the observations (mo=“model observations”) (p elements) h is the forward model or observation operator (input n elements, output p elements) y is the observation vector (p elements) Strategy: what x gives best fit between y and y mo ?

NCEO Early Career Science Conference 16th – 18th April 2012 Introduction to data assimilation Page 11 of 20 Indirect observations and prior information The structure of the state vector (for the example of meteorological fields u, v, θ, p, q are 3-D fields; λ, φ and ℓ are longitude, latitude and vertical level). There are n elements in total. The observation vector – comprising each observation made. There are p observations. model parameters

NCEO Early Career Science Conference 16th – 18th April 2012 Introduction to data assimilation Page 12 of 20 Indirect observations and prior information Examples of h For in-situ observations, h is an interpolation function. For radiance observations, h is a radiative transfer operator. For observations at a later time than that of x, h includes a forecast model. Prior information Often the observations are insufficient to determine x. Introduce prior information (a-priori, background, first guess, forecast), x f. One strategy (variational assimilation) to solving the assimilation problem is to ask: “What x (called x a [in earlier slide this was called x e ]) gives: y mo that is the closest possible to y and x that is the closest possible to x f ?” Construct a cost functional and minimize w.r.t. x (a generalized least-squares problem).

NCEO Early Career Science Conference 16th – 18th April 2012 Introduction to data assimilation Page 13 of 20 Indirect observations and prior information Square of length of vector Error covariance matrices define the norm (these respect the uncertainty of x f and y and are important!) P f forecast (or background) error covariance matrix (n × n matrix). Sometimes called B. R observation error covariance matrix (p × p matrix). This cost function can be derived from Bayes’ Theorem by assuming forecast and obs errors obey Gaussian stats, has argument, x (think of as a control variable), may be extended to include fit to other unknowns in the system (e.g. the fact that h is imperfect, including model parameters.

NCEO Early Career Science Conference 16th – 18th April 2012 Introduction to data assimilation Page 14 of 20 Errrors everywhere Random errors: background (a-priori) errors observation errors model errors representivity errors Systematic errors: biases in background biases in observations biases in model All significant sources of uncertainty should be accounted for in data assimilation Example 1 – repeated observations of air temperature y (T observations) truth unbiased thermometer truth biased thermometer Example 2 – representivity errors due to model grid

NCEO Early Career Science Conference 16th – 18th April 2012 Introduction to data assimilation Page 15 of 20 Leading methods of solving the DA problem Variational-type approach Kalman filter-type approach (linear obs operator, H t x t = h t (x t ) ← analysis update at time t ← analysis error covariance ← forecast ← forecast error covariance Model error covariance matrix Linear forecast model

NCEO Early Career Science Conference 16th – 18th April 2012 Introduction to data assimilation Page 16 of 20 Leading methods of solving the da problem Ensemble Kalman filter-type approach Have N ensemble members (index i, 1 ≤ i ≤ N). Differences between them represent uncertainty. Approximate the forecast error covariance matrix with an ensemble to make manageable the Kalman update equation for n << p A superposition of ensemble members But beware...

NCEO Early Career Science Conference 16th – 18th April 2012 Introduction to data assimilation Page 17 of 20 Leading methods of solving the da problem MethodDescriptionProsCons A. Data insertion Set grid points to observation values 1.Easy to do1.No respect of uncertainty 2.What about observation voids? 3.Can’t deal with indirect observations B. Variational data assimilation Minimize a cost function Many flavours: 3D, 4D, weak/strong constraint 1.Respect of data uncertainty 2.Direct and indirect observations 3.P f gives smooth and balanced fields 4.Efficient 5.Can deal with (weakly) non-linear h 1.P f is difficult to know, often static and suboptimal 2.High development costs 3.h: need tangent linear, H and adjoint, H T 4.Gaussian pdf C. Kalman filtering Evaluate KF equations 1.As B.1, B.2, B.3 2.P f adapts with the state 1.As B.3, B.4 2.Difficult to use with non-linear h 3.Prohibitively expensive for large n D. Ensemble Kalman filtering Approximate KF equations with ensemble of N model runs Many flavours 1.As B.1,B.2, B.4, B.5, C.2 2.h: do not need H and H T 3.Have measure of analysis spread 1.As B.4 2.Serious sampling issues when N << n 3.Need ensemble inflation and localization schemes to overcome D.2 E. HybridCross between C/D1.As B.1, B.2, B.3, B.4, B.5, C.21.As D.2 F. Particle filter Assign weights to ensemble members to represent any pdf 1.As. B.1, B.2 2.Can deal with non-linear h 3.Can deal with non-Gaussian pdf 4.Have measure of analysis spread 1.As D.2 2.Inefficient – members often become redundant 3.Need special techniques to overcome F.2

NCEO Early Career Science Conference 16th – 18th April 2012 Introduction to data assimilation Page 18 of 20 Mathematics required Vector representation of fields Matrix algebra Linear vector spaces Matrix inversion Vector derivative Generalized chain rule Jacobians Eigenvectors/eigenvalues Singular vectors/values Variances, covariances, correlations Matrix rank Lagrange multipliers www.met.reading.ac.uk/~ross/MTMD02/MathTools.pdf

NCEO Early Career Science Conference 16th – 18th April 2012 Introduction to data assimilation Page 19 of 20 Summary of basic principles DA is concerned with estimating the state of a system given: observations (direct [e.g. in-situ] and indirect [e.g. remotely sensed]), forecast models (to provide a-priori data, given too-few obs), observation operators (to connect model state with obs). All data have uncertainties, which must be quantified. DA estimates are sensitive to uncertainty characteristics, which are often poorly known. Many observations and model have systematic as well as random errors. Should take into account all sources of error in the system. DA theory is suited mostly to errors that are Gaussian distributed. Most errors are non-Gaussian and non-linearity is synonymous with non-Gaussianity. DA problems are computationally expensive and require intensive development effort.

NCEO Early Career Science Conference 16th – 18th April 2012 Introduction to data assimilation Page 20 of 20 Some subtleties and caveats of DA DA estimates are not the ‘truth’ and can be problematic for some kinds of analyses: A good fit to observations does not guarantee that the analysis is correct! E.g. if h-operator has inadequacies not accounted for, or if error covariances matrices are poor. Unobserved parts of the system may be poor. E.g. in meteorology, horizontal winds may be constrained well by obs, but implied vertical wind may be poor. Assimilated fields may be subject to other constraints: E.g. certain balance constraints. Be careful with error covariance matrices: P f, R need to be tuned for variational DA, P f subject to sampling problems for ensemble DA. DA systems should be well tested before using real data: Test h-operators (forecast models and obs. operators) – which parts of x is y mo sensitive to? Adjoint tests, H, H T if using variational data assimilation. Test DA system with simulated obs. from a made-up truth (identical twin experiments). For assimilation of real data, validate analysis against independent obs. if possible.

An introduction to data assimilation for the geosciences Ross Bannister Amos Lawless Alison Fowler National Centre for Earth Observation School of Mathematics.

Similar presentations

Presentation on theme: "An introduction to data assimilation for the geosciences Ross Bannister Amos Lawless Alison Fowler National Centre for Earth Observation School of Mathematics."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

An introduction to data assimilation for the geosciences Ross Bannister Amos Lawless Alison Fowler National Centre for Earth Observation School of Mathematics.

Similar presentations

Presentation on theme: "An introduction to data assimilation for the geosciences Ross Bannister Amos Lawless Alison Fowler National Centre for Earth Observation School of Mathematics."— Presentation transcript:

Similar presentations

About project

Feedback