Download presentation

Presentation is loading. Please wait.

Published byKrystal Paul Modified over 2 years ago

1
Data Errors, Model Errors, and Estimation Errors Frontiers of Geophysical Inversion Workshop Waterways Experiment Station Vicksburg, MS 17-19 February 2002 P.B. Stark Department of Statistics University of California Berkeley CA www.stat.berkeley.edu/~stark

2
Acknowledgements Excerpted from Evans, S.N. and Stark, P.B., 2001. “Inverse Problems as Statistics,” Technical report 609, Department of Statistics, University of California, Berkeley.Inverse Problems as Statistics

3
Theory and Practice In Theory, there is no difference between Theory and Practice, but in Practice, there is. + Reality. What a concept! ++ Challenge: accept the complexity and unpredictability of practice; develop useful (& computable) methodology. + Jan L.A. de Snepscheut. ++ Robin Williams

4
Forward Problems in Statistics Measurable space X of possible data. Set of possible descriptions of the world—models. Family P = {P : } of probability distributions on X, indexed by models . Forward operator P maps model into a probability measure on X. Data X are a sample from P . P is whole story: stochastic variability in the “truth,” contamination by measurement error, systematic error, censoring, etc.

5
Models Index set usually has special structure. For example, could be a convex subset of a separable Banach space T. (geomag, seismo, grav, MT, …) The forward mapping P maps the index of the model to a probability distribution for the data. The physical significance of generally gives P reasonable analytic properties, e.g., continuity.

6
Example: Function estimation w/ systematic and random error Observe f C, a set of smooth of functions on [0, 1] t j [0, 1] | j | 1, j=1, 2, …, n j iid N(0, 1) with

7
Example, cont’d Let = C [-1, 1] n, X = R n, = (f, 1, 2, …, n ) P is the probability distribution on R n with density

8
Forward Problems in Geophysics Composition of steps: –transform idealized description of Earth into perfect, noise-free, infinite-dimensional data (“approximate physics”) –censor perfect data to retain only a finite list of numbers, because can only measure, record, and compute with such lists –possibly corrupt the list with measurement error. Equivalent to single-step procedure with corruption on par with physics, and mapping incorporating the censoring.

9
Geophysical v. Statistical Forward Problems Statistical framework for forward problems more general: Forward problems of Applied Math and Geophysics are instances of statistical forward problems.

10
Parameters A parameter of a model θ is the value g(θ) at θ of a continuous G -valued function g defined on . (g can be the identity.)

11
Inverse Problems Observe data X drawn from distribution P θ for some unknown . (Assume contains at least two points; otherwise, data superfluous.) Use data X and the knowledge that to learn about ; for example, to estimate a parameter g( ).

12
Applied Math and Statistical Perspectives Applied Math: recover a parameter of a pde or the solution of an integral equation from infinitely many data, noise-free or with deterministic error. –Common issues: existence, uniqueness, construction, stability for deterministic noise. Statistics: estimate or draw inference about parameter from finitely many noisy data –Common issues: identifiability, consistency, bias, variance, efficiency, MSE, etc.

13
Many Connections Identifiability—distinct parameter values yield distinct probability distributions for the observables—similar to uniqueness—forward operator maps at most one model into the observed data. Consistency—parameter can be estimated with arbitrary accuracy as the number of data grows—related to stability of a recovery algorithm—small changes in the data produce small changes in the recovered model. quantitative connections too.

14
Geophysical Inverse Problems Inverse problems in geophysics often “solved” using applied math methods for Ill-posed problems (e.g., Tichonov regularization, analytic inversions) Those methods are designed to answer different questions; can behave poorly with data (e.g., bad bias & variance) Inference construction: statistical viewpoint more appropriate for interpreting geophysical data.

15
Elements of the Statistical View Distinguish between characteristics of the problem, and characteristics of methods used to draw inferences One fundamental property of a parameter: g is identifiable if for all η, υ Θ, {g(η) g(υ)} {P η P υ }. In most inverse problems, g(θ) = θ not identifiable, and few linear functionals of θ are identifiable

16
Decision Rules A (randomized) decision rule δ: X M 1 ( A ) x δ x (.), is a measurable mapping from the space X of possible data to the collection M 1 ( A ) of probability distributions on a separable metric space A of actions. A non-randomized decision rule is a randomized decision rule that, to each x X, assigns a unit point mass at some value a = a(x) A.

17
Estimators An estimator of a parameter g(θ) is a decision rule for which the space A of possible actions is the space G of possible parameter values. ĝ=ĝ(X) is common notation for an estimator of g(θ). Usually write non-randomized estimator as a G- valued function of x instead of a M 1 ( G )-valued function.

18
Comparing Decision Rules Infinitely many decision rules and estimators. Which one to use? The best one! But what does best mean?

19
Loss and Risk Formulate as 2-player game: Nature v. Statistician. Nature picks θ from Θ. θ is secret, but statistician knows Θ. Statistician picks δ from a set D of decision rules. δ is secret. Generate data X from P θ, apply δ. Statistician pays loss l (θ, δ(X)). l should be dictated by scientific context, but… Risk is expected loss: r(θ, δ) = E θ l (θ, δ(X)) Good decision rule has small risk, but what does small mean?

20
Strategy Rare that a single decision rule has smallest risk for every Decision is admissible if not dominated by another. Minimax decision minimizes sup r (θ, δ) over D Bayes decision minimizes over D for a given prior probability distribution on

21
Minimax is Bayes for least favorable prior If minimax risk >> Bayes risk, prior π controls the apparent uncertainty of the Bayes estimate. Pretty generally for convex D, concave-convexlike r,

22
Common Risk: Mean Distance Error (MDE) Let d G denote the metric on G. MDE at θ of estimator ĝ of g is MDE θ (ĝ, g) = E θ [d(ĝ, g(θ))]. When metric derives from norm, MDE is called mean norm error (MNE). When the norm is Hilbertian, (MNE) 2 is called mean squared error (MSE).

23
Bias When G is a Banach space, can define bias at θ of ĝ: bias θ (ĝ, g) = E θ [ĝ - g(θ)] (when the expectation is well-defined). If bias θ (ĝ, g) = 0, say ĝ is unbiased at θ (for g). If ĝ is unbiased at θ for g for every θ , say ĝ is unbiased for g. If such ĝ exists, g is unbiasedly estimable. If g is unbiasedly estimable then g is identifiable.

24
More Notation Let T be a separable Banach space, T * its normed dual. Write the pairing between T and T * : T * x T R.

25
Linear Forward Problems A forward problem is linear if Θ is a subset of a separable Banach space T For some fixed sequence (κ j ) j=1 n of elements of T *, X = (X j ) j=1 n, where X j = + ε j, θ Θ, and ε = (ε j ) j=1 n is a vector of stochastic errors whose distribution does not depend on θ (so X = R n ).

26
Linear Forward Problems, contd. The linear functionals {κ j } are the “representers” The distribution P θ is the probability distribution of X. Typically, dim(Θ) = ; at the very least, n < dim(Θ), so estimating θ is an underdetermined problem. Define K : T R n Θ ( ) j=1 n. Abbreviate forward problem by X = Kθ + ε, θ Θ.

27
Linear Inverse Problems Use X = Kθ + ε, and the knowledge θ Θ to estimate or draw inferences about g(θ). Probability distribution of X depends on θ only through Kθ, so if there are two points θ 1, θ 2 Θ such that Kθ 1 = Kθ 2 but g(θ 1 ) g(θ 2 ), then g(θ) is not identifiable.

28
Backus-Gilbert ++ : Necessary conditions Let g be an identifiable real-valued parameter. Suppose θ 0 Θ, a symmetric convex set Ť T, c R, and ğ: Ť R such that: 1. θ 0 + Ť Θ 2. For t Ť, g(θ 0 + t) = c + ğ(t), and ğ(-t) = -ğ(t) 3. ğ(a 1 t 1 + a 2 t 2 ) = a 1 ğ(t 1 ) + a 2 ğ(t 2 ), t 1, t 2 Ť, a 1, a 2 0, a 1 +a 2 = 1, and 4. sup t Ť | ğ(t)| < . Then 1 × n matrix Λ s.t. the restriction of ğ to Ť is the restriction of Λ. K to Ť.

29
Backus-Gilbert ++ : Sufficient Conditions Suppose g = (g i ) i=1 m is an R m -valued parameter that can be written as the restriction to Θ of Λ. K for some m×n matrix Λ. Then 1. g is identifiable. 2. If E[ε] = 0, Λ. X is an unbiased estimator of g. 3. If, in addition, ε has covariance matrix Σ = E[εε T ], the covariance matrix of Λ. X is Λ. Σ. Λ T whatever be P θ.

30
Corollary: Backus-Gilbert Theory Let T be a Hilbert space; let Θ = T ; let g T = T * be a linear parameter; and let The parameter g(θ) is identifiable iff g = Λ. K for some 1×n matrix Λ. In that case, if E[ε] = 0, then ĝ = Λ. X is unbiased for g. If, in addition, ε has covariance matrix Σ = E[εε T ], then the MSE of ĝ is Λ. Σ. Λ T.

31
Consistency in Linear Inverse Problems X i = i + i, i=1, 2, 3, … subset of separable Banach { i } * linear, bounded on { i } iid consistently estimable w.r.t. weak topology iff {T k }, T k Borel function of X 1,..., X k s.t. , >0, *, lim k P {| T k - |> } = 0

32
µ a prob. measure on ; µ a (B) = µ(B-a), a Hellinger distance Pseudo-metric on **: If restriction to converges to metric compatible with weak topology, can estimate consistently in weak topology. For given sequence of functionals { i }, µ rougher consistent estimation easier. Importance of the Error Distribution

33
Summary Statistical viewpoint is useful abstraction. Physics in map P ; prior information in constraint . Represents systematic and stochastic errors, censoring, etc. Separating “model” from parameters of interest is useful: Sabatier’s “well posed questions.” “Solving” inverse problem means different things to different audiences. Thinking about measures of performance is useful. Difficulty of problem performance of specific method.

Similar presentations

OK

Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.

Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on surface area and volume of cube cuboid and cylinder Ppt on operating system and its functions Ppt online compressor software Ppt on supply chain mechanism Convert free pdf to ppt online conversion Download ppt on area related to circles for class 10 Ppt on extinct species in india Sample ppt on importance of english Converter pub to ppt online shopping Ppt on rc coupled amplifier experiment