Statistical Measures of Uncertainty in Inverse Problems Workshop on Uncertainty in Inverse Problems Institute for Mathematics and Its Applications Minneapolis,

Slides:



Advertisements
Similar presentations
Statistical Decision Theory Abraham Wald ( ) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.
Advertisements

Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Data Errors, Model Errors, and Estimation Errors Frontiers of Geophysical Inversion Workshop Waterways Experiment Station Vicksburg, MS February.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
Visual Recognition Tutorial
Instructor : Dr. Saeed Shiry
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Point estimation, interval estimation
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Minimaxity & Admissibility Presenting: Slava Chernoi Lehman and Casella, chapter 5 sections 1-2,7.
1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)
Evaluating Hypotheses
Presenting: Assaf Tzabari
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
Introduction to Signal Estimation. 94/10/142 Outline 
Inverse Problems and Data Errors Conference in Honor of Douglas O. Gough On the Occasion of his 60 th Birthday Chateau de Mons, Caussens, France P.B. Stark.
Visual Recognition Tutorial
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Maximum likelihood (ML)
Principles of Pattern Recognition
Statistical Decision Theory
Model Inference and Averaging
Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.
Random Sampling, Point Estimation and Maximum Likelihood.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Quantifying Uncertainty in Inverse Problems Workshop on Statistical Methods for Inverse Problems Institute for Pure and Applied Mathematics University.
Statistical Approaches to Inverse Problems
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
Geology 5670/6670 Inverse Theory 21 Jan 2015 © A.R. Lowry 2015 Read for Fri 23 Jan: Menke Ch 3 (39-68) Last time: Ordinary Least Squares Inversion Ordinary.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
CHAPTER 17 O PTIMAL D ESIGN FOR E XPERIMENTAL I NPUTS Organization of chapter in ISSO –Background Motivation Finite sample and asymptotic (continuous)
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Statistics What is the probability that 7 heads will be observed in 10 tosses of a fair coin? This is a ________ problem. Have probabilities on a fundamental.
1 EE571 PART 3 Random Processes Huseyin Bilgekul Eeng571 Probability and astochastic Processes Department of Electrical and Electronic Engineering Eastern.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Introduction to Estimation Theory: A Tutorial
Rate Distortion Theory. Introduction The description of an arbitrary real number requires an infinite number of bits, so a finite representation of a.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
Computacion Inteligente Least-Square Methods for System Identification.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy.
How Statistics Helps 9th US Congress on Computational Mechanics San Francisco, CA 25 July 2007 P.B. Stark Department of Statistics University of California.
12. Principles of Parameter Estimation
Probability Theory and Parameter Estimation I
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Statistical Measures of Uncertainty in Inverse Problems
Model Inference and Averaging
Ch3: Model Building through Regression
CONCEPTS OF ESTIMATION
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Summarizing Data by Statistics
Parametric Methods Berlin Chen, 2005 References:
Learning From Observed Data
12. Principles of Parameter Estimation
Data Exploration and Pattern Recognition © R. El-Yaniv
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Presentation transcript:

Statistical Measures of Uncertainty in Inverse Problems Workshop on Uncertainty in Inverse Problems Institute for Mathematics and Its Applications Minneapolis, MN April 2002 P.B. Stark Department of Statistics University of California Berkeley, CA

Abstract Inverse problems can be viewed as special cases of statistical estimation problems. From that perspective, one can study inverse problems using standard statistical measures of uncertainty, such as bias, variance, mean squared error and other measures of risk, confidence sets, and so on. It is useful to distinguish between the intrinsic uncertainty of an inverse problem and the uncertainty of applying any particular technique for “solving” the inverse problem. The intrinsic uncertainty depends crucially on the prior constraints on the unknown (including prior probability distributions in the case of Bayesian analyses), on the forward operator, on the statistics of the observational errors, and on the nature of the properties of the unknown one wishes to estimate. I will try to convey some geometrical intuition for uncertainty, and the relationship between the intrinsic uncertainty of linear inverse problems and the uncertainty of some common techniques applied to them.

References & Acknowledgements Donoho, D.L., Statistical Estimation and Optimal Recovery, Ann. Stat., 22, Evans, S.N. and Stark, P.B., Inverse Problems as Statistics, Inverse Problems, 18, R1- R43 (in press). Stark, P.B., Inference in infinite-dimensional inverse problems: Discretization and duality, J. Geophys. Res., 97, 14,055-14,082. Created using TexPoint by G. Necula,

Outline Inverse Problems as Statistics –Ingredients; Models –Forward and Inverse Problems—applied perspective –Statistical point of view –Some connections Notation; linear problems; illustration Identifiability and uniqueness –Sketch of identifiablity and extremal modeling –Backus-Gilbert theory Decision Theory –Decision rules and estimators –Comparing decision rules: Loss and Risk –Strategies; Bayes/Minimax duality –Mean distance error and bias –Illustration: Regularization –Illustration: Minimax estimation of linear functionals Distinguishing Models: metrics and consistency

Inverse Problems as Statistics Measurable space X of possible data. Set  of possible descriptions of the world—models. Family P = { P  :  2  } of probability distributions on X, indexed by models . Forward operator   P  maps model  into a probability measure on X. Data X are a sample from P . P  is whole story: stochastic variability in the “truth,” contamination by measurement error, systematic error, censoring, etc.

Models Set  usually has special structure.  could be a convex subset of a separable Banach space T. (geomag, seismo, grav, MT, …) Physical significance of  generally gives   P  reasonable analytic properties, e.g., continuity.

Forward Problems in Geophysics Composition of steps: –transform idealized description of Earth into perfect, noise-free, infinite-dimensional data (“approximate physics”) –censor perfect data to retain only a finite list of numbers, because can only measure, record, and compute with such lists –possibly corrupt the list with measurement error. Equivalent to single-step procedure with corruption on par with physics, and mapping incorporating the censoring.

Inverse Problems Observe data X drawn from distribution P θ for some unknown . (Assume  contains at least two points; otherwise, data superfluous.) Use data X and the knowledge that  to learn about  ; for example, to estimate a parameter g(  ) (the value g(θ) at θ of a continuous G -valued function g defined on  ).

Geophysical Inverse Problems Inverse problems in geophysics often “solved” using applied math methods for Ill-posed problems (e.g., Tichonov regularization, analytic inversions) Those methods are designed to answer different questions; can behave poorly with data (e.g., bad bias & variance) Inference  construction: statistical viewpoint more appropriate for interpreting geophysical data.

Elements of the Statistical View Distinguish between characteristics of the problem, and characteristics of methods used to draw inferences. One fundamental property of a parameter: g is identifiable if for all η,   Θ, {g(η)  g(  )}  { P   P  }. In most inverse problems, g(θ) = θ not identifiable, and few linear functionals of θ are identifiable.

Deterministic and Statistical Perspectives: Connections Identifiability—distinct parameter values yield distinct probability distributions for the observables—similar to uniqueness—forward operator maps at most one model into the observed data. Consistency—parameter can be estimated with arbitrary accuracy as the number of data grows— related to stability of a recovery algorithm—small changes in the data produce small changes in the recovered model.  quantitative connections too.

More Notation Let T be a separable Banach space, T * its normed dual. Write the pairing between T and T * : T * x T  R.

Linear Forward Problems A forward problem is linear if Θ is a subset of a separable Banach space T X = R n For some fixed sequence (κ j ) j=1 n of elements of T *, is a vector of stochastic errors whose distribution does not depend on θ.

Linear Forward Problems, contd. Linear functionals {κ j } are the “representers” Distribution P θ is the probability distribution of X. Typically, dim(Θ) =  ; at least, n < dim(Θ), so estimating θ is an underdetermined problem. Define K : T  R n Θ  ( ) j=1 n. Abbreviate forward problem by X = Kθ + ε, θ  Θ.

Linear Inverse Problems Use X = Kθ + ε, and the knowledge θ  Θ to estimate or draw inferences about g(θ). Probability distribution of X depends on θ only through Kθ, so if there are two points θ 1, θ 2  Θ such that Kθ 1 = Kθ 2 but g(θ 1 )  g(θ 2 ), then g(θ) is not identifiable.

Ex: Sampling w/ systematic and random error Observe X j = f(t j ) +  j +  j, j = 1, 2, …, n, f 2 C, a set of smooth of functions on [0, 1] t j 2 [0, 1] |  j |  1, j=1, 2, …, n  j iid N(0, 1) Take  = C £ [-1, 1] n, X = R n, and  = (f,  1, …,  n ). Then P  has density (2  ) -n/2 exp{-  j=1 n (x j – f(t j )-  j ) 2 }.

R X = R n  Sketch: Identifiability  KK X = K  KK  g(  )  PP P  = P  g(  ) g(  ) { P  = P  } ; {g(  ) = g(  )}, so g not identifiable { P  = P  } ; {  =  }, so  not identifiable g cannot be estimated with bounded bias

Backus-Gilbert Theory Let  = T be a Hilbert space. Let g 2 T = T * be a linear parameter. Let {  j } j=1 n µ T *. Then: g(  ) is identifiable iff g =  ¢ K for some 1 £ n matrix . If also E [  ] = 0, then  ¢ X is unbiased for g. If also  has covariance matrix  = E [  T ], then the MSE of  ¢ X is  ¢  ¢  T.

R X = R n  Sketch: Backus-Gilbert  KK X = K  g(  ) =  ¢ K  PP  ¢ X

Backus-Gilbert ++ : Necessary conditions Let g be an identifiable real-valued parameter. Suppose  θ 0  Θ, a symmetric convex set Ť  T, c  R, and ğ: Ť  R such that: 1. θ 0 + Ť  Θ 2. For t  Ť, g(θ 0 + t) = c + ğ(t), and ğ(-t) = -ğ(t) 3. ğ(a 1 t 1 + a 2 t 2 ) = a 1 ğ(t 1 ) + a 2 ğ(t 2 ), t 1, t 2  Ť, a 1, a 2  0, a 1 +a 2 = 1, and 4. sup t  Ť | ğ(t)| < . Then  1×n matrix Λ s.t. the restriction of ğ to Ť is the restriction of Λ. K to Ť.

Backus-Gilbert ++ : Sufficient Conditions Suppose g = (g i ) i=1 m is an R m -valued parameter that can be written as the restriction to Θ of Λ. K for some m×n matrix Λ. Then 1. g is identifiable. 2. If E[ε] = 0, Λ. X is an unbiased estimator of g. 3. If, in addition, ε has covariance matrix Σ = E[εε T ], the covariance matrix of Λ. X is Λ. Σ. Λ T whatever be P θ.

Decision Rules A (randomized) decision rule δ: X  M 1 ( A ) x  δ x (.), is a measurable mapping from the space X of possible data to the collection M 1 ( A ) of probability distributions on a separable metric space A of actions. A non-randomized decision rule is a randomized decision rule that, to each x  X, assigns a unit point mass at some value a = a(x)  A.

Estimators An estimator of a parameter g(θ) is a decision rule for which the space A of possible actions is the space G of possible parameter values. ĝ=ĝ(X) is common notation for an estimator of g(θ). Usually write non-randomized estimator as a G- valued function of x instead of a M 1 ( G )-valued function.

Comparing Decision Rules  Infinitely many decision rules and estimators. Which one to use? The best one! But what does best mean?

Loss and Risk 2-player game: Nature v. Statistician. Nature picks θ from Θ. θ is secret, but statistician knows Θ. Statistician picks δ from a set D of rules. δ is secret. Generate data X from P θ, apply δ. Statistician pays loss l (θ, δ(X)). l should be dictated by scientific context, but… Risk is expected loss: r(θ, δ) = E  l (θ, δ(X)) Good rule  has small risk, but what does small mean?

Strategy Rare that one  has smallest risk 8   is admissible if not dominated. Minimax decision minimizes sup  r (θ, δ) over  D Bayes decision minimizes over  D for a given prior probability distribution  on 

Minimax is Bayes for least favorable prior If minimax risk >> Bayes risk, prior π controls the apparent uncertainty of the Bayes estimate. Pretty generally for convex  D, concave- convexlike r,

Common Risk: Mean Distance Error (MDE) Let d G denote the metric on G. MDE at θ of estimator ĝ of g is MDE θ (ĝ, g) = E  [d(ĝ, g(θ))]. When metric derives from norm, MDE is called mean norm error (MNE). When the norm is Hilbertian, (MNE) 2 is called mean squared error (MSE).

Bias When G is a Banach space, can define bias at θ of ĝ: bias θ (ĝ, g) = E  [ĝ - g(θ)] (when the expectation is well-defined). If bias θ (ĝ, g) = 0, say ĝ is unbiased at θ (for g). If ĝ is unbiased at θ for g for every θ , say ĝ is unbiased for g. If such ĝ exists, g is unbiasedly estimable. If g is unbiasedly estimable then g is identifiable.

R X = R n  Sketch: Regularization  KK X = K  KK  g(  ) 0 error g(  ) bias

Minimax Estimation of Linear parameters: Hilbert Space, Gaussian error Observe X = K  +  2 R n, with  2  µ T, T a separable Hilbert space  convex {  i } i=1 n iid N(0,  2 ). Seek to learn about g(  ):  ! R, linear, bounded on  For variety of risks (MSE, MAD, length of fixed-length confidence interval), minimax risk is controlled by modulus of continuity of g, calibrated to the noise level. (Donoho, 1994.)

R X = R n  Modulus of Continuity   g(  ) g(  )  KK KK

Distinguishing two models Data tell the difference between two models  and  if the L 1 distance between P  and P  is large:

L 1 and Hellinger distances

Consistency in Linear Inverse Problems X i =  i  +  i, i=1, 2, 3, …  subset of separable Banach {  i }   * linear, bounded on  {  i } iid   consistently estimable w.r.t. weak topology iff  {T k }, T k Borel function of X 1,..., X k, s.t. ,  >0,   *, lim k P  {|  T k -  |>  } = 0

µ a prob. measure on  ; µ a (B) = µ(B-a), a   Pseudo-metric on  **: If restriction to  converges to metric compatible with weak topology, can estimate  consistently in weak topology. For given sequence of functionals {  i }, µ rougher  consistent estimation easier. Importance of the Error Distribution

Summary Statistical viewpoint is useful abstraction. Physics in mapping   P  Prior information in constraint . Separating “model” from parameters of interest is useful: Sabatier’s “well posed questions.” “Solving” inverse problem means different things to different audiences. Thinking about measures of performance is useful. Difficulty of problem  performance of specific method.