Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 8 The Principle of Maximum Likelihood. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.

Similar presentations


Presentation on theme: "Lecture 8 The Principle of Maximum Likelihood. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture."— Presentation transcript:

1 Lecture 8 The Principle of Maximum Likelihood

2 Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability and Measurement Error, Part 2 Lecture 04The L 2 Norm and Simple Least Squares Lecture 05A Priori Information and Weighted Least Squared Lecture 06Resolution and Generalized Inverses Lecture 07Backus-Gilbert Inverse and the Trade Off of Resolution and Variance Lecture 08The Principle of Maximum Likelihood Lecture 09Inexact Theories Lecture 10Nonuniqueness and Localized Averages Lecture 11Vector Spaces and Singular Value Decomposition Lecture 12Equality and Inequality Constraints Lecture 13L 1, L ∞ Norm Problems and Linear Programming Lecture 14Nonlinear Problems: Grid and Monte Carlo Searches Lecture 15Nonlinear Problems: Newton’s Method Lecture 16Nonlinear Problems: Simulated Annealing and Bootstrap Confidence Intervals Lecture 17Factor Analysis Lecture 18Varimax Factors, Empircal Orthogonal Functions Lecture 19Backus-Gilbert Theory for Continuous Problems; Radon’s Problem Lecture 20Linear Operators and Their Adjoints Lecture 21Fréchet Derivatives Lecture 22 Exemplary Inverse Problems, incl. Filter Design Lecture 23 Exemplary Inverse Problems, incl. Earthquake Location Lecture 24 Exemplary Inverse Problems, incl. Vibrational Problems

3 Purpose of the Lecture Introduce the spaces of all possible data, all possible models and the idea of likelihood Use maximization of likelihood as a guiding principle for solving inverse problems

4 Part 1 The spaces of all possible data, all possible models and the idea of likelihood

5 viewpoint the observed data is one point in the space of all possible observations or d obs is a point in S(d)

6 d2d2 d3d3 d1d1 O plot of d obs

7 d2d2 d3d3 d1d1 O d obs plot of d obs

8 now suppose … the data are independent each is drawn from a Gaussian distribution with the same mean m 1 and variance σ 2 (but m 1 and σ unknown)

9 d2d2 d3d3 d1d1 O plot of p(d)

10 d2d2 d3d3 d1d1 O cloud centered on d 1 =d 2 =d 3 with radius proportional to σ

11 now interpret … p(d obs ) as the probability that the observed data was in fact observed L = log p(d obs ) called the likelihood

12 find parameters in the distribution maximize p(d obs ) with respect to m 1 and σ maximize the probability that the observed data were in fact observed the Principle of Maximum Likelihood

13 Example

14 solving the two equations

15 usual formula for the sample mean almost the usual formula for the sample standard deviation

16 these two estimates linked to the assumption of the data being Gaussian-distributed might get a different formula for a different p.d.f.

17 L(m 1, σ) σ m1m1 maximum likelihood point example of a likelihood surface

18 d1d1 d2d2 d2d2 d1d1 p(d 1,,d 1 ) (A)(B) likelihood maximization process will fail if p.d.f. has no well-defined peak

19 Part 2 Using the maximization of likelihood as a guiding principle for solving inverse problems

20 linear inverse problem for with Gaussian-distibuted data with known covariance [cov d] assume Gm=d gives the mean d T

21 principle of maximum likelihood maximize L = log p(d obs ) minimize with respect to m T

22 principle of maximum likelihood maximize L = log p(d obs ) minimize This is just weighted least squares E = T

23 principle of maximum likelihood when data Gaussian-distributed solve Gm=d with weighted least squares with weighting of

24 special case of uncorrelated data each datum with a different variance [cov d] ii = σ di 2 minimize

25 errors weighted by their certainty

26 but what about a priori information?

27 probabilistic representation of a priori information probability that the model parameters are near m given by p.d.f. p A (m)

28 probabilistic representation of a priori information probability that the model parameters are near m given by p.d.f. p A (m) centered at a priori value

29 probabilistic representation of a priori information probability that the model parameters are near m given by p.d.f. p A (m) variance reflects uncertainty in a priori information

30 certain uncertain m1m1 m1m1 m2m2 m2m2

31 m1m1 m2m2

32 linear relationship approximation with Gaussian m1m1 m1m1 m2m2 m2m2

33 m1m1 m2m2 p=constant p=0

34 assessing the information content in p A (m) Do we know a little about m or a lot about m ?

35 Information Gain, S -S called Relative Entropy,

36 Relative Entropy, S also called Information Gain null p.d.f. state of no knowledge

37 Relative Entropy, S also called Information Gain uniform p.d.f. might work for this

38 probabilistic representation of data probability that the data are near d given by p.d.f. p A (d)

39 probabilistic representation of data probability that the data are near d given by p.d.f. p(d) centered at observed data d obs

40 probabilistic representation of data probability that the data are near d given by p.d.f. p(d) variance reflects uncertainty in measurements

41 probabilistic representation of both prior information and observed data assume observations and a priori information are uncorrelated

42 d obs model, m datum, d m ap Example of

43 the theory d = g(m) is a surface in the combined space of data and model parameters on which the estimated model parameters and predicted data must lie

44 the theory d = g(m) is a surface in the combined space of data and model parameters on which the estimated model parameters and predicted data must lie for a linear theory the surface is planar

45 the principle of maximum likelihood says maximize on the surface d=g(m)

46 d obs model, m datum, d m ap m est d pre d=g(m) position along curve, s p(s) (B) s max (A)

47 d obs model, m datum, d m est ≈m ap d pre d=g(m) position along curve, s p(s) s max

48 d pre ≈ d obs model, m datum, d m est d=g(m ) position along curve, s p(s) m ap s max (A) (B)

49 principle of maximum likelihood with Gaussian-distributed data Gaussian-distributed a priori information minimize

50 this is just weighted least squares with so we already know the solution

51 solve Fm=f with simple least squares

52 when [cov d]=σ d 2 I and [cov m]=σ m 2 I

53 this provides and answer to the question What should be the value of ε 2 in damped least squares? The answer it should be set to the ratio of variances of the data and the a priori model parameters

54 if the a priori information is Hm=h with covariance [cov h] A then the Fm=f becomes

55 Gm=d obs with covariance [cov d] Hm=h with covariance [cov h] A m est = (F T F) -1 F T d obs with the most useful formula in inverse theory


Download ppt "Lecture 8 The Principle of Maximum Likelihood. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture."

Similar presentations


Ads by Google