Environmental Data Analysis with MatLab Lecture 8: Solving Generalized Least Squares Problems.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Environmental Data Analysis with MatLab Lecture 10: Complex Fourier Series.
Environmental Data Analysis with MatLab
Environmental Data Analysis with MatLab Lecture 21: Interpolation.
Lecture 10 Nonuniqueness and Localized Averages. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Lecture 13 L1 , L∞ Norm Problems and Linear Programming
Lecture 23 Exemplary Inverse Problems including Earthquake Location.
Environmental Data Analysis with MatLab
Lecture 22 Exemplary Inverse Problems including Filter Design.
Environmental Data Analysis with MatLab Lecture 9: Fourier Series.
Environmental Data Analysis with MatLab
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Environmental Data Analysis with MatLab Lecture 13: Filter Theory.
Environmental Data Analysis with MatLab Lecture 16: Orthogonal Functions.
Lecture 3 Probability and Measurement Error, Part 2.
The General Linear Model. The Simple Linear Model Linear Regression.
Environmental Data Analysis with MatLab Lecture 23: Hypothesis Testing continued; F-Tests.
Environmental Data Analysis with MatLab Lecture 11: Lessons Learned from the Fourier Transform.
Environmental Data Analysis with MatLab
Environmental Data Analysis with MatLab Lecture 12: Power Spectral Density.
Lecture 5 A Priori Information and Weighted Least Squared.
Environmental Data Analysis with MatLab Lecture 17: Covariance and Autocorrelation.
Lecture 19 Continuous Problems: Backus-Gilbert Theory and Radon’s Problem.
Lecture 4 The L 2 Norm and Simple Least Squares. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Lecture 9 Inexact Theories. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability and.
Lecture 6 Resolution and Generalized Inverses. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Lecture 8 Advanced Topics in Least Squares - Part Two -
Environmental Data Analysis with MatLab Lecture 5: Linear Models.
Lecture 3 Review of Linear Algebra Simple least-squares.
Lecture 12 Equality and Inequality Constraints. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Lecture 7 Advanced Topics in Least Squares. the multivariate normal distribution for data, d p(d) = (2  ) -N/2 |C d | -1/2 exp{ -1/2 (d-d) T C d -1 (d-d)
Probability theory 2011 The multivariate normal distribution  Characterizing properties of the univariate normal distribution  Different definitions.
Environmental Data Analysis with MatLab Lecture 3: Probability and Measurement Error.
Lecture 8 The Principle of Maximum Likelihood. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Environmental Data Analysis with MatLab Lecture 24: Confidence Limits of Spectra; Bootstraps.
Lecture 11 Vector Spaces and Singular Value Decomposition.
Linear and generalised linear models
Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance.
Environmental Data Analysis with MatLab Lecture 7: Prior Information.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Probability theory 2008 Outline of lecture 5 The multivariate normal distribution  Characterizing properties of the univariate normal distribution  Different.
Environmental Data Analysis with MatLab Lecture 20: Coherence; Tapering and Spectral Analysis.
GEO7600 Inverse Theory 09 Sep 2008 Inverse Theory: Goals are to (1) Solve for parameters from observational data; (2) Know something about the range of.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
Environmental Data Analysis with MatLab Lecture 10: Complex Fourier Series.
Linear Regression James H. Steiger. Regression – The General Setup You have a set of data on two variables, X and Y, represented in a scatter plot. You.
Linear Regression Andy Jacobson July 2006 Statistical Anecdotes: Do hospitals make you sick? Student’s story Etymology of “regression”
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Geology 5670/6670 Inverse Theory 21 Jan 2015 © A.R. Lowry 2015 Read for Fri 23 Jan: Menke Ch 3 (39-68) Last time: Ordinary Least Squares Inversion Ordinary.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
Geology 5670/6670 Inverse Theory 28 Jan 2015 © A.R. Lowry 2015 Read for Fri 30 Jan: Menke Ch 4 (69-88) Last time: Ordinary Least Squares: Uncertainty The.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
Environmental Data Analysis with MatLab 2 nd Edition Lecture 14: Applications of Filters.
Environmental Data Analysis with MatLab 2 nd Edition Lecture 22: Linear Approximations and Non Linear Least Squares.
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION Statistical Interpretation of Least Squares ASEN.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Probability Theory and Parameter Estimation I
CH 5: Multivariate Methods
Special Topics In Scientific Computing
Lecture 26: Environmental Data Analysis with MatLab 2nd Edition
Environmental Data Analysis with MatLab 2nd Edition
Environmental Data Analysis with MatLab 2nd Edition
The Regression Model Suppose we wish to estimate the parameters of the following relationship: A common method is to choose parameters to minimise the.
Singular Value Decomposition SVD
OVERVIEW OF LINEAR MODELS
Nonlinear Fitting.
Environmental Data Analysis with MatLab
Presentation transcript:

Environmental Data Analysis with MatLab Lecture 8: Solving Generalized Least Squares Problems

Lecture 01Using MatLab Lecture 02Looking At Data Lecture 03Probability and Measurement Error Lecture 04Multivariate Distributions Lecture 05Linear Models Lecture 06The Principle of Least Squares Lecture 07Prior Information Lecture 08Solving Generalized Least Squares Problems Lecture 09Fourier Series Lecture 10Complex Fourier Series Lecture 11Lessons Learned from the Fourier Transform Lecture 12Power Spectra Lecture 13Filter Theory Lecture 14Applications of Filters Lecture 15Factor Analysis Lecture 16Orthogonal functions Lecture 17Covariance and Autocorrelation Lecture 18Cross-correlation Lecture 19Smoothing, Correlation and Spectra Lecture 20Coherence; Tapering and Spectral Analysis Lecture 21Interpolation Lecture 22 Hypothesis testing Lecture 23 Hypothesis Testing continued; F-Tests Lecture 24 Confidence Limits of Spectra, Bootstraps SYLLABUS

purpose of the lecture use prior information to solve exemplary problems

review of last lecture

failure-proof least-squares add information to the problem that guarantees that matrices like [G T G] are never singular such information is called prior information

examples of prior information soil has density will be around 1500 kg/m 3 give or take 500 or so chemical components sum to 100% pollutant transport is subject to the diffusion equation water in rivers always flows downhill

linear prior information with covariance C h

simplest example model parameters near known values m 1 = 10 ± 5 m 2 = 20 ± 5 m 1 and m 2 uncorrelated Hm = h with H=I h = [10, 20] T C h =

another example relevant to chemical constituents Hh

use Normal p.d.f. to represent prior information

Normal p.d.f. defines an “error in prior information” individual errors weighted by their certainty

now suppose that we observe some data: d = d obs with covariance C d

represent the observations with a Normal p.d.f. mean of data predicted by the model observations p(d) =

this Normal p.d.f. defines an “error in data” weighted by its certainty prediction error

Generalized Principle of Least Squares the best m est is the one that minimizes the total error with respect to m justified by Bayes Theorem in the last lecture

generalized least squares solution pattern same as ordinary least squares … … but with more complicated matrices

(new material) How to use the Generalized Least Squares Equations

Cd-½GCd-½G Ch-½HCh-½H Generalized least squares is equivalent to solving F m = f by ordinary least squares Cd-½dCd-½d Ch-½hCh-½h = m

σ d -1 G σ h -1 H uncorrelated, uniform variance case C d = σ d 2 I C h = σ h 2 I σ d -1 d σ h -1 h = m

top part data equation weighted by its certainty σ d -1 { Gm = d } data equation σ d -1 G σ h -1 H σ d -1 d σ h -1 h = certainty of measurement m

bottom part prior information equation weighted by its certainty σ h -1 { Hm = h } prior information equation σ d -1 G σ h -1 H σ d -1 d σ h -1 h = certainty of prior information m

example no prior information but data equation weighted by its certainty σ d1 -1 G 11 σ d1 -1 G 12 …σ d1 -1 G 1M σ d2 -1 G 21 σ d2 -1 G 22 …σ d2 -1 G 2M ………… σ dN -1 G N1 σ dN -1 G N2 …σ dN -1 G NM σ d1 -1 d 1 σ d2 -1 d 2 … σ dN -1 d N = called “weighted least squares” m

straight line fit no prior information but data equation weighted by its certainty data with high variance data with low variance fit

straight line fit no prior information but data equation weighted by its certainty data with high variance data with low variance fit

another example prior information that the model parameters are small m ≈ 0 H=I h=0 assume uncorrelated with uniform variances C d = σ d 2 I C h = σ h 2 I

σ d -1 G σ h -1 I σ d -1 d σ h -1 0 m = Fm =h [F T F] -1 F T m=f m=[G T G + ε 2 I] -1 G T d with ε= σ d /σ m

called “damped least squares” m=[G T G + ε 2 I] -1 G T d with ε= σ d /σ m ε=0: minimize the prediction error ε→∞: minimize the size of the model parameters 0<ε<∞: minimize a combination of the two

advantages: really easy to code mest = (G’*G+(e^2)*eye(M))\(G’*d); always works m=[G T G + ε 2 I] -1 G T d with ε= σ d /σ m disadvantages: often need to determine ε empirically prior information that the model parameters are small not always sensible

smoothness as prior information

model parameters represent the values of a function m(x) at equally spaced increments along the x-axis

function approximated by its values at a sequence of x ’s m(x) x mimi m i+1 xixi x i+1 ΔxΔx m(x) → m=[m 1, m 2, m 3, …, m M ] T

rough function has large second derivative a smooth function is one that is not rough a smooth function has a small second derivative

approximate expressions for second derivative

m(x) x i -th row of H: (Δx) -2 [ 0, 0, 0, … 0, 1, -2, 1, 0, …. 0, 0, 0] xixi column i 2 nd derivative at x i

what to do about m 1 and m M ? not enough points for 2 nd derivative two possibilities no prior information for m 1 and m M or prior information about flatness (first derivative)

m(x) x first row of H: (Δx) -1 [ -1, 1, 0, … 0] x1x1 1st derivative at x 1

“smooth interior” / “flat ends” version of Hm=h h=0

x m = d example problem : to fill in the missing model parameters so that the resulting curve is smooth

the model parameters, m an ordered list of all model parameters m1m1 m2m2 m3m3 m4m4 m5m5 m6m6 m7m7 m=m=

the data, d just the model parameters that were measured d=d= d3d3 d5d5 d6d6 m3m3 m5m5 m6m6 =

data equation Gm=d … m1m1 m2m2 m3m3 m4m4 m5m5 m6m6 m7m7 d3d3 d5d5 d7d7 = data are just model parameters that have been observed data kernel “associates” a measured model parameter with an unknown model parameter

The prior information equation, Hm=h “smooth interior” / “flat ends” h=0

σ d -1 G σ h -1 H put them together into the Generalized Least Squares equation σ d -1 d 0 F =f = choose σ d /σ m to be << 1 data takes precedence over prior information

the solution using MatLab

x m = d graph of the solution solution passes close to data solution is smooth

Two MatLab issues Issue 1: matrices like G and F can be quite big, but contain mostly zeros. Solution 1: Use “sparse matrices” which don’t store the zeros Issue 2: matrices like G T G and F T F are not as sparse as G and F Solution 2: Solve equation by a method, such as “biconjugate gradients” that doesn’t require the calculation of G T G and F T F

Using “sparse matrices” which don’t store the zeros: N=200000; M=100000; F=spalloc(N,M,3*M); creates a × matrix that can hold up to non-zero elements. “sparse allocate” note that an ordinary matrix would have 20,000,000,000 elements

Once allocated, sparse matrices are used just like ordinary matrices … … they just consume less memory.

Issue 2: Use biconjugate gradient solver to avoid calculating G T G and F T F Suppose that we want to solve F T F m = F T f The standard way would be: mest = (F’F)\(F’f); but that requires that we compute F’F

a “biconjugate gradient” solver requires only that we be able to multiply a vector, v, by G T G, where the solver supplies the vector, v. so we have to calculate y=G T G v the trick is to calculate t=Gv first, and then calculate y=G’t this is done in a Matlab function, afun()

function y = afun(v,transp_flag) global F; t = F*v; y = F'*t; return ignore this variable; its never used

the bicg() solver is passed a “handle” to this function so, the new way of solving the generalized inverse problem is: clear F; global F; … for “biconjugate” “handle” to the function put at the top of the MatLab script

for “biconjugate” “handle” to the multiply function r.h.s of equation F T Fm=F T f tolerance maximum number of iterations The solution is by iterative improvement of an initial guess. The iterations stop when the tolerance falls beneath the specified level (good) or, regardless, when the maximum number of iterations is reached (bad).

example of a large problem fill in the missing model parameters that represents a 2D function m(x,y) so that the function passes through measured data points m(x i,y i ) = d i and the function satisfies the diffusion equation d 2 m/dx 2 + d 2 m/dy 2 = 0

y x A) observed, d i obs =m(x i, y i ) y x B) predicted, m(x,y) (see text for details on how its done)