Presentation on theme: "Lecture 13 L1 , L∞ Norm Problems and Linear Programming"— Presentation transcript:
1 Lecture 13 L1 , L∞ Norm Problems and Linear Programming In this lecture, we show how to solve problems involving the L1 norm.
2 SyllabusLecture 01 Describing Inverse Problems Lecture 02 Probability and Measurement Error, Part 1 Lecture 03 Probability and Measurement Error, Part 2 Lecture 04 The L2 Norm and Simple Least Squares Lecture 05 A Priori Information and Weighted Least Squared Lecture 06 Resolution and Generalized InversesLecture 07 Backus-Gilbert Inverse and the Trade Off of Resolution and Variance Lecture 08 The Principle of Maximum Likelihood Lecture 09 Inexact Theories Lecture 10 Nonuniqueness and Localized Averages Lecture 11 Vector Spaces and Singular Value DecompositionLecture 12 Equality and Inequality Constraints Lecture 13 L1 , L∞ Norm Problems and Linear Programming Lecture 14 Nonlinear Problems: Grid and Monte Carlo Searches Lecture 15 Nonlinear Problems: Newton’s Method Lecture 16 Nonlinear Problems: Simulated Annealing and Bootstrap Confidence Intervals Lecture 17 Factor Analysis Lecture 18 Varimax Factors, Empircal Orthogonal Functions Lecture 19 Backus-Gilbert Theory for Continuous Problems; Radon’s Problem Lecture 20 Linear Operators and Their Adjoints Lecture 21 Fréchet Derivatives Lecture 22 Exemplary Inverse Problems, incl. Filter Design Lecture 23 Exemplary Inverse Problems, incl. Earthquake Location Lecture 24 Exemplary Inverse Problems, incl. Vibrational Problems
3 Purpose of the LectureReview Material on Outliers and Long-Tailed DistributionsDerive the L1 estimate of themean and variance of an exponential distributionSolve the Linear Inverse Problem under the L1 normby Transformation to a Linear Programming ProblemDo the same for the L∞ problemThe first section emphasizes the link between p.d.f.’s and measures of error.The seconds applies the same likelihood method as was previously applied to the Gaussian p.d.f.The last two show how to solve general L1 problems
4 Part 1 Review Material on Outliers and Long-Tailed Distributions If the error has a long tailed p.d.f.the data will have lots of outliersso outliers are unimportant
5 Review of the Ln family of norms Just a review of the formulas for the norms
6 higher norms give increaing weight to largest element of e Higher the norm, the more important an outlier isThe higher norms give preferential weight to the largest of the errors.Fig ?. Hypothetical prediction error, ei(zi), and its absolute value, raised the powers, 1, 2 and 10. While most elements of |ei| are numerically significant, only a few elements of |ei|10 are. MatLab script gda03_??.
7 limiting caseThe limiting case say the norm of e is the absolute value of its largest element.
8 but which norm to use? it makes a difference! Different estimate of the model parameters for each norm.
9 outlierL1L2L∞Example of fitting a straight line to data. Note the outlier. The higher the norm,the more the outlier “pulls down” the line.Straight line problem solved under three different norms.This is a worst-case example, since the data have a horrible outlier (be sure to point it out).Fig ?. Straight line fits to (z,d) pairs where the error is measured under the L1, L2 and L∞ norms. The L1 norm gives the least weight to the one outlier. MatLab script gda03_??.
10 Answer is related to the distribution of the error Answer is related to the distribution of the error. Are outliers common or rare?B)A)Comparison of a long-tailed (left) and short-tailed p.d.f.Use low norms for long-tailed p.d.f.’s, high norms for short tailed p.d.f.’sFig ?. A) Long-tailed probability density function. B) Short-tailed probability density function. MatLab script gda03_??.short tailsoutliers uncommonoutliers importantuse high normgives high weight to outlierslong tailsoutliers commonoutliers unimportantuse low normgives low weight to outliers
11 as we showed previously … use L2 norm when data has Gaussian-distributed error as we will show in a moment … use L1 norm when data has Exponentially-distributed error
12 comparison of p.d.f.’s Gaussian Exponential Emphasize the “square” in the Gaussian and the “absolute value” in the exponentialFig Comparison of exponential probability density distribution (red) with a Normal probability density distribution (blue) of equal variance, σ2=1. The exponential probability density distribution is longer tailed. MatLab script gda08_01.
13 to make realizations of an exponentially-distributed random variable in MatLab mu = sd/sqrt(2);rsign = (2*(random('unid',2,Nr,1)-1)-1);dr = dbar + rsign .* ...random('exponential',mu,Nr,1);Note that Matlab’s “exponential” distribution is one-sized, so must fudge by multiplying it by randomly chosen sign,
14 Part 2 Derive the L1 estimate of the mean and variance of an exponential distribution Scenario: we measure the temperature in the room (presumed spatially uniform) N times.The thermometer has exponentially-distributed error.What’s the best estimate of the mean (the temperature in the room)and the variance of the measurement.
15 use of Principle of Maximum Likelihood maximize L = log p(dobs) the log-probability that the observed data was in fact observed with respect to unknown parameters in the p.d.f. e.g. its mean m1 and variance σ2We applied this principle many times before when discussing Gaussian-distributed errors.
16 Previous Example: Gaussian p.d.f. We did this a few lectures ago.Note that taking the log gets rid of the exponential and changes products of variables into sums of variables.The maximization is accomplished by setting derivatives to zero, as usual.
17 solving the two equations The two equations are easy to solve. The first can only be zero when the summation is zero, whichyields the formula for m1. Once m1 is known, solving the second is trivial.
18 solving the two equations I use the word almost, since the usual formula for standard deviation has a factor of (N-1) in it,not the factor of N that appears above. However, this difference is insignificant when N is large.usual formula for the sample meanalmost the usual formula for the sample standard deviation
19 New Example: Exponential p.d.f. Note that taking the log gets rid of the exponential and changes products of variables into sums of variables.The maximization is accomplished by setting derivatives to zero, as usual.Note that the derivative of an |x| is sign(x):if x>0 then dx/dx=1.if x<0 then d(-x)/dx=-1so derivative is +1 if x>0 and -1 if x<0(and 0 if x=0)
20 solving the two equations m1est=median(d) andThe first equation is satisfied if the sum sign(di-<d>)=0.So you must pick <d> so that half of the di are greater than <d> and half are smaller.Then half of the signs are +1 and half are -1, and their sum is zero.But that’s just the formula for the median.
21 solving the two equations m1est=median(d) andThe median is robust, in the sense that adding 1 datumsuch as an outliercan only change the median from one centrally-located dito a nearest-neighbor and thus still centrally located di.more robust than sample meansince outlier moves it only by “one data point”
22 E(m)(A)(C)(B)mestL1 error as a function of m, when N is even. Note that it’s flat between the two central data.L1 error as a function of m, when N is odd. Note that the estimate IS the central datum.L2 error is a smooth quadratic curve.Fig Inverse problem to estimate the mean, mest, of N observations. (A) L1 error , E(m), (red curve) with even N: The error has a flat minimum, bounded by two observations (circles), and the solution is non-unique. (B) L1 error with odd N: The error is minimum at a one of the observation points, and the solution is unique. (C) The L2 error, both odd and even N: The error is minimum at a single point, which may not correspond to an observation point, and the solution is unique. MatLab script gda08_02.
23 these properties carry over to the general linear problem observationsWhen the number of data are even, the solution in non-unique but boundedThe solution exactly satisfies one of the datathese properties carry over to the general linear problemThe notion of finite bounds is new. There is no analog in the L2 world.In certain cases, the solution can be non-unique but boundedThe solution exactly satisfies M of the data equations
24 Part 3 Solve the Linear Inverse Problem under the L1 norm by Transformation to a Linear Programming ProblemNow we develop the tools needed to solve practical problems.
25 the Linear Programming problem reviewthe Linear Programming problemHere’s the formal statement of the Linear Programming problem ...
26 Case A The Minimum L1 Length Solution Analogous to the minimum L2 length problem that we applied to Gm=d when it was underdetermined.
27 subject to the constraint minimizesubject to the constraintGm=dThe formal statement of the problem. Emphasize the absolute value ...
28 minimize subject to the constraint Gm=d weighted L1 solution length (weighted by σm-1)subject to the constraintGm=dusual data equations
29 transformation to an equivalent linear programming problem In the course, we encounter many cases where a “harder” problem can be transformed into an “easier” one.
30 Here’s the transformation. We’ll go through it step by step.
31 all variables are required to be positive First, all the variables are required to be positive. Notice that there are 5 variables. We’ll definethem in a minute.all variables are required to be positive
32 usual data equations with m=m’-m’’ Notice that one of the equality constraints is Gm=d, written with the substitution m=m’-m’’.
33 standard trick in linear programming “slack variables”standard trick in linear programmingto allow m to have any sign while m1 and m2 are non-negativeThis substitution is a standard trick in linear programming. It relaxed the positivity constraint on m.While m’ and m’’ are constrained to be positive, their difference is not required to be positive.
34 same asThe other two equality constraints can be rewritten as shown ...
35 can always be satisfied by choosing an appropriate x’ if +Suppose that m-<m> is positive.In the first equationsince x is non-negativealpha must be greater than m-<m>.In the second equation, you can always choose x’ to satisfy the equation, regardless of the value of alpha.so the second equation does not constrain alpha.can always be satisfied by choosing an appropriate x’then α ≥ (m-<m>)since x≥0
36 if - can always be satisfied by choosing an appropriate x’ Suppose that m-<m> is negative.In the second equationsince x is non-negativealpha must be greater than -(m-<m>).In the first equation, you can always choose x to satisfy the equation, regardless of the value of alpha.so the first equation does not constrain alpha.can always be satisfied by choosing an appropriate x’then α ≥ -(m-<m>)since x≥0
37 taken together then α ≥|m-<m>| Taken together the two equations imply that alpha is greater thanthe absolute value of m-<m>taken togetherthen α ≥|m-<m>|
38 same as minimizing weighted solution length minimizing zsame as minimizing weighted solution lengthso minimizing the weighted sum of alphais the same asminimizing the weighted sum of abs(m-<m>).Clever!
39 Case B Least L1 error solution (analogous to least squares)
40 transformation to an equivalent linear programming problem
41 This transformation is similar to the minimum-length case This transformation is similar to the minimum-length case. Once again, 5 variables, all positive.
42 so previous argument applies There are two complimentary equality constraints,rigged to constrain thatα ≥|Gm-d|so mimimizing sum alpha is the same a minimizing sum abs(d-Gm)same asα – x = Gm – dα – x’ = -(Gm – d)so previous argument applies
43 MatLab % variables % m = mp - mpp % x = [mp', mpp', alpha', x', xp']' % mp, mpp len M and alpha, x, xp, len NL = 2*M+3*N;x = zeros(L,1);f = zeros(L,1);f(2*M+1:2*M+N)=1./sd;MatLab implementationNote that there are lots of variables.This is a limitation of the method – problems can easily be too big to handle.
45 % inequality constraints A x <= b % part 1: everything positive A = zeros(L+2*M,L);b = zeros(L+2*M,1);A(1:L,:) = -eye(L,L);b(1:L) = zeros(L,1);% part 2; mp and mpp have an upper bound.A(L+1:L+2*M,:) = eye(2*M,L);mls = (G'*G)\(G'*dobs); % L2mupperbound=10*max(abs(mls));b(L+1:L+2*M) = mupperbound;Inequality constraints are simple; everything positive.
46 % solve linear programming problem [x, fmin] = linprog(f,A,b,Aeq,beq); fmin=-fmin; mest = x(1:M) - x(M+1:2*M);Call to MatLab’s linear porgamming functionand then compute madded benefit is that fmin is the L1 error
47 di outlier zi Example: Red-true; Green L1; Blue L2. L2 is badly affected by the outlier. L1 is not.Fig Curve fitting using the L1 norm. The true data (red curve) follow a cubic polynomial in an auxiliary variable, z. Observations (red circles) have additive noise with zero mean and variance, σ2=1, drawn from an exponential probability density function. Note the outlier at z≈1. The L1 fit (green curve) is not as affected by the outlier as a standard L2 (least-squares) fit (blue curve). MatLab script gda08_03.
48 the mixed-determined problem of minimizing L+E can also be solved via transformation but we omit it hereIn other words, add prior information that m is close to <m>.
49 Part 4 Solve the Linear Inverse Problem under the L∞ norm by Transformation to a Linear Programming ProblemNice to know, but actually, one rarely (if ever) wants to solve this problem.
50 we’re going to skip all the details and just show the transformation for the overdetermined case Its explained in the text.
51 minimize E=maxi (ei /σdi) where e=dobs-Gm Emphasize that the Linf problem is to minimize the largest prediction error.The transformation is similar to the ones that we’ve seen previously ...
52 note α is a scalarExcept that alpha is a scalar, not a vector ...
54 di outlier zi Example: Red-true; Green Linf; Blue L2. Linf is very badly affected by the outlier.Fig Curve fitting using the L∞ norm. The true data (red curve) follow a cubic polynomial in an auxiliary variable, z. Observations (red circles) have additive noise with zero mean and variance, σ2=1, drawn from an exponential probability density function. Note the outlier at z≈ The L∞ fit (green curve) is more affected by the outlier than is a standard L2 (least-squares) fit (blue curve). MatLab script gda08_04.