# Lecture 13 L1 , L∞ Norm Problems and Linear Programming

## Presentation on theme: "Lecture 13 L1 , L∞ Norm Problems and Linear Programming"— Presentation transcript:

Lecture 13 L1 , L∞ Norm Problems and Linear Programming
In this lecture, we show how to solve problems involving the L1 norm.

Syllabus Lecture 01 Describing Inverse Problems Lecture 02 Probability and Measurement Error, Part 1 Lecture 03 Probability and Measurement Error, Part 2 Lecture 04 The L2 Norm and Simple Least Squares Lecture 05 A Priori Information and Weighted Least Squared Lecture 06 Resolution and Generalized Inverses Lecture 07 Backus-Gilbert Inverse and the Trade Off of Resolution and Variance Lecture 08 The Principle of Maximum Likelihood Lecture 09 Inexact Theories Lecture 10 Nonuniqueness and Localized Averages Lecture 11 Vector Spaces and Singular Value Decomposition Lecture 12 Equality and Inequality Constraints Lecture 13 L1 , L∞ Norm Problems and Linear Programming Lecture 14 Nonlinear Problems: Grid and Monte Carlo Searches Lecture 15 Nonlinear Problems: Newton’s Method Lecture 16 Nonlinear Problems: Simulated Annealing and Bootstrap Confidence Intervals Lecture 17 Factor Analysis Lecture 18 Varimax Factors, Empircal Orthogonal Functions Lecture 19 Backus-Gilbert Theory for Continuous Problems; Radon’s Problem Lecture 20 Linear Operators and Their Adjoints Lecture 21 Fréchet Derivatives Lecture 22 Exemplary Inverse Problems, incl. Filter Design Lecture 23 Exemplary Inverse Problems, incl. Earthquake Location Lecture 24 Exemplary Inverse Problems, incl. Vibrational Problems

Purpose of the Lecture Review Material on Outliers and Long-Tailed Distributions Derive the L1 estimate of the mean and variance of an exponential distribution Solve the Linear Inverse Problem under the L1 norm by Transformation to a Linear Programming Problem Do the same for the L∞ problem The first section emphasizes the link between p.d.f.’s and measures of error. The seconds applies the same likelihood method as was previously applied to the Gaussian p.d.f. The last two show how to solve general L1 problems

Part 1 Review Material on Outliers and Long-Tailed Distributions
If the error has a long tailed p.d.f. the data will have lots of outliers so outliers are unimportant

Review of the Ln family of norms
Just a review of the formulas for the norms

higher norms give increaing weight to largest element of e
Higher the norm, the more important an outlier is The higher norms give preferential weight to the largest of the errors. Fig ?. Hypothetical prediction error, ei(zi), and its absolute value, raised the powers, 1, 2 and 10. While most elements of |ei| are numerically significant, only a few elements of |ei|10 are. MatLab script gda03_??.

limiting case The limiting case say the norm of e is the absolute value of its largest element.

but which norm to use? it makes a difference!
Different estimate of the model parameters for each norm.

outlier L1 L2 L∞ Example of fitting a straight line to data. Note the outlier. The higher the norm, the more the outlier “pulls down” the line. Straight line problem solved under three different norms. This is a worst-case example, since the data have a horrible outlier (be sure to point it out). Fig ?. Straight line fits to (z,d) pairs where the error is measured under the L1, L2 and L∞ norms. The L1 norm gives the least weight to the one outlier. MatLab script gda03_??.

Answer is related to the distribution of the error
Answer is related to the distribution of the error. Are outliers common or rare? B) A) Comparison of a long-tailed (left) and short-tailed p.d.f. Use low norms for long-tailed p.d.f.’s, high norms for short tailed p.d.f.’s Fig ?. A) Long-tailed probability density function. B) Short-tailed probability density function. MatLab script gda03_??. short tails outliers uncommon outliers important use high norm gives high weight to outliers long tails outliers common outliers unimportant use low norm gives low weight to outliers

as we showed previously … use L2 norm when data has Gaussian-distributed error as we will show in a moment … use L1 norm when data has Exponentially-distributed error

comparison of p.d.f.’s Gaussian Exponential
Emphasize the “square” in the Gaussian and the “absolute value” in the exponential Fig Comparison of exponential probability density distribution (red) with a Normal probability density distribution (blue) of equal variance, σ2=1. The exponential probability density distribution is longer tailed. MatLab script gda08_01.

to make realizations of an exponentially-distributed random variable in MatLab
mu = sd/sqrt(2); rsign = (2*(random('unid',2,Nr,1)-1)-1); dr = dbar + rsign .* ... random('exponential',mu,Nr,1); Note that Matlab’s “exponential” distribution is one-sized, so must fudge by multiplying it by randomly chosen sign,

Part 2 Derive the L1 estimate of the mean and variance of an exponential distribution
Scenario: we measure the temperature in the room (presumed spatially uniform) N times. The thermometer has exponentially-distributed error. What’s the best estimate of the mean (the temperature in the room) and the variance of the measurement.

use of Principle of Maximum Likelihood
maximize L = log p(dobs) the log-probability that the observed data was in fact observed with respect to unknown parameters in the p.d.f. e.g. its mean m1 and variance σ2 We applied this principle many times before when discussing Gaussian-distributed errors.

Previous Example: Gaussian p.d.f.
We did this a few lectures ago. Note that taking the log gets rid of the exponential and changes products of variables into sums of variables. The maximization is accomplished by setting derivatives to zero, as usual.

solving the two equations
The two equations are easy to solve. The first can only be zero when the summation is zero, which yields the formula for m1. Once m1 is known, solving the second is trivial.

solving the two equations
I use the word almost, since the usual formula for standard deviation has a factor of (N-1) in it, not the factor of N that appears above. However, this difference is insignificant when N is large. usual formula for the sample mean almost the usual formula for the sample standard deviation

New Example: Exponential p.d.f.
Note that taking the log gets rid of the exponential and changes products of variables into sums of variables. The maximization is accomplished by setting derivatives to zero, as usual. Note that the derivative of an |x| is sign(x): if x>0 then dx/dx=1. if x<0 then d(-x)/dx=-1 so derivative is +1 if x>0 and -1 if x<0 (and 0 if x=0)

solving the two equations
m1est=median(d) and The first equation is satisfied if the sum sign(di-<d>)=0. So you must pick <d> so that half of the di are greater than <d> and half are smaller. Then half of the signs are +1 and half are -1, and their sum is zero. But that’s just the formula for the median.

solving the two equations
m1est=median(d) and The median is robust, in the sense that adding 1 datum such as an outlier can only change the median from one centrally-located di to a nearest-neighbor and thus still centrally located di. more robust than sample mean since outlier moves it only by “one data point”

E(m) (A) (C) (B) mest L1 error as a function of m, when N is even. Note that it’s flat between the two central data. L1 error as a function of m, when N is odd. Note that the estimate IS the central datum. L2 error is a smooth quadratic curve. Fig Inverse problem to estimate the mean, mest, of N observations. (A) L1 error , E(m), (red curve) with even N: The error has a flat minimum, bounded by two observations (circles), and the solution is non-unique. (B) L1 error with odd N: The error is minimum at a one of the observation points, and the solution is unique. (C) The L2 error, both odd and even N: The error is minimum at a single point, which may not correspond to an observation point, and the solution is unique. MatLab script gda08_02.

these properties carry over to the general linear problem
observations When the number of data are even, the solution in non-unique but bounded The solution exactly satisfies one of the data these properties carry over to the general linear problem The notion of finite bounds is new. There is no analog in the L2 world. In certain cases, the solution can be non-unique but bounded The solution exactly satisfies M of the data equations

Part 3 Solve the Linear Inverse Problem under the L1 norm by Transformation to a Linear Programming Problem Now we develop the tools needed to solve practical problems.

the Linear Programming problem
review the Linear Programming problem Here’s the formal statement of the Linear Programming problem ...

Case A The Minimum L1 Length Solution
Analogous to the minimum L2 length problem that we applied to Gm=d when it was underdetermined.

subject to the constraint
minimize subject to the constraint Gm=d The formal statement of the problem. Emphasize the absolute value ...

minimize subject to the constraint Gm=d weighted L1 solution length
(weighted by σm-1) subject to the constraint Gm=d usual data equations

transformation to an equivalent linear programming problem
In the course, we encounter many cases where a “harder” problem can be transformed into an “easier” one.

Here’s the transformation. We’ll go through it step by step.

all variables are required to be positive
First, all the variables are required to be positive. Notice that there are 5 variables. We’ll define them in a minute. all variables are required to be positive

usual data equations with m=m’-m’’
Notice that one of the equality constraints is Gm=d, written with the substitution m=m’-m’’.

standard trick in linear programming
“slack variables” standard trick in linear programming to allow m to have any sign while m1 and m2 are non-negative This substitution is a standard trick in linear programming. It relaxed the positivity constraint on m. While m’ and m’’ are constrained to be positive, their difference is not required to be positive.

same as The other two equality constraints can be rewritten as shown ...

can always be satisfied by choosing an appropriate x’
if + Suppose that m-<m> is positive. In the first equation since x is non-negative alpha must be greater than m-<m>. In the second equation, you can always choose x’ to satisfy the equation, regardless of the value of alpha. so the second equation does not constrain alpha. can always be satisfied by choosing an appropriate x’ then α ≥ (m-<m>) since x≥0

if - can always be satisfied by choosing an appropriate x’
Suppose that m-<m> is negative. In the second equation since x is non-negative alpha must be greater than -(m-<m>). In the first equation, you can always choose x to satisfy the equation, regardless of the value of alpha. so the first equation does not constrain alpha. can always be satisfied by choosing an appropriate x’ then α ≥ -(m-<m>) since x≥0

taken together then α ≥|m-<m>| Taken together
the two equations imply that alpha is greater than the absolute value of m-<m> taken together then α ≥|m-<m>|

same as minimizing weighted solution length
minimizing z same as minimizing weighted solution length so minimizing the weighted sum of alpha is the same as minimizing the weighted sum of abs(m-<m>). Clever!

Case B Least L1 error solution (analogous to least squares)

transformation to an equivalent linear programming problem

This transformation is similar to the minimum-length case
This transformation is similar to the minimum-length case. Once again, 5 variables, all positive.

so previous argument applies
There are two complimentary equality constraints, rigged to constrain that α ≥|Gm-d| so mimimizing sum alpha is the same a minimizing sum abs(d-Gm) same as α – x = Gm – d α – x’ = -(Gm – d) so previous argument applies

MatLab % variables % m = mp - mpp % x = [mp', mpp', alpha', x', xp']'
% mp, mpp len M and alpha, x, xp, len N L = 2*M+3*N; x = zeros(L,1); f = zeros(L,1); f(2*M+1:2*M+N)=1./sd; MatLab implementation Note that there are lots of variables. This is a limitation of the method – problems can easily be too big to handle.

% equality constraints Aeq = zeros(2*N,L); beq = zeros(2*N,1);
% first equation G(mp-mpp)+x-alpha=d Aeq(1:N,1:M) = G; Aeq(1:N,M+1:2*M) = -G; Aeq(1:N,2*M+1:2*M+N) = -eye(N,N); Aeq(1:N,2*M+N+1:2*M+2*N) = eye(N,N); beq(1:N) = dobs; % second equation G(mp-mpp)-xp+alpha=d Aeq(N+1:2*N,1:M) = G; Aeq(N+1:2*N,M+1:2*M) = -G; Aeq(N+1:2*N,2*M+1:2*M+N) = eye(N,N); Aeq(N+1:2*N,2*M+2*N+1:2*M+3*N) = -eye(N,N); beq(N+1:2*N) = dobs; Equality constraints

% inequality constraints A x <= b % part 1: everything positive
A = zeros(L+2*M,L); b = zeros(L+2*M,1); A(1:L,:) = -eye(L,L); b(1:L) = zeros(L,1); % part 2; mp and mpp have an upper bound. A(L+1:L+2*M,:) = eye(2*M,L); mls = (G'*G)\(G'*dobs); % L2 mupperbound=10*max(abs(mls)); b(L+1:L+2*M) = mupperbound; Inequality constraints are simple; everything positive.

% solve linear programming problem [x, fmin] = linprog(f,A,b,Aeq,beq); fmin=-fmin; mest = x(1:M) - x(M+1:2*M); Call to MatLab’s linear porgamming function and then compute m added benefit is that fmin is the L1 error

di outlier zi Example: Red-true; Green L1; Blue L2.
L2 is badly affected by the outlier. L1 is not. Fig Curve fitting using the L1 norm. The true data (red curve) follow a cubic polynomial in an auxiliary variable, z. Observations (red circles) have additive noise with zero mean and variance, σ2=1, drawn from an exponential probability density function. Note the outlier at z≈1. The L1 fit (green curve) is not as affected by the outlier as a standard L2 (least-squares) fit (blue curve). MatLab script gda08_03.

the mixed-determined problem of minimizing L+E can also be solved via transformation but we omit it here In other words, add prior information that m is close to <m>.

Part 4 Solve the Linear Inverse Problem under the L∞ norm by Transformation to a Linear Programming Problem Nice to know, but actually, one rarely (if ever) wants to solve this problem.

we’re going to skip all the details and just show the transformation for the overdetermined case
Its explained in the text.

minimize E=maxi (ei /σdi) where e=dobs-Gm
Emphasize that the Linf problem is to minimize the largest prediction error. The transformation is similar to the ones that we’ve seen previously ...

note α is a scalar Except that alpha is a scalar, not a vector ...

di outlier zi Example: Red-true; Green Linf; Blue L2.
Linf is very badly affected by the outlier. Fig Curve fitting using the L∞ norm. The true data (red curve) follow a cubic polynomial in an auxiliary variable, z. Observations (red circles) have additive noise with zero mean and variance, σ2=1, drawn from an exponential probability density function. Note the outlier at z≈ The L∞ fit (green curve) is more affected by the outlier than is a standard L2 (least-squares) fit (blue curve). MatLab script gda08_04.

Download ppt "Lecture 13 L1 , L∞ Norm Problems and Linear Programming"

Similar presentations