Presentation is loading. Please wait.

Presentation is loading. Please wait.

Optimality Conditions for Nonlinear Optimization Ashish Goel Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A.

Similar presentations


Presentation on theme: "Optimality Conditions for Nonlinear Optimization Ashish Goel Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A."— Presentation transcript:

1 Optimality Conditions for Nonlinear Optimization Ashish Goel Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. Based on slides by Yinyu Ye 1

2 2 PROOFS FROM THIS LECTURE ARE NOT PART OF THE EXAM/HWs

3 Mathematical Optimization Problem Form: ( MOP ) s.t. x ∈ F. min f(x)f(x) Mathematical Optimization Problems 3 The first question: does the problem have a feasible solution, that is, a solution that satisfies all the constraints of the problem, that is, in F. The second question: How does one recognize or certify a (local) optimal solution? We answered it for LP by developing Optimality Conditions from the LP duality. But what about a generally constrained and objective optimization problem? We need more general Optimality Condition Theory.

4 The objective and constraint are often specified by functions that are continuously differentiable or in C 1 over certain regions. Sometimes the functions are twice continuously differentiable or in C 2 over certain regions. The theory distinguishes these two cases and develops first-order optimality conditions and second-order optimality conditions. For convex optimization, first order optimality conditions suffice. 4 Continuous Differentiable Function

5 Let f be a differentiable function on R n. For a given point x ∈ R n and a vector d ∈ R n such that ∇ f (x) T d < 0, where ∇ f (x) is the gradient vector of f (x) (partial derivative of f(x) to each x j ) then there exists a scalar τ> 0 such that f (x + τ d) < f (x) for all small τ. Such a vector d (above) is called a descent direction at x. If the gradient vector ∇ f (x) ≠ 0, then ∇ f(x) is the direction of steepest ascent and − ∇ f(x) is the direction of steepest descent at x. Denote by D x the set of all descent directions at x, that is, D x = { d ∈ R n : ∇ f(x) T d < 0}. Descent Direction 5

6 At feasible point x, a feasible direction is F x := {d ∈ R n : d ≠ 0, x + λd ∈ F for all small λ > 0}. Examples: Unconstrained: R n ⇒ F x = R n. Linear Equality: { x :Ax = b} ⇒ F x = {d : Ad = 0 }. Linear Inequality: { x :Ax ≥ b} ⇒ F x = {d : a i d ≥ 0, ∀ i ∈ B(x)}, where B(x) is the binding constraint index set B(x) := { i : a i x = b i }. Linear Equality and Nonnegativity: { x : Ax = b, x ≥ 0} ⇒ F x = {d : Ad = 0, d i ≥ 0, ∀ i ∈ B(x) }, where B(x) := { i : x i = 0 }. Feasible Direction 6

7 A fundamental question in Optimization is: given a feasible solution or point x, what are the necessary conditions such that x is a local optimizer? A general answer: there exists no direction d at x that is both descent and feasible. Or the intersection of D x and F x is empty. Optimality Conditions 7

8 Consider the unconstrained problem, where f is differentiable on R n, (UP) s.t.x ∈ R n. min f(x)f(x) Theorem 1 Let x be a (local) minimizer of (UP) where f is continuously differentiable at x. Then ∇ f(x ) = 0. Optimality Conditions for Unconstrained Problems Then the descent direction set: D x = { d ∈ R n : ∇ f(x) T d < 0}, must be empty. 8

9 x ab ced f Figure : Local minimizers of one-variable function 9

10 Quadratic Function: A minimizer or maximizer x must satisfy ∇ f(x ) = 2Qx − 2c = 0 or Q x = c. Quadratic Optimization 10 d 1 (x) = 2 - x 1 + x 2 d 2 (x) = 3 -2x 2 + x 1

11 Consider the linear equality constrained problem, where f is differentiable on R n, Theorem 2 Let x be a (local) minimizer of (LEP) where function f is continuously differentiable at x. Then ∇ f(x ) = A T y for a vector y = (y 1 ;... ; y m ) ∈ R m, which are called Lagrange or dual multipliers. The geometric interpretation: the objective gradient vector is perpendicular to or the objective level set tangents the constraint hyperplanes. Linear Equality Constrained Problems (LEP) s.t.Ax = b. min f(x)f(x) 11

12 x1x1 x2x2 The objective level set tangents the constraint hyperplane 12 Linear Equality Constrained Problems min (x 1 − 1) 2 + (x 2 − 1) 2 x 1 + x 2 = 1. s.t. 2(x 1 -1) = y 2(x 2 -1)=y, (y+2)/2+(y+2)/2=1 y=-1.

13 Consider the primal feasible system: {x : Ax = b, x ≥ 0}. If it is infeasible, then the dual must be unbounded, that is, there exists a y in system {y : A T y ≤ 0, b T y > 0}. The reverse is also true. These two systems are an alternative pair: one and only one of the two is feasible. Consider the dual feasible system: {y : A T y ≤ c}. If it is infeasible, then the primal must be unbounded, that is, there exists an x in system {x : Ax = 0, x ≥ 0, c T x < 0}. The reverse is also true. These two systems are also an alternative pair: one and only one of the two is feasible. 13 Farkas Lemma: Alternative Systems

14 If x is a local optimizer, then the intersection of the descent and feasible direction sets at x must be empty, or Ad = 0, ∇ f(x ) T d < 0 has no feasible solution d. Or no d’≥0 and d”≥0 such that A(d’ - d”) = 0, ∇ f(x ) T (d’-d”) < 0. By the Alternative System Theorem it must be true that its alternative system has a solution, that is, there is y ∈ R m such that A T y ≤ ∇ f(x ) and -A T y ≤ - ∇ f(x ), that is ∇ f (x ) =A T y. Sketch of Proof 14

15 Consider the linear inequality constrained problem, where f is differentiable on R n, Theorem 3 Let x be a (local) minimizer of (LIP) where function f is continuously differentiable at x. Then ∇ f(x ) = A T y, y ≥ 0, and y i (Ax - b) i =0, for all i=1,…,m for a vector y = (y 1 ;... ; y m ) ∈ R m, which are called Lagrange or dual multipliers. The geometric interpretation: the objective gradient vector is a conic combination of the normal directions of the binding/active constraint hyperplanes. Linear Inequality Constrained Problems (LIP) s.t.Ax ≥ b. min f(x)f(x) 15

16 x1x1 x2x2 The constraint cannot be binding. 16 Linear Inequality Constrained Problems min (x 1 − 1) 2 + (x 2 − 1) 2 x 1 + x 2 ≥ 1. s.t. 2(x 1 -1) = y 2(x 2 -1)=y, (y+2)/2+(y+2)/2=1 y=-1.

17 x1x1 x2x2 The constraint is binding. 17 Linear Inequality Constrained Problems min (x 1 − 1) 2 + (x 2 − 1) 2 -x 1 - x 2 ≥ -1. s.t. 2(x 1 -1) = -y 2(x 2 -1)=-y, (-y+2)/2+(-y+2)/2=1 y=1.

18 Sketch of Proof 18

19 Consider the linear equality and nonnegativity constrained problem, where f is differentiable on R n, Theorem 4 Let x be a (local) minimizer of (LENP) where function f is continuously differentiable at x. Then ∇ f(x ) - A T y ≥ 0, x j ( ∇ f(x ) - A T y ) j =0, for all j=1,..,n. for a vector y = (y 1 ;... ; y m ) ∈ R m, which are called Lagrange or dual multipliers. Linear Equality Constrained Problems (LENP) s.t.Ax = b, x ≥ 0 min f(x)f(x) 19

20 (LEP) (LIP) (LENP) 20 Optimality Conditions Linearly Constrained Problems s.t. Ax = b, x ≥ 0 min f(x)f(x) ∇ f(x ) - A T y ≥ 0, x j ( ∇ f(x ) - A T y ) j =0, for j=1,..,n. min f (x) Ax ≥ b s.t. ∇ f(x ) = A T y, y ≥ 0, y i (Ax - b) i =0, for i=1,…,m min f (x) Ax = b s.t. ∇ f(x ) = A T y where vector y = (y 1 ;... ; y m ) ∈ R m is called Lagrange or dual multiplier vector. They are also called KKT (Karush-Kuhn-Tucker) conditions.

21 (LEP) (LIP) (LENP) 21 Optimality Conditions Linearly Constrained Problems s.t. Ax = b, x ≥ 0 min f(x)f(x) ∇ f(x ) - A T y ≥ 0, x j ( ∇ f(x ) - A T y ) j =0, for j=1,..,n. min f (x) Ax ≥ b s.t. ∇ f(x ) = A T y, y ≥ 0, y i (Ax - b) i =0, for i=1,…,m min f (x) Ax = b s.t. ∇ f(x ) = A T y Necessary, but not sufficient in general; However, necessary and sufficient when f(x) is convex. Example of Convex Optimization.

22 Let f(x) be a twice-continuously-differentiable function of one variable. Then f(x) is convex if any of the following equivalent conditions hold 1.f’(x) is a non-decreasing function 2.f(ax + (1-a)y) <= af(x) + (1-a)f(y) for all x, y and all 0 <= a <= 1 3.f’’(x) is non-negative in the domain of f, i.e. where f is defined. When f is a function of many variables, 2 still holds, and 1, 3 have to be modified for high dimensions. The sum of two convex functions is convex. Convex Functions 22

23 The log-barrier problem min - log(x 1 ) - log(x 2 ) x 1 + 2x 2 = 1, x 1,x 2 ≥ 0, s.t. -(1/x 1 ) = y -(1/x 2 )=2y, -(1/y)+2(-1/2y)=1 y=-2. 23 x 1 = 1/2 x 2 =1/4. min - log(x 1 ) - log(x 2 ) - … - log(x n ) Ax = b, x 1,x 2 …, x n ≥ 0, s.t. There is a y such that -(1/x j ) = a j T y, j=1,…,n; or x j (-a j T y) j = 1, for all j.

24 Optimality Conditions for Nonlinearly Constrained Optimization 24

25 (NEP) (NIP) (NEIP) 25 Nonlinearly Constrained Optimization Problems s.t. h(x) = b’, g(x) ≥ b min f(x)f(x) f (x) g(x) ≥ b s.t. min f (x) h(x) = b’ s.t. The vector function h(x) is from R n to R m and g(x) is from R n to R k ; that is: h 1 (x) h 2 (x) h(x).... h m (x) = ∈ R m g 1 (x) g 2 (x) g(x).... g k (x) = ∈ R k

26 26 Nonlinearly Constrained Optimization Problems

27 27 Gradient Vector and Jacobian Matrix at x h 1 (x) h 2 (x) h(x).... h m (x) = g 1 (x) g 2 (x) g(x).... g k (x) = ∇ f(x) T =[∂f/∂x 1, ∂f/∂x 2, …, ∂f/∂x N ] ∇ h 1 (x) T ∇ h 2 (x) T ∇ h(x).... ∇ h m (x) T = ∈ R m X n ∇ g 1 (x) T ∇ g 2 (x) T ∇ g(x).... ∇ g k (x) T = ∈ R k X n

28 (NEP) (NIP) (NEIP) 28 Optimality Conditions Nonlinearly Constrained Problems s.t. h(x) = b’ g(x) ≥ b min f(x)f(x) ∇ f(x )- ∇ h(x) T y’- ∇ g(x) T y=0 y ≥ 0 y i (g(x) - b) i = 0, for i=1,..,k. min f (x) g(x) ≥ b s.t. ∇ f(x )- ∇ g(x) T y=0, y ≥ 0, y i (g(x) - b) i =0, for i=1,…,k min f (x) h(x) = b’ s.t. ∇ f(x )- ∇ h(x) T y’=0 where vector y’ = (y’ 1 ;... ; y’ m ) ∈ R m and y = (y 1 ;... ; y k ) ∈ R k called Lagrange multiplier vector. They are also called KKT conditions.

29 (NEP) (NIP) (NEIP) 29 Optimality Conditions Nonlinearly Constrained Problems s.t. h(x) = b’ g(x) ≥ b min f(x)f(x) ∇ f(x )- ∇ h(x) T y’- ∇ g(x) T y=0 y ≥ 0 y i (g(x) - b) i = 0, for i=1,..,k. min f (x) g(x) ≥ b s.t. ∇ f(x )- ∇ g(x) T y=0, y ≥ 0, y i (g(x) - b) i =0, for i=1,…,k min f (x) h(x) = b’ s.t. ∇ f(x )- ∇ h(x) T y’=0 These are FONC (First Order Necessary Conditions)

30 Convex Optimization 30 where f is a convex function and F is a convex set. Examples: Linear Programming; Quadratic Programming; Semi- Definite Programming ( CVXP ) s.t. x ∈ F. min f(x)f(x)

31 (NEP) (NIP) (NEIP) 31 Convex Optimization s.t. h(x) = b’ g(x) ≥ b min f(x)f(x) f (x) g(x) ≥ b s.t. min f (x) h(x) = b’ s.t. The above are convex programs if h is linear and g is concave (i.e. –g is convex) and f is convex

32 (NEP) (NIP) (NEIP) 32 Convex Optimization s.t. h(x) = b’ g(x) ≥ b min f(x)f(x) ∇ f(x )- ∇ h(x) T y’- ∇ g(x) T y=0 y ≥ 0 y i (g(x) - b) i = 0, for i=1,..,k. min f (x) g(x) ≥ b s.t. ∇ f(x )- ∇ g(x) T y=0, y ≥ 0, y i (g(x) - b) i =0, for i=1,…,k min f (x) h(x) = b’ s.t. ∇ f(x )- ∇ h(x) T y’=0 The above are convex programs if h is linear and g is concave (i.e. –g is convex) and f is convex For convex programs, the optimality conditions are necessary and sufficient.

33 33

34 x1x1 x2x2 34 Optimality Conditions: Example I

35 x1x1 x2x2 35 Optimality Conditions: Example II

36 x1x1 x2x2 36 Optimality Conditions: Example III

37 The fundamental concept of the first order necessary condition (FONC) in optimization is that there is no feasible and descent direction d at same time The fundamental concept of the second order necessary condition (SONC) is that even the first order condition is satisfied at x, then one must have also d T ∇ 2 f (x)d ≥ 0 for any feasible direction. Consider f(x) =x 2 and f(x)=-x 2, and they both satisfy the first-order necessary condition at x =0, but only one of them satisfy the second- order condition. The first order condition would be sufficient if d T ∇ 2 f (x)d > 0 for any feasible direction d, while meet the first-order condition. Second Order Optimality Condition 37

38 Theorem If f is a differentiable convex function in the feasible region and the feasible region is a convex set, then the first-order KKT optimality conditions are sufficient for the global optimality. How to check convexity f (x) ? Hessian matrix is PSD in the feasible region. Epigraph is a convex set This class: we will only give functions where you can easily verify that the second derivative is positive or where we have sums of convex functions or convex functions of a linear combination How to check the feasible region is a convex set? Linear equality and inequality constraints h(x) is linear function and g(x) is concave function When FONC is Sufficient? 38 s.t. h(x) = b’, g(x) ≥ b min f(x)f(x)

39 Any problem in the previous theorem is called a convex optimization problem. The convexity plays a big role in Optimization and it simplifies the optimality certification greatly. Linear programming is a special convex optimization problem. Many convex optimization modelers and solvers are available: CVX MOSEK CPLEX And of course EXCEL Remarks 39

40 Another useful form: Linear Inequality and Non-negative constraints (LINP) 40 Optimality Conditions for Linearly Constrained Problems min f (x) Ax ≥ b x ≥ 0 s.t. ∇ f(x ) ≥ A T y y ≥ 0, y i (Ax - b) i =0, for i=1,…,m x j ( ∇ f(x ) - A T y ) j =0, for j=1,..,n

41 “Convexfication” 41

42 Assume we control a probability distribution over N outcomes, i = 1,... 2, …, N x i : probability of the i-th outcome Goal: Maximize the entropy of the distribution, i.e. Maximize  i – x i log x i x i log x i is convex Hence maximizing entropy of a probability distribution under constraints is a convex function. Important for communications! Natural constraints: x ¸ 0;  i x i = 1 Another interesting example 42

43 OPTIONAL MATERIAL 43

44 Second Order Optimality Conditions: special cases 44 min f (x) h(x) = b s.t. ∇ f(x )- ∇ h(x) T y=0 L(x,y) =f(x)- (h(x)-b) T y

45 x1x1 x2x2 45 Who satisfies the second order necessary condition?

46 Constraint Qualification 46 To develop optimality or KKT conditions for nonlinearly constrained optimization, one needs some technical assumptions, called constrained qualification. For equality constraints, the standard assumption is that the Jacobian matrix on the test solution is full rank, or the rows of the matrix are linearly independent. For inequality constraints, the standard assumption is that there is a feasible direction at the test solution pointing to the interior of the feasible region. When the problem data are randomly perturbed a little, the constraint qualification would be met with probability one.


Download ppt "Optimality Conditions for Nonlinear Optimization Ashish Goel Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A."

Similar presentations


Ads by Google