Computational Optimization

Slides:



Advertisements
Similar presentations
Instabilities of SVD Small eigenvalues -> m+ sensitive to small amounts of noise Small eigenvalues maybe indistinguishable from 0 Possible to remove small.
Advertisements

1 OR II GSLM Outline  some terminology  differences between LP and NLP  basic questions in NLP  gradient and Hessian  quadratic form  contour,
How Bad is Selfish Routing? By Tim Roughgarden Eva Tardos Presented by Alex Kogan.
Optimization 吳育德.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The FIR Adaptive Filter The LMS Adaptive Filter Stability and Convergence.
Review + Announcements 2/22/08. Presentation schedule Friday 4/25 (5 max)Tuesday 4/29 (5 max) 1. Miguel Jaller 8:031. Jayanth 8:03 2. Adrienne Peltz 8:202.
Inexact SQP Methods for Equality Constrained Optimization Frank Edward Curtis Department of IE/MS, Northwestern University with Richard Byrd and Jorge.
Steepest Decent and Conjugate Gradients (CG). Solving of the linear equation system.
Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.
The Most Important Concept in Optimization (minimization)  A point is said to be an optimal solution of a unconstrained minimization if there exists no.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
MIT and James Orlin © Nonlinear Programming Theory.
Gradient Methods April Preview Background Steepest Descent Conjugate Gradient.
1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.
Numerical Optimization
Function Optimization Newton’s Method. Conjugate Gradients
1cs542g-term Notes  Extra class this Friday 1-2pm  If you want to receive s about the course (and are auditing) send me .
Tutorial 12 Unconstrained optimization Conjugate gradients.
1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.
Linear Regression  Using a linear function to interpolate the training set  The most popular criterion: Least squares approach  Given the training set:
Engineering Optimization
Gradient Methods May Preview Background Steepest Descent Conjugate Gradient.
Optimization Methods One-Dimensional Unconstrained Optimization
Unconstrained Optimization Problem
Gradient Methods Yaron Lipman May Preview Background Steepest Descent Conjugate Gradient.
Function Optimization. Newton’s Method Conjugate Gradients Method
Martin Burger Total Variation 1 Cetraro, September 2008 Numerical Schemes Wrap up approximate formulations of subgradient relation.
ECE 552 Numerical Circuit Analysis Chapter Six NONLINEAR DC ANALYSIS OR: Solution of Nonlinear Algebraic Equations Copyright © I. Hajj 2012 All rights.
Why Function Optimization ?
Math for CSLecture 51 Function Optimization. Math for CSLecture 52 There are three main reasons why most problems in robotics, vision, and arguably every.
Optimization Methods One-Dimensional Unconstrained Optimization
Chapter 3 Root Finding.
9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 9. Optimization problems.
Collaborative Filtering Matrix Factorization Approach
KKT Practice and Second Order Conditions from Nash and Sofer
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
1. The Simplex Method.
Instructor: Prof.Dr.Sahand Daneshvar Presented by: Seyed Iman Taheri Student number: Non linear Optimization Spring EASTERN MEDITERRANEAN.
Algorithms for a large sparse nonlinear eigenvalue problem Yusaku Yamamoto Dept. of Computational Science & Engineering Nagoya University.
ENCI 303 Lecture PS-19 Optimization 2
1 Unconstrained Optimization Objective: Find minimum of F(X) where X is a vector of design variables We may know lower and upper bounds for optimum No.
1 Optimization Multi-Dimensional Unconstrained Optimization Part II: Gradient Methods.
CHAPTER 3 NUMERICAL METHODS
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
L21 Numerical Methods part 1 Homework Review Search problem Line Search methods Summary 1 Test 4 Wed.
Chapter 10 Minimization or Maximization of Functions.
CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.
Inexact SQP methods for equality constrained optimization Frank Edward Curtis Department of IE/MS, Northwestern University with Richard Byrd and Jorge.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
METHOD OF STEEPEST DESCENT ELE Adaptive Signal Processing1 Week 5.
INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Network Systems Lab. Korea Advanced Institute of Science and Technology No.1 Maximum Norms & Nonnegative Matrices  Weighted maximum norm e.g.) x1x1 x2x2.
Non-linear Minimization
Computational Optimization
Computational Optimization
Boundary-Value Problems for ODE )בעיות הגבול(
Collaborative Filtering Matrix Factorization Approach
CS5321 Numerical Optimization
Optimization Part II G.Anuradha.
Instructor :Dr. Aamer Iqbal Bhatti
Outline Preface Fundamentals of Optimization
Fixed- Point Iteration
L23 Numerical Methods part 3
Outline Preface Fundamentals of Optimization
2.2 Fixed-Point Iteration
CS5321 Numerical Optimization
Bracketing.
Presentation transcript:

Computational Optimization Steepest Descent 2/8

Most Obvious Algorithm The negative gradient always points downhill. Head in the direction of the negative gradient. Use linesearch to decided how far to go. Repeat.

Steepest Descent Algorithm Start with x0 For k =1,…,K If xk is optimal then stop Perform exact or backtracking linesearch to determine xk+1=xk+kpk

How far should we step? Linesearch: xk+1=xk+kpk fixed k::=c>0 a constant Decreasing Exact Approximate conditions Armijo Wolfe Goldstein

Key Idea: Sufficient Decrease The step can’t go to 0 unless the gradient goes to 0.

Armijo Condition 1 1/2  g(0)+ c1g’() g() g(0)+  g’()

Curvature Condition Make sure the step uses up most of the available decrease

Wolfe Condition For 0<c1<c2<1 Solution exists for any descent direction if f is bounded below on the linesearch. (Lemma 3.1)

Backtracking Search Key point: Stepsize cannot be allowed to go to 0 unless gradient is going to 0. Must have sufficient decrease. Fix >0 (0,1)  (0,1)

Armijo Condition 1 1/2  g(0)+ c1g’() g() g(0)+  g’()

Steepest Descent Algorithm Start with x0 For k =1,…,K If xk is optimal then stop Perform backtracking linesearch to determine xk+1=xk+kpk

Convergence Analysis We usually analyze quadratic problem Equivalent to more general problems since let x* solve min f(x)=1/2x’Qx-bx Let y = x-x* so x=y+x* then

Convergence Rate Consider case of constant stepsize Square both sides

Convergence Rate…

Best Step

Condition number determines convergence rate Linear convergence since Condition number: 1 is best

Well Conditioned Problem x^2+y^2 Cond num =1/1=1 Steepest Descent does one step like Newton

ill-Conditioned Problem 50(x-10)^2+y^2 Cond num =50/1=50 Steepest Descent ZIGZAGS!!!

Linear Convergence for our fixed stepsize

Theorem 3.3 NW Let be the sequence generated by steepest descent with exact linesearch applied to function min 1/2x’Qx-b’x where Q is p.d. Then for any x0, the sequence converges to the unique minimizer x* and That is the method converges linearly

Condition Number Officially: But this as ratio of max and mix eigenvalues if A is pd and symmetric

Theorem 3.4 Suppose f is twice cont diff and the sequence of steepest descent converge to x* satisfying SOSC. Let The for all k sufficient large

Other possible directions Any direction satisfying is a descent direction. Which ones will work?

Zoutendijk’s Theorem For any descent direction and step size satisfying the Wolfe conditions, f bounded, below, Lipschitz, and differentiable Corollary:

Convergence Theorem If for all iteration k Then So steepest descent converges! Many other variations.

Theorem Convergence of Gradient Related Methods (i) Assume the set S is bounded where (ii) Let be Lipschitz cont on S i.e. for all x,y S and some fixed finite L For the sequence generated by

Theorem 10.2 Nash and Sofer (iii) Let the directions pk satisfy: (iv) be gradient related: and bounded in form:

Theorem 10.2 Let the stepsize be the first element of the sequence {1,1/2,1/4,1/8,….} satisfying with 0<<1. Then

Steepest Descent An obvious choice of direction is Called steepest descent because Clearly satisfies gradient related requirement of Theorem 10.2

Steepest Descent Summary Simple algorithm Inexpensive per iteration Only requires first derivative information Global convergence to local minimum Linear convergence – may be slow Convergence rate depends on condition number May zigzag