Math for CSLecture 51 Function Optimization. Math for CSLecture 52 There are three main reasons why most problems in robotics, vision, and arguably every.

Slides:



Advertisements
Similar presentations
5.1 Real Vector Spaces.
Advertisements

Optimization.
1 OR II GSLM Outline  some terminology  differences between LP and NLP  basic questions in NLP  gradient and Hessian  quadratic form  contour,
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
Introducción a la Optimización de procesos químicos. Curso 2005/2006 BASIC CONCEPTS IN OPTIMIZATION: PART II: Continuous & Unconstrained Important concepts.
Optimization of thermal processes
Optimization 吳育德.
Linear Discriminant Functions
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
The Derivative as a Function
Jonathan Richard Shewchuk Reading Group Presention By David Cline
Chapter 5 Orthogonality
Function Optimization Newton’s Method. Conjugate Gradients
Tutorial 12 Unconstrained optimization Conjugate gradients.
Methods For Nonlinear Least-Square Problems
Optimization Methods One-Dimensional Unconstrained Optimization
Constrained Optimization
Tutorial 12 Linear programming Quadratic programming.
Gradient Methods May Preview Background Steepest Descent Conjugate Gradient.
Tutorial 5-6 Function Optimization. Line Search. Taylor Series for Rn
Unconstrained Optimization Problem
Math for CSLecture 11 Mathematical Methods for Computer Science Lecture 1.
Gradient Methods Yaron Lipman May Preview Background Steepest Descent Conjugate Gradient.
Ch 3.3: Linear Independence and the Wronskian
Tutorial 10 Iterative Methods and Matrix Norms. 2 In an iterative process, the k+1 step is defined via: Iterative processes Eigenvector decomposition.
Math for CSTutorial 5-61 Tutorial 5 Function Optimization. Line Search. Taylor Series for R n Steepest Descent.
Function Optimization. Newton’s Method Conjugate Gradients Method
Why Function Optimization ?
Math for CSLecture 71 Constrained Optimization Lagrange Multipliers ____________________________________________ Ordinary Differential equations.
Tier I: Mathematical Methods of Optimization
LIAL HORNSBY SCHNEIDER

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
Computational Optimization
UNCONSTRAINED MULTIVARIABLE
Solving Systems of Equations. Rule of Thumb: More equations than unknowns  system is unlikely to have a solution. Same number of equations as unknowns.
5.5 The Substitution Rule In this section, we will learn: To substitute a new variable in place of an existing expression in a function, making integration.
DERIVATIVES The Derivative as a Function DERIVATIVES In this section, we will learn about: The derivative of a function f.
ENCI 303 Lecture PS-19 Optimization 2
Multivariate Statistics Matrix Algebra II W. M. van der Veld University of Amsterdam.
Nonlinear programming Unconstrained optimization techniques.
Copyright © 2014, 2010 Pearson Education, Inc. Chapter 2 Polynomials and Rational Functions Copyright © 2014, 2010 Pearson Education, Inc.
Markov Decision Processes1 Definitions; Stationary policies; Value improvement algorithm, Policy improvement algorithm, and linear programming for discounted.
Linear Equations in Linear Algebra
1 1.7 © 2016 Pearson Education, Inc. Linear Equations in Linear Algebra LINEAR INDEPENDENCE.
Functions of Several Variables Copyright © Cengage Learning. All rights reserved.
1 Optimization Multi-Dimensional Unconstrained Optimization Part II: Gradient Methods.
Vector Norms and the related Matrix Norms. Properties of a Vector Norm: Euclidean Vector Norm: Riemannian metric:
Method of Hooke and Jeeves
Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct.
Chapter 8 Integration Techniques. 8.1 Integration by Parts.
EASTERN MEDITERRANEAN UNIVERSITY Department of Industrial Engineering Non linear Optimization Spring Instructor: Prof.Dr.Sahand Daneshvar Submited.
Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
Advanced Engineering Mathematics, 7 th Edition Peter V. O’Neil © 2012 Cengage Learning Engineering. All Rights Reserved. CHAPTER 4 Series Solutions.
Signal & Weight Vector Spaces
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.
Ch 9.6: Liapunov’s Second Method In Section 9.3 we showed how the stability of a critical point of an almost linear system can usually be determined from.
Math for CS Fourier Transform
Function Optimization
Computational Optimization
§7-4 Lyapunov Direct Method
13 Functions of Several Variables
6.5 Taylor Series Linearization
AS-Level Maths: Core 2 for Edexcel
Performance Surfaces.
Outline Preface Fundamentals of Optimization
Section 3: Second Order Methods
Presentation transcript:

Math for CSLecture 51 Function Optimization

Math for CSLecture 52 There are three main reasons why most problems in robotics, vision, and arguably every other science or endeavor take on the form of optimization problems. One is that the desired goal may not be achievable, and so we try to get as close as possible to it. The second reason is that there may be more ways to achieve the goal, and so we can choose one by assigning a quality to all the solutions and selecting the best one. The third reason is that we may not know how to solve the system of equations f(x) = 0, so instead we minimize the norm If(x)I, which is a scalar function of the unknown vector x. Why Function Optimization ?

Math for CSLecture 53 Local Minimization and Steepest Descent k=0; while x k is not a minimum compute step direction P k with IIP k II = 1 compute step size a k x k+l = x k + a k P k ; k=k+1 end. Suppose that we want to find a local minimum for the scalar function f of the vector variable x, starting from an initial point x o. Picking an appropriate x o is crucial, but also very problem- dependent. We start from x o, and we go downhill. At every step of the way, we must make the following decisions : Whether to stop. In what direction to proceed. How long a step to take. The following algorithm reflects various ‘descent minimization’ procedures:

Math for CSLecture 54 The best direction of descent is not necessarily the direction of steepest descent. Consider a function: where Q is a symmetric, positive definite matrix. Positive definite means that for every nonzero x the quantity x T Qx is positive. In this case, the graph of f(x) - c is a plane a T x plus a paraboloid x T Qx. Of course, if f were this simple, no descent methods would be necessary. In fact the minimum of f can be found by setting its gradient to zero: so that the minimum x* is the solution to the linear system Qx=-a Since Q is positive definite, it is also invertible (why?), and the solution x* is unique. Minimization of Positive Definite functions 1 (1)

Math for CSLecture 55 Minimization of Positive Definite functions 2 In order to simplify the mathematics, we observe that if we let then we have so that e and f differ only by a constant. Since e is simpler, we consider that we are minimizing e rather than f. In addition, we can let what shifts the origin of the domain to x*, and study the function (2)

Math for CSLecture 56 The steepest descent direction where e in minimum reaches a value of zero : let our steepest descent algorithm find this minimum by starting from the initial point the algorithm chooses the direction of steepest descent: Which is opposite to the gradient of e evaluated at y k :

Math for CSLecture 57 The step size The most favorable step size will take us from y k to the lowest point in the direction of p k. This can be found by differentiating the function with respect to α, and setting the derivative to zero to obtain the optimal step α k and setting this to zero yields

Math for CSLecture 58 The step size 2 Thus, the basic step of our steepest descent can be written as follows: or (3)

Math for CSLecture 59 e(y) descend rate How much closer does one step bring us to the solution y* = 0? In other words, how much smaller is e(y k+1 ) relatively to e(y k )? From the definition (2) of e(y) and equation (3) for y k+1, we obtain: (4)

Math for CSLecture 510 e(y) descend rate 2 Since Q is invertible, we have: And,what allows to rewrite (4) as: or

Math for CSLecture 511 Kantorovich inequality. Kantorovich inequality: Let Q be a positive definite, symmetric, n x n matrix. Then, for any vector y there holds: This inequality allows to prove the Steepest Descent Rate theorem.

Math for CSLecture 512 Steepest Descent Rate theorem 1 Steepest Descent Rate theorem: Let be a quadratic function of x, with Q symmetric and positive definite. For any x o, the method of steepest descent,where

Math for CSLecture 513 Steepest Descent Rate theorem 2 Converges to the unique minimum point The difference at every step satisfies, where σ 1 and σ n are the respectively the largest and the smallest singular values of Q

Math for CSLecture 514 Proof From the definitions we obtain Here, the Kantorovich inequality was used.

Math for CSLecture 515 Analysis The ratio Is called a condition number of Q. The larger the condition number (ratio between the largest and the smallest singular values), the smaller the ratio And therefore the slower the convergence.

Math for CSLecture 516 Illustration Consider the two dimensional case, x R 2. The figure shows a trajectory x k, imposed in the isocontours of f(x). The greater the ratio between the singular values, of Q (which is the aspect ratio of the ellipses), the slower the convergence rate. If the isocontours are circular (k(Q)=1) or the trajectory started from the ellipses axis, the single step brings us to x*.

Math for CSLecture 517 Convergence rate To characterize the speed of convergence of different minimization algorithms, we introduce the order of convergence. It is defined as the largest value of q, for which the Is finite. If is the limit, then we can write (for large values of k) The distance from x* is reduced by the q-th power at every step, therefore thehigher the order of convergence, the better.

Math for CSLecture 518 Stop criteria We do not know the x* and therefore f(x*). Thus the stop criteria is not trivial. The criteria can be |f(x k )-f(x k-1 )| or |x k -x k-1 |. The second criteria is better, since it indicates proximity of x*.

Math for CSLecture 519 Line search The steepest descend can be applied to general cases of f, not necessarily quadratic and not defined via In these cases, Q is the matrix of the second derivatives of f with respect to x, called a Hessian of f. In this case, only n first derivatives are needed to calculate the direction p k. The step size requires calculation of Hessian of f(x), which requires computing second derivatives, and therefore is very expensive. Using the line search allows to reach the minimum of f(x) in the direction p k without the Hessian calculation. (1)

Math for CSLecture 520 Line search runs as following. Let Be the scalar function of α representing the possible values of f(x) in the direction of p k. Let (a,b,c) be the three points of α, such, that the point of (constrained) minimum x’, is between a and c: a<x’<c. Then the following algorithm allows to approach x’ arbitrarily close: If b-a>c-b, u=(a+b)/2; If f(u)<f(b) (a,b,c)=(a,u,b) Else (a,b,c)=(u,b,c) Line search 2 a b c u If b-a<c-b, u=(b+c)/2; If f(u)<f(b) (a,b,c)=(b,u,c) Else (a,b,c)=(a,b,u)