Gradient Methods April 2004. Preview Background Steepest Descent Conjugate Gradient.

Slides:

Advertisements

Similar presentations

Discrete variational derivative methods: Geometric Integration methods for PDEs Chris Budd (Bath), Takaharu Yaguchi (Tokyo), Daisuke Furihata (Osaka)

Advertisements

Curved Trajectories towards Local Minimum of a Function Al Jimenez Mathematics Department California Polytechnic State University San Luis Obispo, CA

2.3 共轭斜量法（ Conjugate Gradient Methods) 属于一种迭代法，但如果不考虑计算过程的舍入误差， CG 算法只用有限步就收敛于方程组的精确解.

Instabilities of SVD Small eigenvalues -> m+ sensitive to small amounts of noise Small eigenvalues maybe indistinguishable from 0 Possible to remove small.

Chapter 9: Vector Differential Calculus Vector Functions of One Variable -- a vector, each component of which is a function of the same variable.

ESSENTIAL CALCULUS CH11 Partial derivatives

P. Venkataraman Mechanical Engineering P. Venkataraman Rochester Institute of Technology DETC2013 – 12269: Continuous Solution for Boundary Value Problems.

Support Vector Machines

Optimization of thermal processes

Optimization 吳育德.

Optimization methods Review

Steepest Decent and Conjugate Gradients (CG). Solving of the linear equation system.

Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.

Numerical Optimization

1cs542g-term Notes  Extra class this Friday 1-2pm  If you want to receive s about the course (and are auditing) send me .

Lecture 2 Linear Variational Problems (Part II). Conjugate Gradient Algorithms for Linear Variational Problems in Hilbert Spaces 1.Introduction. Synopsis.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Gradient Methods May Preview Background Steepest Descent Conjugate Gradient.

Optimization Methods One-Dimensional Unconstrained Optimization

Unconstrained Optimization Problem

Gradient Methods Yaron Lipman May Preview Background Steepest Descent Conjugate Gradient.

Topologically Adaptive Stochastic Search I.E. Lagaris & C. Voglis Department of Computer Science University of Ioannina - GREECE IOANNINA ATHENS THESSALONIKI.

12 1 Variations on Backpropagation Variations Heuristic Modifications –Momentum –Variable Learning Rate Standard Numerical Optimization –Conjugate.

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems

Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.

Why Function Optimization ?

Math for CSLecture 51 Function Optimization. Math for CSLecture 52 There are three main reasons why most problems in robotics, vision, and arguably every.

Optimization Methods One-Dimensional Unconstrained Optimization

(MTH 250) Lecture 24 Calculus. Previous Lecture’s Summary Multivariable functions Limits along smooth curves Limits of multivariable functions Continuity.

Implementation of Nonlinear Conjugate Gradient Method for MLP Matt Peterson ECE 539 December 10, 2001.

9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.

Computational Optimization

UNCONSTRAINED MULTIVARIABLE

Collaborative Filtering Matrix Factorization Approach

Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences

1 Computacion Inteligente Derivative-Based Optimization.

ENCI 303 Lecture PS-19 Optimization 2

Application of Differential Applied Optimization Problems.

Nonlinear programming Unconstrained optimization techniques.

Maxima and minima.

Control Engineering Lecture# 10 & th April’2008.

Graph Cuts Marc Niethammer. Segmentation by Graph-Cuts A way to compute solutions to the optimization problems we looked at before. Example: Binary Segmentation.

Computer Animation Rick Parent Computer Animation Algorithms and Techniques Optimization & Constraints Add mention of global techiques Add mention of calculus.

Variational Data Assimilation - Adjoint Sensitivity Analysis Yan Ding, Ph.D. National Center for Computational Hydroscience and Engineering The University.

559 Fish 559; Lecture 5 Non-linear Minimization. 559 Introduction Non-linear minimization (or optimization) is the numerical technique that is used by.

CHAPTER 17 O PTIMAL D ESIGN FOR E XPERIMENTAL I NPUTS Organization of chapter in ISSO –Background Motivation Finite sample and asymptotic (continuous)

Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.

Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore

1 Spring 2003 Prof. Tim Warburton MA557/MA578/CS557 Lecture 23.

Lecture Note 2 – Calculus and Probability Shuaiqiang Wang Department of CS & IS University of Jyväskylä

Exam 1 Oct 3, closed book Place ITE 119, Time:12:30-1:45pm

Steepest Descent Method Contours are shown below.

Gradient Methods In Optimization

Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.

Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.

Machine learning optimization Usman Roshan. Machine learning Two components: – Modeling – Optimization Modeling – Generative: we assume a probabilistic.

INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems

Krylov-Subspace Methods - II Lecture 7 Alessandra Nardi Thanks to Prof. Jacob White, Deepak Ramaswamy, Michal Rewienski, and Karen Veroy.

Feature Generation and Cluster-based Feature Selection.

Optimal Control.

Function Optimization

Non-linear Minimization

Non-linear Least-Squares

Collaborative Filtering Matrix Factorization Approach

Outline Single neuron case: Nonlinear error correcting learning

15.5 Directional Derivatives

Performance Optimization

Presentation transcript:

Gradient Methods April 2004

Preview Background Steepest Descent Conjugate Gradient

Preview Background Steepest Descent Conjugate Gradient

Background Motivation The gradient notion The Wolfe Theorems

Motivation The min(max) problem: But we learned in calculus how to solve that kind of question!

Motivation Not exactly, Functions: High order polynomials: What about function that don ’ t have an analytic presentation: “ Black Box ”

Motivation- “ real world ” problem Connectivity shapes (isenburg,gumhold,gotsman) What do we get only from C without geometry?

Motivation- “ real world ” problem First we introduce error functionals and then try to minimize them:

Motivation- “ real world ” problem Then we minimize: High dimension non-linear problem. The authors use conjugate gradient method which is maybe the most popular optimization technique based on what we ’ ll see here.

Motivation- “ real world ” problem Changing the parameter:

Motivation General problem: find global min(max) This lecture will concentrate on finding local minimum.

Background Motivation The gradient notion The Wolfe Theorems

Directional Derivatives: first, the one dimension derivative:

Directional Derivatives : Along the Axes …

Directional Derivatives : In general direction …

Directional Derivatives

In the plane The Gradient: Definition in

The Gradient: Definition

The Gradient Properties The gradient defines (hyper) plane approximating the function infinitesimally

The Gradient properties By the chain rule: (important for later use)

The Gradient properties Proposition 1: is maximal choosing is minimal choosing (intuitive: the gradient point the greatest change direction)

The Gradient properties Proof: (only for minimum case) Assign: by chain rule:

The Gradient properties On the other hand for general v:

The Gradient Properties Proposition 2: let be a smooth function around P, if f has local minimum (maximum) at p then, (Intuitive: necessary for local min(max))

The Gradient Properties Proof: Intuitive:

The Gradient Properties Formally: for any We get:

The Gradient Properties We found the best INFINITESIMAL DIRECTION at each point, Looking for minimum: “ blind man ” procedure How can we derive the way to the minimum using this knowledge?

Background Motivation The gradient notion The Wolfe Theorems

The Wolfe Theorem This is the link from the previous gradient properties to the constructive algorithm. The problem:

The Wolfe Theorem We introduce a model for algorithm: Data: Step 0:set i=0 Step 1:ifstop, else, compute search direction Step 2: compute the step-size Step 3:setgo to step 1

The Wolfe Theorem The Theorem: suppose C1 smooth, and exist continuous function: And, And, the search vectors constructed by the model algorithm satisfy:

The Wolfe Theorem And Then if is the sequence constructed by the algorithm model, then any accumulation point y of this sequence satisfy:

The Wolfe Theorem The theorem has very intuitive interpretation : Always go in decent direction.

Preview Background Steepest Descent Conjugate Gradient

Steepest Descent What it mean? We now use what we have learned to implement the most basic minimization technique. First we introduce the algorithm, which is a version of the model algorithm. The problem:

Steepest Descent Steepest descent algorithm: Data: Step 0:set i=0 Step 1:ifstop, else, compute search direction Step 2: compute the step-size Step 3:setgo to step 1

Steepest Descent Theorem: if is a sequence constructed by the SD algorithm, then every accumulation point y of the sequence satisfy: Proof: from Wolfe theorem Remark: wolfe theorem gives us numerical stability is the derivatives aren ’ t given (are calculated numerically).

Steepest Descent From the chain rule: Therefore the method of steepest descent looks like this:

Steepest Descent

The steepest descent find critical point and local minimum. Implicit step-size rule Actually we reduced the problem to finding minimum: There are extensions that gives the step size rule in discrete sense. (Armijo)

Steepest Descent Back with our connectivity shapes: the authors solve the 1-dimension problem analytically. They change the spring energy and get a quartic polynomial in x

Preview Background Steepest Descent Conjugate Gradient