Download presentation

Presentation is loading. Please wait.

1
Unconstrained Optimization Rong Jin

2
Recap Gradient ascent/descent Simple algorithm, only requires the first order derivative Problem: difficulty in determining the step size Small step size slow convergence Large step size oscillation or bubbling

3
Recap: Newton Method Univariate Newton method Mulvariate Newton method Guarantee to converge when the objective function is convex/concave Hessian matrix

4
Recap Problem with standard Newton method Computing inverse of Hessian matrix H is expensive (O(n^3)) The size of Hessian matrix H can be very large (O(n^2)) Quasi-Newton method (BFGS): Approximate the inverse of Hessian matrix H with another matrix B Avoid the difficulty in computing inverse of H However, still have problem when the size of B is large Limited memory Quasi-Newton method (L-BFGS) Storing a set of vectors instead of matrix B Avoid the difficulty in computing the inverse of H Avoid the difficulty in storing the large-size B

5
Recap Number of Variable Standard Newton method: O(n 3 ) Small Medium Quasi Newton method (BFGS): O(n 2 ) Limited-memory Quasi Newton method (L-BFGS): O(n) Large Convergence Rate V-Fast Fast R-Fast

6
Empirical Study: Learning Conditional Exponential Model DatasetInstancesFeatures Rule29,602246 Lex42,509135,182 Summary24,044198,467 Shallow8,625,782264,142 DatasetIterationsTime (s) Rule3504.8 811.13 Lex1545114.21 17620.02 Summary3321190.22 698.52 Shallow1452785962.53 4212420.30 Limited-memory Quasi-Newton method Gradient ascent

7
Free Software http://www.ece.northwestern.edu/~nocedal/so ftware.html http://www.ece.northwestern.edu/~nocedal/so ftware.html L-BFGS L-BFGSB

8
Conjugate Gradient Another Great Numerical Optimization Method !

9
Linear Conjugate Gradient Method Consider optimizing the quadratic function Conjugate vectors The set of vector {p 1, p 2, …, p l } is said to be conjugate with respect to a matrix A if Important property The quadratic function can be optimized by simply optimizing the function along individual direction in the conjugate set. Optimal solution: k is the minimizer along the kth conjugate direction

10
Example Minimize the following function Matrix A Conjugate direction Optimization First direction, x 1 = x 2 =x: Second direction, x 1 =- x 2 =x: Solution: x 1 = x 2 =1

11
How to Efficiently Find a Set of Conjugate Directions Iterative procedure Given conjugate directions {p 1,p 2,…, p k-1 } Set p k as follows: Theorem: The direction generated in the above step is conjugate to all previous directions {p 1,p 2,…, p k-1 }, i.e., Note: compute the k direction p k only requires the previous direction p k-1

12
Nonlinear Conjugate Gradient Even though conjugate gradient is derived for a quadratic objective function, it can be applied directly to other nonlinear functions Guarantee convergence if the objective is convex/concave Variants: Fletcher-Reeves conjugate gradient (FR-CG) Polak-Ribiere conjugate gradient (PR-CG) More robust than FR-CG Compared to Newton method The first order method Usually less efficient than Newton method However, it is simple to implement

13
Empirical Study: Learning Conditional Exponential Model DatasetInstancesFeatures Rule29,602246 Lex42,509135,182 Summary24,044198,467 Shallow8,625,782264,142 DatasetIterationsTime (s) Rule1421.93 811.13 Lex28121.72 17620.02 Summary53731.66 698.52 Shallow281316251.12 4212420.30 Limited-memory Quasi-Newton method Conjugate Gradient (PR)

14
Free Software http://www.ece.northwestern.edu/~nocedal/so ftware.html http://www.ece.northwestern.edu/~nocedal/so ftware.html CG+

15
When Should We Use Which Optimization Technique Using Newton method if you can find a package Using conjugate gradient if you have to implement it Using gradient ascent/descent if you are lazy

16
Logarithm Bound Algorithms To maximize Start with a guess Do it for t = 1, 2, …, T Compute Find a decoupling function Find optimal solution Touch Point

17
Logarithm Bound Algorithm Start with initial guess x 0 Come up with a lower bounded function (x) f(x) + f(x 0 ) Touch point: (x 0 ) =0 Touch Point Optimal solution x 1 for (x)

18
Logarithm Bound Algorithm Start with initial guess x 0 Come up with a lower bounded function (x) f(x) + f(x 0 ) Touch point: (x 0 ) =0 Optimal solution x 1 for (x) Repeat the above procedure

19
Logarithm Bound Algorithm Start with initial guess x 0 Come up with a lower bounded function (x) f(x) + f(x 0 ) Touch point: (x 0 ) =0 Optimal solution x 1 for (x) Repeat the above procedure Converge to the optimal point Optimal Point

20
Property of Concave Functions For any concave function

21
Important Inequality log(x), -exp(x) are concave functions Therefore

22
Expectation-Maximization Algorithm Derive the EM algorithm for Hierarchical Mixture Model m 1 (x) r(x) m 2 (x) X y Log-likelihood of training data

Similar presentations

© 2020 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google