Presentation is loading. Please wait.

Presentation is loading. Please wait.

The role of optimization in machine learning

Similar presentations


Presentation on theme: "The role of optimization in machine learning"— Presentation transcript:

1 The role of optimization in machine learning
Dominik Csiba, MLMU Bratislava, 19.april 2017 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

2 Motivation 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

3 Linear models 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

4 Linear models as optimization
LASSO Features Label Logistic regression Regularizer Loss function Features Label 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

5 Dimensionality Reduction - PCA
19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

6 PCA as optimization Features Loss function Features Constraint
Principal Components Principal Components 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

7 Matrix completion -1 1 -1 1 19. April 2017
MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

8 Matrix completion as optimization
Constraint Observed indices Loss function Low rank matrix Observed matrix 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

9 Real-time bidding in online advertising
AUCTION WEBPAGE Banner 0.1$ 0.05$ 0.01$ WINNER Banner Banner Competitor Competitor Competitor 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

10 Real-time bidding as optimization
Arrive sequentially Utility function Spent Budget Allocation Allocation Feasible Allocations Penalty 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

11 Optimization in Machine Learning
Most of the learning boils down to an optimization problem Attracts many mathematicians Main topic of my PhD. thesis Optimization has its own ecosystem in the machine learning community Treated as a black-box by a lot of practitioners Black-box gets faster and faster each year Optimization is a consideration only if .fit() has issues in learning Next: understanding some of the scenarios with such issues 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

12 Supervised learning 19. April 2017
MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

13 Main idea WORLD CAT BREAD CAT DOG examples labels Ground truth
19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

14 Learning the true predictor
Empirical Risk Minimization: Samples: ERM: Hopefully: Usual form: Hypothesis class Loss function Solve this! 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

15 How does optimization work?
Intuition: Find the bottom of a valley in a dense fog using a teleport, where you can ask for the following information: 0th order info: the altitude at any location 1st order info: the slope at any location 2nd order info: the curvature at any location Most popular 0th order 1st order 2nd order 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

16 Going down the hill – Gradient Descent
By far the most popular iterative scheme: Intuition: We do a step down the hill Stepsize: in some cases given by theory, otherwise difficult to pick Too small Too big 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

17 Big data 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

18 #dimension partial derivatives
Large-scale data Gradient descent step: #dimension partial derivatives #examples functions 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

19 Wrong step in a smart direction
Gradient Descent Do the best we can using first order information Stepsize: constant Iteration cost depends on both dimension and number of examples! Wrong step in a smart direction SOLVED! 6 4 5 2 3 1 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

20 Randomized Coordinate Descent
Update only one randomly chosen coordinate at a time Stepsize: constant for every dimension Iteration cost is independent of dimension! Smart step in a wrong direction SOLVED! 6 7 4 N 5 2 3 W E S 1 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

21 Stochastic Gradient Descent
Update using only one random example at a time Stepsize: Decaying Iteration cost independent of the number of examples! Wrong step in a wrong direction SOLVED! 4 6 7 2 5 3 1 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

22 Magic of SGD explained Correct direction in expectation:
Variance is not vanishing! Stepsize has to be decaying, otherwise SGD will not converge! For GD/RCD, there is no such issue, because: 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

23 Stochastic Variance Reduced Gradient
A new method was proposed in 2013 to reduce the variance of SGD. Outer Loop (repeat forever): Store the current iterate as . Compute and store Inner Loop (repeat K times) uniformly at random sample perform the update: We have Correct direction! 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

24 Distributed problems 19. April 2017
MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

25 Distributed framework
Node 1 Node 2 Node K . . . Naïve Approach 1: Distributed GD Master node COMMUNICATION Communication is very expensive! Naïve Approach 2: One-shot averaging ACCURACY 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

26 Distributed convergence rates
Standard convergence time measure: Distributed convergence time measure: Ideally similar 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

27 Distributed methods: Intuition
Iterate over the following steps: Compute the minimizers of the local objectives Send the minimizers to the master node Create a new local objective for each node based on the other minimizers Distribute the local objectives back to the local nodes Local estimates of global objective AND SO ON! 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

28 Complex objectives 19. April 2017
MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

29 Deep Learning / Neural Networks
For a practitioner: The ultimate tool for machine learning For a mathematician: A nightmare (or the ultimate challenge) Optimization without any assumptions (except continuity) No real guarantees on generalization error No real guarantees on convergence A lot of attention – ¼ of 2500 papers submitted to NIPS 2016 Next: Understand Deep Learning better by understanding when it fails 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

30 Learning parities (Failures of Deep Learning, Ohad Shamir, 2017)
TASK: learn a function, which outputs the parity of active entries in an unknown subset of coordinates of a vector, formally: Choose a vector For define Learn without any information on The task is realizable by a single-layer neural network with units In the following experiment we try to learn using units 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

31 Parities convergence (Failures of Deep Learning, Ohad Shamir, 2017)
19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

32 Non-informative gradients
Let be the loss corresponding to the parity problem given . Claim: Fix a point and consider all the gradients for all vectors . Their variance is upper-bounded by It follows that for large dimensions, all the methods based only on gradient information fail to converge. A more general version of the above claim holds for linear functions composed with a periodic function (Shamir, 2016) 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

33 Objective function example
(Distribution-specific Hardness of Learning Neural Networks, Ohad Shamir, 2016) 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

34 Final remarks Optimization is the backbone of machine learning
One does not realize how important it is, until something goes wrong Optimization offers a lot of challenging problems Ideal for mathematically oriented people with applied tastes Optimization improves a lot by analyzing its failures “Learning from failures is the key to success”, (put here any name) Responsible for most of the modern advances in deep learning 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

35 Thank you for your attention!
Feel free to contact me on with any further questions! 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba


Download ppt "The role of optimization in machine learning"

Similar presentations


Ads by Google