The role of optimization in machine learning

The role of optimization in machine learning
Dominik Csiba, MLMU Bratislava, 19.april 2017 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

Motivation 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

Linear models 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

Linear models as optimization
LASSO Features Label Logistic regression Regularizer Loss function Features Label 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

Dimensionality Reduction - PCA
19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

PCA as optimization Features Loss function Features Constraint
Principal Components Principal Components 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

Matrix completion -1 1 -1 1 19. April 2017
MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

Matrix completion as optimization
Constraint Observed indices Loss function Low rank matrix Observed matrix 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

Real-time bidding in online advertising
AUCTION WEBPAGE Banner 0.1$ 0.05$ 0.01$ WINNER Banner Banner Competitor Competitor Competitor 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

Real-time bidding as optimization
Arrive sequentially Utility function Spent Budget Allocation Allocation Feasible Allocations Penalty 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

Optimization in Machine Learning
Most of the learning boils down to an optimization problem Attracts many mathematicians Main topic of my PhD. thesis Optimization has its own ecosystem in the machine learning community Treated as a black-box by a lot of practitioners Black-box gets faster and faster each year Optimization is a consideration only if .fit() has issues in learning Next: understanding some of the scenarios with such issues 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

Supervised learning 19. April 2017

Main idea WORLD CAT BREAD CAT DOG examples labels Ground truth

Learning the true predictor
Empirical Risk Minimization: Samples: ERM: Hopefully: Usual form: Hypothesis class Loss function Solve this! 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

How does optimization work?
Intuition: Find the bottom of a valley in a dense fog using a teleport, where you can ask for the following information: 0th order info: the altitude at any location 1st order info: the slope at any location 2nd order info: the curvature at any location … Most popular 0th order 1st order 2nd order 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

Going down the hill – Gradient Descent
By far the most popular iterative scheme: Intuition: We do a step down the hill Stepsize: in some cases given by theory, otherwise difficult to pick Too small Too big 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

Big data 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

#dimension partial derivatives
Large-scale data Gradient descent step: #dimension partial derivatives #examples functions 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

Wrong step in a smart direction
Gradient Descent Do the best we can using first order information Stepsize: constant Iteration cost depends on both dimension and number of examples! Wrong step in a smart direction SOLVED! 6 4 5 2 3 1 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

Randomized Coordinate Descent
Update only one randomly chosen coordinate at a time Stepsize: constant for every dimension Iteration cost is independent of dimension! Smart step in a wrong direction SOLVED! 6 7 4 N 5 2 3 W E S 1 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

Stochastic Gradient Descent
Update using only one random example at a time Stepsize: Decaying Iteration cost independent of the number of examples! Wrong step in a wrong direction SOLVED! 4 6 7 2 5 3 1 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

Magic of SGD explained Correct direction in expectation:
Variance is not vanishing! Stepsize has to be decaying, otherwise SGD will not converge! For GD/RCD, there is no such issue, because: 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

Stochastic Variance Reduced Gradient
A new method was proposed in 2013 to reduce the variance of SGD. Outer Loop (repeat forever): Store the current iterate as . Compute and store Inner Loop (repeat K times) uniformly at random sample perform the update: We have Correct direction! 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

Distributed problems 19. April 2017

Distributed framework
Node 1 Node 2 Node K . . . Naïve Approach 1: Distributed GD Master node COMMUNICATION Communication is very expensive! Naïve Approach 2: One-shot averaging ACCURACY 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

Distributed convergence rates
Standard convergence time measure: Distributed convergence time measure: Ideally similar 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

Distributed methods: Intuition
Iterate over the following steps: Compute the minimizers of the local objectives Send the minimizers to the master node Create a new local objective for each node based on the other minimizers Distribute the local objectives back to the local nodes Local estimates of global objective AND SO ON! 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

Complex objectives 19. April 2017

Deep Learning / Neural Networks
For a practitioner: The ultimate tool for machine learning For a mathematician: A nightmare (or the ultimate challenge) Optimization without any assumptions (except continuity) No real guarantees on generalization error No real guarantees on convergence A lot of attention – ¼ of 2500 papers submitted to NIPS 2016 Next: Understand Deep Learning better by understanding when it fails 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

Learning parities (Failures of Deep Learning, Ohad Shamir, 2017)
TASK: learn a function, which outputs the parity of active entries in an unknown subset of coordinates of a vector, formally: Choose a vector For define Learn without any information on The task is realizable by a single-layer neural network with units In the following experiment we try to learn using units 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

Parities convergence (Failures of Deep Learning, Ohad Shamir, 2017)

Non-informative gradients
Let be the loss corresponding to the parity problem given . Claim: Fix a point and consider all the gradients for all vectors . Their variance is upper-bounded by It follows that for large dimensions, all the methods based only on gradient information fail to converge. A more general version of the above claim holds for linear functions composed with a periodic function (Shamir, 2016) 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

Objective function example
(Distribution-specific Hardness of Learning Neural Networks, Ohad Shamir, 2016) 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

Final remarks Optimization is the backbone of machine learning
One does not realize how important it is, until something goes wrong Optimization offers a lot of challenging problems Ideal for mathematically oriented people with applied tastes Optimization improves a lot by analyzing its failures “Learning from failures is the key to success”, (put here any name) Responsible for most of the modern advances in deep learning 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

Thank you for your attention!
Feel free to contact me on with any further questions! 19. April 2017 MLMU Bratislava: The role of optimization in machine learning, Dominik Csiba

The role of optimization in machine learning

Similar presentations

Presentation on theme: "The role of optimization in machine learning"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The role of optimization in machine learning

Similar presentations

Presentation on theme: "The role of optimization in machine learning"— Presentation transcript:

Similar presentations

About project

Feedback