Learning by Loss Minimization. Machine learning: Learn a Function from Examples Function: Examples: – Supervised: – Unsupervised: – Semisuprvised:

Learning by Loss Minimization

Machine learning: Learn a Function from Examples Function: Examples: – Supervised: – Unsupervised: – Semisuprvised:

Example: Regression Examples:

Example: Regression Examples: Function:

Example: Regression Examples: Function: How to find ?

Loss Functions Least Squares: Least absolute deviations

Open Questions How to choose the model function? How to choose the loss function? How to minimize the loss function?

Example: Binary Classification

Support Vector Machines (SVMs) Binary classification can be viewed as the task of separating classes in feature space:

Support Vector Machines (SVMs)

Its sign is the predicted label right label

Other losses ?

Can minimize using Stochastic subGradient Decent (SGD)

Constant

Can minimize using Stochastic subGradient Decent (SGD)

Papers Pegasos: Primal Estimated sub-GrAdient SOlver for SVM, Shai Shalev-Shwartz, Yoram Singer, Nathan Srebro, Andrew Cotter 2011 The Tradeoffs of Large Scale Learning, Léon Bottou and Olivier Bousquet 2011 Stochastic Gradient Descent Tricks, Léon Bottou 2012

Non-Linear SVMs Datasets that are linearly separable: 0 x

Non-Linear SVMs Datasets that are linearly separable: Datasets that are NOT linearly separable? 0 x 0 x

Non-Linear SVMs Datasets that are linearly separable: Datasets that are NOT linearly separable? Mapping to other (here higher) dimensions: 0 x 0 x 0 x2x2 x

What should be the mapping? 1 3

What should be the mapping in general?

Support Vector Machines (SVMs) The Lagrangian dual: Where the classifier is:

Support Vector Machines (SVMs) The Lagrangian dual:

Support Vector Machines (SVMs) Primal with Kernels (Chapelle 06)

Popular Choices for Kernels Polynomial (homogenous) kernel: Polynomial (inhomogenous) kernel: Gaussian Radial Basis Function (RBF) kernel:

One-vs-One: trains classifiers, each one to classify between two classes and classify by majority. Multiclass ?

One-vs-One: trains classifiers, each one to classify between two classes and classify by majority. One-vs-All: train classifiers, each one to classify between one class and all other classes and classify by majority. Multiclass ?

One-vs-One: trains classifiers, each one to classify between two classes and classify by majority. One-vs-All: train classifiers, each one to classify between one class and all other classes and classify by majority. Multiclass (Crammer and Singer): train one-vs-all classifiers jointly. Multiclass ?

Multiclass (Crammer and Singer): train one-vs-all classifiers jointly.

Right class response

Multiclass (Crammer and Singer): train one-vs-all classifiers jointly. Wrong class that got the largest response

Complex labels – Structured Prediction

How to choose C or sigma for Gaussian kernel or … ?

How to evaluate performance ?

Neural Nets = Deep Learning

Learning by Loss Minimization. Machine learning: Learn a Function from Examples Function: Examples: – Supervised: – Unsupervised: – Semisuprvised:

Similar presentations

Presentation on theme: "Learning by Loss Minimization. Machine learning: Learn a Function from Examples Function: Examples: – Supervised: – Unsupervised: – Semisuprvised:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning by Loss Minimization. Machine learning: Learn a Function from Examples Function: Examples: – Supervised: – Unsupervised: – Semisuprvised:

Similar presentations

Presentation on theme: "Learning by Loss Minimization. Machine learning: Learn a Function from Examples Function: Examples: – Supervised: – Unsupervised: – Semisuprvised:"— Presentation transcript:

Similar presentations

About project

Feedback