Learning by Loss Minimization. Machine learning: Learn a Function from Examples Function: Examples: – Supervised: – Unsupervised: – Semisuprvised:

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

ECG Signal processing (2)
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
ONLINE ARABIC HANDWRITING RECOGNITION By George Kour Supervised by Dr. Raid Saabne.
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines
An Introduction of Support Vector Machine
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
Support Vector Machines
Pattern Recognition and Machine Learning: Kernel Methods.
Support vector machine
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Machine learning continued Image source:
Optimization Tutorial
Support Vector Machines
Support Vector Machine
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
1 PEGASOS Primal Efficient sub-GrAdient SOlver for SVM Shai Shalev-Shwartz Yoram Singer Nati Srebro The Hebrew University Jerusalem, Israel YASSO = Yet.
Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)
Margins, support vectors, and linear programming Thanks to Terran Lane and S. Dreiseitl.
CS 4700: Foundations of Artificial Intelligence
A Kernel-based Support Vector Machine by Peter Axelberg and Johan Löfhede.
SVMs Finalized. Where we are Last time Support vector machines in grungy detail The SVM objective function and QP Today Last details on SVMs Putting it.
SVM Support Vectors Machines
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:
SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.
An Introduction to Support Vector Machines Martin Law.
Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Outline Classification Linear classifiers Perceptron Multi-class classification Generative approach Naïve Bayes classifier 2.
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 24 – Classifiers 1.
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
1 CSC 4510, Spring © Paula Matuszek CSC 4510 Support Vector Machines 2 (SVMs)
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Classification & Regression Part II
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
SVMs in a Nutshell.
Support Vector Machines Optimization objective Machine Learning.
Introduction to Machine Learning Prof. Nir Ailon Lecture 5: Support Vector Machines (SVM)
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Neural networks and support vector machines
Support vector machines
PREDICT 422: Practical Machine Learning
Lecture 07: Soft-margin SVM
ECE 5424: Introduction to Machine Learning
Geometrical intuition behind the dual problem
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Support Vector Machines
Hyperparameters, bias-variance tradeoff, validation
Lecture 07: Soft-margin SVM
CSSE463: Image Recognition Day 14
PEGASOS Primal Estimated sub-GrAdient Solver for SVM
Support Vector Machines
Lecture 08: Soft-margin SVM
Support vector machines
Lecture 07: Soft-margin SVM
CSSE463: Image Recognition Day 15
CSSE463: Image Recognition Day 15
Support Vector Machine I
Support vector machines
CSSE463: Image Recognition Day 15
Presentation transcript:

Learning by Loss Minimization

Machine learning: Learn a Function from Examples Function: Examples: – Supervised: – Unsupervised: – Semisuprvised:

Machine learning: Learn a Function from Examples Function: Examples: – Supervised: – Unsupervised: – Semisuprvised:

Example: Regression Examples:

Example: Regression Examples: Function:

Example: Regression Examples: Function: How to find ?

Loss Functions Least Squares: Least absolute deviations

Open Questions How to choose the model function? How to choose the loss function? How to minimize the loss function?

Example: Binary Classification

Support Vector Machines (SVMs) Binary classification can be viewed as the task of separating classes in feature space:

Support Vector Machines (SVMs)

Its sign is the predicted label right label

Support Vector Machines (SVMs)

Other losses ?

Can minimize using Stochastic subGradient Decent (SGD)

Constant

Can minimize using Stochastic subGradient Decent (SGD)

Papers Pegasos: Primal Estimated sub-GrAdient SOlver for SVM, Shai Shalev-Shwartz, Yoram Singer, Nathan Srebro, Andrew Cotter 2011 The Tradeoffs of Large Scale Learning, Léon Bottou and Olivier Bousquet 2011 Stochastic Gradient Descent Tricks, Léon Bottou 2012

Non-Linear SVMs Datasets that are linearly separable: 0 x

Non-Linear SVMs Datasets that are linearly separable: Datasets that are NOT linearly separable? 0 x 0 x

Non-Linear SVMs Datasets that are linearly separable: Datasets that are NOT linearly separable? Mapping to other (here higher) dimensions: 0 x 0 x 0 x2x2 x

What should be the mapping? 1 3

1 3

10

What should be the mapping in general?

Support Vector Machines (SVMs) The Lagrangian dual: Where the classifier is:

Support Vector Machines (SVMs) The Lagrangian dual: Where the classifier is:

Support Vector Machines (SVMs) The Lagrangian dual: Where the classifier is:

Support Vector Machines (SVMs) The Lagrangian dual:

Support Vector Machines (SVMs)

Support Vector Machines (SVMs) Primal with Kernels (Chapelle 06)

Popular Choices for Kernels Polynomial (homogenous) kernel: Polynomial (inhomogenous) kernel: Gaussian Radial Basis Function (RBF) kernel:

One-vs-One: trains classifiers, each one to classify between two classes and classify by majority. Multiclass ?

One-vs-One: trains classifiers, each one to classify between two classes and classify by majority. One-vs-All: train classifiers, each one to classify between one class and all other classes and classify by majority. Multiclass ?

One-vs-One: trains classifiers, each one to classify between two classes and classify by majority. One-vs-All: train classifiers, each one to classify between one class and all other classes and classify by majority. Multiclass (Crammer and Singer): train one-vs-all classifiers jointly. Multiclass ?

Multiclass (Crammer and Singer): train one-vs-all classifiers jointly.

Right class response

Multiclass (Crammer and Singer): train one-vs-all classifiers jointly. Wrong class that got the largest response

Complex labels – Structured Prediction

How to choose C or sigma for Gaussian kernel or … ?

How to evaluate performance ?

Neural Nets = Deep Learning