CS 2750: Machine Learning Support Vector Machines

Slides:

Advertisements

Similar presentations

Introduction to Support Vector Machines (SVM)

Advertisements

Lecture 9 Support Vector Machines

ECG Signal processing (2)

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Support Vector Machine & Its Applications Mingyue Tan The University of British Columbia Nov 26, 2004 A portion (1/3) of the slides are taken from Prof.

SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.

An Introduction of Support Vector Machine

An Introduction of Support Vector Machine

Support Vector Machines

Support vector machine

Machine learning continued Image source:

Discriminative and generative methods for bags of features

Support Vector Machine

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.

1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.

Support Vector Machines Kernel Machines

Support Vector Machines and Kernel Methods

Support Vector Machines

Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.

Lecture 10: Support Vector Machines

Discriminative and generative methods for bags of features

An Introduction to Support Vector Machines Martin Law.

Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,

Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.

Support Vector Machine & Image Classification Applications

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 24 – Classifiers 1.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

An Introduction to Support Vector Machines (M. Law)

CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.

Linear Models for Classification

SVM – Support Vector Machines Presented By: Bella Specktor.

CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.

Dec 21, 2006For ICDM Panel on 10 Best Algorithms Support Vector Machines: A Survey Qiang Yang, for ICDM 2006 Panel Partially.

CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.

Support Vector Machine Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata November 3, 2014.

CS 2750: Machine Learning Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh February 17, 2016.

An Introduction of Support Vector Machine In part from of Jinwei Gu.

A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.

An Introduction of Support Vector Machine Courtesy of Jinwei Gu.

Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:

Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.

Support Vector Machine Slides from Andrew Moore and Mingyue Tan.

Lecture IX: Object Recognition (2)

Neural networks and support vector machines

PREDICT 422: Practical Machine Learning

Support Vector Machine

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

Support Vector Machines

Object detection as supervised classification

An Introduction to Support Vector Machines

An Introduction to Support Vector Machines

LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS

Support Vector Machines Introduction to Data Mining, 2nd Edition by

Support Vector Machines

CS 2770: Computer Vision Intro to Visual Recognition

Statistical Learning Dong Liu Dept. EEIS, USTC.

Support Vector Machines Most of the slides were taken from:

COSC 4335: Other Classification Techniques

Support Vector Machines

Recitation 6: Kernel SVM

Introduction to Support Vector Machines

Support Vector Machines and Kernels

COSC 4368 Machine Learning Organization

CISC 841 Bioinformatics (Fall 2007) Kernel Based Methods (I)

SVMs for Document Ranking

Presentation transcript:

CS 2750: Machine Learning Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh February 16, 2017

Plan for today Linear Support Vector Machines Non-linear SVMs and the “kernel trick” Extensions (briefly) Soft-margin SVMs Multi-class SVMs Hinge loss SVM vs logistic regression SVMs with latent variables How to solve the SVM problem (next class)

Lines in R2 Let Kristen Grauman

Lines in R2 Let Kristen Grauman

Lines in R2 Let Kristen Grauman

Lines in R2 Let distance from point to line Kristen Grauman

Lines in R2 Let distance from point to line Kristen Grauman

Linear classifiers Find linear function to separate positive and negative examples Which line is best? C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Support vector machines Discriminative classifier based on optimal separating line (for 2d case) Maximize the margin between the positive and negative training examples C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Support vector machines Want line that maximizes the margin. wx+b=1 wx+b=0 wx+b=-1 For support, vectors, Support vectors Margin C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Support vector machines Want line that maximizes the margin. wx+b=1 wx+b=0 wx+b=-1 For support, vectors, Distance between point and line: For support vectors: Support vectors Margin C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Support vector machines Want line that maximizes the margin. wx+b=1 wx+b=0 wx+b=-1 For support, vectors, Distance between point and line: Therefore, the margin is 2 / ||w|| Support vectors Margin C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Finding the maximum margin line Maximize margin 2/||w|| Correctly classify all training data points: Quadratic optimization problem: Minimize Subject to yi(w·xi+b) ≥ 1 One constraint for each training point. Note sign trick. C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Finding the maximum margin line Solution: Learned weight Support vector C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Finding the maximum margin line Solution: b = yi – w·xi (for any support vector) Classification function: Notice that it relies on an inner product between the test point x and the support vectors xi (Solving the optimization problem also involves computing the inner products xi · xj between all pairs of training points) MORE DETAILS NEXT TIME If f(x) < 0, classify as negative, otherwise classify as positive. C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Inner product Adapted from Milos Hauskrecht

Plan for today Linear Support Vector Machines Non-linear SVMs and the “kernel trick” Extensions (briefly) Soft-margin SVMs Multi-class SVMs Hinge loss SVM vs logistic regression SVMs with latent variables How to solve the SVM problem (next class)

Nonlinear SVMs Datasets that are linearly separable work out great: But what if the dataset is just too hard? We can map it to a higher-dimensional space: x x x2 x Andrew Moore

Nonlinear SVMs General idea: the original input space can always be mapped to some higher-dimensional feature space where the training set is separable: Φ: x → φ(x) Andrew Moore

Nonlinear kernel: Example Consider the mapping x2 Svetlana Lazebnik

The “kernel trick” The linear classifier relies on dot product between vectors K(xi , xj) = xi · xj If every data point is mapped into high-dimensional space via some transformation Φ: xi → φ(xi ), the dot product becomes: K(xi , xj) = φ(xi ) · φ(xj) A kernel function is similarity function that corresponds to an inner product in some expanded feature space The kernel trick: instead of explicitly computing the lifting transformation φ(x), define a kernel function K such that: K(xi , xj) = φ(xi ) · φ(xj) Andrew Moore CS 376 Lecture 22

Examples of kernel functions Linear: Polynomials of degree up to d: Gaussian RBF: Histogram intersection: 𝐾( 𝑥 𝑖 , 𝑥 𝑗 )= ( 𝑥 𝑖 𝑇 𝑥 𝑗 +1) 𝑑 Andrew Moore / Carlos Guestrin CS 376 Lecture 22

The benefit of the “kernel trick” Example: Polynomial kernel for 2-dim features … lives in 6 dimensions

Is this function a kernel? Blaschko / Lampert

Constructing kernels Blaschko / Lampert

Using SVMs Define your representation for each example. Select a kernel function. Compute pairwise kernel values between labeled examples. Use this “kernel matrix” to solve for SVM support vectors & alpha weights. To classify a new example: compute kernel values between new input and support vectors, apply alpha weights, check sign of output. Adapted from Kristen Grauman CS 376 Lecture 22

Example: Learning gender w/ SVMs Moghaddam and Yang, Learning Gender with Support Faces, TPAMI 2002 Moghaddam and Yang, Face & Gesture 2000 Kristen Grauman CS 376 Lecture 22

Example: Learning gender w/ SVMs Support faces Kristen Grauman CS 376 Lecture 22

Example: Learning gender w/ SVMs SVMs performed better than humans, at either resolution Kristen Grauman CS 376 Lecture 22

Plan for today Linear Support Vector Machines Non-linear SVMs and the “kernel trick” Extensions (briefly) Soft-margin SVMs Multi-class SVMs Hinge loss SVM vs logistic regression SVMs with latent variables How to solve the SVM problem (next class)

Hard-margin SVMs The w that minimizes… Maximize margin

Soft-margin SVMs (allowing misclassifications) # data samples Misclassification cost Slack variable The w that minimizes… Maximize margin Minimize misclassification BOARD

Multi-class SVMs In practice, we obtain a multi-class SVM by combining two-class SVMs One vs. others Training: learn an SVM for each class vs. the others Testing: apply each SVM to the test example, and assign it to the class of the SVM that returns the highest decision value One vs. one Training: learn an SVM for each pair of classes Testing: each learned SVM “votes” for a class to assign to the test example There are also “natively multi-class” formulations Crammer and Singer, JMLR 2001 Svetlana Lazebnik / Carlos Guestrin

Hinge loss Let We have the objective to minimize where: Then we can define a loss: and unconstrained SVM objective:

Multi-class hinge loss We want: Minimize: Hinge loss:

SVMs vs logistic regression σ Adapted from Tommi Jaakola

SVMs vs logistic regression Adapted from Tommi Jaakola

SVMs with latent variables Adapted from S. Nowozin and C. Lampert

SVMs: Pros and cons Pros Cons Kernel-based framework is very powerful, flexible Often a sparse set of support vectors – compact at test time Work very well in practice, even with very small training sample sizes Solution can be formulated as a quadratic program (next time) Many publicly available SVM packages: e.g. LIBSVM, LIBLINEAR, SVMLight Cons Can be tricky to select best kernel function for a problem Computation, memory At training time, must compute kernel values for all example pairs Learning can take a very long time for large-scale problems Adapted from Lana Lazebnik CS 376 Lecture 22