CS 2750: Machine Learning Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh February 17, 2016.

Slides:

Advertisements

Similar presentations

Introduction to Support Vector Machines (SVM)

Advertisements

Lecture 9 Support Vector Machines

ECG Signal processing (2)

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Support Vector Machine & Its Applications Mingyue Tan The University of British Columbia Nov 26, 2004 A portion (1/3) of the slides are taken from Prof.

SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.

An Introduction of Support Vector Machine

Appearance-based recognition & detection II Kristen Grauman UT-Austin Tuesday, Nov 11.

An Introduction of Support Vector Machine

Support Vector Machines

Support vector machine

Machine learning continued Image source:

LOGO Classification IV Lecturer: Dr. Bo Yuan

Lecture 31: Modern object recognition

LPP-HOG: A New Local Image Descriptor for Fast Human Detection Andy Qing Jun Wang and Ru Bo Zhang IEEE International Symposium.

Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.

More sliding window detection: Discriminative part-based models Many slides based on P. FelzenszwalbP. Felzenszwalb.

Discriminative and generative methods for bags of features

Support Vector Machine

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Support Vector Machines (SVMs) Chapter 5 (Duda et al.)

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.

1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.

Support Vector Machines Kernel Machines

Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.

Support Vector Machines and Kernel Methods

Support Vector Machines

Discriminative and generative methods for bags of features

Generic object detection with deformable part-based models

An Introduction to Support Vector Machines Martin Law.

Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,

Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.

SVM by Sequential Minimal Optimization (SMO)

Classification 2: discriminative models

Support Vector Machine & Image Classification Applications

CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 24 – Classifiers 1.

Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

计算机学院计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知计算机学院 Perceptron Revisited: Linear Separators Binary classification.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.

Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

An Introduction to Support Vector Machines (M. Law)

Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

Biointelligence Laboratory, Seoul National University

CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.

Dec 21, 2006For ICDM Panel on 10 Best Algorithms Support Vector Machines: A Survey Qiang Yang, for ICDM 2006 Panel Partially.

Support Vector Machines Tao Department of computer science University of Illinois.

Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,

Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

CS 2750: Machine Learning Linear Regression Prof. Adriana Kovashka University of Pittsburgh February 10, 2016.

Support Vector Machines Optimization objective Machine Learning.

An Introduction of Support Vector Machine In part from of Jinwei Gu.

Roughly overview of Support vector machines Reference: 1.Support vector machines and machine learning on documents. Christopher D. Manning, Prabhakar Raghavan.

A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.

An Introduction of Support Vector Machine Courtesy of Jinwei Gu.

Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:

Support Vector Machine Slides from Andrew Moore and Mingyue Tan.

Neural networks and support vector machines

ECE 5424: Introduction to Machine Learning

Support Vector Machines

Object detection as supervised classification

CS 2770: Computer Vision Intro to Visual Recognition

CS 2750: Machine Learning Support Vector Machines

Presentation transcript:

CS 2750: Machine Learning Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh February 17, 2016

Announcement Homework 2 deadline is now 2/29 – We’ll have covered everything you need today or at the latest on Monday Project proposal due tonight on CourseWeb How many of you want me to print handouts for next time?

Plan for today Linear Support Vector Machines Non-linear SVMs and the “kernel trick” Soft-margin SVMs Example use of SVMs Advanced topics (very briefly) – Structured SVMs – Latent variables How to solve the SVM problem (next class)

Lines in R 2 Let Kristen Grauman

Lines in R 2 Let Kristen Grauman

Lines in R 2 Let Kristen Grauman

Lines in R 2 Let distance from point to line Kristen Grauman

Lines in R 2 Let distance from point to line Kristen Grauman

Linear classifiers Find linear function to separate positive and negative examples Which line is best? C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998A Tutorial on Support Vector Machines for Pattern Recognition

Support vector machines Discriminative classifier based on optimal separating line (for 2d case) Maximize the margin between the positive and negative training examples C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998A Tutorial on Support Vector Machines for Pattern Recognition

Support vector machines Want line that maximizes the margin. Margin Support vectors C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998A Tutorial on Support Vector Machines for Pattern Recognition For support, vectors, wx+b=-1 wx+b=0 wx+b=1

Support vector machines Want line that maximizes the margin. Support vectors For support, vectors, wx+b=-1 wx+b=0 wx+b=1 Distance between point and line: For support vectors: C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998A Tutorial on Support Vector Machines for Pattern Recognition Margin

Support vector machines Want line that maximizes the margin. Margin Support vectors For support, vectors, wx+b=-1 wx+b=0 wx+b=1 Distance between point and line: Therefore, the margin is 2 / ||w|| C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998A Tutorial on Support Vector Machines for Pattern Recognition

Finding the maximum margin line 1.Maximize margin 2/||w|| 2.Correctly classify all training data points: Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 One constraint for each training point. Note sign trick. C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998A Tutorial on Support Vector Machines for Pattern Recognition

Finding the maximum margin line Solution: Support vector Learned weight C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998A Tutorial on Support Vector Machines for Pattern Recognition

Finding the maximum margin line Solution: b = y i – w·x i (for any support vector) Classification function: Notice that it relies on an inner product between the test point x and the support vectors x i (Solving the optimization problem also involves computing the inner products x i · x j between all pairs of training points) If f(x) < 0, classify as negative, otherwise classify as positive. C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998A Tutorial on Support Vector Machines for Pattern Recognition MORE DETAILS NEXT TIME

Inner product Adapted from Milos Hauskrecht

Plan for today Linear Support Vector Machines Non-linear SVMs and the “kernel trick” Soft-margin SVMs Example use of SVMs Advanced topics (very briefly) – Structured SVMs – Latent variables How to solve the SVM problem (next class)

Datasets that are linearly separable work out great: But what if the dataset is just too hard? We can map it to a higher-dimensional space: 0x 0 x 0 x x2x2 Andrew Moore Nonlinear SVMs

Φ: x → φ(x) General idea: the original input space can always be mapped to some higher-dimensional feature space where the training set is separable: Andrew Moore Nonlinear SVMs

Nonlinear kernel: Example Consider the mapping x2x2 Svetlana Lazebnik

The “Kernel Trick” The linear classifier relies on dot product between vectors K(x i, x j ) = x i · x j If every data point is mapped into high-dimensional space via some transformation Φ: x i → φ(x i ), the dot product becomes: K(x i, x j ) = φ(x i ) · φ(x j ) A kernel function is similarity function that corresponds to an inner product in some expanded feature space The kernel trick: instead of explicitly computing the lifting transformation φ(x), define a kernel function K such that: K(x i, x j ) = φ(x i ) · φ(x j ) Andrew Moore

Examples of kernel functions Linear: Polynomials of degree up to d: Gaussian RBF: Histogram intersection: Andrew Moore / Carlos Guestrin

(C) Dhruv Batra24 Slide Credit: Blaschko & Lampert

(C) Dhruv Batra25 Slide Credit: Blaschko & Lampert

Plan for today Linear Support Vector Machines Non-linear SVMs and the “kernel trick” Soft-margin SVMs Example use of SVMs Advanced topics (very briefly) – Structured SVMs – Latent variables How to solve the SVM problem (next class)

Assuming data separable Maximize margin The w that minimizes…

Allowing misclassifications Maximize marginMinimize misclassification Slack variable The w that minimizes… Misclassification cost # data samples BOARD

What about multi-class SVMs? In practice, we obtain a multi-class SVM by combining two-class SVMs One vs. others –Training: learn an SVM for each class vs. the others –Testing: apply each SVM to the test example, and assign it to the class of the SVM that returns the highest decision value One vs. one –Training: learn an SVM for each pair of classes –Testing: each learned SVM “votes” for a class to assign to the test example There are also “natively multi-class” formulations –Crammer and Singer, JMLR 2001 Svetlana Lazebnik / Carlos Guestrin

SVMs for recognition 1.Define your representation for each example. 2.Select a kernel function. 3.Compute pairwise kernel values between labeled examples 4.Use this “kernel matrix” to solve for SVM support vectors & weights. 5.To classify a new example: compute kernel values between new input and support vectors, apply weights, check sign of output. Kristen Grauman

Example: learning gender with SVMs Moghaddam and Yang, Learning Gender with Support Faces, TPAMI Moghaddam and Yang, Face & Gesture Kristen Grauman

Support Faces Moghaddam and Yang, Learning Gender with Support Faces, TPAMI 2002.

Human vs. Machine SVMs performed better than any single human test subject, at either resolution Kristen Grauman

Plan for today Linear Support Vector Machines Non-linear SVMs and the “kernel trick” Soft-margin SVMs Example use of SVMs Advanced topics (very briefly) – Structured SVMs – Latent variables How to solve the SVM problem (next class)

Structured SVMs y is a vector Tsochantaridis et al., Large Margin Methods for Structured and Interdependent Output Variables, JMLR 2005.Large Margin Methods for Structured and Interdependent Output Variables

Latent Variables Adapted from S. Nowozin and C. Lampert

Application: Detecting objects (people) We can have a confident person prediction if the features in a person-sized window match a global template (root filter), the parts (e.g. head) match the part templates (filters), and the parts are not too far from where we’re used to seeing them (the lower the deformation weights, the better) P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object Detection with Discriminatively Trained Part Based Models, PAMI 32(9), 2010Object Detection with Discriminatively Trained Part Based Models Root filter Part filters Deformation weights Adapted from Lana Lazebnik

Scoring an object hypothesis The score of a hypothesis is the sum of appearance scores (features times weights) minus the sum of deformation costs Locations of parts are latent (unobserved) Appearance weights Subwindow features Deformation weights Displacements Adapted from Lana Lazebnik

Training with latent variables Our classifier has the form w are model parameters, z are latent hypotheses h is a feature representation that depends on part locations Latent SVM training: Initialize w and iterate: Fix w and find the best z for each training example Fix z and solve for w (standard SVM training) Lana Lazebnik

SVMs: Pros and cons Pros –Kernel-based framework is very powerful, flexible –Often a sparse set of support vectors – compact at test time –Work very well in practice, even with very small training sample sizes –Solution can be formulated as a quadratic program (next time) –Many publicly available SVM packages: e.g. LIBSVM, LIBLINEAR, SVMLight (or use built-in Matlab version but slower) Cons –Can be tricky to select best kernel function for a problem –Computation, memory At training time, must compute kernel values for all example pairs Learning can take a very long time for large-scale problems Adapted from Lana Lazebnik

SVM Solution Equations from Andrew Ng’s notes

SVM Solution Equations from Andrew Ng’s notes

Sequential Minimal Optimization Convergence: all α i ’s satisfy Karush-Kuhn-Tucker (KKT) conditions used to determine if at optimal solution Repeat until convergence: –Pick α i that violates the conditions –Pick another α j –Recompute new values for α i and α j Proposed by John Platt in 1998: “Fast Training of Support Vector Machines using Sequential Minimal Optimization” Further reading: – –