Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Support Vector Machine
ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines

An Introduction of Support Vector Machine
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
Support Vector Machines
1 Lecture 5 Support Vector Machines Large-margin linear classifier Non-separable case The Kernel trick.
Support vector machine
Machine learning continued Image source:
Groundwater 3D Geological Modeling: Solving as Classification Problem with Support Vector Machine A. Smirnoff, E. Boisvert, S. J.Paradis Earth Sciences.
Support Vector Machines
Support Vector Machines (and Kernel Methods in general)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Support Vector Machines
CS 4700: Foundations of Artificial Intelligence
A Kernel-based Support Vector Machine by Peter Axelberg and Johan Löfhede.
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
Lecture 10: Support Vector Machines
SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.
An Introduction to Support Vector Machines Martin Law.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
SVM by Sequential Minimal Optimization (SMO)
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University.
CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
CpSc 810: Machine Learning Support Vector Machine.
SVMs in a Nutshell.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
An Introduction of Support Vector Machine In part from of Jinwei Gu.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Support Vector Machines
PREDICT 422: Practical Machine Learning
Geometrical intuition behind the dual problem
An Introduction to Support Vector Machines
An Introduction to Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 14
Support Vector Machines and Kernels
CSSE463: Image Recognition Day 14
SVMs for Document Ranking
Support Vector Machines 2
Presentation transcript:

Support Vector Machines Mei-Chen Yeh 04/20/2010

The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined categories. Example: Image classification

Starting from the simplest setting Two-class Samples are linearly separable Class 1 Class 2 Hyperplane g(x) = w T x + w 0 = 0 How many classifiers we may have to separate the data? infinite! weight vectorthreshold > 0 < 0

Formulation Given training data: (x i, y i ), i = 1, 2, …, N, – x i : feature vector – y i : label Learn a hyper-plane which separates all data – variables: w and w 0 Testing: decision function f(x) = sign(w T x + w 0 ) – x: test data

Class 1 Class 2 H1 H2 H3 Hyperplanes H1, H2, and H3 are candidate classifiers. Which one is preferred? Why?

Choose the one with large margin! Class 1 Class 2 Class 1 Class 2

Class 1 Class 2 w T x + w 0 = 0 scale w, w 0 so that margin? w T x + w 0 = δ w T x + w 0 = -δ 1

Formulation Compute w, w 0 so that to: Side information:

Formulation The problem is equal to the optimization task: w can be recovered by Classification rule: – Assign x to ω 1 (ω 2 ) if Lagrange multipliers

Remarks Just some λ are not zeros. x i with non-zero λ are called support vectors. The hyperplane is determined only by the support vectors. The cost function is in the form of inner products. – does not depend explicitly on the dimensionality of the input space! Class 1 Class 2

Non-separable Classes Class 1 Class 2 Allow training errors! Previous constraint: y i (w T x i + w 0 ) ≥ 1 Introduce errors: y i (w T x i + w 0 ) ≥ 1- ξ i ξ i > 1 0 < ξ i ≤ 1 others, ξ i = 0

Formulation Compute w, w 0 so that to: penalty parameter

Formulation The dual problem:

Non-linear Case Linear separable in other spaces? Idea: map the feature vector to higher dimensional space

Non-linear Case Example:  ( )  (.)  ( )

Problems – High computation burden – Hard to get a good estimate

Kernel Trick Recall that in the dual problem, w can be recovered by g(x) = w T x + w 0 = All we need here is the inner product of (transformed) feature vectors!

Kernel Trick Decision function Kernel function – K(x i, x j ) =  (x i )   (x j )

Example kernel The inner product can be directly computed without going through the mapping  (.)

Remarks In practice, we specify K, thereby specifying  (.) indirectly, instead of choosing  (.) Intuitively, K(x, y) represents the similarity between data x and y K(x, y) needs to satisfy the Mercer condition in order for  (.) to exist

Examples of Kernel Functions Polynomial kernel with degree d Radial basis function kernel with width  Sigmoid with parameter  and 

Pros and Cons Strengths – Training is relatively easy – It scales relatively well to high dimensional data – Tradeoff between classifier complexity and error can be controlled explicitly Weaknesses – No practical method for the best selection of the kernel function – Binary classification alone Binary classification alone

Combing SVM binary classifiers for multi-class problem (1) M-category classification (ω 1, ω 2, …, ω M ) Two popular approaches 1.One-against-all (ω i, M-1 others) M classifiers Choose the one with the largest output Example: 5 categories Winner: ω1

Combing SVM binary classifiers for multi-class problem (2) 2.Pair-wise coupling (ω i, ω j ) M(M-1)/2 classifiers Aggregate the outputs Example: 5 categories svm outputs decision Voting! 1: 4 2: 1 3: 3 4: 0 5: 2 Winner: ω1

Data normalization The features may have different ranges. Example: We use weight (w) and height (h) for classifying male and female college students. – male: avg.(w) = kg, avg.(h) = cm – female: avg.(w) = kg, avg.(h) = cm Different scales!

Data normalization “Data pre-processing” Equalize scales among different features – Zero mean and unit variance Zero mean and unit variance – Two cases in practice (0, 1) if all feature values are positive (-1, 1) if feature values may be positive or negative

Data normalization x ik : feature k, sample i, Mean and variance Normalization back

Assignment #4 Develop a SVM classifier using either – OpenCV, or – LIBSVM ( Use “training.txt” to train your classifier, and evaluate the performance “test.txt” Write a 1-page report that summarizes how you implement your classifier, and the classification accuracy rate.

Final project announcement Please prepare a short (<5 minutes) presentation on what you’re going to develop for the final project.