SVM—Support Vector Machines

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Lecture 9 Support Vector Machines
Support Vector Machine & Its Applications Abhishek Sharma Dept. of EEE BIT Mesra Aug 16, 2010 Course: Neural Network Professor: Dr. B.M. Karan Semester.
Support Vector Machine & Its Applications Mingyue Tan The University of British Columbia Nov 26, 2004 A portion (1/3) of the slides are taken from Prof.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines
Support Vector Machines Instructor Max Welling ICS273A UCIrvine.
CHAPTER 10: Linear Discrimination
An Introduction of Support Vector Machine
Support Vector Machines
Support vector machine
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Classification and Decision Boundaries
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Margins, support vectors, and linear programming Thanks to Terran Lane and S. Dreiseitl.
Support Vector Machines Kernel Machines
Support Vector Machines
CS 4700: Foundations of Artificial Intelligence
SVMs Finalized. Where we are Last time Support vector machines in grungy detail The SVM objective function and QP Today Last details on SVMs Putting it.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
SVMs Reprised Reading: Bishop, Sec 4.1.1, 6.0, 6.1, 7.0, 7.1.
Support Vector Machines
Lecture 10: Support Vector Machines
Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:
SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.
An Introduction to Support Vector Machines Martin Law.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?
Support Vector Machine & Image Classification Applications
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
An Introduction to Support Vector Machines (M. Law)
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
An Introduction to Support Vector Machine (SVM)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Support Vector Machines Tao Department of computer science University of Illinois.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
SVMs in a Nutshell.
Extending linear models by transformation (section 3.4 in text) (lectures 3&4 on amlbook.com)
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Support Vector Machine
PREDICT 422: Practical Machine Learning
Support Vector Machines
Kernels Usman Roshan.
Support Vector Machines Introduction to Data Mining, 2nd Edition by
COSC 4335: Other Classification Techniques
Usman Roshan CS 675 Machine Learning
COSC 4368 Machine Learning Organization
SVMs for Document Ranking
Presentation transcript:

SVM—Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training data into a higher dimension With the new dimension, it searches for the linear optimal separating hyperplane (i.e., “decision boundary”) With an appropriate nonlinear mapping to a sufficiently high dimension, data from two classes can always be separated by a hyperplane SVM finds this hyperplane using support vectors (“essential” training tuples) and margins (defined by the support vectors)

SVM—History and Applications Vapnik and colleagues (1992)—groundwork from Vapnik & Chervonenkis’ statistical learning theory in 1960s Features: training can be slow but accuracy is high owing to their ability to model complex nonlinear decision boundaries (margin maximization) Used both for classification and prediction Applications: handwritten digit recognition, object recognition, speaker identification, benchmarking time-series prediction tests

SVM—Linearly Separable A separating hyperplane can be written as W ● X + b = 0 where W={w1, w2, …, wn} is a weight vector and b a scalar (bias) For 2-D it can be written as w0 + w1 x1 + w2 x2 = 0 The hyperplane defining the sides of the margin: H1: w0 + w1 x1 + w2 x2 ≥ 1 for yi = +1, and H2: w0 + w1 x1 + w2 x2 ≤ – 1 for yi = –1 Any training tuples that fall on hyperplanes H1 or H2 (i.e., the sides defining the margin) are support vectors This becomes a constrained (convex) quadratic optimization problem: Quadratic objective function and linear constraints  Quadratic Programming (QP)  Lagrangian multipliers

Support vectors The support vectors define the maximum margin hyperplane! All other instances can be deleted without changing its position and orientation This means the hyperplane can be written as

Finding support vectors Support vector: training instance for which i > 0 Determine i and b ?— A constrained quadratic optimization problem Off-the-shelf tools for solving these problems However, special-purpose algorithms are faster Example: Platt’s sequential minimal optimization algorithm (implemented in WEKA) Note: all this assumes separable data!

Extending linear classification Linear classifiers can’t model nonlinear class boundaries Simple trick: Map attributes into new space consisting of combinations of attribute values E.g.: all products of n factors that can be constructed from the attributes Example with two attributes and n = 3:

Nonlinear SVMs “Pseudo attributes” represent attribute combinations Overfitting not a problem because the maximum margin hyperplane is stable There are usually few support vectors relative to the size of the training set Computation time still an issue Each time the dot product is computed, all the “pseudo attributes” must be included

A mathematical trick Avoid computing the “pseudo attributes”! Compute the dot product before doing the nonlinear mapping Example: for compute Corresponds to a map into the instance space spanned by all products of n attributes

Other kernel functions Mapping is called a “kernel function” Polynomial kernel We can use others: Only requirement: Examples:

Problems with this approach 1st problem: speed 10 attributes, and n = 5  >2000 coefficients Use linear regression with attribute selection Run time is cubic in number of attributes 2nd problem: overfitting Number of coefficients is large relative to the number of training instances Curse of dimensionality kicks in

Sparse data SVM algorithms speed up dramatically if the data is sparse (i.e. many values are 0) Why? Because they compute lots and lots of dot products Sparse data  compute dot products very efficiently Iterate only over non-zero values SVMs can process sparse datasets with 10,000s of attributes

Applications Machine vision: e.g face identification Outperforms alternative approaches (1.5% error) Handwritten digit recognition: USPS data Comparable to best alternative (0.8% error) Bioinformatics: e.g. prediction of protein secondary structure Text classifiation Can modify SVM technique for numeric prediction problems