1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines 2 (SVMs)

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

CSC321: Introduction to Neural Networks and Machine Learning Lecture 24: Non-linear Support Vector Machines Geoffrey Hinton.
Lecture 9 Support Vector Machines
ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machine & Its Applications Mingyue Tan The University of British Columbia Nov 26, 2004 A portion (1/3) of the slides are taken from Prof.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine

CHAPTER 10: Linear Discrimination
An Introduction of Support Vector Machine
Support Vector Machines
SVM—Support Vector Machines
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Machine learning continued Image source:
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
©2012 Paula Matuszek GATE information based on ©2012 Paula Matuszek.
Classification and Decision Boundaries
Support Vector Machine
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support Vector Machines
CS 4700: Foundations of Artificial Intelligence
SVMs Reprised Reading: Bishop, Sec 4.1.1, 6.0, 6.1, 7.0, 7.1.
Lecture 10: Support Vector Machines
An Introduction to Support Vector Machines Martin Law.
Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Support Vector Machine & Image Classification Applications
Copyright © 2001, Andrew W. Moore Support Vector Machines Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University.
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
1 CSC 8520 Spring Paula Matuszek CS 8520: Artificial Intelligence Machine Learning 2 Paula Matuszek Spring, 2013.
An Introduction to Support Vector Machines (M. Law)
1 CMSC 671 Fall 2010 Class #24 – Wednesday, November 24.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
CS 478 – Tools for Machine Learning and Data Mining SVM.
1 CSC 4510, Spring © Paula Matuszek CSC 4510 Support Vector Machines (SVMs)
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
SVMs in a Nutshell.
Learning by Loss Minimization. Machine learning: Learn a Function from Examples Function: Examples: – Supervised: – Unsupervised: – Semisuprvised:
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Support Vector Machine Slides from Andrew Moore and Mingyue Tan.
Support Vector Machines
PREDICT 422: Practical Machine Learning
An Introduction to Support Vector Machines
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Support Vector Machines
Support Vector Machines
COSC 4335: Other Classification Techniques
CS 485: Special Topics in Data Mining Jinze Liu
Class #212 – Thursday, November 12
COSC 4368 Machine Learning Organization
SVMs for Document Ranking
Presentation transcript:

1 CSC 4510, Spring © Paula Matuszek CSC 4510 Support Vector Machines 2 (SVMs)

2 CSC 4510, Spring © Paula Matuszek So What’s an SVM? A Support Vector Machine (SVM) is a classifier –It uses features of instances to decide which class each instance belongs to It is a supervised machine-learning classifier –Training cases are used to calculate parameters for a model which can then be applied to new instances to make a decision It is a binary classifier –it distinguishes between two classes For the squirrel vs bird, Grandis used size, a histogram of pixels, and a measure of texture as the features

3 CSC 4510, Spring © Paula Matuszek Basic Idea Underlying SVMs Find a line, or a plane, or a hyperplane, that separates our classes cleanly. –This is the same concept as we have seen in regression. By finding the greatest margin separating them –This is not the same concept as we have seen in regression. What does it mean?

4 CSC 4510, Spring © Paula Matuszek Soft Margins Intuitively, it still looks like we can make a decent separation here. –Can’t make a clean margin –But can almost do so, if we allow some errors We introduce slack variables, which measure the degree of misclassification A soft margin is one which lets us make some errors, in order to get a wider margin Tradeoff between wide margin and classification errors

5 CSC 4510, Spring © Paula Matuszek Non-Linearly-Separable Data Suppose we can’t do a good linear separation of our data? As with regression, allowing non-linearity will give us much better modeling of many data sets. In SVMs, we do this by using a kernel. A kernel is a function which maps our data into a higher-order order feature space where we can find a separating hyperplane

6 CSC 4510, Spring © Paula Matuszek Kernels for SVMs As we saw in Orange, we always specify a kernel for an SVM Linear is simplest, but seldom a good match to the data Other common ones are –polynomial –RBF (Gaussian Radial Basis Function)

Borrowed heavily from Andrew tutorials::

Borrowed heavily from Andrew tutorials::

9 CSC 4510, Spring © Paula Matuszek What If We Want Three Classes? Suppose our task involves more than two classes, such as for the IRIS data set? Reduce multiple class problem to multiple binary class problems. –one-versus-all, N-1 classifiers winner takes all –one-versus-one, N(N-1)/2 classifiers max-wins voter Directed Acyclic Graph (DAGSVM) Orange will do this automatically if there are more than two classes.

Borrowed heavily from Andrew tutorials:: Linear Classifiers f x  y est denotes +1 denotes -1 f ( x, w,b) = sign( w. x - b) How would you classify this data?

Borrowed heavily from Andrew tutorials:: Maximum Margin f x  y est denotes +1 denotes -1 f ( x, w,b) = sign( w. x - b) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Support Vectors are those datapoints that the margin pushes up against Linear SVM

Borrowed heavily from Andrew tutorials:: Estimate the Margin What is the distance expression for a point x to a line wx +b= 0? denotes +1 denotes -1 x wx +b = 0

Borrowed heavily from Andrew tutorials:: Estimate the Margin What is the expression for margin? denotes +1 denotes -1 wx +b = 0 Margin

Borrowed heavily from Andrew tutorials:: Maximize Margin denotes +1 denotes -1 wx +b = 0 Margin

15 CSC 4510, Spring © Paula Matuszek How Do We Solve It? Gradient Descent? search the space of w and b for largest margin that classifies all points correctly Better: our equation is in a form which can be solved by quadratic programming, –well-understood set of algorithms for optimizing a function –uses only the dot products of pairs of points –weights will be 0 except for points support vectors

16 CSC 4510, Spring © Paula Matuszek Back to Kernels Data which are not linearly separable Can generally be separated cleanly if we transform it into a higher dimensions So we want a new function over the data points that lets us transform them

Borrowed heavily from Andrew tutorials:: Hard 1-dimensional dataset What can be done about this? x=0

Borrowed heavily from Andrew tutorials:: Hard 1-dimensional dataset x=0

Borrowed heavily from Andrew tutorials:: Harder 1-dimensional dataset x=0

Borrowed heavily from Andrew tutorials:: SVM Kernel Functions K( a, b )=( a. b +1) d is an example of an SVM Kernel Function Beyond polynomials there are other very high dimensional basis functions that can be made practical by finding the right Kernel Function Most common: Radial-Basis-style Kernel Function:

21 CSC 4510, Spring © Paula Matuszek Kernel Trick We don’t actually have to compute the complete higher-order function In the QP equation we only use the dot product So we replace it with a kernel function This means we can work with much higher dimensions without getting hopeless performance The kernel trick in SVMs refers to all of this: using a kernel function instead of the dot product to give us separation of non-linear data without impossible performance cost.

22 CSC 4510, Spring © Paula Matuszek Back to Orange

23 CSC 4510, Spring © Paula Matuszek SVM Type C-SVM: This is the soft margin SVM we discussed, where C is the cost of an error. The higher the value of C the closer the fit to the data, and the narrower the margin ν - SVM: An alternate approach to noisy data which measures how the proportion of support vectors which can be either in the margin or misclassified. The larger we make ν the more errors we can make and the larger the margin can be

24 CSC 4510, Spring © Paula Matuszek Kernel Parameters For a linear “kernel” we specify cost or complexity For more complex kernels there’s more. Some of them: –For polynomial, d is degree, and controls how complex we allow the match to be –For RBF, g is the width, which controls how steep the transformation curve is

25 CSC 4510, Spring © Paula Matuszek Setting the SVM Parameters SVMs are quite sensitive to their parameters There are not good a priori rules for setting them. The usual recommendation is “try some and see what works in your domain” Widget has an “automatic parameter search” ; generally a good idea. Generally, a C-SVM and and RBF kernel, with data normalized and parameters set automatically, will give good results.

26 CSC 4510, Spring © Paula Matuszek How We Automate: Grid Search To automate the parameter selection, we basically try sets of parameters, increasing the value exponentially –For instance, we might try C at 1, 2, 4, 8, 16, 32 –Calculate accuracy for each value and choose best Can do a two-pass, coarse then fine: –C = 1, for first pass –Can do cross-validation, learning repeatedly on subsets of the data

27 CSC 4510, Spring © Paula Matuszek What Do We Do With SVMs Popular because successful in a wide variety of domains. Some examples: –Medicine: Detecting breast cancer. (EARLY DETECTION OF BREAST CANCER USING SVM CLASSIFIER TECHNIQUE Y.Ireaneus Anna Rejani, Dr.S.Thamarai Selvi, International Journal on Computer Science and Engineering Vol.1(3), 2009, ) –Natural Language Processing: Text classification. –Psychology: Decoding Cognitive States From MRI data. (Decoding Cognitive States from fMRI Data Using Support Vector Regression, Maria Grazia Di Bono, Marco Zorzi. PsychNology Journal, 2008, Volume 6, Number 2, 189 – 201)

28 CSC 4510, Spring © Paula Matuszek Summary SVMs are a form of supervised classifier The basic SVM is binary and linear, but there are non-linear and multi-class extensions “One key insight and one neat trick” 1 –key insight: maximum margin separator –neat trick: kernel trick A good method to try first if you have no knowledge about the domain Applicable in a wide variety of domains 1 Artificial Intelligence, a Modern Approach, third edition, Russell and Norvig, 2010, p, 744