SVM – Support Vector Machines Presented By: Bella Specktor.

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Support Vector Machines
Lecture 9 Support Vector Machines
ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machine & Its Applications Abhishek Sharma Dept. of EEE BIT Mesra Aug 16, 2010 Course: Neural Network Professor: Dr. B.M. Karan Semester.
Support Vector Machine & Its Applications Mingyue Tan The University of British Columbia Nov 26, 2004 A portion (1/3) of the slides are taken from Prof.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines

An Introduction of Support Vector Machine
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
Support Vector Machines
Machine learning continued Image source:
Discriminative and generative methods for bags of features
Support Vector Machine
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines and Kernel Methods
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support Vector Machines Kernel Machines
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
Support Vector Machines and Kernel Methods
Support Vector Machines
CS 4700: Foundations of Artificial Intelligence
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
2806 Neural Computation Support Vector Machines Lecture Ari Visa.
Lecture 10: Support Vector Machines
An Introduction to Support Vector Machines Martin Law.
Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Support Vector Machine & Image Classification Applications
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
An Introduction to Support Vector Machine (SVM)
CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Support Vector Machines Tao Department of computer science University of Illinois.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Introduction to Machine Learning Prof. Nir Ailon Lecture 5: Support Vector Machines (SVM)
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
CS 9633 Machine Learning Support Vector Machines
PREDICT 422: Practical Machine Learning
Geometrical intuition behind the dual problem
Support Vector Machines
An Introduction to Support Vector Machines
An Introduction to Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
CS 2750: Machine Learning Support Vector Machines
CSSE463: Image Recognition Day 14
Support Vector Machines and Kernels
SVMs for Document Ranking
Presentation transcript:

SVM – Support Vector Machines Presented By: Bella Specktor

Lecture Topics: ► Motivation to SVM ► SVM – Algorithm Description ► SVM Applications

Motivation - Learning Our task is to detect and exploit complex patterns in data. For this, we should use learning algorithms. We would like to use algorithm that is able to make generalizations, but not over generalization. MotivationSVM Alg. ApplicationsReferences Neural Networks can be used for this task

Linear Classification Our purpose is to find and b. MotivationSVM Alg. ApplicationsReferences Suppose we have linear separable data, and we want to classify it into 2 classes. We will label the training data Linear separation of the input space is done by the function: Any of this hyperplanes would be fine for the separation. Which one to choose?

Perceptron Algorithm If MotivationSVM Alg. ApplicationsReferences What about the non-linear case?

Neural Networks MotivationSVM Alg. ApplicationsReferences We can use advanced networks architecture with multiple layers. But… Some of them are having many local minima. Need to find how many neurons are needed. Sometimes we get many solutions.

SVM - History MotivationSVM Alg. ApplicationsReferences  Said to start in 1979 with Vladimir Vapnik’s paper.  Major developmets throughout 1990’s : introduced in 1992 by Boser, Guyon & Vapnik.  Centrelized web site:  Has been applied to diverse problems very successfully in the last years.

MotivationSVM Alg. ApplicationsReferences The SVM Algorithm Margin of a linear classifier is the width that the boundary could be increased by before hitting a datapoint. SVM would choose the Maximal Margin, Where distance to the closest negative example = Distance to the closest positive example.

Why Maximum Margin? Better empirical performance. Even if we have small error in the location of the boundary, we have least chance of misclassification. Avoiding local minima. MotivationSVM Alg. ApplicationsReferences

VC (Vapnik-Chervinenkis) dimensions and Structural Risk Minimization MotivationSVM Alg. ApplicationsReferences VC dimension of model f is the maximal data point set cardinality that can be shattered by f. For Example: Set of points P is said to be shattered by F if for any subset of points there exists such that f can separate P perfectly.

MotivationSVM Alg. ApplicationsReferences The bound on the test error of the classification model is given by: (Vapnik, 1995, “Structural Minimization Principle”) Intuitively, functions with high VC dimension represent many dichotomies for a given data set.

MotivationSVM Alg. ApplicationsReferences VC dimensions and Structural Risk Minimization A function that minimizes the empirical risk and has low VC dimension will generalize well regardless of the dimensionality of the input space (structural risk minimization). Vapnik has shown that maximizing the margin of separation between classes is equivalent to minimizing the VC dimension.

MotivationSVM Alg. ApplicationsReferences Support Vectors are the points closest to the separating hiperplane.. Those are critical points whose removal would change the solution found. Optimal hyperplane is completely defined by the support vectors.

MotivationSVM Alg. ApplicationsReferences Let be an example closest to the boundary. Set Thus, support vectors lie in the hyperplanes: Notice that:

MotivationSVM Alg. ApplicationsReferences

The Margin is. Thus, we will get the widest Margin by minimizing. But how to do it? MotivationSVM Alg. ApplicationsReferences For this purpose, we will switch to Dual Representation and use Quadratic Programming. Convert the problem to: minimize Subject to constraint:

In convex problem, R is positive semidefinite. In this case, QP has global minimizer MotivationSVM Alg. ApplicationsReferences

MotivationSVM Alg. ApplicationsReferences Our problem is: Subject to constraint: Introduce Lagrange multipliers associated with the constraints. The solution to the primal problem is equivalent to determining the saddle point of the function:

can be optimized by quadratic programming. formulated in terms of, but depends on w and b. MotivationSVM Alg. ApplicationsReferences

 b can be determined by the optimal and condition: MotivationSVM Alg. ApplicationsReferences implies:  For every sample i, one of the following must hold: Samples with are Support Vectors Many sparse solution.

Test Phase: determine on which side of the decision boundary a given test pattern lies and assign the corresponding label: MotivationSVM Alg. ApplicationsReferences

Soft Margin Classifier In real world problem it is not likely to get an exactly separate line dividing the data. We might have a curved decision boundary. Exact linear separation may not be desirable if the data has noise in it. MotivationSVM Alg. ApplicationsReferences Smoothing boundary. We want that

is the upper bound on the number of training errors. Thus, in order to control error rate, we would minimize also, when a larger C is corresponding to assigning higher penalty for errors. The new QP: MotivationSVM Alg. ApplicationsReferences Constrains: Where : Define:

Non Linear SVM Limitations of Linear SVM:  Doesn’t work on non linear separable data.  Noise problem. But… the advantage is that it deals with vectorial data. MotivationSVM Alg. ApplicationsReferences We saw earlier that we can use Neural Networks, but it has many limitations. What should we do?

MotivationSVM Alg. ApplicationsReferences Let’s look at the following example: We would like to map the samples so that they would be linearly separable. If we will lift to two dimensional space with we will get:

MotivationSVM Alg. ApplicationsReferences So, possible Solution can be: map data into a richer feature space (usually called Hilbert’s space) including non linear features, and than use linear classifier. But…  There is a computational problem.  There is a generalization problem.

Solution: using kernels Remember we used dual representation, and hence data appears only in the form of dot products. MotivationSVM Alg. ApplicationsReferences Kernel is a function that returns the value of the dot product between the images of two arguments. Thus, we can replace dot products with Kernels.

MotivationSVM Alg. ApplicationsReferences Now, rather than making inner product on the new, larger vectors, we represent dot product of the data after doing non linear mapping on them. For Kernel we would only need to use K in the training algorithm, and would never need to explicitly even know what is. The Kernel Matrix:

MotivationSVM Alg. ApplicationsReferences

Mercer’s Condition Which functions can serve as Kernels? Every (semi) positive definite symmetric function is a Kernel, i.e. there exist a mapping such that it is possible to write: MotivationSVM Alg. ApplicationsReferences

Different Kernel Functions 1.Polynomial: where p is degree of the polinomial. 2. Gaussian Radial Basis Function: 3. Two layer sigmoidal NN: MotivationSVM Alg. ApplicationsReferences

Multi-Class Classification Two basic strategies: 1.One Versus All: Q SVMs are trained, and each of the SVMs separates a single class from all the others. Classification is done by “winner takes all strategy”, in which the classifier with the highest output function assigns the class. MotivationSVM Alg. ApplicationsReferences

MotivationSVM Alg. ApplicationsReferences Multi-Class Classification 2.Pairwise: Q(Q-1) machines are trained, each SVM separates a pair of classes. The classification is done by “max-wins” voting strategy, in which the class with most votes determines the instance classification.  First is preferable in terms of training complexity.  Experiments didn’t show big performance differences between the two.

Summary - SVM Algorithm for pattern classification 1.Start with data x1,…,xn which lives in feature space of dimension d. 2.Implicitly define the feature space by choosing a Kernel. 3.Find the largest margin linear discriminant function in the higher dimensional space by using quadratic programming package to solve: MotivationSVM Alg. ApplicationsReferences

Strength and Weaknesses of SVM Strength: 1.Training is relatively easy. 2.No local minima (unlike in NN). 3.Scales relatively well to high dimensional data. 4.Trade-of between complexity and error can be controlled explicitly. MotivationSVM Alg. ApplicationsReferences Major Weakness: Need for a good Kernel function.

Pattern Recognition: - Object Recognition - Handwriting recognition - Speecker Identification - Text Categorization - Face Recognition Motivation What Is SVM useful for? SVM Alg. ApplicationsReferences Regression Estimation

Face Recognition with SVM - Global Versus Component Approach Bernd Heisle, Purdi Ho & Tomaso Pogio MotivationSVM Alg. ApplicationsReferences  One-versus all strategy was used.  Linear SVM for each person in the dataset.  Each SVM was trained to distinguish between all images of a single person and all other images. Global Approach – basic algorithm:

MotivationSVM Alg. ApplicationsReferences Face Recognition with SVM - Global Versus Component Approach

MotivationSVM Alg. ApplicationsReferences  Given a set of q people (a set of q SVMs), the class label y of a face pattern x is computed as follows: Face Recognition with SVM - Global Versus Component Approach where Let |d| be the distance from x to hyperplane:  The gray values of the face picture were converted to feature vector.

MotivationSVM Alg. ApplicationsReferences Face Recognition with SVM - Global Versus Component Approach Global Approach – improved algorithm:  Variation of this algorithm was using second degree polynomial SVM (SVM with second degree polynomial Kernel).

MotivationSVM Alg. ApplicationsReferences Face Recognition with SVM - Global Versus Component Approach Component-based algorithm:  In the detection phase, facial components were detected.

MotivationSVM Alg. ApplicationsReferences  Than, final detection was made by combining the results of the component classifiers. Each of the componenets was normalized in size their gray levels were combined into a single feature vector:  Again, one versus all Linear SVM was used. Face Recognition with SVM - Global Versus Component Approach

MotivationSVM Alg. ApplicationsReferences The Component-Based algorithm showed much better results than the Global approach. Face Recognition with SVM - Global Versus Component Approach - Results

MotivationSVM Alg. ApplicationsReferences  B. Heisele, P. Ho & T. Poggio. Face Recognition With Support Vector Machines. Computer Vision and Image understanding, vol. 91, no 1-2, pp ,  C.J.C Burges, A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery. Vol 2(1),  J.P Lewis. A Short SVM (Support Vector Machines) Tutorial.

MotivationSVM Alg. ApplicationsReferences  Prof. Bebis. Support Vector Machines. Pattern Recognition Course Spring 2006 Lecture Slides.  Prof. A.W Moore. Support Vector Machines Lecture Slides.  R. Osadchy. Support Vector Machines Lecture Slides.  Youtube – Facial Expressions Recognition References