Presentation is loading. Please wait.

Presentation is loading. Please wait.

SVM – Support Vector Machines Presented By: Bella Specktor.

Similar presentations


Presentation on theme: "SVM – Support Vector Machines Presented By: Bella Specktor."— Presentation transcript:

1 SVM – Support Vector Machines Presented By: Bella Specktor

2 Lecture Topics: ► Motivation to SVM ► SVM – Algorithm Description ► SVM Applications

3 Motivation - Learning Our task is to detect and exploit complex patterns in data. For this, we should use learning algorithms. We would like to use algorithm that is able to make generalizations, but not over generalization. MotivationSVM Alg. ApplicationsReferences Neural Networks can be used for this task

4 Linear Classification Our purpose is to find and b. MotivationSVM Alg. ApplicationsReferences Suppose we have linear separable data, and we want to classify it into 2 classes. We will label the training data Linear separation of the input space is done by the function: Any of this hyperplanes would be fine for the separation. Which one to choose?

5 Perceptron Algorithm If MotivationSVM Alg. ApplicationsReferences What about the non-linear case?

6 Neural Networks MotivationSVM Alg. ApplicationsReferences We can use advanced networks architecture with multiple layers. But… Some of them are having many local minima. Need to find how many neurons are needed. Sometimes we get many solutions.

7 SVM - History MotivationSVM Alg. ApplicationsReferences  Said to start in 1979 with Vladimir Vapnik’s paper.  Major developmets throughout 1990’s : introduced in 1992 by Boser, Guyon & Vapnik.  Centrelized web site: www.kernel-machines.org  Has been applied to diverse problems very successfully in the last 10-15 years.

8 MotivationSVM Alg. ApplicationsReferences The SVM Algorithm Margin of a linear classifier is the width that the boundary could be increased by before hitting a datapoint. SVM would choose the Maximal Margin, Where distance to the closest negative example = Distance to the closest positive example.

9 Why Maximum Margin? Better empirical performance. Even if we have small error in the location of the boundary, we have least chance of misclassification. Avoiding local minima. MotivationSVM Alg. ApplicationsReferences

10 VC (Vapnik-Chervinenkis) dimensions and Structural Risk Minimization MotivationSVM Alg. ApplicationsReferences VC dimension of model f is the maximal data point set cardinality that can be shattered by f. For Example: Set of points P is said to be shattered by F if for any subset of points there exists such that f can separate P perfectly.

11 MotivationSVM Alg. ApplicationsReferences The bound on the test error of the classification model is given by: (Vapnik, 1995, “Structural Minimization Principle”) Intuitively, functions with high VC dimension represent many dichotomies for a given data set.

12 MotivationSVM Alg. ApplicationsReferences VC dimensions and Structural Risk Minimization A function that minimizes the empirical risk and has low VC dimension will generalize well regardless of the dimensionality of the input space (structural risk minimization). Vapnik has shown that maximizing the margin of separation between classes is equivalent to minimizing the VC dimension.

13 MotivationSVM Alg. ApplicationsReferences Support Vectors are the points closest to the separating hiperplane.. Those are critical points whose removal would change the solution found. Optimal hyperplane is completely defined by the support vectors.

14 MotivationSVM Alg. ApplicationsReferences Let be an example closest to the boundary. Set Thus, support vectors lie in the hyperplanes: Notice that:

15 MotivationSVM Alg. ApplicationsReferences

16 The Margin is. Thus, we will get the widest Margin by minimizing. But how to do it? MotivationSVM Alg. ApplicationsReferences For this purpose, we will switch to Dual Representation and use Quadratic Programming. Convert the problem to: minimize Subject to constraint:

17 In convex problem, R is positive semidefinite. In this case, QP has global minimizer MotivationSVM Alg. ApplicationsReferences

18 MotivationSVM Alg. ApplicationsReferences Our problem is: Subject to constraint: Introduce Lagrange multipliers associated with the constraints. The solution to the primal problem is equivalent to determining the saddle point of the function:

19 can be optimized by quadratic programming. formulated in terms of, but depends on w and b. MotivationSVM Alg. ApplicationsReferences

20  b can be determined by the optimal and condition: MotivationSVM Alg. ApplicationsReferences implies:  For every sample i, one of the following must hold: Samples with are Support Vectors Many sparse solution.

21 Test Phase: determine on which side of the decision boundary a given test pattern lies and assign the corresponding label: MotivationSVM Alg. ApplicationsReferences

22 Soft Margin Classifier In real world problem it is not likely to get an exactly separate line dividing the data. We might have a curved decision boundary. Exact linear separation may not be desirable if the data has noise in it. MotivationSVM Alg. ApplicationsReferences Smoothing boundary. We want that

23 is the upper bound on the number of training errors. Thus, in order to control error rate, we would minimize also, when a larger C is corresponding to assigning higher penalty for errors. The new QP: MotivationSVM Alg. ApplicationsReferences Constrains: Where : Define:

24 Non Linear SVM Limitations of Linear SVM:  Doesn’t work on non linear separable data.  Noise problem. But… the advantage is that it deals with vectorial data. MotivationSVM Alg. ApplicationsReferences We saw earlier that we can use Neural Networks, but it has many limitations. What should we do?

25 MotivationSVM Alg. ApplicationsReferences Let’s look at the following example: We would like to map the samples so that they would be linearly separable. If we will lift to two dimensional space with we will get:

26 MotivationSVM Alg. ApplicationsReferences So, possible Solution can be: map data into a richer feature space (usually called Hilbert’s space) including non linear features, and than use linear classifier. But…  There is a computational problem.  There is a generalization problem.

27 Solution: using kernels Remember we used dual representation, and hence data appears only in the form of dot products. MotivationSVM Alg. ApplicationsReferences Kernel is a function that returns the value of the dot product between the images of two arguments. Thus, we can replace dot products with Kernels.

28 MotivationSVM Alg. ApplicationsReferences Now, rather than making inner product on the new, larger vectors, we represent dot product of the data after doing non linear mapping on them. For Kernel we would only need to use K in the training algorithm, and would never need to explicitly even know what is. The Kernel Matrix:

29 MotivationSVM Alg. ApplicationsReferences

30 Mercer’s Condition Which functions can serve as Kernels? Every (semi) positive definite symmetric function is a Kernel, i.e. there exist a mapping such that it is possible to write: MotivationSVM Alg. ApplicationsReferences

31 Different Kernel Functions 1.Polynomial: where p is degree of the polinomial. 2. Gaussian Radial Basis Function: 3. Two layer sigmoidal NN: MotivationSVM Alg. ApplicationsReferences

32 Multi-Class Classification Two basic strategies: 1.One Versus All: Q SVMs are trained, and each of the SVMs separates a single class from all the others. Classification is done by “winner takes all strategy”, in which the classifier with the highest output function assigns the class. MotivationSVM Alg. ApplicationsReferences

33 MotivationSVM Alg. ApplicationsReferences Multi-Class Classification 2.Pairwise: Q(Q-1) machines are trained, each SVM separates a pair of classes. The classification is done by “max-wins” voting strategy, in which the class with most votes determines the instance classification.  First is preferable in terms of training complexity.  Experiments didn’t show big performance differences between the two.

34 Summary - SVM Algorithm for pattern classification 1.Start with data x1,…,xn which lives in feature space of dimension d. 2.Implicitly define the feature space by choosing a Kernel. 3.Find the largest margin linear discriminant function in the higher dimensional space by using quadratic programming package to solve: MotivationSVM Alg. ApplicationsReferences

35 Strength and Weaknesses of SVM Strength: 1.Training is relatively easy. 2.No local minima (unlike in NN). 3.Scales relatively well to high dimensional data. 4.Trade-of between complexity and error can be controlled explicitly. MotivationSVM Alg. ApplicationsReferences Major Weakness: Need for a good Kernel function.

36 Pattern Recognition: - Object Recognition - Handwriting recognition - Speecker Identification - Text Categorization - Face Recognition Motivation What Is SVM useful for? SVM Alg. ApplicationsReferences Regression Estimation

37 Face Recognition with SVM - Global Versus Component Approach Bernd Heisle, Purdi Ho & Tomaso Pogio MotivationSVM Alg. ApplicationsReferences  One-versus all strategy was used.  Linear SVM for each person in the dataset.  Each SVM was trained to distinguish between all images of a single person and all other images. Global Approach – basic algorithm:

38 MotivationSVM Alg. ApplicationsReferences Face Recognition with SVM - Global Versus Component Approach

39 MotivationSVM Alg. ApplicationsReferences  Given a set of q people (a set of q SVMs), the class label y of a face pattern x is computed as follows: Face Recognition with SVM - Global Versus Component Approach where Let |d| be the distance from x to hyperplane:  The gray values of the face picture were converted to feature vector.

40 MotivationSVM Alg. ApplicationsReferences Face Recognition with SVM - Global Versus Component Approach Global Approach – improved algorithm:  Variation of this algorithm was using second degree polynomial SVM (SVM with second degree polynomial Kernel).

41 MotivationSVM Alg. ApplicationsReferences Face Recognition with SVM - Global Versus Component Approach Component-based algorithm:  In the detection phase, facial components were detected.

42 MotivationSVM Alg. ApplicationsReferences  Than, final detection was made by combining the results of the component classifiers. Each of the componenets was normalized in size their gray levels were combined into a single feature vector:  Again, one versus all Linear SVM was used. Face Recognition with SVM - Global Versus Component Approach

43 MotivationSVM Alg. ApplicationsReferences The Component-Based algorithm showed much better results than the Global approach. Face Recognition with SVM - Global Versus Component Approach - Results

44 MotivationSVM Alg. ApplicationsReferences  B. Heisele, P. Ho & T. Poggio. Face Recognition With Support Vector Machines. Computer Vision and Image understanding, vol. 91, no 1-2, pp. 6- 21, 2003.  C.J.C Burges, A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery. Vol 2(1), 121-167.  J.P Lewis. A Short SVM (Support Vector Machines) Tutorial.

45 MotivationSVM Alg. ApplicationsReferences  Prof. Bebis. Support Vector Machines. Pattern Recognition Course Spring 2006 Lecture Slides.  Prof. A.W Moore. Support Vector Machines. 2003 Lecture Slides.  R. Osadchy. Support Vector Machines. 2008 Lecture Slides.  Youtube – Facial Expressions Recognition http://www.youtube.com/watch?v=iPFg52yOZzY References

46


Download ppt "SVM – Support Vector Machines Presented By: Bella Specktor."

Similar presentations


Ads by Google