Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

Similar presentations


Presentation on theme: "1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in."— Presentation transcript:

1 1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ

2 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in statistical learning theory. Based on recent advances in statistical learning theory. Use a hypothesis space of linear functions, Use a hypothesis space of linear functions, High dimensional feature space, High dimensional feature space, Optimisation theory, Optimisation theory, Statistical learning theory. Statistical learning theory.

3 3 Features of SVM Invented by Vapnik. Invented by Vapnik. Simple, geometric, and always trained to find global optimum. Simple, geometric, and always trained to find global optimum. Used for pattern recognition, regression, and linear operator inversion. Used for pattern recognition, regression, and linear operator inversion. Considered too slow at the beginning. Considered too slow at the beginning. Now for most application, this problem is overcome. Now for most application, this problem is overcome.

4 4 Features of SVM(Cont’d) Based on simple idea. Based on simple idea. High performance in practical applications. High performance in practical applications. Can deal with complex nonlinear problems. Can deal with complex nonlinear problems. But working with a simple linear algorithm. But working with a simple linear algorithm.

5 5 The main idea of SVMs : The main idea of SVMs : Finding Optimal hyperplane for linearly separable patterns ! Finding Optimal hyperplane for linearly separable patterns ! Extend to patterns that are not linearly separable ! Extend to patterns that are not linearly separable !

6 6 Separating Line (or hyperplane) Goal: Find the best line (or hyperplane) to separate the training data. How to formalize? Goal: Find the best line (or hyperplane) to separate the training data. How to formalize? In two dimensions, equation of the line is given by: ●In two dimensions, equation of the line is given by: Class 1 Class -1 ●Better notation for n dimensions:

7 7 Simple Classifier The Simple Classifier: The Simple Classifier: Points that fall on the right are classified as “1” Points that fall on the right are classified as “1” Points that fall on the left are classified as “-1” Points that fall on the left are classified as “-1” Using the training set, find a hyperplane (line) so that Using the training set, find a hyperplane (line) so that w is a weight vector. x is input vector. b is bias. How can we improve this simple classifier ? How can we improve this simple classifier ?

8 8 Finding the Best Plane Which of the following two planes are better ? Which of the following two planes are better ? Class 1 Class -1 The green plane is the better choice, since it is more likely to do well on future test data. The green plane is the better choice, since it is more likely to do well on future test data.

9 9 Separating the planes Construct the bounding planes: Construct the bounding planes: Draw two parallel planes to the classification plane. Draw two parallel planes to the classification plane. Push them as far apart as possible, until they hit data points. Push them as far apart as possible, until they hit data points. The classification plane with bounding planes furthest apart is the best one. The classification plane with bounding planes furthest apart is the best one. Class 1 Class -1

10 10 Finding the Best Plane(Cont’d) All points in class 1 should be to the right of bounding plane 1. All points in class 1 should be to the right of bounding plane 1. All points in class -1 should be to the left of bounding plane -1. All points in class -1 should be to the left of bounding plane -1. y i is +1 or -1 depending on the classification. Then the above two inequalities can be written as one. y i is +1 or -1 depending on the classification. Then the above two inequalities can be written as one. The distance between bounding planes should be maximized. The distance between bounding planes should be maximized.

11 11 The Optimization Problem Mathematical techniques to find hyperplanes optimizing measures.(maximize distance). Mathematical techniques to find hyperplanes optimizing measures.(maximize distance). This is a mathematical program. This is a mathematical program. Optimization problem subject to constraints. Optimization problem subject to constraints. More specifically, this is a quadratic program. More specifically, this is a quadratic program. There are high powered software tools for solving this kind of problem (both commercial and academic) There are high powered software tools for solving this kind of problem (both commercial and academic)

12 12 Data Which is Not Linearly Separable What if a separating plane does not exist? What if a separating plane does not exist? Class 1 Class -1 error Find the plane that maximizes the margin and minimizes the errors on the training points. Find the plane that maximizes the margin and minimizes the errors on the training points. Take original inequality and add a slack variable to measure error: Take original inequality and add a slack variable to measure error:

13 13 The Support Vector Machine Push the planes apart and minimize the error at the same time: Push the planes apart and minimize the error at the same time: such that C is a positive number that is chosen to balance these two goals. C is a positive number that is chosen to balance these two goals. This problem is called a Support Vector Machine, or SVM. This problem is called a Support Vector Machine, or SVM. The SVM is one of many techniques for doing supervised machine learning. The SVM is one of many techniques for doing supervised machine learning. Others: Neural networks, decision trees, k-nearest neighbor Others: Neural networks, decision trees, k-nearest neighbor

14 14 Terminology Those points that touch the bounding plane, or lie on the wrong side, are called support vectors. Those points that touch the bounding plane, or lie on the wrong side, are called support vectors. If all the support vectors were removed, the solution would be the same. If all the support vectors were removed, the solution would be the same. They are the most difficult to classify. They are the most difficult to classify.

15 15 What about nonlinear surfaces? Some datasets may not be best separated by a plane. First Idea :(Simple and effective) Map each data point into a higher dimensional space, and find a linear fit there. Map each data point into a higher dimensional space, and find a linear fit there. Finding Quadratic solution. Finding Quadratic solution. If dimensionality of space is high, lots of calculations. Problem: If dimensionality of space is high, lots of calculations.

16 16 Solution Nonlinear surfaces can be used without these problems through the use of a kernel function. Nonlinear surfaces can be used without these problems through the use of a kernel function. The kernel function specifies a similarity measure between two vectors. The kernel function specifies a similarity measure between two vectors.

17 17 Solution(Cont’d) The only way in which the data appears in the training problem is in the form of dot products x i  x j. The only way in which the data appears in the training problem is in the form of dot products x i  x j. First map the data to some other (possibly infinite dimensional) space H using a mapping . First map the data to some other (possibly infinite dimensional) space H using a mapping . Training algorithm now only depends on data through dot products in H :  (x i )  (x j ) Training algorithm now only depends on data through dot products in H :  (x i )  (x j ) If there is a kernel function K such that If there is a kernel function K such that K(x i,x j )=  (x i )  (x j ) K(x i,x j )=  (x i )  (x j ) we would only need to use K in the training algorithm and would never need to know  explicitly.

18 18 SVM Applications. Pattern Recognition : Pattern Recognition : handwriting recognition handwriting recognition 3D object recognition 3D object recognition speaker identification speaker identification face detection face detection text categorization text categorization bio-informatics bio-informatics Regression estimation. Regression estimation. Density estimation. Density estimation. More… More…

19 19 Conclusions SVM assure that good performance in a variety of applications such as Pattern Recognition, regression estimation, time series prediction etc. SVM assure that good performance in a variety of applications such as Pattern Recognition, regression estimation, time series prediction etc. Some open issues, Some open issues, Considered too slow at the beginning. Now this problem is solved. Considered too slow at the beginning. Now this problem is solved. The choice of kernel function : there are no guidelines. The choice of kernel function : there are no guidelines. In most cases, SVM generalizes better than other competing methods(Holds the record for lowest handwriting recog. error rate, 0.56%). In most cases, SVM generalizes better than other competing methods(Holds the record for lowest handwriting recog. error rate, 0.56%).

20 20 References Cristianini, N. and B. Shawe-Taylor, J. “An Inroduction to Support Vector Machines and other kernel-based learning methods”, 2000. www.support-vector.net Burges, J. C. “A tutorial on support vector machines for pattern recognition,” Data Mining and Knowledge Discovery, 1998.


Download ppt "1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in."

Similar presentations


Ads by Google