Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.

Similar presentations


Presentation on theme: "Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on."— Presentation transcript:

1 Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on SV Regression 3.http://en.wikipedia.org/wiki/Kernel_trickhttp://en.wikipedia.org/wiki/Kernel_trick

2 2 Likelihood- vs. Discriminant- based Classification Likelihood-based: Assume a model for p(x|C i ), use Bayes’ rule to calculate P(C i |x) g i (x) = log P(C i |x) Discriminant-based: Assume a model for g i (x| Φ i ); no density estimation Prototype-based: Make classification decisions based on nearest prototypes without constructing decision boundaries (kNN, kMeans approach) Estimating the boundaries is enough; no need to accurately estimate the densities/probability inside the boundaries; we are just interested in learning decision boundaries (lines for which the densities of two classes is the same), and many popular classification techniques learn decision boundaries without explicitly constructing density functions. Eick: Support Vector Machines: The Main Ideas

3 Support Vector Machines SVMs use a single hyperplane; one Possible Solution Eick: Support Vector Machines: The Main Ideas http://en.wikipedia.org/wiki/Hyperplane

4 Support Vector Machines Another possible solution Eick: Support Vector Machines: The Main Ideas

5 Support Vector Machines Other possible solutions Eick: Support Vector Machines: The Main Ideas

6 Support Vector Machines Which one is better? B1 or B2? How do you define better? Eick: Support Vector Machines: The Main Ideas

7 Support Vector Machines Find a hyperplane maximizing the margin => B1 is better than B2 Eick: Support Vector Machines: The Main Ideas

8 8 Key Properties of Support Vector Machines 1. Use a single hyperplane which subdivides the space into two half- spaces, one which is occupied by Class1 and the other by Class2 2. They maximize the margin of the decision boundary using quadratic optimization techniques which find the optimal hyperplane. 3. When used in practice, SVM approaches frequently map (using  ) the examples to a higher dimensional space and find margin maximal hyperplanes in the mapped space, obtaining decision boundaries which are not hyperplanes in the original space. 4. Moreover, versions of SVMs exist that can be used when linear separability cannot be accomplished. Eick: Support Vector Machines: The Main Ideas

9 Support Vector Machines Examples are: (x 1,..,x n,y) with y  {-1,1} Eick: Support Vector Machines: The Main Ideas L2 Norm: http://en.wikipedia.org/wiki/L2_norm#Euclidean_norm http://en.wikipedia.org/wiki/L2_norm#Euclidean_norm Dot-Product: http://en.wikipedia.org/wiki/Dot_producthttp://en.wikipedia.org/wiki/Dot_product

10 Support Vector Machines We want to maximize:  Which is equivalent to minimizing:  But subjected to the following N constraints:  This is a constrained convex quadratic optimization problem that can be solved in polynominal time Numerical approaches to solve it (e.g., quadratic programming) exist The function to be optimized has only a single minimum  no local minimum problem Eick: Support Vector Machines: The Main Ideas Dot-Product: http://en.wikipedia.org/wiki/Dot_producthttp://en.wikipedia.org/wiki/Dot_product

11 Support Vector Machines What if the problem is not linearly separable? Eick: Support Vector Machines: The Main Ideas

12 Linear SVM for Non-linearly Separable Problems What if the problem is not linearly separable?  Introduce slack variables  Need to minimize:  Subject to (i=1,..,N):  C is chosen using a validation set trying to keep the margins wide while keeping the training error low. Measures prediction error Inverse size of margin between hyperplanes Parameter Slack variable allows constraint violation to a certain degree Eick: Support Vector Machines: The Main Ideas No kernel

13 Nonlinear Support Vector Machines What if decision boundary is not linear? Alternative 1: Use technique that Employs non-linear decision boundaries Non-linear function Eick: Support Vector Machines: The Main Ideas

14 Nonlinear Support Vector Machines 1. Transform data into higher dimensional space 2. Find the best hyperplane using the methods introduced earlier Alternative 2: Transform into a higher dimensional attribute space and find linear decision boundaries in this space Eick: Support Vector Machines: The Main Ideas

15 Nonlinear Support Vector Machines 1. Choose a non-linear function  to transform into a different, usually higher dimensional, attribute space 2. Minimize  but subjected to the following N constraints: Find a good hyperplane in the transformed space Eick: Support Vector Machines: The Main Ideas Remark: The Soft Margin SVM can be generalized similarly.

16 Example: Polynomial Kernel Function Polynomial Kernel Function:  (x1,x2)=(x1 2,x2 2,sqrt(2)*x1,sqrt(2)*x2,1) K(u,v)=  (u)  (v)= (u  v + 1) 2 A Support Vector Machine with polynomial kernel function classifies a new example z as follows: sign((  i y i  (x i )  (z))+b) = sign((  i y i  (x i  z +1) 2 ))+b) Remark: i and b are determined using the methods for linear SVMs that were discussed earlier Kernel function trick: perform computations in the original space, although we solve an optimization problem in the transformed space  more efficient; more details  Topic14.

17 Other Material on SVMs http://www.youtube.com/watch?v=27RQRUR7Ubc Support Vector Machines in Rapid Miner http://stackoverflow.com/questions/1072097/pointers- to-some-good-svm-tutorial http://stackoverflow.com/questions/1072097/pointers- to-some-good-svm-tutorial http://www.csie.ntu.edu.tw/~cjlin/libsvm/ http://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html Adaboost/SVM Relationship Lecture: http://videolectures.net/mlss05us_rudin_da/ http://videolectures.net/mlss05us_rudin_da/ Eick: Support Vector Machines: The Main Ideas

18 Summary Support Vector Machines Support vector machines learn hyperplanes that separate two classes maximizing the margin between them (the empty space between the instances of the two classes). Support vector machines introduce slack variables—in the case that classes are not linear separable—trying to maximize margins while keeping the training error low. The most popular versions of SVMs use non-linear kernel functions and map the attribute space into a higher dimensional space to facilitate finding “good” linear decision boundaries in the modified space. Support vector machines find “margin optimal” hyperplanes by solving a convex quadratic optimization problem. However, this optimization process is quite slow and support vector machines tend to fail if the number of examples goes beyond 500/5000/50000… In general, support vector machines accomplish quite high accuracies, if compared to other techniques. In the last 10 years, support vector machines have been generalized for other tasks such as regression, PCA, outlier detection,… Eick: Support Vector Machines: The Main Ideas

19 19 Kernels—What can they do for you? Some machine learning/statistical problems only depend on the dot- product of the objects in the dataset O={x 1,..,x n } and not on other characteristics of the objects in the dataset; in other words, those techniques only depend on the gram matrix of O which stores x 1  x 1, x 1  x 2,…x n  x n (http://en.wikipedia.org/wiki/Gramian_matrix). (http://en.wikipedia.org/wiki/Gramian_matrix These techniques can be generalized by mapping the dataset into a higher dimensional space as long as the non-linear mapping  can be kernelized; that is, a kernel function K can be found such that: K(u,v)=  (u)  (v) In this case the results are computed in the mapped space based on K(x 1,x 1), K(x 1,x 2 ),…,K(x n,x n ) which is called the kernel trick: http://en.wikipedia.org/wiki/Kernel_trick http://en.wikipedia.org/wiki/Kernel_trick Kernels have been successfully used to generalize PCA, K-means, support vector machines, and many other techniques, allowing them to use non-linear coordinate systems, more complex decision boundaries, or more complex cluster boundaries. We will revisit kernels later discussing transparencies 13-25, 30-35 of the Vasconcelos lecture. Eick: Support Vector Machines: The Main Ideas


Download ppt "Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on."

Similar presentations


Ads by Google