Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Introduction to Support Vector Machine (SVM)

Similar presentations


Presentation on theme: "An Introduction to Support Vector Machine (SVM)"— Presentation transcript:

1 An Introduction to Support Vector Machine (SVM)

2 Famous Examples that helped SVM become popular

3

4 Classification Everyday, all the time we classify things.
Eg crossing the street: Is there a car coming? At what speed? How far is it to the other side? Classification: Safe to walk or not!!!

5 Discriminant Function
It can be arbitrary functions of x, such as: Nearest Neighbor Decision Tree Linear Functions Nonlinear Functions

6 Background – Classification Problem
Applications: Personal Identification Credit Rating Medical Diagnosis Text Categorization Denial of Service Detection Character recognition Biometrics Image classification

7 Classification Formulation
Given an input space a set of classes ={ } the Classification Problem is to define a mapping f: g where each x in  is assigned to one class This mapping function is called a Decision Function

8 Decision Function The basic problem in classification problem is to find c decision functions with the property that, if a pattern x belongs to class i, then is some similarity measure between x and class i, such as distance or probability concept

9 Decision Function Example d1=d3 Class 1 d2,d3<d1 Class 3

10 Single Classifier Most popular single classifiers:
Minimum Distance Classifier Bayes Classifier K-Nearest Neighbor Decision Tree Neural Network Support Vector Machine

11 SVM Support Vector Machines (SVM) The one with largest margin!!
(Separable case) Which is the best separation hyperplane? The one with largest margin!!

12 Linearly Separable Classes

13 Support Vector Machine
Basically a 2-class classifier developed by Vapnik and Chervonenkis (1992) Which line is optimal?

14 large margin provides better generalization ability
Support Vector Machines (SVM) large margin provides better generalization ability Maximizing Margin: Correct Separation:

15 Why named “Support Vector Machine”?
Support Vectors

16 Support Vector Machine
Training vectors : xi , i=1….n Consider a simple case with two classes : Define a vector y yi = 1 if xi in class 1 = -1 if xi in class 2 A hyperplane which separates all data r ρ Separating plane Margin Class 1 Class 2 Support Vector (Class 1) Support Vector (Class 2)

17 2.8 SVM

18 Linear Separable SVM Label the training data
Suppose we have some hyperplanes which separates the “+” from “-” examples (a separating hyperplane) x which lie on the hyperplane, satisfy w is noraml to hyperplane, |b|/||w|| is the perpendicular distance from hyperplane to origin

19 Linear Separable SVM Margin = distance between H1 and H2 = 2/||w||
Define two support hyperplane as H1:wTx = b +δ and H2:wTx = b –δ To solve over-parameterized problem, set δ=1 Define the distance as Margin = distance between H1 and H2 = 2/||w||

20 The Primal problem of SVM
Goal: Find a separating hyperplane with largest margin. A SVM is to find w and b that satisfy (1) minimize ||w||/2 = wTw/2 (2) yi(xi·w+b)-1 ≥ 0 Switch the above problem to a Lagrangian formulation for two reason (1) easier to handle by transforming into quadratic eq. (2) training data only appear in form of dot products between vectors => can be generalized to nonlinear case

21 Langrange Muliplier Method
a method to find the extremum of a multivariate function f(x1,x2,…xn) subject to the constraint g(x1,x2,…xn) = 0 For an extremum of f to exist on g, the gradient of f must line up with the gradient of g . for all k = 1, ...,n , where the constant λis called the Lagrange multiplier The Lagrangian transformation of the problem is

22 Langrange Muliplier Method
To have , we need to find the gradient of L with respect to w and b. (1) (2) Substitute them into Lagrangian form, we have a dual problem Inner product form => Can be generalize to nonlinear case by applying kernel

23 KKT Conditions w is determined by training procedure.
Since the problems for SVM is convex, the KKT conditions are necessary and sufficient for w, b and α to be a solution. w is determined by training procedure. b is easily found by using KKT complementary conditions, by choosing any i for which αi≠ 0 Complementary slackness

24 2.8 SVM What about non-linear boundary?

25 Non-Linear Separable SVM : Kernal
To extend to non-linear case, we need to the data to some other Euclidean space.

26 Kernal Φ is a mapping function.
Since the training algorithm only depend on data thru dot products. We can use a “kernal function” K such that One commonly used example is radial based function (RBF) A RBF is a real-valued function whose value depends only on the distance from the origin, so that Φ(x)= Φ(||x||) ; or alternatively on the distance from some other point c, called a center, so that Φ(x,c)= Φ(||x-c||).

27 Non-separable SVM Real world application usually have no OSH. We need to add an error term ζ. => To give penalty to error term, define New Lagrangian form is

28 Non-separable SVM New KKT Conditions


Download ppt "An Introduction to Support Vector Machine (SVM)"

Similar presentations


Ads by Google