An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.

An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung

Outline Background Linear Separable SVM Lagrange Multiplier Method Karush-Kuhn-Tucker (KKT) Conditions Non-linear SVM: Kernel Non-Separable SVM libsvm

Background – Classification Problem The goal of classification is to organize and categorize data into distinct classes  A model is first created based on the previous data (training samples)  This model is then used to classify new data (unseen samples) A sample is characterized by a set of features Classification is essentially finding the best boundary between classes

Background – Classification Problem Applications:  Personal Identification  Credit Rating  Medical Diagnosis  Text Categorization  Denial of Service Detection  Character recognition  Biometrics  Image classification

Classification Formulation Given  an input space  a set of classes ={ } the Classification Problem is  to define a mapping f:  where each x in  is assigned to one class This mapping function is called a Decision Function

Decision Function The basic problem in classification problem is to find c decision functions with the property that, if a pattern x belongs to class i, then is some similarity measure between x and class i, such as distance or probability concept

Decision Function Example d 3 =d 2 d 1 =d 3 d 1 =d 2 d 1,d 3 <d 2 Class 1 Class 3 Class 2 d 2,d 3 <d 1 d 1,d 2 <d 3

Single Classifier Most popular single classifiers:  Minimum Distance Classifier  Bayes Classifier  K-Nearest Neighbor  Decision Tree  Neural Network  Support Vector Machine

Minimum Distance Classifier Simplest approach to selection of decision boundaries Each class is represented by a prototype (or mean) vector: where = the number of pattern vectors from A new unlabelled sample is assigned to a class whose prototype is closest to the sample

Bayes Classifier Bayes rule is the same for each class, therefore Assign x to class j if for all i

Bayes Classifier The following information must be known:  The probability density functions of the patterns in each class  The probability of occurrence of each class Training samples may be used to obtain estimations on these probability functions Samples assumed to follow a known distribution pattern

K-Nearest Neighbor K-Nearest Neighbor Rule (k-NNR)  Examine the labels of the k-nearest samples and classify by using a majority voting scheme 0246810 2 4 6 8 (7, 3) 1NN 3NN 5NN 7NN 9NN

Decision Tree The decision boundaries are hyper-planes parallel to the feature-axis A sequential classification procedure may be developed by considering successive partitions of R

Decision Trees Example

Neural Network A Neural Network generally maps a set of inputs to a set of outputs Number of inputs/outputs vary The network itself is composed of an arbitrary number of nodes with an arbitrary topology It is an universal approximator Node Connection

Neural Network A popular NN is the feed forward neural network  E.g. Multi-layer Perceptron (MLP) Radial-Based Function (RBF) Learning algorithm: back propagation Weights of nodes are adjusted based on how well the current weights match an objective

Support Vector Machine Basically a 2-class classifier developed by Vapnik and Chervonenkis (1992) Which line is optimal?

Support Vector Machine Training vectors : x i, i=1….n Consider a simple case with two classes : Define a vector y y i = 1 if x i in class 1 = -1 if x i in class 2 A hyperplane which separates all data r ρ Separating plane Margin Class 1 Class 2 Support Vector (Class 1) Support Vector (Class 2)

Linear Separable SVM Label the training data Suppose we have some hyperplanes which separates the “+” from “-” examples (a separating hyperplane) x which lie on the hyperplane, satisfy w is noraml to hyperplane, |b|/||w|| is the perpendicular distance from hyperplane to origin

Linear Separable SVM Define two support hyperplane as H1:w T x = b +δ and H2:w T x = b –δ To solve over-parameterized problem, set δ=1 Define the distance between OSH and two support hyperplanes as Margin = distance between H1 and H2 = 2/||w||

The Primal problem of SVM Goal: Find a separating hyperplane with largest margin. A SVM is to find w and b that satisfy (1) minimize ||w||/2 = w T w/2 (2) y i (x i ·w+b)-1 ≥ 0 Switch the above problem to a Lagrangian formulation for two reason (1) easier to handle by transforming into quadratic eq. (2) training data only appear in form of dot products between vectors => can be generalized to nonlinear case

Langrange Muliplier Method a method to find the extremum of a multivariate function f(x 1,x 2,…x n ) subject to the constraint g(x 1,x 2,…x n ) = 0 For an extremum of f to exist on g, the gradient of f must line up with the gradient of g. for all k = 1,...,n, where the constant λis called the Lagrange multiplier The Lagrangian transformation of the problem is

Langrange Muliplier Method To have, we need to find the gradient of L with respect to w and b. (1) (2) Substitute them into Lagrangian form, we have a dual problem Inner product form => Can be generalize to nonlinear case by applying kernel

KKT Conditions Since the problems for SVM is convex, the KKT conditions are necessary and sufficient for w, b and αto be a solution. w is determinded by training procedure. b is easily found by using KKT complementary conditions, by choosing any i for which α i ≠ 0 Complementary slackness

Non-Linear Separable SVM : Kernal To extend to non-linear case, we need to the data to some other Euclidean space.

Kernal Φ is a mapping function. Since the training algorithm only depend on data thru dot products. We can use a “kernal function” K such that One commonly used example is radial based function (RBF) A RBF is a real-valued function whose value depends only on the distance from the origin, so that Φ(x)= Φ(||x||) ; or alternatively on the distance from some other point c, called a center, so that Φ(x,c)= Φ(||x-c||).

Non-separable SVM Real world application usually have no OSH. We need to add an error term ζ. => To give penalty to error term, define New Lagrangian form is

Non-separable SVM New KKT Conditions

An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.

Similar presentations

Presentation on theme: "An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.

Similar presentations

Presentation on theme: "An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung."— Presentation transcript:

Similar presentations

About project

Feedback