Support Vector Machines

Name: Support Vector Machines
Uploaded: 2017-08-18T10:47:45+00:00
Duration: PTM9S42
Channel: Edmund Bennett
Description: Support Vector Machines

Support Vector Machines
Lecturer: Yishay Mansour Itay Kirshenbaum

Lecture Overview In this lecture we present in detail one of the most theoretically well motivated and practically most eﬀective classiﬁcation algorithms in modern machine learning: Support Vector Machines (SVMs).

Lecture Overview – Cont.
We begin with building the intuition behind SVMs continue to deﬁne SVM as an optimization problem and discuss how to eﬃciently solve it. We conclude with an analysis of the error rate of SVMs using two techniques: Leave One Out and VC-dimension.

Introduction Support Vector Machine is a supervised learning algorithm
Used to learn a hyperplane that can solve the binary classiﬁcation problem Among the most extensively studied problems in machine learning.

Binary Classification Problem
Input space: Output space: Training data: S drawn i.i.d with distribution D Goal: Select hypothesis that best predicts other points drawn i.i.d from D

Binary Classification – Cont.
Consider the problem of predicting the success of a new drug based on a patient height and weight m ill people are selected and treated This generates m 2d vectors (height and weight) Each point is assigned +1 to indicate successful treatment or -1 otherwise This can be used as training data

Binary classification – Cont.
Infinitely many ways to classify Occam’s razor – simple classification rules provide better results Linear classifier or hyperplane Our class of linear classifiers:

Choosing a Good Hyperplane
Intuition Consider two cases of positive classification: w*x + b = 0.1 w*x + b = 100 More confident in the decision made by the latter rather than the former Choose a hyperplane with maximal margin

Good Hyperplane – Cont. Definition: Functional margin S
A linear classifier:

Maximal Margin w,b can be scaled to increase margin
sign(w*x + b) = sign(5w*x + 5b) for all x (5w, 5b) is 5 times greater than (w,b) Cope by adding an additional constraint: ||w|| = 1

Maximal Margin – Cont. Geometric Margin
Consider the geometric distance between the hyperplane and the closest points

Geometric Margin Definition: Geometric margin S
Relation to functional margin Both are equal when Definition:

The Algorithm We saw: Two definitions of the margin Intuition behind seeking a maximizing hyperplane Goal: Write an optimization program that finds such a hyperplan We always look for (w,b) maximizing the margin

The Algorithm – Take 1 First try: Idea
Maximize For each sample the Functional margin is at least Functional and geometric margin are the same as Largest possible geometric margin with respect to the training set

The Algorithm – Take 2 The first try can’t be solved by any off-the-shelf optimization software The constraint is non-linear In fact, it’s even non-convex How can we discard the constraint? Use geometric margin!

The Algorithm – Take 3 We now have a non-convex objective function – The problem remains Remember We can scale (w,b) as we wish Force the functional margin to be 1 Objective function: Same as: Factor of 0.5 and power of 2 do not change the program – Make things easier

The algorithm – Final version
The final program: The objective is convex (quadratic) All constraints are linear Can solve efficiently using standard quadratic programing (QP) software

Convex Optimization We want to solve the optimization problem more efficiently than generic QP Solution – Use convex optimization techniques

Convex Optimization – Cont.
Definition: A convex function Theorem

Convex Optimization Problem
We look for a value of Minimizes Under the constraint

Lagrange Multipliers Used to find a maxima or a minima of a function subject to constraints Use to solve out optimization problem Definition

Primal Program Plan Definition – Primal Program
Use the Lagrangian to write a program called the Primal Program Equal to f(x) is all the constraints are met Otherwise – Definition – Primal Program

Primal Progam – Cont. The constraints are of the form If they are met
is maximized when all are 0, and the summation is 0 Otherwise is maximized for

Primal Progam – Cont. Our convex optimization problem is now:
Define as the value of the primal program

Dual Program We define the Dual Program as: We’ll look at
Same as our primal program Order of min / max is different Define the value of our Dual Program

Dual Program – Cont. We want to show Start with Now on to
If we find a solution to one problem, we find the solution to the second problem Start with “max min” is always less then “min max” Now on to

Dual Program – Cont. Claim Proof Conclude

Karush-Kuhn-Tucker (KKT) conditions
KKT conditions derive a characterization of an optimal solution to a convex problem. Theorem

KKT Conditions – Cont. Proof The other direction holds as well

KKT Conditions – Cont. Example
Consider the following optimization problem: We have The Lagragian will be

Optimal Margin Classifier
Back to SVM Rewrite our optimization program Following the KKT conditions Only for points in the training set with a margin of exactly 1 These are the support vectors of the training set

Optimal Margin – Cont. Optimal margin classifier and its
support vectors

Optimal Margin – Cont. Construct the Lagragian Find the dual form
First minimize to get Do so by setting the derivatives to zero

Optimal Margin – Cont. Take the derivative with respect to
Use in the Lagrangian We saw the last tem is zero

Optimal Margin – Cont. The dual optimization problem
The KKT conditions hold Can solve by finding that maximize Assuming we have – define The solution to the primal problem

Optimal Margin – Cont. Still need to find Assume is a support vector
We get

Error Analysis Using Leave-One-Out
The Leave-One-Out (LOO) method Remove one point at a time from the training set Calculate an SVM for the remaining points Test our result using the removed point Definition The indicator function I(exp) is 1 if exp is true, otherwise 0

LOO Error Analysis – Cont.
Expected error It follows the expected error of LOO for a training set of size m is the same as for a training set of size m-1

LOO Error Analysis – Cont.
Theorem Proof

Generalization Bounds Using VC-dimension
Theorem Proof

Generalization Bounds Using VC-dimension – Cont.
Proof – Cont.

Support Vector Machines

Similar presentations

Presentation on theme: "Support Vector Machines"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Support Vector Machines

Similar presentations

Presentation on theme: "Support Vector Machines"— Presentation transcript:

Similar presentations

About project

Feedback