Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Machine Learning Weak 4 Lecture 2

Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour and only about the hand in If you nailed it you can stay home

Support Vector Machines Last Time Today

Functional Margins For each point we define the functional margin Define the functional margin of the hyperplane, e.g. the parameters w,b as “functional distance”

Geometric Margin xixi How far is x i from the hyperplane? How long is segment from x i to L? Hyperplane L w Since L on hyperplane Definition of L Multiply in Solve

Margins functional and Geometrical w Related by ||w||

Optimizing Margins Maximize Subject To Geometric Margin Point Margins Scale Constraint

Optimization Subject To Minimize Quadratic Programming - Convex w Functional margin =1 means sitting on margin

Linear Separable SVM Subject To Minimize Constrained Problem We need to study the theory of Lagrange Multipliers to understand the SVM

Lagrange Multipliers Define The Lagrangian Only consider convex f, g i, and affine h i (method is more general) α,β are called Lagrange Multipliers

Primal Problem Which is what we are looking for!!! We denote the solution x*

Dual Problem α,β are dual feasible if α i ≥ 0 for all i This implies for dual feasible α,β

Weak and Strong Duality Question: When are they equal? Technical Assume Strong Duality d* = p*

Complementary Slackness Let x* be primal optimal α*,β* dual optimal (p*=d*) All Non-Negative Must be zero since squeezed between p* for all i Complimentary Slackness

Karush-Kuhn-Tucker (KKT) Conditions Let x* be primal optimal α*,β* dual optimal (p*=d*) g i (x*) ≤ 0, for all i α i * ≥ 0 for all i α i * g i (x*) = 0 for all i h i (x*) = 0, for all i Primal Feasibility Dual Feasibility Complementary Slackness Stationary KKT Conditions for optimality, necessary and sufficient.

Finally Back To SVM Subject To Minimize Define the Lagrangian (no β required)

SVM Summary Dual S. t. Support Vectors w

SVM Generalization VC Dimension for hyperplanes is the number of parameters Theoretically Speaking Why bother finding large margins hyperplanes? There are other bounds: Rich Theory

Kernel Support Vector Machines

Kernels Nonlinear feature transforms Define Kernel and replace The two optimizations problems are identical!!! Kernel is an inner product in another space

Kernels K is an inner product in Φ space!

Polynomial Kernel Dimensional Space!!! Feature Transform would take n d time Computing the kernel takes n time

Gaussian Kernel Think of this as a similarity measure It is essentially 0 if x and z are not close

Gaussian Kernel Nonlinear Transform Simplest case, x,z are 1D e.g. numbers Inner product between infinitely long feature mapped x,z

Lets Apply It

Kernel Matrix Points Kernel K(x,z)=Φ(x) T Φ(z) Kernel Matrix (same name) If K is a valid Kernel e.g. K(x,z) = Φ(x) T Φ(z) for some Φ Then K is symmetric positive semidefinite (x T Kx≥0) Mercer Kernels, Positive semidefinite is sufficient and necessary condition

Kernels Add nonlinearity to our SVM Efficient computation in high and even infinite dimensional spaces. Few Support vectors (on margin) help us in generalization (in theory) and runtime Kernels are not limited to SVM Kernel Perceptrons, Kernel logistic Regression,…, any place where we only depend on the inner product

Non Separable Data SVM

Violating Margin w Wrong side of the track ξ

S. To Minimize If a point is on wrong side of the margin at distance ξ we penalize by Cξ Hyperparameter C controls the competing goals of a large margin and points being on the right side of it How to find C? Validation (Model Selection) w Does this look like regularization to you?

Effect of C C=1 C=100

Minimize S. To Primal Var. Lagrange Mult.

Defining The Problems Dual Primal Dual Opt Primal Opt

Find minimizing w,b,ξ Use Gradients New Constraint

Constraints

Look Familiar!

S. t. When done optimizing set β = C-α Convex Quadratic Program

KKT Complementary Slackness Optimal solution must have For all inequality constraints We know

On Margin Right Side Wrong Side Find b*: Use a point on margin Practice way: Average over margin points

S. t. S. To

Coordinate Ascent Pick Fix Solve Repeat until done

Sequential Minimal Optimization (SMO) Algorithm S. t. Coordinate Ascent: Cannot Change only one variable Take Two

Pick 2 indexes Fix nonpicked Optimize W for α’s selected Repeat Until Done subject to additional constraint Algorithm Outline

Linear Equation in α 1,α 2 α1α1 α2α2 0 C L H Constrains we have C

y 1 is either 1 or -1 α2α2 α1α1 0 C L H Optimize Subject to

i=j=1 i=1,j=2, i=2,j=1 i=j=2 i=1, j>2 i>3, j=2 Trying to say it is a second degree polynomial in α 2

= Second degree polynomial We can maximize such things: α2α2 α1α1 0 C L H

Remains How to pick α’s – Pick one that violate KKT or Heuristic – Pick another one and optimize Stopping Criterion – Close enough to KKT conditions or tired of waiting

The End Of SVMs Except you will use them in hand in 2…

Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Similar presentations

Presentation on theme: "Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Similar presentations

Presentation on theme: "Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour."— Presentation transcript:

Similar presentations

About project

Feedback