Support Vector Machines Joseph Gonzalez TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A AA A AA.

Support Vector Machines Joseph Gonzalez TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A AA A AA

From a linear classifier to... *One of the most famous slides you will see, ever!

O X O O X X X X X X O O O O O O

Maximum margin Maximum possible separation between positive and negative training examples *One of the most famous slides you will see, ever!

Geometric Intuition O X O O O X X X SUPPORT VECTORS

Geometric Intuition O X X O O O X X X SUPPORT VECTORS

Primal Version min || w || 2 +C ∑ξ s.t. (w.x + b)y ≥ 1-ξ ξ ≥ 0

DUAL Version Where did this come from? Remember Lagrange Multipliers Let us “incorporate” constraints into objective Then solve the problem in the “dual” space of lagrange multipliers max ∑α -1/2 ∑α i α j y i y j x i x j s.t. ∑α i y i = 0 C ≥ α i ≥ 0

Primal vs Dual Number of parameters? large # features? large # examples? for large # features, DUAL preferred many α i can go to zero! max ∑α -1/2 ∑α i α j y i y j x i x j s.t. ∑α i y i = 0 C ≥ α i ≥ 0 min || w || 2 +C ∑ξ s.t. (w.x + b)y ≥ 1-ξ ξ ≥ 0

DUAL: the “Support vector” version How do we find α ? Quadratic programming How do we find C? Cross-validation! Wait... how do we predict y for a new point x?? How do we find w? How do we find b? y = sign(w.x+b) w = Σ i α i y i x i max ∑α - 1/2 ∑α i α j y i y j x i x j s.t. ∑α i y i = 0 C ≥ α i ≥ 0 y = sign(Σ i α i y i x i x j + b)

max α 1 + α 2 + 2α 1 α 2 - α 1 2 /2 - 4α 2 2 s.t. α 1 -α 2 = 0 C ≥ αi ≥ 0 “Support Vector”s? O X α1α1 α2α2 max ∑α - 1/2 ∑α i α j y i y j x i x j s.t. ∑α i y i = 0 C ≥ α i ≥ 0 (0,1) (2,2) max ∑α - α1α2(-1)(0+2) - 1/2 α1 2 (1)(0+1) - 1/2 α2 2 (1)(4+4) w = Σ i α i y i x i w =.4([0 1]-[2 2]) =.4[-2 -1 ] y=w.x+b b = y-w.x x1: b = 1-.4 [-2 -1][0 1] = 1+.4 =1.4 b 4/5 α 1 =α 2 =α max 2α -5/2α 2 max 5/2α(4/5-α) 02/5 α 1 =α 2 =2/5

“Support Vector”s? O X α1α1 α2α2 max ∑α - 1/2 ∑α i α j y i y j x i x j s.t. ∑α i y i = 0 C ≥ α i ≥ 0 (0,1) (2,2) O α3α3 What is α 3 ? Try this at home

Playing With SVMS http://www.csie.ntu.edu.tw/~cjlin/libsvm/

More on Kernels Kernels represent inner products – K(a,b) = a.b – K(a,b) = φ(a). φ(b) Kernel trick is allows extremely complex φ( ) while keeping K(a,b) simple Goal: Avoid having to directly construct φ( ) at any point in the algorithm

Kernels Complexity of the optimization problem remains only dependent on the dimensionality of the input space and not of the feature space!

Can we used Kernels to Measure Distances? Can we measure distance between φ(a) and φ(b) using K(a,b)?

Continued:

Popular Kernel Methods Gaussian Processes Kernel Regression (Smoothing) – Nadarayan-Watson Kernel Regression

Support Vector Machines Joseph Gonzalez TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A AA A AA.

Similar presentations

Presentation on theme: "Support Vector Machines Joseph Gonzalez TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A AA A AA."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Support Vector Machines Joseph Gonzalez TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A AA A AA.

Similar presentations

Presentation on theme: "Support Vector Machines Joseph Gonzalez TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A AA A AA."— Presentation transcript:

Similar presentations

About project

Feedback