Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Similar presentations


Presentation on theme: "Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour."— Presentation transcript:

1 Machine Learning Weak 4 Lecture 2

2 Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour and only about the hand in If you nailed it you can stay home

3 Support Vector Machines Last Time Today

4 Functional Margins For each point we define the functional margin Define the functional margin of the hyperplane, e.g. the parameters w,b as “functional distance”

5 Geometric Margin xixi How far is x i from the hyperplane? How long is segment from x i to L? Hyperplane L w Since L on hyperplane Definition of L Multiply in Solve

6 Margins functional and Geometrical w Related by ||w||

7 Optimizing Margins Maximize Subject To Geometric Margin Point Margins Scale Constraint

8 Optimization Subject To Minimize Quadratic Programming - Convex w Functional margin =1 means sitting on margin

9 Linear Separable SVM Subject To Minimize Constrained Problem We need to study the theory of Lagrange Multipliers to understand the SVM

10 Lagrange Multipliers Define The Lagrangian Only consider convex f, g i, and affine h i (method is more general) α,β are called Lagrange Multipliers

11 Primal Problem Which is what we are looking for!!! We denote the solution x*

12 Dual Problem α,β are dual feasible if α i ≥ 0 for all i This implies for dual feasible α,β

13 Weak and Strong Duality Question: When are they equal? Technical Assume Strong Duality d* = p*

14 Complementary Slackness Let x* be primal optimal α*,β* dual optimal (p*=d*) All Non-Negative Must be zero since squeezed between p* for all i Complimentary Slackness

15 Karush-Kuhn-Tucker (KKT) Conditions Let x* be primal optimal α*,β* dual optimal (p*=d*) g i (x*) ≤ 0, for all i α i * ≥ 0 for all i α i * g i (x*) = 0 for all i h i (x*) = 0, for all i Primal Feasibility Dual Feasibility Complementary Slackness Stationary KKT Conditions for optimality, necessary and sufficient.

16 Finally Back To SVM Subject To Minimize Define the Lagrangian (no β required)

17 SVM Summary Dual S. t. Support Vectors w

18 SVM Generalization VC Dimension for hyperplanes is the number of parameters Theoretically Speaking Why bother finding large margins hyperplanes? There are other bounds: Rich Theory

19 Kernel Support Vector Machines

20 Kernels Nonlinear feature transforms Define Kernel and replace The two optimizations problems are identical!!! Kernel is an inner product in another space

21 Kernels K is an inner product in Φ space!

22 Polynomial Kernel Dimensional Space!!! Feature Transform would take n d time Computing the kernel takes n time

23 Gaussian Kernel Think of this as a similarity measure It is essentially 0 if x and z are not close

24 Gaussian Kernel Nonlinear Transform Simplest case, x,z are 1D e.g. numbers Inner product between infinitely long feature mapped x,z

25 Lets Apply It

26

27

28

29

30

31 Kernel Matrix Points Kernel K(x,z)=Φ(x) T Φ(z) Kernel Matrix (same name) If K is a valid Kernel e.g. K(x,z) = Φ(x) T Φ(z) for some Φ Then K is symmetric positive semidefinite (x T Kx≥0) Mercer Kernels, Positive semidefinite is sufficient and necessary condition

32 Kernels Add nonlinearity to our SVM Efficient computation in high and even infinite dimensional spaces. Few Support vectors (on margin) help us in generalization (in theory) and runtime Kernels are not limited to SVM Kernel Perceptrons, Kernel logistic Regression,…, any place where we only depend on the inner product

33 Non Separable Data SVM

34 Violating Margin w Wrong side of the track ξ

35 S. To Minimize If a point is on wrong side of the margin at distance ξ we penalize by Cξ Hyperparameter C controls the competing goals of a large margin and points being on the right side of it How to find C? Validation (Model Selection) w Does this look like regularization to you?

36 Effect of C C=1 C=100

37 Minimize S. To Primal Var. Lagrange Mult.

38 Defining The Problems Dual Primal Dual Opt Primal Opt

39 Find minimizing w,b,ξ Use Gradients New Constraint

40 Constraints

41 Look Familiar!

42 S. t. When done optimizing set β = C-α Convex Quadratic Program

43 KKT Complementary Slackness Optimal solution must have For all inequality constraints We know

44 On Margin Right Side Wrong Side Find b*: Use a point on margin Practice way: Average over margin points

45 S. t. S. To

46 Coordinate Ascent Pick Fix Solve Repeat until done

47 Sequential Minimal Optimization (SMO) Algorithm S. t. Coordinate Ascent: Cannot Change only one variable Take Two

48 Pick 2 indexes Fix nonpicked Optimize W for α’s selected Repeat Until Done subject to additional constraint Algorithm Outline

49 Linear Equation in α 1,α 2 α1α1 α2α2 0 C L H Constrains we have C

50 y 1 is either 1 or -1 α2α2 α1α1 0 C L H Optimize Subject to

51 i=j=1 i=1,j=2, i=2,j=1 i=j=2 i=1, j>2 i>3, j=2 Trying to say it is a second degree polynomial in α 2

52 = Second degree polynomial We can maximize such things: α2α2 α1α1 0 C L H

53 Remains How to pick α’s – Pick one that violate KKT or Heuristic – Pick another one and optimize Stopping Criterion – Close enough to KKT conditions or tired of waiting

54 The End Of SVMs Except you will use them in hand in 2…


Download ppt "Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour."

Similar presentations


Ads by Google