Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mathematical Programming in Support Vector Machines

Similar presentations


Presentation on theme: "Mathematical Programming in Support Vector Machines"— Presentation transcript:

1 Mathematical Programming in Support Vector Machines
Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering Systems Seminar MIT October 4, 2000

2 What is a Support Vector Machine?
An optimally defined surface Typically nonlinear in the input space Linear in a higher dimensional space Implicitly defined by a kernel function

3 What are Support Vector Machines Used For?
Classification Regression & Data Fitting Supervised & Unsupervised Learning (Will concentrate on classification)

4 Example of Nonlinear Classifier: Checkerboard Classifier

5 Outline of Talk Generalized support vector machines (SVMs)
Completely general kernel allows complex classification (No Mercer condition!) Smooth support vector machines Smooth & solve SVM by a fast Newton method Lagrangian support vector machines Very fast simple iterative scheme- One matrix inversion: No LP. No QP. Reduced support vector machines Handle large datasets with nonlinear kernels

6 Generalized Support Vector Machines 2-Category Linearly Separable Case

7 Generalized Support Vector Machines Algebra of 2-Category Linearly Separable Case
Given m points in n dimensional space Represented by an m-by-n matrix A Membership of each in class +1 or –1 specified by: An m-by-m diagonal matrix D with +1 & -1 entries Separate by two bounding planes, More succinctly: where e is a vector of ones.

8 Generalized Support Vector Machines Maximizing the Margin between Bounding Planes

9 Generalized Support Vector Machines The Linear Support Vector Machine Formulation
Solve the following mathematical program for some : The nonnegative slack variable is zero iff: Convex hulls of and do not intersect is sufficiently large

10 Breast Cancer Diagnosis Application 97% Tenfold Cross Validation Correctness 780 Samples:494 Benign, 286 Malignant

11 Another Application: Disputed Federalist Papers Bosch & Smith Hamilton, 50 Madison, 12 Disputed

12 Generalized Support Vector Machine Motivation (Nonlinear Kernel Without Mercer Condition)
Linear SVM: Linear separating surface: Set . Resulting linear surface: Replace by arbitrary nonlinear kernel Resulting nonlinear surface:

13 SSVM: Smooth Support Vector Machine (SVM as Unconstrained Minimization Problem)
Changing to 2-norm and measuring margin in( ) space:

14 Smoothing the Plus Function: Integrate the Sigmoid Function

15 SSVM: The Smooth Support Vector Machine Smoothing the Plus Function
Integrating the sigmoid approximation to the step function: gives a smooth, excellent approximation to the plus function: Replacing the plus function in the nonsmooth SVM by the smooth approximation gives our SSVM:

16 Newton: Minimize a sequence of quadratic approximations
to the strongly convex objective function, i.e. solve a sequence of linear equations in n+1 variables. (Small dimensional input space.) Armijo: Shorten distance between successive iterates so as to generate sufficient decrease in objective function. (In computational reality, not needed!) Global Quadratic Convergence: Starting from any point, the iterates guaranteed to converge to the unique solution at a quadratic rate, i.e. errors get squared. (Typically, 6 to 8 iterations without an Armijo.)

17 SSVM with a Nonlinear Kernel Nonlinear Separating Surface in Input Space

18 Examples of Kernels Generate Nonlinear Separating Surfaces in Input Space
Polynomial Kernel Gaussian (Radial Basis) Kernel Neural Network Kernel

19

20

21

22

23

24

25 LSVM: Lagrangian Support Vector Machine Dual of SVM
Taking the dual of the SVM formulation: , gives the following simple dual problem: The variables of SSVM are related to by:

26 LSVM: Lagrangian Support Vector Machine Dual SVM as Symmetric Linear Complementarity Problem
Defining the two matrices: Reduces the dual SVM to: The optimality condition for this dual SVM is the LCP: which, by Implicit Lagrangian Theory, is equivalent to:

27 LSVM Algorithm Simple & Linearly Convergent – One Small Matrix Inversion
Where: Key Idea: Sherman-Morrison-Woodbury formula allows the inversion inversion of an extremely large m-by-m matrix Q by merely inverting a much smaller n-by-n matrix as follows:

28 LSVM Algorithm – Linear Kernel 11 Lines of MATLAB Code
function [it, opt, w, gamma] = svml(A,D,nu,itmax,tol) % lsvm with SMW for min 1/2*u'*Q*u-e'*u s.t. u=>0, % Q=I/nu+H*H', H=D[A -e] % Input: A, D, nu, itmax, tol; Output: it, opt, w, gamma % [it, opt, w, gamma] = svml(A,D,nu,itmax,tol); [m,n]=size(A);alpha=1.9/nu;e=ones(m,1);H=D*[A -e];it=0; S=H*inv((speye(n+1)/nu+H'*H)); u=nu*(1-S*(H'*e));oldu=u+1; while it<itmax & norm(oldu-u)>tol z=(1+pl(((u/nu+H*(H'*u))-alpha*u)-1)); oldu=u; u=nu*(z-S*(H'*z)); it=it+1; end; opt=norm(u-oldu);w=A'*D*u;gamma=-e'*D*u; function pl = pl(x); pl = (abs(x)+x)/2;

29 LSVM Algorithm – Linear Kernel Computational Results
2 Million random points in 10 dimensional space Classified in 6.7 minutes in 6 iterations & e-5 accuracy 250 MHz UltraSPARC II with 2 gigabyte memory CPLEX ran out of memory 32562 points in 123-dimensional space (UCI Adult Dataset) Classified in141 seconds & 55 iterations to 85% correctness 400 MHz Pentium II with 2 gigabyte memory SVM classified in 178 seconds & 4497 iterations

30 LSVM – Nonlinear Kernel Formulation
For the nonlinear kernel: the separating nonlinear surface is given by: Where u is the solution of the dual problem: with Q redefined as:

31 LSVM Algorithm – Nonlinear Kernel Application 100 Iterations, 58 Seconds on Pentium II, 95.9% Accuracy

32 Reduced Support Vector Machines (RSVM) Large Nonlinear Kernel Classification Problems
Key idea: Use a rectangular kernel. is a small random sample of where Typically has 1% to 10% of the rows of Two important consequences: RSVM can solve very large problems Nonlinear separator depends on only Separating surface: gives lousy results

33 Conventional SVM Result on Checkerboard Using 50 Random Points Out of 1000

34 RSVM Result on Checkerboard Using SAME 50 Random Points Out of 1000

35 RSVM on Large Classification Problems Standard Error over 50 Runs = 0
RSVM on Large Classification Problems Standard Error over 50 Runs = to RSVM Time = 1.24 * (Random Points Time)

36 Conclusion Theory Algorithms
Mathematical Programming plays an essential role in SVMs Theory New formulations Generalized SVMs New algorithm-generating concepts Smoothing (SSVM) Implicit Lagrangian (LSVM) Algorithms Fast : SSVM Massive: LSVM, RSVM

37 Future Research Theory Algorithms Concave minimization
Concurrent feature & data selection Multiple-instance problems SVMs as complementarity problems Kernel methods in nonlinear programming Algorithms Chunking for massive classification: Multicategory classification algorithms

38 Talk & Papers Available on Web


Download ppt "Mathematical Programming in Support Vector Machines"

Similar presentations


Ads by Google