Download presentation
Presentation is loading. Please wait.
Published byKerrie Taylor Modified over 9 years ago
1
Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of Wisconsin - Madison
2
Key Contributions Fast new support vector machine classifier An order of magnitude faster than standard classifiers Extremely simple to implement 4 lines of MATLAB code NO optimization packages (LP,QP) needed
3
Outline of Talk (Standard) Support vector machines (SVM) Classify by halfspaces Proximal support vector machines (PSVM) Classify by proximity to planes Linear PSVM classifier Nonlinear PSVM classifier Full and reduced kernels Numerical results Correctness comparable to standard SVM Much faster classification! 2-million points in 10-space in 21 seconds Compared to over 10 minutes for standard SVM
4
Support Vector Machines Maximizing the Margin between Bounding Planes A+ A-
5
Proximal Vector Machines Fitting the Data using two parallel Bounding Planes A+ A-
6
Standard Support Vector Machine Algebra of 2-Category Linearly Separable Case Given m points in n dimensional space Represented by an m-by-n matrix A Membership of each in class +1 or –1 specified by: An m-by-m diagonal matrix D with +1 & -1 entries More succinctly: where e is a vector of ones. Separate by two bounding planes,
7
Standard Support Vector Machine Formulation Margin is maximized by minimizing Solve the quadratic program for some : min s. t. (QP),, denotes where or membership.
8
PSVM Formulation We have from the QP SVM formulation: (QP) min s. t. This simple, but critical modification, changes the nature of the optimization problem tremendously!! Solving for in terms of and gives: min
9
Advantages of New Formulation Objective function remains strongly convex An explicit exact solution can be written in terms of the problem data PSVM classifier is obtained by solving a single system of linear equations in the usually small dimensional input space Exact leave-one-out-correctness can be obtained in terms of problem data
10
Linear PSVM We want to solve: min Setting the gradient equal to zero, gives a nonsingular system of linear equations. Solution of the system gives the desired PSVM classifier
11
Linear PSVM Solution Here, The linear system to solve depends on: which is of the size is usually much smaller than
12
Linear Proximal SVM Algorithm Classifier: Input Define Solve Calculate
13
Nonlinear PSVM Formulation By QP “duality”,. Maximizing the margin in the “dual space”, gives: min Replace by a nonlinear kernel : min Linear PSVM: (Linear separating surface: ) (QP) min s. t.
14
The Nonlinear Classifier The nonlinear classifier: Where K is a nonlinear kernel, e.g.: Gaussian (Radial Basis) Kernel : The -entry of represents the “similarity” of data pointsand
15
Nonlinear PSVM Defining slightly different: Similar to the linear case, setting the gradient equal to zero, we obtain: However, reduced kernels techniques can be used (RSVM) to reduce dimensionality. Here, the linear system to solve is of the size
16
Linear Proximal SVM Algorithm Input Solve Calculate Non Define Classifier:
17
Linear & Nonlinear PSVM MATLAB Code function [w, gamma] = psvm(A,d,nu) % PSVM: linear and nonlinear classification % INPUT: A, d=diag(D), nu. OUTPUT: w, gamma % [w, gamma] = psvm(A,d,nu); [m,n]=size(A);e=ones(m,1);H=[A -e]; v=(d’*H)’ %v=H’*D*e; r=(speye(n+1)/nu+H’*H)\v % solve (I/nu+H’*H)r=v w=r(1:n);gamma=r(n+1); % getting w,gamma from r
18
Linear PSVM Comparisons with Other SVMs Much Faster, Comparable Correctness Data Set m x n PSVM Ten-fold test % Time (sec.) SSVM Ten-fold test % Time (sec.) SVM Ten-fold test % Time (sec.) WPBC (60 mo.) 110 x 32 68.5 0.02 68.5 0.17 62.7 3.85 Ionosphere 351 x 34 87.3 0.17 88.7 1.23 88.0 2.19 Cleveland Heart 297 x 13 85.9 0.01 86.2 0.70 86.5 1.44 Pima Indians 768 x 8 77.5 0.02 77.6 0.78 76.4 37.00 BUPA Liver 345 x 6 69.4 0.02 70.0 0.78 69.5 6.65 Galaxy Dim 4192 x 14 93.5 0.34 95.0 5.21 94.1 28.33
19
Linear PSVM vs LSVM 2-Million Dataset Over 30 Times Faster DatasetMethodTraining Correctness % Testing Correctness % Time Sec. NDC “Easy” LSVM90.8691.23658.5 PSVM90.8091.1320.8 NDC “Hard” LSVM69.8069.44655.6 PSVM69.8469.5220.6
20
Nonlinear PSVM: Spiral Dataset 94 Red Dots & 94 White Dots
21
Nonlinear PSVM Comparisons Data Set m x n PSVM Ten-fold test % Time (sec.) SSVM Ten-fold test % Time (sec.) LSVM Ten-fold test % Time (sec.) Ionosphere 351 x 34 95.2 4.60 95.8 25.25 95.8 14.58 BUPA Liver 345 x 6 73.6 4.34 73.7 20.65 73.7 30.75 Tic-Tac-Toe 958 x 9 98.4 74.95 98.4 395.30 94.7 350.64 Mushroom * 8124 x 22 88.0 35.50 88.8 307.66 87.8 503.74 * A rectangular kernel was used of size 8124 x 215
22
Conclusion PSVM is an extremely simple procedure for generating linear and nonlinear classifiers PSVM classifier is obtained by solving a single system of linear equations in the usually small dimensional input space for a linear classifier Comparable test set correctness to standard SVM Much faster than standard SVMs : typically an order of magnitude less.
23
Future Work Extension of PSVM to multicategory classification Massive data classification using an incremental PSVM Parallel formulation and implementation of PSVM
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.