Mathematical Programming in Support Vector Machines

Slides:



Advertisements
Similar presentations
Optimization in Data Mining Olvi L. Mangasarian with G. M. Fung, J. W. Shavlik, Y.-J. Lee, E.W. Wild & Collaborators at ExonHit – Paris University of Wisconsin.
Advertisements

Introduction to Support Vector Machines (SVM)
ECG Signal processing (2)
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines
Support Vector Machine Classification Computation & Informatics in Biology & Medicine Madison Retreat, November 15, 2002 Olvi L. Mangasarian with G. M.

Support Vector Machines
A Newton Method for Linear Programming Olvi L. Mangasarian University of California at San Diego.
Separating Hyperplanes
The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,
Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg
The value of kernel function represents the inner product of two training points in feature space Kernel functions merge two steps 1. map input data from.
Kernel Technique Based on Mercer’s Condition (1909)
Dual Problem of Linear Program subject to Primal LP Dual LP subject to ※ All duality theorems hold and work perfectly!
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.
Reduced Support Vector Machine
Support Vector Machines Formulation  Solve the quadratic program for some : min s. t.,, denotes where or membership.  Different error functions and measures.
Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends.
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Binary Classification Problem Learn a Classifier from the Training Set
Unconstrained Optimization Problem
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
Support Vector Regression David R. Musicant and O.L. Mangasarian International Symposium on Mathematical Programming Thursday, August 10, 2000
Support Vector Machines
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Classification and Regression
Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends.
An Introduction to Support Vector Machines Martin Law.
Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?
Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung.
Support Vector Machine & Image Classification Applications
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian & Edward Wild University of Wisconsin Madison Workshop on Optimization-Based Data.
Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian Edward Wild University of Wisconsin Madison.
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
The Disputed Federalist Papers: Resolution via Support Vector Machine Feature Selection Olvi Mangasarian UW Madison & UCSD La Jolla Glenn Fung Amazon Inc.,
Support Vector Machines in Data Mining AFOSR Software & Systems Annual Meeting Syracuse, NY June 3-7, 2002 Olvi L. Mangasarian Data Mining Institute University.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
An Introduction to Support Vector Machines (M. Law)
Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison
Privacy-Preserving Support Vector Machines via Random Kernels Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison November 14, 2015 TexPoint.
RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.
CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison.
Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Second Annual Review June 1, 2001 Data Mining Institute.
Privacy-Preserving Support Vector Machines via Random Kernels Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison March 3, 2016 TexPoint.
Survival-Time Classification of Breast Cancer Patients and Chemotherapy Yuh-Jye Lee, Olvi Mangasarian & W. H. Wolberg UW Madison & UCSD La Jolla Computational.
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
Machine Learning and Data Mining: A Math Programming- Based Approach Glenn Fung CS412 April 10, 2003 Madison, Wisconsin.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
Massive Support Vector Regression (via Row and Column Chunking) David R. Musicant and O.L. Mangasarian NIPS 99 Workshop on Learning With Support Vectors.
Minimal Kernel Classifiers Glenn Fung Olvi Mangasarian Alexander Smola Data Mining Institute University of Wisconsin - Madison Informs 2002 San Jose, California,
Classification via Mathematical Programming Based Support Vector Machines Glenn M. Fung Computer Sciences Dept. University of Wisconsin - Madison November.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Knowledge-Based Nonlinear Support Vector Machine Classifiers Glenn Fung, Olvi Mangasarian & Jude Shavlik COLT 2003, Washington, DC. August 24-27, 2003.
Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi
Geometrical intuition behind the dual problem
Computer Sciences Dept. University of Wisconsin - Madison
University of Wisconsin - Madison
University of Wisconsin - Madison
Minimal Kernel Classifiers
Presentation transcript:

Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering Systems Seminar MIT October 4, 2000

What is a Support Vector Machine? An optimally defined surface Typically nonlinear in the input space Linear in a higher dimensional space Implicitly defined by a kernel function

What are Support Vector Machines Used For? Classification Regression & Data Fitting Supervised & Unsupervised Learning (Will concentrate on classification)

Example of Nonlinear Classifier: Checkerboard Classifier

Outline of Talk Generalized support vector machines (SVMs) Completely general kernel allows complex classification (No Mercer condition!) Smooth support vector machines Smooth & solve SVM by a fast Newton method Lagrangian support vector machines Very fast simple iterative scheme- One matrix inversion: No LP. No QP. Reduced support vector machines Handle large datasets with nonlinear kernels

Generalized Support Vector Machines 2-Category Linearly Separable Case

Generalized Support Vector Machines Algebra of 2-Category Linearly Separable Case Given m points in n dimensional space Represented by an m-by-n matrix A Membership of each in class +1 or –1 specified by: An m-by-m diagonal matrix D with +1 & -1 entries Separate by two bounding planes, More succinctly: where e is a vector of ones.

Generalized Support Vector Machines Maximizing the Margin between Bounding Planes

Generalized Support Vector Machines The Linear Support Vector Machine Formulation Solve the following mathematical program for some : The nonnegative slack variable is zero iff: Convex hulls of and do not intersect is sufficiently large

Breast Cancer Diagnosis Application 97% Tenfold Cross Validation Correctness 780 Samples:494 Benign, 286 Malignant

Another Application: Disputed Federalist Papers Bosch & Smith 1998 56 Hamilton, 50 Madison, 12 Disputed

Generalized Support Vector Machine Motivation (Nonlinear Kernel Without Mercer Condition) Linear SVM: Linear separating surface: Set . Resulting linear surface: Replace by arbitrary nonlinear kernel Resulting nonlinear surface:

SSVM: Smooth Support Vector Machine (SVM as Unconstrained Minimization Problem) Changing to 2-norm and measuring margin in( ) space:

Smoothing the Plus Function: Integrate the Sigmoid Function

SSVM: The Smooth Support Vector Machine Smoothing the Plus Function Integrating the sigmoid approximation to the step function: gives a smooth, excellent approximation to the plus function: Replacing the plus function in the nonsmooth SVM by the smooth approximation gives our SSVM:

Newton: Minimize a sequence of quadratic approximations to the strongly convex objective function, i.e. solve a sequence of linear equations in n+1 variables. (Small dimensional input space.) Armijo: Shorten distance between successive iterates so as to generate sufficient decrease in objective function. (In computational reality, not needed!) Global Quadratic Convergence: Starting from any point, the iterates guaranteed to converge to the unique solution at a quadratic rate, i.e. errors get squared. (Typically, 6 to 8 iterations without an Armijo.)

SSVM with a Nonlinear Kernel Nonlinear Separating Surface in Input Space

Examples of Kernels Generate Nonlinear Separating Surfaces in Input Space Polynomial Kernel Gaussian (Radial Basis) Kernel Neural Network Kernel

LSVM: Lagrangian Support Vector Machine Dual of SVM Taking the dual of the SVM formulation: , gives the following simple dual problem: The variables of SSVM are related to by:

LSVM: Lagrangian Support Vector Machine Dual SVM as Symmetric Linear Complementarity Problem Defining the two matrices: Reduces the dual SVM to: The optimality condition for this dual SVM is the LCP: which, by Implicit Lagrangian Theory, is equivalent to:

LSVM Algorithm Simple & Linearly Convergent – One Small Matrix Inversion Where: Key Idea: Sherman-Morrison-Woodbury formula allows the inversion inversion of an extremely large m-by-m matrix Q by merely inverting a much smaller n-by-n matrix as follows:

LSVM Algorithm – Linear Kernel 11 Lines of MATLAB Code function [it, opt, w, gamma] = svml(A,D,nu,itmax,tol) % lsvm with SMW for min 1/2*u'*Q*u-e'*u s.t. u=>0, % Q=I/nu+H*H', H=D[A -e] % Input: A, D, nu, itmax, tol; Output: it, opt, w, gamma % [it, opt, w, gamma] = svml(A,D,nu,itmax,tol); [m,n]=size(A);alpha=1.9/nu;e=ones(m,1);H=D*[A -e];it=0; S=H*inv((speye(n+1)/nu+H'*H)); u=nu*(1-S*(H'*e));oldu=u+1; while it<itmax & norm(oldu-u)>tol z=(1+pl(((u/nu+H*(H'*u))-alpha*u)-1)); oldu=u; u=nu*(z-S*(H'*z)); it=it+1; end; opt=norm(u-oldu);w=A'*D*u;gamma=-e'*D*u; function pl = pl(x); pl = (abs(x)+x)/2;

LSVM Algorithm – Linear Kernel Computational Results 2 Million random points in 10 dimensional space Classified in 6.7 minutes in 6 iterations & e-5 accuracy 250 MHz UltraSPARC II with 2 gigabyte memory CPLEX ran out of memory 32562 points in 123-dimensional space (UCI Adult Dataset) Classified in141 seconds & 55 iterations to 85% correctness 400 MHz Pentium II with 2 gigabyte memory SVM classified in 178 seconds & 4497 iterations

LSVM – Nonlinear Kernel Formulation For the nonlinear kernel: the separating nonlinear surface is given by: Where u is the solution of the dual problem: with Q redefined as:

LSVM Algorithm – Nonlinear Kernel Application 100 Iterations, 58 Seconds on Pentium II, 95.9% Accuracy

Reduced Support Vector Machines (RSVM) Large Nonlinear Kernel Classification Problems Key idea: Use a rectangular kernel. is a small random sample of where Typically has 1% to 10% of the rows of Two important consequences: RSVM can solve very large problems Nonlinear separator depends on only Separating surface: gives lousy results

Conventional SVM Result on Checkerboard Using 50 Random Points Out of 1000

RSVM Result on Checkerboard Using SAME 50 Random Points Out of 1000

RSVM on Large Classification Problems Standard Error over 50 Runs = 0 RSVM on Large Classification Problems Standard Error over 50 Runs = 0.001 to 0.002 RSVM Time = 1.24 * (Random Points Time)

Conclusion Theory Algorithms Mathematical Programming plays an essential role in SVMs Theory New formulations Generalized SVMs New algorithm-generating concepts Smoothing (SSVM) Implicit Lagrangian (LSVM) Algorithms Fast : SSVM Massive: LSVM, RSVM

Future Research Theory Algorithms Concave minimization Concurrent feature & data selection Multiple-instance problems SVMs as complementarity problems Kernel methods in nonlinear programming Algorithms Chunking for massive classification: Multicategory classification algorithms

Talk & Papers Available on Web www.cs.wisc.edu/~olvi