RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

Slides:



Advertisements
Similar presentations
Optimization in Data Mining Olvi L. Mangasarian with G. M. Fung, J. W. Shavlik, Y.-J. Lee, E.W. Wild & Collaborators at ExonHit – Paris University of Wisconsin.
Advertisements

Introduction to Support Vector Machines (SVM)
ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Classification / Regression Support Vector Machines
Support Vector Machines
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Machine learning continued Image source:
SVMs Reprised. Administrivia I’m out of town Mar 1-3 May have guest lecturer May cancel class Will let you know more when I do...
The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,
Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
The value of kernel function represents the inner product of two training points in feature space Kernel functions merge two steps 1. map input data from.
Kernel Technique Based on Mercer’s Condition (1909)
Using Analytic QP and Sparseness to Speed Training of Support Vector Machines John C. Platt Presented by: Travis Desell.
Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.
Reduced Support Vector Machine
Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends.
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Binary Classification Problem Learn a Classifier from the Training Set
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
SVMs Finalized. Where we are Last time Support vector machines in grungy detail The SVM objective function and QP Today Last details on SVMs Putting it.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Support Vector Regression David R. Musicant and O.L. Mangasarian International Symposium on Mathematical Programming Thursday, August 10, 2000
Support Vector Machines
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Lecture 10: Support Vector Machines
SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.
Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends.
Mathematical Programming in Support Vector Machines
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian & Edward Wild University of Wisconsin Madison Workshop on Optimization-Based Data.
Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian Edward Wild University of Wisconsin Madison.
The Disputed Federalist Papers: Resolution via Support Vector Machine Feature Selection Olvi Mangasarian UW Madison & UCSD La Jolla Glenn Fung Amazon Inc.,
Support Vector Machines in Data Mining AFOSR Software & Systems Annual Meeting Syracuse, NY June 3-7, 2002 Olvi L. Mangasarian Data Mining Institute University.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison
CS 478 – Tools for Machine Learning and Data Mining SVM.
CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.
Dec 21, 2006For ICDM Panel on 10 Best Algorithms Support Vector Machines: A Survey Qiang Yang, for ICDM 2006 Panel Partially.
Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier.
Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Second Annual Review June 1, 2001 Data Mining Institute.
Privacy-Preserving Support Vector Machines via Random Kernels Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison March 3, 2016 TexPoint.
Survival-Time Classification of Breast Cancer Patients and Chemotherapy Yuh-Jye Lee, Olvi Mangasarian & W. H. Wolberg UW Madison & UCSD La Jolla Computational.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
Machine Learning and Data Mining: A Math Programming- Based Approach Glenn Fung CS412 April 10, 2003 Madison, Wisconsin.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
Massive Support Vector Regression (via Row and Column Chunking) David R. Musicant and O.L. Mangasarian NIPS 99 Workshop on Learning With Support Vectors.
Minimal Kernel Classifiers Glenn Fung Olvi Mangasarian Alexander Smola Data Mining Institute University of Wisconsin - Madison Informs 2002 San Jose, California,
Classification via Mathematical Programming Based Support Vector Machines Glenn M. Fung Computer Sciences Dept. University of Wisconsin - Madison November.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Knowledge-Based Nonlinear Support Vector Machine Classifiers Glenn Fung, Olvi Mangasarian & Jude Shavlik COLT 2003, Washington, DC. August 24-27, 2003.
Support Vector Machine
Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi
Geometrical intuition behind the dual problem
Computer Sciences Dept. University of Wisconsin - Madison
An Introduction to Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Concave Minimization for Support Vector Machine Classifiers
COSC 4368 Machine Learning Organization
University of Wisconsin - Madison
University of Wisconsin - Madison
Minimal Kernel Classifiers
Presentation transcript:

RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University of Wisconsin-Madison

Outline of Talk  The smooth support vector machine (SSVM)  Difficulties with nonlinear SVMs:  Computational: Handling massive kernel matrix:  Storage: Separating surface depends on almost entire dataset  Reduced Support Vector Machines (RSVMs)  Reduced kernel : Much smaller rectangular matrix  : 1% to 10% of  Numerical Results  What is a support vector machine (SVM) classifier?  A new SVM solvable without an optimization package  e.g. 32,562-point dataset classified in 17 minutes compared to 2.15 hours by a standard algorithm (SMO)  Speeds computation & reduces storage

What is a Support Vector Machine?  An optimally defined surface  Typically nonlinear in the input space  Linear in a higher dimensional space  Implicitly defined by a kernel function

What are Support Vector Machines Used For?  Classification  Regression & Data Fitting  Supervised & Unsupervised Learning (Will concentrate on classification)

Geometry of the Classification Problem 2-Category Linearly Separable Case A+ A-

Support Vector Machines Maximizing the Margin between Bounding Planes A+ A-

Support Vector Machines Formulation  Margin is maximized by minimizing  Solve the quadratic program for some : min s. t. (QP),, denotes where or membership.

SVM as an Unconstrained Minimization Problem At the solution of (QP) : where, Hence (QP) is equivalent to the nonsmooth SVM: min s. t. (QP)

SSVM: The Smooth Support Vector Machine  Replacing the plus function in the nonsmooth SVM by the smooth, gives our SSVM: nonsmooth SVM as goes to infinity.  The solution of SSVM converges to the solution of ( Typically, ) min, obtained by integrating the sigmoid functionof Here, is an accurate smooth approximation of neural networks. (sigmoid = smoothed step)

Nonlinear Smooth Support Vector Machine Nonlinear Separating Surface:  Use a nonlinear kernel in SSVM: min  The kernel matrix is fully dense  Use Newton algorithm to solve the problem  Each iteration solves m+1 linear equations in m+1 variables  Nonlinear separating surface depends on entire dataset :

Examples of Kernels is an integer:  Polynomial Kernel : ) (Linear Kernel :  Gaussian (Radial Basis) Kernel :

Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Long CPU time to compute numbers  Computational complexity depends on  Separating surface depends on almost entire dataset  Need to store the entire dataset after solving the problem  Complexity of nonlinear SSVM  Runs out of memory while storing kernel matrix

Overcoming Computational & Storage Difficulties Use a Rectangular Kernel  Choose a small random sample of  The small random sample is a representative sample of the entire dataset  Typically is 1% to 10% of the rows of  Replace by with corresponding in nonlinear SSVM the rectangular kernel  Only need to compute and store numbers for  Computational complexity reduces to  The nonlinear separator only depends on Using gives lousy results!

Reduced Support Vector Machine Algorithm Nonlinear Separating Surface: (i) Choose a random subset matrix of entire data matrix (ii) Solve the following problem by the Newton method with corresponding : min (iii) The separating surface is defined by the optimal solution in step (ii):

How to Choose in RSVM?  is a representative sample of the entire dataset  Need not be a subset of  A good selection of may generate a classifier using very small  Possible ways to choose :  Choose random rows from the entire dataset  Choose such that the distance between its rows exceeds a certain tolerance  Use k cluster centers of as and

A Nonlinear Kernel Application Checkerboard Training Set: 1000 Points in Separate 486 Asterisks from 514 Dots

Conventional SVM Result on Checkerboard Using 50 Randomly Selected Points Out of 1000

RSVM Result on Checkerboard Using SAME 50 Random Points Out of 1000

RSVM on Moderate Sized Problems (Best Test Set Correctness %, CPU seconds) Cleveland Heart 297 x 13, BUPA Liver 345 x 6, Ionosphere 351 x 34, Pima Indians 768 x 8, Tic-Tac-Toe 958 x 9, Mushroom 8124 x 22, N/A

RSVM on Large UCI Adult Dataset Standard Deviation over 50 Runs = Average Correctness % & Standard Deviation, 50 Runs (6414, 26148) % (11221, 21341) % (16101, 16461) % (22697, 9865) % (32562, 16282) %

CPU Times on UCI Adult Dataset RSVM, SMO and PCGC with a Gaussian Kernel Adult Dataset : Training Set Size vs. CPU Time in Seconds Size RSVM SMO PCGC Ran out of memory

Time( CPU sec. ) Training Set Size CPU Time Comparison on UCI Dataset RSVM, SMO and PCGC with a Gaussian Kernel

Conclusion  RSVM : An effective classifier for large datasets  Classifier uses 10% or less of dataset  Can handle massive datasets  Much faster than other algorithms  Test set correctness:  Applicable to all nonlinear kernel problems Same or better than full dataset Much better than randomly chosen subset  Rectangular kernel :  Novel practical idea