Concave Minimization for Support Vector Machine Classifiers Unlabeled Data Classification & Data Selection Glenn Fung O. L. Mangasarian
Part 1: Unlabeled Data Classification Given a large unlabeled dataset Use a k-Median clustering algorithm to select a small (5% to 10%) representative sample. Representative sample is labeled by expert or oracle. Combined labeled-unlabeled dataset is classified by a Semi-supervised Support Vector Machine. Test set correctness within 5.2% of a linear support vector machine trained on the entire dataset labeled by an expert.
Part 2: Data Selection for Support Vector Machines Classifiers Extract a minimal set of data points from a given dataset. Minimal set used to generate a Minimal Support Vector Machine (MSVM) classifier. MSVM classifier as good or better than that obtained by training on entire dataset. Feature selection is incorporated into procedure to obtain a minimal set of input features. Data reduction as high as 81% and averaged 66% over seven public datasets.
SVM: Linear Support Vector Machine
1-norm Linear SVM
Unlabeled Data Classification Given a completely unlabeled large data set. Costly to label points by an expert or an oracle. Two Question arise: How to choose a small subset for labeling? How to combine labeled and unlabeled data? Answers: Use k-median clustering for selecting “representative” points to be labeled. Use semi-supervised SVM to obtain a classifier based on labeled and unlabeled data.
Unlabeled Data Classification Unlabeled Data Set k-Median clustering Chosen Data Remaining Data Expert Labeled Data Semi-supervised SVM Separating Plane
K-Median Clustering Algorithm Given m data points. Find k clusters of these points such that the sum of the 1-norm distances from each point to the closest cluster center is minimized.
K-Median Clustering Algorithm *
K-Median Clustering Algorithm
Unlabeled Data Classification Unlabeled Data Set k-Median clustering Chosen Data Remaining Data Expert Labeled Data Semi-supervised SVM Separating Plane
Semi-supervised SVM (S3VM) Given a dataset consisting of: labeled (+1,-1) points represented by: unlabeled points represented by: Classify the data into two classes as follows: Assign each unlabeled point in to a class (+1,-1) so as to maximize the distance between the bounding planes obtained by a linear SVM1 applied to entire dataset.
Formulation
:A concave approach The term in the objective function is concave because it is the minimum of two linear functions. A local solution to this problem is obtained solving a succession of linear programs (4 to 7) .
S3VM: Graphical Example Separate Triangles & Circles Hollow shapes represent labeled data Solid shapes represent unlabeled data SVM S3VM
Numerical Tests
Part 2: Data Selection for Support Vector Machines Classifiers Labeled dataset 1-norm SVM feature selection Smaller dimension dataset Support vector suppression MSVM Separating surface
Support Vectors
Feature Selection using 1-norm Linear SVM ( small.)
Motivation for the Minimal Support Vector Machine (MSVM)
Motivation for the Minimal Support Vector Machine (MSVM) Suppression of error term y: Minimizes the number of misclassified points. Works remarkably well computationally. Reduces positive components of multiplier u and hence number of support vectors.
MSVM Formulation
MSVM Formulation
Numerical Tests
Unlabeled data classification: Conclusions Unlabeled data classification: A fast finite linear programming based approach for Semi-supervised Support Vector Machines was proposed for classifying large datasets that are mostly unlabeled. Totally unlabeled datasets were classified by: Labeling a small percentage of clusters by an expert Classification by a semi-supervised SVM Test set correctness within 5.2% of a linear SVM trained on the entire dataset labeled by an expert.
Data selection for SVM classifiers: Conclusions Data selection for SVM classifiers: Minimal SVM (MSVM) extracts a minimal subset used to classify the entire dataset. MSVM maintains or improves generalization over other classifiers that use the entire dataset. Data reduction as high as 81%, and averaged 66% over seven public datasets. Future work MSVM: Promising tool for incremental algorithms. Improve chunking algorithms with MSVM. Nonlinear MSVM: strong potential for time & storage reduction.