Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung.

Slides:

Advertisements

Similar presentations

ECG Signal processing (2)

Advertisements

Support Vector Machine Classification Computation & Informatics in Biology & Medicine Madison Retreat, November 15, 2002 Olvi L. Mangasarian with G. M.

Support Vector Machines

SVM—Support Vector Machines

Machine learning continued Image source:

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Separating Hyperplanes

The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,

Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg

The value of kernel function represents the inner product of two training points in feature space Kernel functions merge two steps 1. map input data from.

Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Dual Problem of Linear Program subject to Primal LP Dual LP subject to ※ All duality theorems hold and work perfectly!

Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.

Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.

Support Vector Machines Formulation  Solve the quadratic program for some : min s. t.,, denotes where or membership.  Different error functions and measures.

Support Vector Machines Kernel Machines

Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.

Binary Classification Problem Learn a Classifier from the Training Set

Unconstrained Optimization Problem

Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.

Support Vector Machines

What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.

Lecture 10: Support Vector Machines

Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:

Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, Rutgers University Y.-J.

Mathematical Programming in Support Vector Machines

An Introduction to Support Vector Machines Martin Law.

Efficient Model Selection for Support Vector Machines

Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian & Edward Wild University of Wisconsin Madison Workshop on Optimization-Based Data.

Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian Edward Wild University of Wisconsin Madison.

The Disputed Federalist Papers: Resolution via Support Vector Machine Feature Selection Olvi Mangasarian UW Madison & UCSD La Jolla Glenn Fung Amazon Inc.,

Support Vector Machines in Data Mining AFOSR Software & Systems Annual Meeting Syracuse, NY June 3-7, 2002 Olvi L. Mangasarian Data Mining Institute University.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.

An Introduction to Support Vector Machines (M. Law)

Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison

Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:

Support Vector Machine Data Mining Olvi L. Mangasarian with Glenn M. Fung, Jude W. Shavlik & Collaborators at ExonHit – Paris Data Mining Institute University.

RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

Support Vector Machines Project מגישים : גיל טל ואורן אגם מנחה : מיקי אלעד נובמבר 1999 הטכניון מכון טכנולוגי לישראל הפקולטה להנדסת חשמל המעבדה לעיבוד וניתוח.

An Introduction to Support Vector Machine (SVM)

Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.

CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Exact Differentiable Exterior Penalty for Linear Programming Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison December 20, 2015 TexPoint.

Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison.

Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier.

CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.

1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, Rutgers University Y.-J.

Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Second Annual Review June 1, 2001 Data Mining Institute.

Privacy-Preserving Support Vector Machines via Random Kernels Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison March 3, 2016 TexPoint.

Survival-Time Classification of Breast Cancer Patients and Chemotherapy Yuh-Jye Lee, Olvi Mangasarian & W. H. Wolberg UW Madison & UCSD La Jolla Computational.

Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.

Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.

Machine Learning and Data Mining: A Math Programming- Based Approach Glenn Fung CS412 April 10, 2003 Madison, Wisconsin.

Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.

Minimal Kernel Classifiers Glenn Fung Olvi Mangasarian Alexander Smola Data Mining Institute University of Wisconsin - Madison Informs 2002 San Jose, California,

Classification via Mathematical Programming Based Support Vector Machines Glenn M. Fung Computer Sciences Dept. University of Wisconsin - Madison November.

Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)

Knowledge-Based Nonlinear Support Vector Machine Classifiers Glenn Fung, Olvi Mangasarian & Jude Shavlik COLT 2003, Washington, DC. August 24-27, 2003.

Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi

Geometrical intuition behind the dual problem

Computer Sciences Dept. University of Wisconsin - Madison

Concave Minimization for Support Vector Machine Classifiers

University of Wisconsin - Madison

University of Wisconsin - Madison

Minimal Kernel Classifiers

Presentation transcript:

Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung & Olvi Mangasarian Data Mining Institute University of Wisconsin - Madison

Key Contributions  Fast incremental classifier based on PSVM  Proximal Support Vector Machine  Capable of modifying an existing linear classifier by both adding and retiring data  Extremely simple to implement  Small memory requirement  Even for huge problems (1 billion)  NO optimization packages (LP,QP) needed

Outline of Talk  (Standard) Support vector machines (SVM)  Classification by halfspaces  Proximal linear support vector machines (PSVM)  Classification by proximity to planes  The incremental and decremental algorithm  Option of keeping or retiring old data  Numerical results  1 Billion points in 10 dimensional space classified in less than 3 hours!  Numerical results confirm that algorithm time is linear in the number of data points

Support Vector Machines Maximizing the Margin between Bounding Planes A+ A-

Proximal Support Vector Machines Fitting the Data using two parallel Bounding Planes A+ A-

Standard Support Vector Machine Algebra of 2-Category Linearly Separable Case  Given m points in n dimensional space  Represented by an m-by-n matrix A  Membership of each in class +1 or –1 specified by:  An m-by-m diagonal matrix D with +1 & -1 entries  More succinctly: where e is a vector of ones.  Separate by two bounding planes,

Standard Support Vector Machine Formulation  Margin is maximized by minimizing  Solve the quadratic program for some : min s. t. (QP),, denotes where or membership.

PSVM Formulation We have from the standard QP SVM formulation: (QP) min s. t. This simple, but critical modification, changes the nature of the optimization problem tremendously!! Solving for in terms of and gives: min

Advantages of New Formulation  Objective function remains strongly convex.  An explicit exact solution can be written in terms of the problem data.  PSVM classifier is obtained by solving a single system of linear equations in the usually small dimensional input space.  Exact leave-one-out-correctness can be obtained in terms of problem data.

Linear PSVM  We want to solve: min  Setting the gradient equal to zero, gives a nonsingular system of linear equations.  Solution of the system gives the desired PSVM classifier.

Linear PSVM Solution Here,  The linear system to solve depends on: which is of size  is usually much smaller than

Linear Proximal SVM Algorithm Classifier: Input Define Solve Calculate

Linear & Nonlinear PSVM MATLAB Code function [w, gamma] = psvm(A,d,nu) % PSVM: linear and nonlinear classification % INPUT: A, d=diag(D), nu. OUTPUT: w, gamma % [w, gamma] = psvm(A,d,nu); [m,n]=size(A);e=ones(m,1);H=[A -e]; v=(d’*H)’ %v=H’*D*e; r=(speye(n+1)/nu+H’*H)\v % solve (I/nu+H’*H)r=v w=r(1:n);gamma=r(n+1); % getting w,gamma from r

Incremental PSVM Classification  The linear system to solve depends on the compressed blocks: which are of the size and  Suppose we have two “blocks” of data

Linear Incremental Proximal SVM Algorithm Initialization Read from disk Compute and Store in memory Yes Compute output Update in memory No Discard: Keep:

Linear Incremental Proximal SVM Adding – Retiring Data  Capable of modifying an existing linear classifier by both adding and retiring data  Option of retiring old data is similar to adding new data  Financial Data: old data is obsolete  Option of keeping old data and merging it with the new data:  Medical Data: old data does not obsolesce.

Numerical experiments One-Billion Two-Class Dataset  Synthetic dataset consisting of 1 billion points in 10- dimensional input space  Generated by NDC (Normally Distributed Clustered) dataset generator  Dataset divided into 500 blocks of 2 million points each.  Solution obtained in less than 2 hours and 26 minutes  About 30% of the time was spent reading data from disk.  Testing set Correctness 90.79%

Numerical Experiments Simulation of Two-month 60-Million Dataset  Synthetic dataset consisting of 60 million points (1 million per day) in 10- dimensional input space  Generated using NDC  At the beginning, we only have data corresponding to the first month  Every day:  The oldest block of data is retired (1 Million)  A new block is added (1 Million)  A new linear classifier is calculated daily  Only an 11 by 11 matrix is kept in memory at the end of each day. All other data is purged.

Numerical experiments Separator changing through time

Numerical experiments Normals to the separating hyperplanes Corresponding to 5 day intervals

Conclusion  Proposed algorithm is an extremely simple procedure for generating linear classifiers in an incremental fashion for huge datasets.  The linear classifier is obtained by solving a single system of linear equations in the small dimensional input space.  The proposed algorithm has the ability to retire old data and add new data in a very simple manner.  Only a matrix of the size of the input space is kept in memory at any time

Future Work  Extension to nonlinear classification  Parallel formulation and implementation on remotely located servers for massive datasets  Real time on-line application, e.g. fraud detection