Presentation is loading. Please wait.

Presentation is loading. Please wait.

Support Vector Machines in Data Mining AFOSR Software & Systems Annual Meeting Syracuse, NY June 3-7, 2002 Olvi L. Mangasarian Data Mining Institute University.

Similar presentations


Presentation on theme: "Support Vector Machines in Data Mining AFOSR Software & Systems Annual Meeting Syracuse, NY June 3-7, 2002 Olvi L. Mangasarian Data Mining Institute University."— Presentation transcript:

1 Support Vector Machines in Data Mining AFOSR Software & Systems Annual Meeting Syracuse, NY June 3-7, 2002 Olvi L. Mangasarian Data Mining Institute University of Wisconsin - Madison

2 What is a Support Vector Machine?  An optimally defined surface  Linear or nonlinear in the input space  Linear in a higher dimensional feature space  Implicitly defined by a kernel function

3 What are Support Vector Machines Used For?  Classification  Regression & Data Fitting  Supervised & Unsupervised Learning

4 Principal Contributions  Lagrangian support vector machine classification  Fast, simple, unconstrained iterative method  Reduced support vector machine classification  Accurate nonlinear classifier using random sampling  Proximal support vector machine classification  Classify by proximity to planes instead of halfspaces  Massive incremental classification  Classify by retiring old data & adding new data  Knowledge-based classification  Incorporate expert knowledge into classifier  Fast Newton method classifier  Finitely terminating fast algorithm for classification  Breast cancer prognosis & chemotherapy  Classify patients on basis of distinct survival curves

5 Principal Contributions  Proximal support vector machine classification

6 Support Vector Machines Maximize the Margin between Bounding Planes A+ A-

7 Proximal Support Vector Machines Maximize the Margin between Proximal Planes A+ A-

8 Standard Support Vector Machine Algebra of 2-Category Linearly Separable Case  Given m points in n dimensional space  Represented by an m-by-n matrix A  Membership of each in class +1 or –1 specified by:  An m-by-m diagonal matrix D with +1 & -1 entries  More succinctly: where e is a vector of ones.  Separate by two bounding planes,

9 Standard Support Vector Machine Formulation  Margin is maximized by minimizing  Solve the quadratic program for some : min s. t. (QP),, denotes where or membership.

10 PSVM Formulation Standard SVM formulation: (QP) min s. t. This simple, but critical modification, changes the nature of the optimization problem tremendously!! Solving for in terms of and gives: min

11 Advantages of New Formulation  Objective function remains strongly convex.  An explicit exact solution can be written in terms of the problem data.  PSVM classifier is obtained by solving a single system of linear equations in the usually small dimensional input space.  Exact leave-one-out-correctness can be obtained in terms of problem data.

12 Linear PSVM  We want to solve: min  Setting the gradient equal to zero, gives a nonsingular system of linear equations.  Solution of the system gives the desired PSVM classifier.

13 Linear PSVM Solution Here,  The linear system to solve depends on: which is of size  is usually much smaller than

14 Linear & Nonlinear PSVM MATLAB Code function [w, gamma] = psvm(A,d,nu) % PSVM: linear and nonlinear classification % INPUT: A, d=diag(D), nu. OUTPUT: w, gamma % [w, gamma] = psvm(A,d,nu); [m,n]=size(A);e=ones(m,1);H=[A -e]; v=(d’*H)’ %v=H’*D*e; r=(speye(n+1)/nu+H’*H)\v % solve (I/nu+H’*H)r=v w=r(1:n);gamma=r(n+1); % getting w,gamma from r

15 Numerical experiments One-Billion Two-Class Dataset  Synthetic dataset consisting of 1 billion points in 10- dimensional input space  Generated by NDC (Normally Distributed Clustered) dataset generator  Dataset divided into 500 blocks of 2 million points each.  Solution obtained in less than 2 hours and 26 minutes  About 30% of the time was spent reading data from disk.  Testing set Correctness 90.79%

16 Principal Contributions  Knowledge-based classification

17 Conventional Data-Based SVM

18 Knowledge-Based SVM via Polyhedral Knowledge Sets

19 Incoporating Knowledge Sets Into an SVM Classifier  This implication is equivalent to a set of constraints that can be imposed on the classification problem.  Suppose that the knowledge set: belongs to the class A+. Hence it must lie in the halfspace :  We therefore have the implication:

20 Numerical Testing The Promoter Recognition Dataset  Promoter: Short DNA sequence that precedes a gene sequence.  A promoter consists of 57 consecutive DNA nucleotides belonging to {A,G,C,T}.  Important to distinguish between promoters and nonpromoters  This distinction identifies starting locations of genes in long uncharacterized DNA sequences.

21 The Promoter Recognition Dataset Comparative Test Results

22 Wisconsin Breast Cancer Prognosis Dataset Description of the data  110 instances corresponding to 41 patients whose cancer had recurred and 69 patients whose cancer had not recurred  32 numerical features  The domain theory: two simple rules used by doctors:

23 Wisconsin Breast Cancer Prognosis Dataset Numerical Testing Results  Doctor’s rules applicable to only 32 out of 110 patients.  Only 22 of 32 patients are classified correctly by this rule (20% Correctness).  KSVM linear classifier applicable to all patients with correctness of 66.4%.  Correctness comparable to best available results using conventional SVMs.  KSVM can get classifiers based on knowledge without using any data.

24 Principal Contributions  Fast Newton method classifier

25 Fast Newton Algorithm for Classification Standard quadratic programming (QP) formulation of SVM:

26 Newton Algorithm  Newton algorithm terminates in a finite number of steps  Termination at global minimum  Error rate decreases linearly  Can generate complex nonlinear classifiers  By using nonlinear kernels: K(x,y)

27 Nonlinear Spiral Dataset 94 Red Dots & 94 White Dots

28 Principal Contributions  Breast cancer prognosis & chemotherapy

29 Kaplan-Meier Curves for Overall Patients: With & Without Chemotherapy

30 Breast Cancer Prognosis & Chemotherapy Good, Intermediate & Poor Patient Clustering

31 Kaplan-Meier Survival Curves for Good, Intermediate & Poor Patients

32 Kaplan-Meier Survival Curves for Intermediate Group: With & Without Chemotherapy

33 Conclusion  New methods for classification proposed  All based on rigorous mathematical foundation  Fast computational algorithms capable of classifying massive datasets  Classifiers based on both abstract prior knowledge as well as conventional datasets  Identification of breast cancer patients that can benefit from chemotherapy

34 Future Work  Extend proposed methods to standard optimization problems  Linear & quadratic programming  Preleminary results beat state-of-the-art software  Incorporate abstract concepts into optimization problems as constraints  Develop fast online algorithms for intrusion and fraud detection  Classify the effectiveness of new drug cocktails in combating various forms of cancer  Encouraging preliminary results


Download ppt "Support Vector Machines in Data Mining AFOSR Software & Systems Annual Meeting Syracuse, NY June 3-7, 2002 Olvi L. Mangasarian Data Mining Institute University."

Similar presentations


Ads by Google