Applying Support Vector Machines to Imbalanced Datasets Authors: Rehan Akbani, Stephen Kwek (University of Texas at San Antonio, USA) Nathalie Japkowicz.

Slides:



Advertisements
Similar presentations
(SubLoc) Support vector machine approach for protein subcelluar localization prediction (SubLoc) Kim Hye Jin Intelligent Multimedia Lab
Advertisements

ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.

An Introduction of Support Vector Machine
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
Support Vector Machines
SVM—Support Vector Machines
Robust Multi-Kernel Classification of Uncertain and Imbalanced Data
Support Vector Machines and Kernel Methods
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
SUPPORT VECTOR MACHINES PRESENTED BY MUTHAPPA. Introduction Support Vector Machines(SVMs) are supervised learning models with associated learning algorithms.
Rutgers CS440, Fall 2003 Support vector machines Reading: Ch. 20, Sec. 6, AIMA 2 nd Ed.
CES 514 – Data Mining Lecture 8 classification (contd…)
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Support Vector Machines
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Mathematical Programming in Support Vector Machines
An Introduction to Support Vector Machines Martin Law.
Learning from Imbalanced, Only Positive and Unlabeled Data Yetian Chen
Active Learning for Class Imbalance Problem
A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data Author: Gustavo E. A. Batista Presenter: Hui Li University of Ottawa.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Learning a Kernel Matrix for Nonlinear Dimensionality Reduction By K. Weinberger, F. Sha, and L. Saul Presented by Michael Barnathan.
Presenter: Wen-Feng Hsiao ( 蕭文峰 ) 2009/8/31 1. Outline Introduction Related Work Proposed Method Experiments and Results Conclusions 2.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison
Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
1 CSC 4510, Spring © Paula Matuszek CSC 4510 Support Vector Machines (SVMs)
RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.
An Introduction to Support Vector Machine (SVM)
Class Imbalance Classification Implementation Group 4 WEI Lili, ZENG Gaoxiong,
CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.
Supervised Learning. CS583, Bing Liu, UIC 2 An example application An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc)
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Class Imbalance in Text Classification
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien-Shing Chen Author: Gustavo.
SVMs in a Nutshell.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
CS 9633 Machine Learning Support Vector Machines
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Class Imbalance, Redux Byron C Wallace,1,2 Kevin Small,1
Kernels Usman Roshan.
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Data Mining Classification: Alternative Techniques
Classification of class-imbalanced data
Usman Roshan CS 675 Machine Learning
CSCE833 Machine Learning Lecture 9 Linear Discriminant Analysis
Data Mining Class Imbalance
Linear Discrimination
University of Wisconsin - Madison
Outlines Introduction & Objectives Methodology & Workflow
Presentation transcript:

Applying Support Vector Machines to Imbalanced Datasets Authors: Rehan Akbani, Stephen Kwek (University of Texas at San Antonio, USA) Nathalie Japkowicz (University of Ottawa, Canada) Published: European Conference on Machine Learning (ECML), 2004 Presenter: Rehan Akbani Home Page:

Presentation Outline Motivation and Problem Definition Key Issues Support Vector Machines Background Problem in Detail Traditional Approaches to Solve the Problem Our Approach Results and Conclusions Future Work and Suggested Improvements

Motivation Imbalanced datasets are datasets where the negative instances far outnumber the positive instances (or vice versa). Naturally occurring imbalanced datasets:  Gene profiling  Medical diagnosis  Credit card fraud detection Ratios of negative to positive instances of 100 to 1 are not uncommon.

Key Issues Traditional algorithms such as SVM, decision trees, neural networks etc. perform poorly with imbalanced data. Accuracy is not a good metric to measure performance. Need to improve traditional algorithms so that they can handle imbalanced data. Need to define other metrics to measure performance.

Support Vector Machines Background Find the maximum margin boundary that separates the green and red instances.

Support Vector Machines Support Vectors Circled instances are support vectors.

Support Vector Machines Kernels Kernels allow non-linear separation of instances. E.g. Gaussian Kernel

Effects of Imbalance on SVM 1. Positive (minority) instances lie further away from the “ideal” boundary.

Effects of Imbalance on SVM 2. Support vector ratio is imbalanced. Support vectors are shown in red.

Effects of Imbalance on SVM 3. Weakness of Soft-Margins. Minimize the primal Lagrangian: Compromise between minimization of total error and maximization of margin.

Effects of Imbalance on SVM Margin is maximized at the cost of small total error

Traditional Approaches Oversample the minority class or undersample the majority class. Sample distribution is no longer random – its distribution no longer approximates the target distribution.  Defense: Sample biased to begin with With undersampling, we are discarding instances that may contain valuable information.

Problem with Undersampling BeforeAfter After undersampling, the learned plane estimates the distance of the ideal plane better but the orientation of the learned plane is no longer as accurate.

Our Approach – SMOTE with Different Costs (SDC) Do not undersample the majority class in order to retain all the information. Use Synthetic Minority Oversampling TEchnique (SMOTE) (Chawla et al, 2002). Use Different Error Costs (DEC) to push the boundary away from positive instances (Veropoulos et al, 1999).

Effect of DEC Before DEC After DEC

Effect of SMOTE and DEC – (SDC) After DECalone After SMOTE and DEC

Experiments Used 10 different UCI datasets. Compared with four other algorithms:  Regular SVM  Undersampling (US)  Different Error Costs (DEC) alone  SMOTE alone Used linear, polynomial (degree 2) and Radial Basis Function (RBF) (γ = 1) kernels.

Metric Used – g-means Used g-means metric (Kubat et al, 1997). Higher g-means means better performance: Sensitivity = TP / (TP + FN) Specificity = TN / (TN + FP) Used by researchers such as Kubat, Matwin, Holte, Wu, Chang (1997 – 2003) for imbalanced datasets. Can be computed easily and results can be displayed compactly. Suitable for use with several datasets and SVM, where time and space are limited.

Datasets Used - UCI

Results g-means metric for each algorithm and dataset

Results g-means graphs for each algorithm and dataset

Conclusions Our algorithm (SDC) outperforms all the other four algorithms. Undersampling is the runner-up. SDC performs better than undersampling in 9 out of 10 datasets. It always performs better than or equal to SMOTE. It performs better than or equal to DEC in 7 out of 10 datasets. It has similar limitations to that of SMOTE:  Assumes the space between two positive neighboring instances is positive.  Assumes the neighborhood of a positive instance is positive.

Future Work and Suggested Improvements Design a better over sampling technique that does not assume a convex positive space. Evaluate the algorithm on biological datasets with extremely high degrees of imbalance (over 10,000 to 1). Find out if the technique can be extended to other ML algorithms which have lower execution time than SVM. Analyze the robustness of the algorithm against noisy minority instances.

Questions?