Extreme Re-balancing for SVMs and other classifiers Presenter: Cui, Shuoyang 2005/03/02 Authors: Bhavani Raskutti & Adam Kowalczyk Telstra Croporation.

Slides:



Advertisements
Similar presentations
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Advertisements

ECG Signal processing (2)
STOR 892 Object Oriented Data Analysis Radial Distance Weighted Discrimination Jie Xiong Advised by Prof. J.S. Marron Department of Statistics and Operations.
Clustering High Dimensional Data Using SVM
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM—Support Vector Machines
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.
The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,
Robust Multi-Kernel Classification of Uncertain and Imbalanced Data
Support Vector Machines and Kernel Methods
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
SUPPORT VECTOR MACHINES PRESENTED BY MUTHAPPA. Introduction Support Vector Machines(SVMs) are supervised learning models with associated learning algorithms.
Text Classification With Support Vector Machines
Support Vector Machines Kernel Machines
Image Categorization by Learning and Reasoning with Regions Yixin Chen, University of New Orleans James Z. Wang, The Pennsylvania State University Published.
Announcements  Project teams should be decided today! Otherwise, you will work alone.  If you have any question or uncertainty about the project, talk.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Data mining and statistical learning - lecture 13 Separating hyperplane.
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
An Introduction to Support Vector Machines Martin Law.
SVMLight SVMLight is an implementation of Support Vector Machine (SVM) in C. Download source from :
Identifying Computer Graphics Using HSV Model And Statistical Moments Of Characteristic Functions Xiao Cai, Yuewen Wang.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Extreme Re-balancing for SVMs: a case study Advisor :
Active Learning for Class Imbalance Problem
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
The Disputed Federalist Papers: Resolution via Support Vector Machine Feature Selection Olvi Mangasarian UW Madison & UCSD La Jolla Glenn Fung Amazon Inc.,
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Filtering and Recommendation INST 734 Module 9 Doug Oard.
Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Linear Document Classifier.
Anti-Learning Adam Kowalczyk Statistical Machine Learning NICTA, Canberra 1 National ICT Australia Limited is funded and.
One-class Training for Masquerade Detection Ke Wang, Sal Stolfo Columbia University Computer Science IDS Lab.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
Protein Classification Using Averaged Perceptron SVM
D. M. J. Tax and R. P. W. Duin. Presented by Mihajlo Grbovic Support Vector Data Description.
1 CSC 4510, Spring © Paula Matuszek CSC 4510 Support Vector Machines (SVMs)
Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.
Linear Models for Classification
Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction Li Lihong (Anna Lee) Cumputer science 22th,Apr.
Class Imbalance in Text Classification
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
© Telstra Corporation Limited 2002 Research Laboratories KDD Cup 2002: Single Class SVM for Yeast Gene Regulation Prediction Adam Kowalczyk Bhavani Raskutti.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
SVMs, Part 2 Summary of SVM algorithm Examples of “custom” kernels Standardizing data for SVMs Soft-margin SVMs.
Gist 2.3 John H. Phan MIBLab Summer Workshop June 28th, 2006.
SVMs in a Nutshell.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 1 Computational Statistics with Application to Bioinformatics Prof. William.
Extending linear models by transformation (section 3.4 in text) (lectures 3&4 on amlbook.com)
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Machine Learning Basics
An Introduction to Support Vector Machines
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Classification of class-imbalanced data
iSRD Spam Review Detection with Imbalanced Data Distributions
Implementing AdaBoost
Machine Learning with Clinical Data
Anti-Learning Adam Kowalczyk Statistical Machine Learning
Presentation transcript:

Extreme Re-balancing for SVMs and other classifiers Presenter: Cui, Shuoyang 2005/03/02 Authors: Bhavani Raskutti & Adam Kowalczyk Telstra Croporation Victoria, Austalia

Imbalance makes the minority-classes samples farther from the true boundary than the majority-class samples. Majority-class samples dominate the penalty introduced by soft margin. ideal Majority Minority

Data Balancing up/down samplings No convincing evidence for how the balanced data sampled Imbalance-free algorithm design Objective function should not be accuracy any longer Reference: Machine Learning from Imbalanced Data Sets 101

In this paper Exploring the characters of two class learning and analyses situations with supervised learning. In the experiments offered later, comparing one-class learning with two class learning and list different forms of imbalance compensation.

Two class discrimination to take examples from these two classes generate a model for discriminating them for many machine learning algorithms, the training data should include the example form two classes.

When the data has heavily unbalanced representatives of these two class. design re-balancing ignore the large pool of negative examples learn from positive examples only

Why extreme re-balancing Extreme imbalance in very high dimensional input spaces Minority class consisting of 1-3% of the total data Learning sample size is much below the dimensionality of the input space Data site has more than 10,000 features

The kernel machine The kernel machine is solved iteratively using the conjugate gradient method. Designing a kernel machine is to take a standard algorithm and massage it so that all references to the original data vectors x appear only in dot products ( xi; xj). Given a training sequence(xi,yi) of binary n— vectors and bipolar labels

Two different cases of kernel machines used here

Two forms of imbalance compensation Sample balancing Weight balancing

Sample balancing 1: the case of 1-class learner using all of the negative examples 1: the case of 2-class learner using all training examples 0: the case of 1-class learner using all of the positive examples

Weight balancing Using different values of the regularisation of the regulation constants for both the minority and majority class data B is a parameter called a balance factor

Experiments Real world data collections AHR-data Reuters data

Combined training and test data set Each training instance labeled with “control”, “change” or “nc” Convert all of the info from different files to a sparse matrix containing features AHR-data

Reuters data A collection of documents Each document has been converted to a vector of dimensional word-presence feature space

AROC is used as performance measure AROC is the Area under the ROC Receiver operating characteristic (ROC) curves are used to describe and compare the performance of diagnostic technology and diagnostic algorithms.

Experiments with Real World Data Impact of regularisation constant Experiment with sample balancing Experiments with weight balancing

Impact of regularisation constant

Experiment with sample balancing The AROC with 2-class learners is close to 1 for all categories indicating that this categorization problem is easy to learn

Experiments with weight balancing 1.Test on AHRdata

Experiments with weight balancing 2.Test on Reuters To observe the performance ouf 1-class and 2-class SVMs when the most features are moved

The characters of test on Reuters The accuracy of all classifiers is very high SVM models start degenerating, the drop in performance for 2-class SVM is larger. 1-class SVM models start outperforming 2-class models Similar trends AROC is always bigger than 0.5