Optimizing F-Measure with Support Vector Machines David R. Musicant Vipin Kumar Aysel Ozgur FLAIRS 2003 Tuesday, May 13, 2003 Carleton College.

Slides:



Advertisements
Similar presentations
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Advertisements

Lecture 9 Support Vector Machines
ECG Signal processing (2)
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine

Support Vector Machines Instructor Max Welling ICS273A UCIrvine.
An Introduction of Support Vector Machine
Support Vector Machines
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
SVMs Reprised. Administrivia I’m out of town Mar 1-3 May have guest lecturer May cancel class Will let you know more when I do...
The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,
Robust Multi-Kernel Classification of Uncertain and Imbalanced Data
Support Vector Machines (and Kernel Methods in general)
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
SUPPORT VECTOR MACHINES PRESENTED BY MUTHAPPA. Introduction Support Vector Machines(SVMs) are supervised learning models with associated learning algorithms.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Margins, support vectors, and linear programming Thanks to Terran Lane and S. Dreiseitl.
Recent Results in Support Vector Machines Dave Musicant Graphic generated with Lucent Technologies Demonstration 2-D Pattern Recognition Applet at
Support Vector Machines
SVMs Finalized. Where we are Last time Support vector machines in grungy detail The SVM objective function and QP Today Last details on SVMs Putting it.
Support Vector Machines
SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Efficient Model Selection for Support Vector Machines
SVM by Sequential Minimal Optimization (SMO)
Support Vector Machine & Image Classification Applications
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
The Disputed Federalist Papers: Resolution via Support Vector Machine Feature Selection Olvi Mangasarian UW Madison & UCSD La Jolla Glenn Fung Amazon Inc.,
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
An Introduction to Support Vector Machines (M. Law)
Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison
CS 478 – Tools for Machine Learning and Data Mining SVM.
RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.
Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
Feature Selection Poonam Buch. 2 The Problem  The success of machine learning algorithms is usually dependent on the quality of data they operate on.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
SVMs, Part 2 Summary of SVM algorithm Examples of “custom” kernels Standardizing data for SVMs Soft-margin SVMs.
SVMs in a Nutshell.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
CSSE463: Image Recognition Day 14 Lab due Weds. Lab due Weds. These solutions assume that you don't threshold the shapes.ppt image: Shape1: elongation.
Machine Learning and Data Mining: A Math Programming- Based Approach Glenn Fung CS412 April 10, 2003 Madison, Wisconsin.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 1 Computational Statistics with Application to Bioinformatics Prof. William.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.
Knowledge-Based Nonlinear Support Vector Machine Classifiers Glenn Fung, Olvi Mangasarian & Jude Shavlik COLT 2003, Washington, DC. August 24-27, 2003.
Support Vector Machine Slides from Andrew Moore and Mingyue Tan.
Support vector machines
PREDICT 422: Practical Machine Learning
Support Vector Machines
An Introduction to Support Vector Machines
Statistical Learning Dong Liu Dept. EEIS, USTC.
Support vector machines
Support vector machines
Support vector machines
Concave Minimization for Support Vector Machine Classifiers
University of Wisconsin - Madison
Discriminative Training
Presentation transcript:

Optimizing F-Measure with Support Vector Machines David R. Musicant Vipin Kumar Aysel Ozgur FLAIRS 2003 Tuesday, May 13, 2003 Carleton College

Slide 2 Overview Classification algorithms often evaluated by test set accuracy Test set accuracy can be a poor measure when one of the classes is rare Support Vector Machines (SVMs) are designed to optimize test set accuracy SVMs have been used in an ad-hoc manner on datasets with rare classes Our new results: current ad-hoc heuristic techniques can be theoretically justified.

Slide 3 Roadmap The Traditional SVM and variants Precision, Recall, and F-measure metrics The F-measure Maximizing SVM Equivalence of traditional SVM and F- measure SVM (for the right parameters) Implications and Conclusions

Slide 4 The Classification Problem Separating Surface: A+A+ A-A- = “margin”

Slide 5 The Classification Problem Given m points in the n dimensional space R n Each point represented as x i Membership of each point A i in the classes A + or A - is specified by y i = § 1 Separate by two bounding planes such that: More succinctly: for i =1,2,…, m.

Slide 6 Misclassification Count SVM ( ¢ ) * is the step function (1 if  > 0, 0 otherwise) “Push the planes apart, and minimize number of misclassified points.” – C balances two competing objectives –Minimizing w 0 w pushes planes apart –Problem NP-complete, objective non-differentiable

Slide 7  > 0 is an arbitrary fixed constant that determines closeness of approximation. This is still difficult to solve. Approx Misclassification Count SVM where we use some differentiable approximation, such as

Slide 8 Standard “Soft Margin” SVM “Push the planes apart, and minimize distance of misclassified points.” We minimize total distances from misclassified points to bounding planes, not actual number of them. Much more tractable, does quite well in optimizing accuracy Does poorly when one class is rare

Slide 9 Weighted Standard SVM “Push the planes apart, and minimize weighted distance of misclassified points.” Allows one to choose different C values for the two classes. Often used to weight rare class more heavily. How do we measure success when one class is rare? Assuming that A + is the rare class…

Slide 10 Measures of success Precision and Recall are better descriptors when one class is rare.

Slide 11 F-measure F-measure: commonly used “average” of precision and recall Can C + and C - in the weighted SVM be balanced to optimize F-measure? Can we start over and invent an SVM to optimize F-measure?

Slide 12 Constructing an F-measure SVM How do we appropriately represent F- measure in an SVM? Substitute P and R into F : Thus to maximize F-measure, we minimize

Slide 13 Want to minimize FP = # misclassified A - FN = # misclassified A + New F-measure maximizing SVM: Constructing an F-measure SVM

Slide 14 The F-measure Maximizing SVM Approximate with sigmoid: Can we connect with standard SVM?

Slide 15 Weighted misclassification count SVM How do these two formulations relate? We show: –Pick a parameter C. –Find classifier to optimize F-measure SVM. –There exist parameters C + and C - such that misclassification counting SVM has same solution. –Proof and formulas to obtain C + and C - in paper. F-measure maximizing SVM

Slide 16 Implications of result Since there exist C +, C - to yield same solution as F-measure maximizing SVM, finding best C + and C - for the weighted standard SVM is “the right thing to do.” (modulo approximations) In practice, common trick is to choose C +, C - such that: This heuristic seems reasonable but is not optimal. (Good first guess?)

Slide 17 Implications of result Suppose that SVM fails to provide good F- measure for a given problem, for a wide range of C + and C - values. Q: Is there another SVM formulation that would yield better F-measure? A: Our evidence suggests not. Q: Is there another SVM formulation that would find best possible F-measure more directly? A: Yes, the F-measure maximizing SVM.

Slide 18 Conclusions / Summary We provide theoretical evidence that standard heuristic practices in using SVMs for optimizing F-measure are reasonable. We provide a framework for continued research in F-measure maximizing SVMs. All our results apply directly to SVMs with kernels (see paper). Future work: attacking F-measure maximizing SVM directly to find faster algorithms.

Slide 19 The Classification Problem A+A+ A-A- Which line is the better classifier?

Slide 20 The Classification Problem Separating Surface: A+A+ A-A- = “margin”

Slide 21 “Hard Margin” SVM “Push the planes as far apart as possible, while maintaining points on proper sides of bounding planes.” Distance between planes: Minimizing w 0 w pushes planes apart. What if there are no planes that correctly separate classes?