Optimizing F-Measure with Support Vector Machines David R. Musicant Vipin Kumar Aysel Ozgur FLAIRS 2003 Tuesday, May 13, 2003 Carleton College.

Optimizing F-Measure with Support Vector Machines David R. Musicant Vipin Kumar Aysel Ozgur FLAIRS 2003 Tuesday, May 13, 2003 Carleton College

Overview Classification algorithms often evaluated by test set accuracy Test set accuracy can be a poor measure when one of the classes is rare Support Vector Machines (SVMs) are designed to optimize test set accuracy SVMs have been used in an ad-hoc manner on datasets with rare classes Our new results: current ad-hoc heuristic techniques can be theoretically justified.

Roadmap The Traditional SVM and variants Precision, Recall, and F-measure metrics The F-measure Maximizing SVM Equivalence of traditional SVM and F- measure SVM (for the right parameters) Implications and Conclusions

The Classification Problem Separating Surface: A+A+ A-A- = “margin”

The Classification Problem Given m points in the n dimensional space R n Each point represented as x i Membership of each point A i in the classes A + or A - is specified by y i = § 1 Separate by two bounding planes such that: More succinctly: for i =1,2,…, m.

Misclassification Count SVM ( ¢ ) * is the step function (1 if  > 0, 0 otherwise) “Push the planes apart, and minimize number of misclassified points.” – C balances two competing objectives –Minimizing w 0 w pushes planes apart –Problem NP-complete, objective non-differentiable

 > 0 is an arbitrary fixed constant that determines closeness of approximation. This is still difficult to solve. Approx Misclassification Count SVM where we use some differentiable approximation, such as

Standard “Soft Margin” SVM “Push the planes apart, and minimize distance of misclassified points.” We minimize total distances from misclassified points to bounding planes, not actual number of them. Much more tractable, does quite well in optimizing accuracy Does poorly when one class is rare

Weighted Standard SVM “Push the planes apart, and minimize weighted distance of misclassified points.” Allows one to choose different C values for the two classes. Often used to weight rare class more heavily. How do we measure success when one class is rare? Assuming that A + is the rare class…

Measures of success Precision and Recall are better descriptors when one class is rare.

F-measure F-measure: commonly used “average” of precision and recall Can C + and C - in the weighted SVM be balanced to optimize F-measure? Can we start over and invent an SVM to optimize F-measure?

Constructing an F-measure SVM How do we appropriately represent F- measure in an SVM? Substitute P and R into F : Thus to maximize F-measure, we minimize

Want to minimize FP = # misclassified A - FN = # misclassified A + New F-measure maximizing SVM: Constructing an F-measure SVM

The F-measure Maximizing SVM Approximate with sigmoid: Can we connect with standard SVM?

Weighted misclassification count SVM How do these two formulations relate? We show: –Pick a parameter C. –Find classifier to optimize F-measure SVM. –There exist parameters C + and C - such that misclassification counting SVM has same solution. –Proof and formulas to obtain C + and C - in paper. F-measure maximizing SVM

Implications of result Since there exist C +, C - to yield same solution as F-measure maximizing SVM, finding best C + and C - for the weighted standard SVM is “the right thing to do.” (modulo approximations) In practice, common trick is to choose C +, C - such that: This heuristic seems reasonable but is not optimal. (Good first guess?)

Implications of result Suppose that SVM fails to provide good F- measure for a given problem, for a wide range of C + and C - values. Q: Is there another SVM formulation that would yield better F-measure? A: Our evidence suggests not. Q: Is there another SVM formulation that would find best possible F-measure more directly? A: Yes, the F-measure maximizing SVM.

Conclusions / Summary We provide theoretical evidence that standard heuristic practices in using SVMs for optimizing F-measure are reasonable. We provide a framework for continued research in F-measure maximizing SVMs. All our results apply directly to SVMs with kernels (see paper). Future work: attacking F-measure maximizing SVM directly to find faster algorithms.

The Classification Problem A+A+ A-A- Which line is the better classifier?

The Classification Problem Separating Surface: A+A+ A-A- = “margin”

“Hard Margin” SVM “Push the planes as far apart as possible, while maintaining points on proper sides of bounding planes.” Distance between planes: Minimizing w 0 w pushes planes apart. What if there are no planes that correctly separate classes?

Optimizing F-Measure with Support Vector Machines David R. Musicant Vipin Kumar Aysel Ozgur FLAIRS 2003 Tuesday, May 13, 2003 Carleton College.

Similar presentations

Presentation on theme: "Optimizing F-Measure with Support Vector Machines David R. Musicant Vipin Kumar Aysel Ozgur FLAIRS 2003 Tuesday, May 13, 2003 Carleton College."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Optimizing F-Measure with Support Vector Machines David R. Musicant Vipin Kumar Aysel Ozgur FLAIRS 2003 Tuesday, May 13, 2003 Carleton College.

Similar presentations

Presentation on theme: "Optimizing F-Measure with Support Vector Machines David R. Musicant Vipin Kumar Aysel Ozgur FLAIRS 2003 Tuesday, May 13, 2003 Carleton College."— Presentation transcript:

Similar presentations

About project

Feedback