Presentation is loading. Please wait.

Presentation is loading. Please wait.

Optimizing F-Measure with Support Vector Machines David R. Musicant Vipin Kumar Aysel Ozgur FLAIRS 2003 Tuesday, May 13, 2003 Carleton College.

Similar presentations


Presentation on theme: "Optimizing F-Measure with Support Vector Machines David R. Musicant Vipin Kumar Aysel Ozgur FLAIRS 2003 Tuesday, May 13, 2003 Carleton College."— Presentation transcript:

1 Optimizing F-Measure with Support Vector Machines David R. Musicant Vipin Kumar Aysel Ozgur FLAIRS 2003 Tuesday, May 13, 2003 Carleton College

2 Slide 2 Overview Classification algorithms often evaluated by test set accuracy Test set accuracy can be a poor measure when one of the classes is rare Support Vector Machines (SVMs) are designed to optimize test set accuracy SVMs have been used in an ad-hoc manner on datasets with rare classes Our new results: current ad-hoc heuristic techniques can be theoretically justified.

3 Slide 3 Roadmap The Traditional SVM and variants Precision, Recall, and F-measure metrics The F-measure Maximizing SVM Equivalence of traditional SVM and F- measure SVM (for the right parameters) Implications and Conclusions

4 Slide 4 The Classification Problem Separating Surface: A+A+ A-A- = “margin”

5 Slide 5 The Classification Problem Given m points in the n dimensional space R n Each point represented as x i Membership of each point A i in the classes A + or A - is specified by y i = § 1 Separate by two bounding planes such that: More succinctly: for i =1,2,…, m.

6 Slide 6 Misclassification Count SVM ( ¢ ) * is the step function (1 if  > 0, 0 otherwise) “Push the planes apart, and minimize number of misclassified points.” – C balances two competing objectives –Minimizing w 0 w pushes planes apart –Problem NP-complete, objective non-differentiable

7 Slide 7  > 0 is an arbitrary fixed constant that determines closeness of approximation. This is still difficult to solve. Approx Misclassification Count SVM where we use some differentiable approximation, such as

8 Slide 8 Standard “Soft Margin” SVM “Push the planes apart, and minimize distance of misclassified points.” We minimize total distances from misclassified points to bounding planes, not actual number of them. Much more tractable, does quite well in optimizing accuracy Does poorly when one class is rare

9 Slide 9 Weighted Standard SVM “Push the planes apart, and minimize weighted distance of misclassified points.” Allows one to choose different C values for the two classes. Often used to weight rare class more heavily. How do we measure success when one class is rare? Assuming that A + is the rare class…

10 Slide 10 Measures of success Precision and Recall are better descriptors when one class is rare.

11 Slide 11 F-measure F-measure: commonly used “average” of precision and recall Can C + and C - in the weighted SVM be balanced to optimize F-measure? Can we start over and invent an SVM to optimize F-measure?

12 Slide 12 Constructing an F-measure SVM How do we appropriately represent F- measure in an SVM? Substitute P and R into F : Thus to maximize F-measure, we minimize

13 Slide 13 Want to minimize FP = # misclassified A - FN = # misclassified A + New F-measure maximizing SVM: Constructing an F-measure SVM

14 Slide 14 The F-measure Maximizing SVM Approximate with sigmoid: Can we connect with standard SVM?

15 Slide 15 Weighted misclassification count SVM How do these two formulations relate? We show: –Pick a parameter C. –Find classifier to optimize F-measure SVM. –There exist parameters C + and C - such that misclassification counting SVM has same solution. –Proof and formulas to obtain C + and C - in paper. F-measure maximizing SVM

16 Slide 16 Implications of result Since there exist C +, C - to yield same solution as F-measure maximizing SVM, finding best C + and C - for the weighted standard SVM is “the right thing to do.” (modulo approximations) In practice, common trick is to choose C +, C - such that: This heuristic seems reasonable but is not optimal. (Good first guess?)

17 Slide 17 Implications of result Suppose that SVM fails to provide good F- measure for a given problem, for a wide range of C + and C - values. Q: Is there another SVM formulation that would yield better F-measure? A: Our evidence suggests not. Q: Is there another SVM formulation that would find best possible F-measure more directly? A: Yes, the F-measure maximizing SVM.

18 Slide 18 Conclusions / Summary We provide theoretical evidence that standard heuristic practices in using SVMs for optimizing F-measure are reasonable. We provide a framework for continued research in F-measure maximizing SVMs. All our results apply directly to SVMs with kernels (see paper). Future work: attacking F-measure maximizing SVM directly to find faster algorithms.

19 Slide 19 The Classification Problem A+A+ A-A- Which line is the better classifier?

20 Slide 20 The Classification Problem Separating Surface: A+A+ A-A- = “margin”

21 Slide 21 “Hard Margin” SVM “Push the planes as far apart as possible, while maintaining points on proper sides of bounding planes.” Distance between planes: Minimizing w 0 w pushes planes apart. What if there are no planes that correctly separate classes?


Download ppt "Optimizing F-Measure with Support Vector Machines David R. Musicant Vipin Kumar Aysel Ozgur FLAIRS 2003 Tuesday, May 13, 2003 Carleton College."

Similar presentations


Ads by Google