Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,

Similar presentations


Presentation on theme: "The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,"— Presentation transcript:

1 The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison, Wisconsin

2 Outline of Talk  Support Vector Machines (SVM) Introduction  Standard Quadratic Programming Formulation  SVM Feature Selection  The Disputed Federalist Papers  Results  Classification Agrees with Previous Results  Successive Linearization Algorithm (SLA)  Description of the Classification Problem  1-norm Linear SVMs  Separating Hyperplane in Three Dimensions Only  Description of Previous Work

3 What is a Support Vector Machine?  An optimally defined surface  Typically nonlinear in the input space  Linear in a higher dimensional space  Implicitly defined by a kernel function

4 What are Support Vector Machines Used For?  Classification  Regression & Data Fitting  Supervised & Unsupervised Learning (Will concentrate on classification)

5 Geometry of the Classification Problem 2-Category Linearly Separable Case A+ A-

6 Algebra of the Classification Problem 2-Category Linearly Separable Case  Given m points in n dimensional space  Represented by an m-by-n matrix A  More succinctly: where e is a vector of ones.  Separate by two bounding planes,  An m-by-m diagonal matrix D with +1 & -1 entries  Membership of each in class +1 or –1 specified by:

7 Support Vector Machines Maximizing the Margin between Bounding Planes A+ A- Support vectors

8 Support Vector Machines: Quadratic Programming Formulation  Solve the following quadratic program: min s.t. where is the weight of the training error  Maximize the margin by minimizing

9 Support Vector Machines: Linear Programming Formulation  Use the 1-norm instead of the 2-norm: min s.t.  This is equivalent to the following linear program: min s.t.

10 Feature Selection and SVMs  Use the step function to suppress components of the normal to the separating hyperplane: min s.t. Where:

11 Smooth Approximation of the Step Function

12 SVM Formulation with Feature Selection  For, we use the approximation of the step vector by the concave exponential:  Here is the base of natural logarithms. This leads to: min s.t.

13 Successive Linearization Algorithm (SLA) for Feature Selection  Choose. Start with some. Having, determine the next iterate by solving the LP: min s.t.  Stop when:  Proposition: Algorithm terminates in a finite number of steps (typically 5 to 7) at a stationary point.

14 The Federalist Papers  Written in 1787-1788 by Alexander Hamilton, John Jay and James Madison to persuade the citizens of New York to ratify the constitution.  Papers consisted of short essays, 900 to 3500 words in length.  Authorship of 12 of those papers have been in dispute ( Madison or Hamilton). These papers are referred to as the disputed Federalist papers.

15 Previous Work  Mosteller and Wallace (1964)  Using statistical inference, determined the authorship of the 12 disputed papers.  Bosch and Smith (1998).  Using linear programming techniques and the evaluation of every possible combination of one, two and three features, obtained a separating hyperplane using only three words.

16 Description of the data  For every paper:  Machine readable text was created using a scanner.  Computed relative frequencies of 70 words, that Mosteller-Wallace identified as good candidates for author-attribution studies.  Each document is represented as a vector containing the 70 real numbers corresponding to the 70 word frequencies.  The dataset consists of 118 papers:  50 Madison papers  56 Hamilton papers  12 disputed papers

17 Function Words Based on Relative Frequencies

18 SLA Feature Selection for Classifying the Disputed Federalist Papers  Apply the successive linearization algorithm to:  Train on the 106 Federalist papers with known authors  Find a classification hyperplane that uses as few words as possible  Use the hyperplane to classify the 12 disputed papers  The parameter was obtained by a tuning procedure.

19 Hyperplane Classifier Using 3 Words  A hyperplane depending on three words was found: 0.5368to+24.6634upon+2.9532would=66.6159  All disputed papers ended up on the Madison side of the plane

20 Results: 3d plot of resulting hyperplane

21 Comparison with Previous Work & Conclusion  Bosch and Smith (1998) calculated all the possible sets of one, two and three words to find a separating hyperplane. They solved 118,895 linear programs.  Our SLA algorithm for feature selection required the solution of only 6 linear programs.  Our classification of the disputed Federalist papers agrees with that of Mosteller- Wallace and Bosch-Smith.

22 More on SVMs:  My web page: www.cs.wisc.edu/~gfung  Olvi Mangasarian web page: www.cs.wisc.edu/~olvi


Download ppt "The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,"

Similar presentations


Ads by Google