Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Disputed Federalist Papers: Resolution via Support Vector Machine Feature Selection Olvi Mangasarian UW Madison & UCSD La Jolla Glenn Fung Amazon Inc.,

Similar presentations


Presentation on theme: "The Disputed Federalist Papers: Resolution via Support Vector Machine Feature Selection Olvi Mangasarian UW Madison & UCSD La Jolla Glenn Fung Amazon Inc.,"— Presentation transcript:

1 The Disputed Federalist Papers: Resolution via Support Vector Machine Feature Selection Olvi Mangasarian UW Madison & UCSD La Jolla Glenn Fung Amazon Inc., Seattle, Washington

2 The Federalist Papers  Written in 1787-1788 by Alexander Hamilton and James Madison to persuade the citizens of New York to ratify the constitution.  Papers consisted of 118 short essays, 900 to 3500 words in length.  Authorship of 106 of the papers is definitely known  Authorship of 12 of those papers has been in dispute ( Madison or Hamilton). These papers are referred to as the disputed Federalist papers.  In this talk we shall resolve this dispute using a linear support vector machine classifier.

3 Outline of Talk  Support Vector Machines (SVM) Introduction  Standard Quadratic Programming Formulation  SVM Feature Selection via Concave Minimization  The Disputed Federalist Papers  Results  Classification Agrees with Previous Results  Successive Linearization Algorithm (SLA)  Description of the Classification Problem  1-norm Linear Linear Programming Formulation  Linear Classifier in Three Dimensions Resolves Dispute  Description of Previous Work

4 What is a Support Vector Machine?  An optimally defined surface  Typically nonlinear in the input space  Linear in a higher dimensional space  Implicitly defined by a kernel function

5 What are Support Vector Machines Used For?  Classification  Regression & Data Fitting  Supervised & Unsupervised Learning (Will concentrate on classification)

6 Geometry of the Classification Problem 2-Category Linearly Separable Case A+ A-

7 Algebra of the Classification Problem 2-Category Linearly Separable Case  Given m points in n dimensional space  Represented by an m-by-n matrix A  More succinctly: where e is a vector of ones.  Separate by two bounding planes,  An m-by-m diagonal matrix D with +1 & -1 entries  Membership of each in class +1 or –1 specified by:

8 Support Vector Machines Maximizing the Margin between Bounding Planes A+ A- Support vectors

9 Support Vector Machines: Quadratic Programming Formulation  Solve the following quadratic program: min s.t. where is the weight of the training error  Maximize the margin by minimizing

10 Support Vector Machines: Linear Programming Formulation  Use the 1-norm instead of the 2-norm: min s.t.  This is equivalent to the following linear program: min s.t.

11 Feature Selection and SVMs  Use the step function to suppress components of the normal to the separating hyperplane: min s.t. Where:

12 Smooth Approximation of the Step Function

13 SVM Formulation with Feature Selection  For, we use the approximation of the step vector by the concave exponential:  Here is the base of natural logarithms. This leads to: min s.t.

14 Successive Linearization Algorithm (SLA) for Feature Selection  Choose. Start with some. Having, determine the next iterate by solving the LP: min s.t.  Stop when:min  Proposition: Algorithm terminates in a finite number of steps (typically 5 to 7) at a stationary point.

15 The Federalist Papers (As Described Earlier )  Written in 1787-1788 by Alexander Hamilton and James Madison to persuade the citizens of New York to ratify the constitution.  Papers consisted of short essays, 900 to 3500 words in length.  Authorship of 12 of those papers has been in dispute ( Madison or Hamilton). These papers are referred to as the disputed Federalist papers.

16 Previous Work  Mosteller and Wallace (1964)  Using statistical inference, determined the authorship of the 12 disputed papers.  Bosch and Smith (1998).  Using linear programming techniques and the evaluation of every possible combination of one, two and three features out of 70, obtained a best separating hyperplane using only three words.

17 Description of the data  For every paper:  Machine readable text was created using a scanner.  Computed relative frequencies of 70 words, that Mosteller-Wallace identified as good candidates for author-attribution studies.  Each document is represented as a vector containing the 70 real numbers corresponding to the 70 word frequencies.  The dataset consists of 118 papers:  50 Madison papers  56 Hamilton papers  12 disputed papers

18 Function Words Based on Relative Frequencies

19 SLA Feature Selection for Classifying the Disputed Federalist Papers  Apply the successive linearization algorithm to:  Train on the 106 Federalist papers with known authors  Find a classification hyperplane that uses as few words as possible  Use the hyperplane to classify the 12 disputed papers  The parameter was obtained by a tuning procedure.

20 Hyperplane Classifier Using 3 Words  A hyperplane depending on three words was found: - 0.5368to-24.6634upon-2.9532would=-66.6159  Hamilton > -66.6159  Madison< -66.6159  All disputed papers ended up on the Madison side of the plane

21 Results: 3d plot of resulting hyperplane

22 Comparison with Previous Work & Conclusion  Bosch and Smith (1998) calculated all the possible combinations of one, two and three words out of 70 to find a separating hyperplane. They solved 57,225 linear programs.  -.5242are+.8895our+4.9235upon=4.7368  Our SLA algorithm for feature selection required the solution of only 6 linear programs.  Our classification of the disputed Federalist papers agrees with that of both Mosteller-Wallace and Bosch-Smith.

23 References  K.P. Bennett & O.L. Mangasarian: Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software 1, 1992, 23-34.  P.S. Bradley & O.L. Mangasarian: Feature Selection via concave minimization and support vector machines. ICML 1998: Machine Learning Proceedings of the Fifteenth International Conference, San Francisco, Califrnia 1998, J. Shavlik, editor, pages 82-90, Morgan Kaufman.

24 References (Continued)  R.A. Bosch & J.A.Smith: Separating hyperplanes and the authorship of the disputed federalist papers. Amrican Mathematical Monthly 105(7) 601-608, 1998  F. Mosteller & D.L.Wallace: Inference and disputed authorship: The Federalist. Addison-Wesley, Reading, Massachusetts, 1964.  F. Mosteller & D.L.Wallace: Applied Bayesian and classical inference: The case of the Federalist papers, Second Edition, Springer-Verlag, New York 1984.

25 More on SVMs:  Olvi Mangasarian’s web page: http://pages.cs.wisc.edu/~olvi


Download ppt "The Disputed Federalist Papers: Resolution via Support Vector Machine Feature Selection Olvi Mangasarian UW Madison & UCSD La Jolla Glenn Fung Amazon Inc.,"

Similar presentations


Ads by Google