The Disputed Federalist Papers: Resolution via Support Vector Machine Feature Selection Olvi Mangasarian UW Madison & UCSD La Jolla Glenn Fung Amazon Inc.,

Slides:



Advertisements
Similar presentations
Lecture 9 Support Vector Machines
Advertisements

ECG Signal processing (2)
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine

An Introduction of Support Vector Machine
Support Vector Machines
SVM—Support Vector Machines
Machine learning continued Image source:
The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,
Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Dual Problem of Linear Program subject to Primal LP Dual LP subject to ※ All duality theorems hold and work perfectly!
Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Support Vector Machines Kernel Machines
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Binary Classification Problem Learn a Classifier from the Training Set
Support Vector Machines
Unconstrained Optimization Problem
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Support Vector Regression David R. Musicant and O.L. Mangasarian International Symposium on Mathematical Programming Thursday, August 10, 2000
Support Vector Machines
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Mathematical Programming in Support Vector Machines
Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Efficient Model Selection for Support Vector Machines
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian & Edward Wild University of Wisconsin Madison Workshop on Optimization-Based Data.
Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian Edward Wild University of Wisconsin Madison.
Support Vector Machines in Data Mining AFOSR Software & Systems Annual Meeting Syracuse, NY June 3-7, 2002 Olvi L. Mangasarian Data Mining Institute University.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Universit at Dortmund, LS VIII
Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison
Support Vector Machine Data Mining Olvi L. Mangasarian with Glenn M. Fung, Jude W. Shavlik & Collaborators at ExonHit – Paris Data Mining Institute University.
RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.
Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison.
Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier.
Support Vector Machines Tao Department of computer science University of Illinois.
Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Second Annual Review June 1, 2001 Data Mining Institute.
Privacy-Preserving Support Vector Machines via Random Kernels Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison March 3, 2016 TexPoint.
Survival-Time Classification of Breast Cancer Patients and Chemotherapy Yuh-Jye Lee, Olvi Mangasarian & W. H. Wolberg UW Madison & UCSD La Jolla Computational.
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
Machine Learning and Data Mining: A Math Programming- Based Approach Glenn Fung CS412 April 10, 2003 Madison, Wisconsin.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
Minimal Kernel Classifiers Glenn Fung Olvi Mangasarian Alexander Smola Data Mining Institute University of Wisconsin - Madison Informs 2002 San Jose, California,
Support Vector Machines Louis Oliphant Cs540 section 2.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Knowledge-Based Nonlinear Support Vector Machine Classifiers Glenn Fung, Olvi Mangasarian & Jude Shavlik COLT 2003, Washington, DC. August 24-27, 2003.
Support vector machines
Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi
Geometrical intuition behind the dual problem
Support vector machines
Support vector machines
Support vector machines
Concave Minimization for Support Vector Machine Classifiers
Linear Discrimination
University of Wisconsin - Madison
University of Wisconsin - Madison
Minimal Kernel Classifiers
Presentation transcript:

The Disputed Federalist Papers: Resolution via Support Vector Machine Feature Selection Olvi Mangasarian UW Madison & UCSD La Jolla Glenn Fung Amazon Inc., Seattle, Washington

The Federalist Papers  Written in by Alexander Hamilton and James Madison to persuade the citizens of New York to ratify the constitution.  Papers consisted of 118 short essays, 900 to 3500 words in length.  Authorship of 106 of the papers is definitely known  Authorship of 12 of those papers has been in dispute ( Madison or Hamilton). These papers are referred to as the disputed Federalist papers.  In this talk we shall resolve this dispute using a linear support vector machine classifier.

Outline of Talk  Support Vector Machines (SVM) Introduction  Standard Quadratic Programming Formulation  SVM Feature Selection via Concave Minimization  The Disputed Federalist Papers  Results  Classification Agrees with Previous Results  Successive Linearization Algorithm (SLA)  Description of the Classification Problem  1-norm Linear Linear Programming Formulation  Linear Classifier in Three Dimensions Resolves Dispute  Description of Previous Work

What is a Support Vector Machine?  An optimally defined surface  Typically nonlinear in the input space  Linear in a higher dimensional space  Implicitly defined by a kernel function

What are Support Vector Machines Used For?  Classification  Regression & Data Fitting  Supervised & Unsupervised Learning (Will concentrate on classification)

Geometry of the Classification Problem 2-Category Linearly Separable Case A+ A-

Algebra of the Classification Problem 2-Category Linearly Separable Case  Given m points in n dimensional space  Represented by an m-by-n matrix A  More succinctly: where e is a vector of ones.  Separate by two bounding planes,  An m-by-m diagonal matrix D with +1 & -1 entries  Membership of each in class +1 or –1 specified by:

Support Vector Machines Maximizing the Margin between Bounding Planes A+ A- Support vectors

Support Vector Machines: Quadratic Programming Formulation  Solve the following quadratic program: min s.t. where is the weight of the training error  Maximize the margin by minimizing

Support Vector Machines: Linear Programming Formulation  Use the 1-norm instead of the 2-norm: min s.t.  This is equivalent to the following linear program: min s.t.

Feature Selection and SVMs  Use the step function to suppress components of the normal to the separating hyperplane: min s.t. Where:

Smooth Approximation of the Step Function

SVM Formulation with Feature Selection  For, we use the approximation of the step vector by the concave exponential:  Here is the base of natural logarithms. This leads to: min s.t.

Successive Linearization Algorithm (SLA) for Feature Selection  Choose. Start with some. Having, determine the next iterate by solving the LP: min s.t.  Stop when:min  Proposition: Algorithm terminates in a finite number of steps (typically 5 to 7) at a stationary point.

The Federalist Papers (As Described Earlier )  Written in by Alexander Hamilton and James Madison to persuade the citizens of New York to ratify the constitution.  Papers consisted of short essays, 900 to 3500 words in length.  Authorship of 12 of those papers has been in dispute ( Madison or Hamilton). These papers are referred to as the disputed Federalist papers.

Previous Work  Mosteller and Wallace (1964)  Using statistical inference, determined the authorship of the 12 disputed papers.  Bosch and Smith (1998).  Using linear programming techniques and the evaluation of every possible combination of one, two and three features out of 70, obtained a best separating hyperplane using only three words.

Description of the data  For every paper:  Machine readable text was created using a scanner.  Computed relative frequencies of 70 words, that Mosteller-Wallace identified as good candidates for author-attribution studies.  Each document is represented as a vector containing the 70 real numbers corresponding to the 70 word frequencies.  The dataset consists of 118 papers:  50 Madison papers  56 Hamilton papers  12 disputed papers

Function Words Based on Relative Frequencies

SLA Feature Selection for Classifying the Disputed Federalist Papers  Apply the successive linearization algorithm to:  Train on the 106 Federalist papers with known authors  Find a classification hyperplane that uses as few words as possible  Use the hyperplane to classify the 12 disputed papers  The parameter was obtained by a tuning procedure.

Hyperplane Classifier Using 3 Words  A hyperplane depending on three words was found: to upon would=  Hamilton >  Madison<  All disputed papers ended up on the Madison side of the plane

Results: 3d plot of resulting hyperplane

Comparison with Previous Work & Conclusion  Bosch and Smith (1998) calculated all the possible combinations of one, two and three words out of 70 to find a separating hyperplane. They solved 57,225 linear programs.  are+.8895our upon=  Our SLA algorithm for feature selection required the solution of only 6 linear programs.  Our classification of the disputed Federalist papers agrees with that of both Mosteller-Wallace and Bosch-Smith.

References  K.P. Bennett & O.L. Mangasarian: Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software 1, 1992,  P.S. Bradley & O.L. Mangasarian: Feature Selection via concave minimization and support vector machines. ICML 1998: Machine Learning Proceedings of the Fifteenth International Conference, San Francisco, Califrnia 1998, J. Shavlik, editor, pages 82-90, Morgan Kaufman.

References (Continued)  R.A. Bosch & J.A.Smith: Separating hyperplanes and the authorship of the disputed federalist papers. Amrican Mathematical Monthly 105(7) , 1998  F. Mosteller & D.L.Wallace: Inference and disputed authorship: The Federalist. Addison-Wesley, Reading, Massachusetts,  F. Mosteller & D.L.Wallace: Applied Bayesian and classical inference: The case of the Federalist papers, Second Edition, Springer-Verlag, New York 1984.

More on SVMs:  Olvi Mangasarian’s web page: