Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Support Vector Machines
Lecture 9 Support Vector Machines
ECG Signal processing (2)
S UPPORT V ECTOR M ACHINES Jianping Fan Dept of Computer Science UNC-Charlotte.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines
Support Vector Machines Instructor Max Welling ICS273A UCIrvine.
An Introduction of Support Vector Machine
Support Vector Machines
1 Lecture 5 Support Vector Machines Large-margin linear classifier Non-separable case The Kernel trick.
Support Vector Machines
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Support Vector Machines
SVMs Finalized. Where we are Last time Support vector machines in grungy detail The SVM objective function and QP Today Last details on SVMs Putting it.
Support Vector Machines
Lecture 10: Support Vector Machines
SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.
An Introduction to Support Vector Machines Martin Law.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Support Vector Machine & Image Classification Applications
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
Support Vector Machine (SVM) Based on Nello Cristianini presentation
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
An Introduction to Support Vector Machines (M. Law)
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support Vector Machine Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata November 3, 2014.
Support Vector Machines Tao Department of computer science University of Illinois.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
SVMs in a Nutshell.
CSSE463: Image Recognition Day 14 Lab due Weds. Lab due Weds. These solutions assume that you don't threshold the shapes.ppt image: Shape1: elongation.
Support Vector Machine (SVM) Presented by Robert Chen.
Introduction to Machine Learning Prof. Nir Ailon Lecture 5: Support Vector Machines (SVM)
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
SUPPORT VECTOR MACHINES
Support vector machines
PREDICT 422: Practical Machine Learning
Support Vector Machine
Support Vector Machines
Support Vector Machines (SVM)
An Introduction to Support Vector Machines
Kernels Usman Roshan.
Support Vector Machines
Support Vector Machines
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 14
Support Vector Machines
Support vector machines
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 14
Support Vector Machines and Kernels
Usman Roshan CS 675 Machine Learning
Support vector machines
Support vector machines
CSSE463: Image Recognition Day 14
SVMs for Document Ranking
Presentation transcript:

Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata

A Linear Classifier 2 A Line (generally hyperplane) that separates the two classes of points Choose a “good” line  Optimize some objective function  LDA: objective function depending on mean and scatter  Depends on all the points There can be many such lines, many parameters to optimize

Recall: A Linear Classifier 3  What do we really want?  Primarily – least number of misclassifications  Consider a separation line  When will we worry about misclassification?  Answer: when the test point is near the margin  So – why consider scatter, mean etc (those depend on all points), rather just concentrate on the “border”

Support Vector Machine: intuition 4  Recall: A projection line w for the points lets us define a separation line L  How? [not mean and scatter]  Identify support vectors, the training data points that act as “support”  Separation line L between support vectors  Maximize the margin: the distance between lines L 1 and L 2 (hyperplanes) defined by the support vectors w L support vectors L2L2 L1L1

Basics Distance of L from origin 5 w

Support Vector Machine: classification 6  Denote the two classes as y = +1 and −1  Then for a unlabeled point x, the classification problem is: w

Support Vector Machine: training 7  Scale w and b such that we have the lines are defined by these equations  Then we have: w  The margin (separation of the two classes) Two classes as y i =−1, +1

Soft margin SVM 8 The non-ideal case  Non separable training data  Slack variables ξ i for each training data point Soft margin SVM w δ (Hard margin) SVM Primal ξiξi ξjξj  C is the controlling parameter  Small C  allows large ξ i ’s; large C  forces small ξ i ’s Sum: an upper bound on #of misclassifications on training data

Dual SVM Primal SVM Optimization problem 9 Theorem: The solution w * can always be written as a linear combination of the training vectors x i with 0 ≤ α i ≤ C Properties:  The factors α i indicate influence of the training examples x i  If ξ i > 0, then α i ≤ C. If α i < C, then ξ i = 0  x i is a support vector if and only if α i > 0  If 0 < α i < C, then y i (w *  x i + b) = 1 Dual SVM Optimization problem

Case: not linearly separable 10  Data may not be linearly separable  Map the data into a higher dimensional space  Data can become separable in the higher dimensional space  Idea: add more features  Learn linear rule in feature space abc abcaabbccabbcac

Dual SVM Primal SVM Optimization problem 11 If w * is a solution to the primal and α * = (α * i ) is a solution to the dual, then  Mapping into the features space with Φ  Even higher dimension; p attributes  O(np) attributes with a n degree polynomial Φ  The dual problem depends only on the inner products  What if there was some way to compute Φ(x i )  Φ(x j )?  Kernel functions: functions such that K(a, b) = Φ(a)  Φ(b) Dual SVM Optimization problem

SVM kernels  Linear: K(a, b) = a  b  Polynomial: K(a, b) = [a  b + 1] d  Radial basis function: K(a, b) = exp(−γ[a − b] 2 )  Sigmoid: K(a, b) = tanh(γ[a  b] + c) Example: degree-2 polynomial  Φ(x) = Φ(x 1, x 2 ) = (x 1 2, x 2 2,√2x 1,√2x 2,√2x 1 x 2,1)  K(a, b) = [a  b + 1] 2 12

SVM Kernels: Intuition 13 Degree 2 polynomial Radial basis function

Acknowledgments  Thorsten Joachims’ lecture notes for some slides 14