Classifying and clustering using Support Vector Machine 2 nd PhD report PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Suppervisor:

Slides:

Advertisements

Similar presentations

3.6 Support Vector Machines

Advertisements

Introduction to Support Vector Machines (SVM)

Support Vector Machine

Lecture 9 Support Vector Machines

ECG Signal processing (2)

Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.

An Introduction of Support Vector Machine

Classification / Regression Support Vector Machines

Support Vector Machines

SVM—Support Vector Machines

Support vector machine

Machine learning continued Image source:

Dual Problem of Linear Program subject to Primal LP Dual LP subject to ※ All duality theorems hold and work perfectly!

Using Analytic QP and Sparseness to Speed Training of Support Vector Machines John C. Platt Presented by: Travis Desell.

Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.

Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.

Support Vector Machines and Kernel Methods

Topic 7 Support Vector Machine for Classification.

2806 Neural Computation Support Vector Machines Lecture Ari Visa.

Lecture 10: Support Vector Machines

Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.

Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:

SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.

An Introduction to Support Vector Machines Martin Law.

Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.

“Study on Parallel SVM Based on MapReduce” Kuei-Ti Lu 03/12/2015.

Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.

Support Vector Machine (SVM) Based on Nello Cristianini presentation

10/18/ Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

An Introduction to Support Vector Machines (M. Law)

Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:

Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

CS 478 – Tools for Machine Learning and Data Mining SVM.

RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

Support Vector Machines Project מגישים : גיל טל ואורן אגם מנחה : מיקי אלעד נובמבר 1999 הטכניון מכון טכנולוגי לישראל הפקולטה להנדסת חשמל המעבדה לעיבוד וניתוח.

Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.

Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University.

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Support Vector Machine Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata November 3, 2014.

Support Vector Machines Tao Department of computer science University of Illinois.

Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.

Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.

Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)

Support vector machines

PREDICT 422: Practical Machine Learning

Support Vector Machine

Omer Boehm A tutorial about SVM Omer Boehm

Computational Intelligence: Methods and Applications

Geometrical intuition behind the dual problem

An Introduction to Support Vector Machines

Kernels Usman Roshan.

Support Vector Machines Introduction to Data Mining, 2nd Edition by

Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis

Support Vector Machines

Statistical Learning Dong Liu Dept. EEIS, USTC.

The following slides are taken from:

Support vector machines

Usman Roshan CS 675 Machine Learning

Support vector machines

Support vector machines

Machine Learning Support Vector Machine Supervised Learning

Linear Discrimination

Presentation transcript:

Classifying and clustering using Support Vector Machine 2 nd PhD report PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Suppervisor: Lucian N. VINŢAN Sibiu, 2005

Contents Classification (clustering) steps Reuters Database processing Feature extraction and selection Information Gain Support Vector Machine Binary classification Multiclass classification Clustering Sequential Minimal Optimizations (SMO) Probabilistic outputs Experiments & results Binary classification. Aspects and results. Feature subset selection. A comparative approach. Multiclass classification. Quantitative aspects. Clustering. Quantitative aspects. Conclusions and further work

Classifying (clustering) steps Text mining – features extraction Features selection Classifying or Clustering Testing results

Reuters Database Processing total documents, 126 topics, 366 regions, 870 industry codes Industry category selection – system software 7083 documents 4722 training samples 2361 testing samples attributes (features) 68 classes (topics) Binary classification Topics c152 (only 2096 from 7083)

Frequency vector Terms frequency Stopwords Stemming Threshold Large frequency vector Features extraction

Information Gain SVM features selection Liniar kernel – weight vector Features selection

Contents Classification (clustering) steps Reuters Database processing Feature extraction and selection Information Gain Support Vector Machine Binary classification Multiclass classification Clustering Sequential Minimal Optimizations (SMO) Probabilistic outputs Experiments & results Binary classification. Aspects and results. Feature subset selection. A comparative approach. Multiclass classification. Quantitative aspects. Clustering. Quantitative aspects. Conclusions and further work

Support Vector Machine Binary classification Optimal hyperplane Higher-dimensional feature space Primal optimization problem Dual optimization problem - Lagrange multipliers Karush-Kuhn-Tucker conditions Support Vectors Kernel trick Decision function

Optimal Hyperplane {x|w,x+b=0} X2X2 X1X1 y i =+1 y i =-1 {x|w,x+b=-1} {x|w,x+b=+1} w margin

Higher-dimensional feature space

Primal optimization problem Dual optimization problem Maximize: subject to: Lagrange formulation

SVM - caracteristics Karush-Kuhn-Tucker (KKT) conditions only the Lagrange multipliers that are non-zero at the saddle point Support Vectors the patterns x i for which Kernel trick Positively defined kernel Decision function

Multi-class classification Separate one class versus the rest

Clustering Caracteristics mapped data into a higher dimensional space search for the minimal enclosing sphere Primal optimisation problem Dual optimisation problem Karush Kuhn Tucker condition

Contents Classification (clustering) steps Reuters Database processing Feature extraction and selection Information Gain Support Vector Machine Binary classification Multiclass classification Clustering Sequential Minimal Optimizations (SMO) Probabilistic outputs Experiments & results Binary classification. Aspects and results. Feature subset selection. A comparative approach. Multiclass classification. Quantitative aspects. Clustering. Quantitative aspects. Conclusions and further work

SMO characteristics Only two parameters are updated (minimal size of updates). Benefit: doesnt need any extra matrix storage doesn t need to use numerical QP optimization step needs more iterations to converge, but only needs a few operations at each step, which leads to overall speed-up Components: Analytic method to solve the problem for two Lagrange multipliers Heuristics for choosing the points

Analytic method Heuristics for choosing the point Choice of 1 st point ( x 1 / 1 ): Find KKT violations Choice of 2 nd point ( x 2 / 2 ): update 1, 2 which cause a large change, which, in turn, result in a large increase of the dual objective maximize quantity |E 1 -E 2 | SMO - components

Probabilistic outputs

Features selection using SVM Linear kernel Primal optimisation form Keeped only that value that have weight in learned w vector great ther a threshold

Contents Classification (clustering) steps Reuters Database processing Feature extraction and selection Information Gain Support Vector Machine Binary classification Multiclass classification Clustering Sequential Minimal Optimizations (SMO) Probabilistic outputs Experiments & results Binary classification. Aspects and results. Feature subset selection. A comparative approach. Multiclass classification. Quantitative aspects. Clustering. Quantitative aspects. Conclusions and further work

Polynomial kernel Gaussian kernel Kernels used

Binary using values 0 and 1 Nominal Connell SMART Data representation

Binary classification - 63 d - kernels degree Binary Nominal CONNELL SMART

Binary classification d - kernels degree Binary Nominal CONNELL SMART

Influence of vector size Polynomial kernel

Influence of vector size Gaussian kernel

Polynomial kernel IG versus SVM – 427 features

Gaussian kernel IG versus SVM – 427 features

LibSvm versus UseSvm Polynomial kernel

LibSvm versus UseSvm Gaussian kernel

Multiclass classification Polynomial kernel features

Multiclass classification Gaussian kernel 2488 features

Clustering using SVM υ\#features ,010,6% 0,7%0,6% 0,10,5% 0,525,2%25,1%

Conclusions – best results Polynomial kernel and nominal representation (degree 5 and 6 ) Gaussian kernel and Connell Smart ( C=2.7) Reduced # of support vectors for polynomial kernel in comparison with Gaussian kernel (24,41% versus 37.78%) # features between 6% (1309) and 10% (2488) Multiclass follows the binary classification Clustering has a smaller # of svs Clustering follows binary classification

Further work Features extraction and selection Association rules between words (Mutual Information) Synonym and Polysemy problem Better implementation of SVM with linear kernel Using families of words (WordNet) SVM with kernel degree greater then 1 Classification and clustering Using classification and clustering together

Influence of bias – Pol. kernel

Influence of bias – RBF kernel