Classifying and clustering using Support Vector Machine 2 nd PhD report PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Suppervisor:

Classifying and clustering using Support Vector Machine 2 nd PhD report PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Suppervisor: Lucian N. VINŢAN Sibiu, 2005

Contents Classification (clustering) steps Reuters Database processing Feature extraction and selection Information Gain Support Vector Machine Binary classification Multiclass classification Clustering Sequential Minimal Optimizations (SMO) Probabilistic outputs Experiments & results Binary classification. Aspects and results. Feature subset selection. A comparative approach. Multiclass classification. Quantitative aspects. Clustering. Quantitative aspects. Conclusions and further work

Classifying (clustering) steps Text mining – features extraction Features selection Classifying or Clustering Testing results

Reuters Database Processing 806791 total documents, 126 topics, 366 regions, 870 industry codes Industry category selection – system software 7083 documents 4722 training samples 2361 testing samples 19038 attributes (features) 68 classes (topics) Binary classification Topics c152 (only 2096 from 7083)

Frequency vector Terms frequency Stopwords Stemming Threshold Large frequency vector Features extraction

Information Gain SVM features selection Liniar kernel – weight vector Features selection

Support Vector Machine Binary classification Optimal hyperplane Higher-dimensional feature space Primal optimization problem Dual optimization problem - Lagrange multipliers Karush-Kuhn-Tucker conditions Support Vectors Kernel trick Decision function

Optimal Hyperplane {x|w,x+b=0} X2X2 X1X1 y i =+1 y i =-1 {x|w,x+b=-1} {x|w,x+b=+1} w margin

Higher-dimensional feature space

Primal optimization problem Dual optimization problem Maximize: subject to: Lagrange formulation

SVM - caracteristics Karush-Kuhn-Tucker (KKT) conditions only the Lagrange multipliers that are non-zero at the saddle point Support Vectors the patterns x i for which Kernel trick Positively defined kernel Decision function

Multi-class classification Separate one class versus the rest

Clustering Caracteristics mapped data into a higher dimensional space search for the minimal enclosing sphere Primal optimisation problem Dual optimisation problem Karush Kuhn Tucker condition

SMO characteristics Only two parameters are updated (minimal size of updates). Benefit: doesnt need any extra matrix storage doesn t need to use numerical QP optimization step needs more iterations to converge, but only needs a few operations at each step, which leads to overall speed-up Components: Analytic method to solve the problem for two Lagrange multipliers Heuristics for choosing the points

Analytic method Heuristics for choosing the point Choice of 1 st point ( x 1 / 1 ): Find KKT violations Choice of 2 nd point ( x 2 / 2 ): update 1, 2 which cause a large change, which, in turn, result in a large increase of the dual objective maximize quantity |E 1 -E 2 | SMO - components

Probabilistic outputs

Features selection using SVM Linear kernel Primal optimisation form Keeped only that value that have weight in learned w vector great ther a threshold

Polynomial kernel Gaussian kernel Kernels used

Binary using values 0 and 1 Nominal Connell SMART Data representation

Binary classification - 63 d - kernels degree123456710 Binary40.1364.7866.5427.2346.5471.6256.9555.19 Nominal38.9662.6567.9382.0316.6211.9583.9964.08 CONNELL SMART40.2463.3262.4114.417.7849.7268.2749.65

Binary classification - 7999 d - kernels degree123456710 Binary35.7741.7461.8877.6469.2181.8710.9535.77 Nominal56.6926.8328.0628.2729.1441.3836.1934.05 CONNELL SMART50.4435.2841.1759.2879.8281.8182.3217.85

Influence of vector size Polynomial kernel

Influence of vector size Gaussian kernel

Polynomial kernel IG versus SVM – 427 features

Gaussian kernel IG versus SVM – 427 features

LibSvm versus UseSvm - 2493 Polynomial kernel

LibSvm versus UseSvm - 2493 Gaussian kernel

Multiclass classification Polynomial kernel - 2488 features

Multiclass classification Gaussian kernel 2488 features

Clustering using SVM υ\#features416313092111 0,010,6% 0,7%0,6% 0,10,5% 0,525,2%25,1%

Conclusions – best results Polynomial kernel and nominal representation (degree 5 and 6 ) Gaussian kernel and Connell Smart ( C=2.7) Reduced # of support vectors for polynomial kernel in comparison with Gaussian kernel (24,41% versus 37.78%) # features between 6% (1309) and 10% (2488) Multiclass follows the binary classification Clustering has a smaller # of svs Clustering follows binary classification

Further work Features extraction and selection Association rules between words (Mutual Information) Synonym and Polysemy problem Better implementation of SVM with linear kernel Using families of words (WordNet) SVM with kernel degree greater then 1 Classification and clustering Using classification and clustering together

Influence of bias – Pol. kernel

Influence of bias – RBF kernel

Classifying and clustering using Support Vector Machine 2 nd PhD report PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Suppervisor:

Similar presentations

Presentation on theme: "Classifying and clustering using Support Vector Machine 2 nd PhD report PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Suppervisor:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Classifying and clustering using Support Vector Machine 2 nd PhD report PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Suppervisor:

Similar presentations

Presentation on theme: "Classifying and clustering using Support Vector Machine 2 nd PhD report PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Suppervisor:"— Presentation transcript:

Similar presentations

About project

Feedback