Download presentation

Presentation is loading. Please wait.

Published byRyley Kipps Modified over 2 years ago

1
Classifying and clustering using Support Vector Machine 2 nd PhD report PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Suppervisor: Lucian N. VINŢAN Sibiu, 2005

2
Contents Classification (clustering) steps Reuters Database processing Feature extraction and selection Information Gain Support Vector Machine Binary classification Multiclass classification Clustering Sequential Minimal Optimizations (SMO) Probabilistic outputs Experiments & results Binary classification. Aspects and results. Feature subset selection. A comparative approach. Multiclass classification. Quantitative aspects. Clustering. Quantitative aspects. Conclusions and further work

3
Classifying (clustering) steps Text mining – features extraction Features selection Classifying or Clustering Testing results

4
Reuters Database Processing 806791 total documents, 126 topics, 366 regions, 870 industry codes Industry category selection – system software 7083 documents 4722 training samples 2361 testing samples 19038 attributes (features) 68 classes (topics) Binary classification Topics c152 (only 2096 from 7083)

5
Frequency vector Terms frequency Stopwords Stemming Threshold Large frequency vector Features extraction

6
Information Gain SVM features selection Liniar kernel – weight vector Features selection

7
Contents Classification (clustering) steps Reuters Database processing Feature extraction and selection Information Gain Support Vector Machine Binary classification Multiclass classification Clustering Sequential Minimal Optimizations (SMO) Probabilistic outputs Experiments & results Binary classification. Aspects and results. Feature subset selection. A comparative approach. Multiclass classification. Quantitative aspects. Clustering. Quantitative aspects. Conclusions and further work

8
Support Vector Machine Binary classification Optimal hyperplane Higher-dimensional feature space Primal optimization problem Dual optimization problem - Lagrange multipliers Karush-Kuhn-Tucker conditions Support Vectors Kernel trick Decision function

9
Optimal Hyperplane {x|w,x+b=0} X2X2 X1X1 y i =+1 y i =-1 {x|w,x+b=-1} {x|w,x+b=+1} w margin

10
Higher-dimensional feature space

11
Primal optimization problem Dual optimization problem Maximize: subject to: Lagrange formulation

12
SVM - caracteristics Karush-Kuhn-Tucker (KKT) conditions only the Lagrange multipliers that are non-zero at the saddle point Support Vectors the patterns x i for which Kernel trick Positively defined kernel Decision function

13
Multi-class classification Separate one class versus the rest

14
Clustering Caracteristics mapped data into a higher dimensional space search for the minimal enclosing sphere Primal optimisation problem Dual optimisation problem Karush Kuhn Tucker condition

15
Contents Classification (clustering) steps Reuters Database processing Feature extraction and selection Information Gain Support Vector Machine Binary classification Multiclass classification Clustering Sequential Minimal Optimizations (SMO) Probabilistic outputs Experiments & results Binary classification. Aspects and results. Feature subset selection. A comparative approach. Multiclass classification. Quantitative aspects. Clustering. Quantitative aspects. Conclusions and further work

16
SMO characteristics Only two parameters are updated (minimal size of updates). Benefit: doesnt need any extra matrix storage doesn t need to use numerical QP optimization step needs more iterations to converge, but only needs a few operations at each step, which leads to overall speed-up Components: Analytic method to solve the problem for two Lagrange multipliers Heuristics for choosing the points

17
Analytic method Heuristics for choosing the point Choice of 1 st point ( x 1 / 1 ): Find KKT violations Choice of 2 nd point ( x 2 / 2 ): update 1, 2 which cause a large change, which, in turn, result in a large increase of the dual objective maximize quantity |E 1 -E 2 | SMO - components

18
Probabilistic outputs

19
Features selection using SVM Linear kernel Primal optimisation form Keeped only that value that have weight in learned w vector great ther a threshold

20
Contents Classification (clustering) steps Reuters Database processing Feature extraction and selection Information Gain Support Vector Machine Binary classification Multiclass classification Clustering Sequential Minimal Optimizations (SMO) Probabilistic outputs Experiments & results Binary classification. Aspects and results. Feature subset selection. A comparative approach. Multiclass classification. Quantitative aspects. Clustering. Quantitative aspects. Conclusions and further work

21
Polynomial kernel Gaussian kernel Kernels used

22
Binary using values 0 and 1 Nominal Connell SMART Data representation

23
Binary classification - 63 d - kernels degree123456710 Binary40.1364.7866.5427.2346.5471.6256.9555.19 Nominal38.9662.6567.9382.0316.6211.9583.9964.08 CONNELL SMART40.2463.3262.4114.417.7849.7268.2749.65

24
Binary classification - 7999 d - kernels degree123456710 Binary35.7741.7461.8877.6469.2181.8710.9535.77 Nominal56.6926.8328.0628.2729.1441.3836.1934.05 CONNELL SMART50.4435.2841.1759.2879.8281.8182.3217.85

25
Influence of vector size Polynomial kernel

26
Influence of vector size Gaussian kernel

27
Polynomial kernel IG versus SVM – 427 features

28
Gaussian kernel IG versus SVM – 427 features

29
LibSvm versus UseSvm - 2493 Polynomial kernel

30
LibSvm versus UseSvm - 2493 Gaussian kernel

31
Multiclass classification Polynomial kernel - 2488 features

32
Multiclass classification Gaussian kernel 2488 features

33
Clustering using SVM υ\#features416313092111 0,010,6% 0,7%0,6% 0,10,5% 0,525,2%25,1%

34
Conclusions – best results Polynomial kernel and nominal representation (degree 5 and 6 ) Gaussian kernel and Connell Smart ( C=2.7) Reduced # of support vectors for polynomial kernel in comparison with Gaussian kernel (24,41% versus 37.78%) # features between 6% (1309) and 10% (2488) Multiclass follows the binary classification Clustering has a smaller # of svs Clustering follows binary classification

35
Further work Features extraction and selection Association rules between words (Mutual Information) Synonym and Polysemy problem Better implementation of SVM with linear kernel Using families of words (WordNet) SVM with kernel degree greater then 1 Classification and clustering Using classification and clustering together

36
Influence of bias – Pol. kernel

39
Influence of bias – RBF kernel

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google