Presentation is loading. Please wait.

Presentation is loading. Please wait.

Classifying and clustering using Support Vector Machine 2 nd PhD report PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Suppervisor:

Similar presentations


Presentation on theme: "Classifying and clustering using Support Vector Machine 2 nd PhD report PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Suppervisor:"— Presentation transcript:

1 Classifying and clustering using Support Vector Machine 2 nd PhD report PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Suppervisor: Lucian N. VINŢAN Sibiu, 2005

2 Contents Classification (clustering) steps Reuters Database processing Feature extraction and selection Information Gain Support Vector Machine Binary classification Multiclass classification Clustering Sequential Minimal Optimizations (SMO) Probabilistic outputs Experiments & results Binary classification. Aspects and results. Feature subset selection. A comparative approach. Multiclass classification. Quantitative aspects. Clustering. Quantitative aspects. Conclusions and further work

3 Classifying (clustering) steps Text mining – features extraction Features selection Classifying or Clustering Testing results

4 Reuters Database Processing 806791 total documents, 126 topics, 366 regions, 870 industry codes Industry category selection – system software 7083 documents 4722 training samples 2361 testing samples 19038 attributes (features) 68 classes (topics) Binary classification Topics c152 (only 2096 from 7083)

5 Frequency vector Terms frequency Stopwords Stemming Threshold Large frequency vector Features extraction

6 Information Gain SVM features selection Liniar kernel – weight vector Features selection

7 Contents Classification (clustering) steps Reuters Database processing Feature extraction and selection Information Gain Support Vector Machine Binary classification Multiclass classification Clustering Sequential Minimal Optimizations (SMO) Probabilistic outputs Experiments & results Binary classification. Aspects and results. Feature subset selection. A comparative approach. Multiclass classification. Quantitative aspects. Clustering. Quantitative aspects. Conclusions and further work

8 Support Vector Machine Binary classification Optimal hyperplane Higher-dimensional feature space Primal optimization problem Dual optimization problem - Lagrange multipliers Karush-Kuhn-Tucker conditions Support Vectors Kernel trick Decision function

9 Optimal Hyperplane {x|w,x+b=0} X2X2 X1X1 y i =+1 y i =-1 {x|w,x+b=-1} {x|w,x+b=+1} w margin

10 Higher-dimensional feature space

11 Primal optimization problem Dual optimization problem Maximize: subject to: Lagrange formulation

12 SVM - caracteristics Karush-Kuhn-Tucker (KKT) conditions only the Lagrange multipliers that are non-zero at the saddle point Support Vectors the patterns x i for which Kernel trick Positively defined kernel Decision function

13 Multi-class classification Separate one class versus the rest

14 Clustering Caracteristics mapped data into a higher dimensional space search for the minimal enclosing sphere Primal optimisation problem Dual optimisation problem Karush Kuhn Tucker condition

15 Contents Classification (clustering) steps Reuters Database processing Feature extraction and selection Information Gain Support Vector Machine Binary classification Multiclass classification Clustering Sequential Minimal Optimizations (SMO) Probabilistic outputs Experiments & results Binary classification. Aspects and results. Feature subset selection. A comparative approach. Multiclass classification. Quantitative aspects. Clustering. Quantitative aspects. Conclusions and further work

16 SMO characteristics Only two parameters are updated (minimal size of updates). Benefit: doesnt need any extra matrix storage doesn t need to use numerical QP optimization step needs more iterations to converge, but only needs a few operations at each step, which leads to overall speed-up Components: Analytic method to solve the problem for two Lagrange multipliers Heuristics for choosing the points

17 Analytic method Heuristics for choosing the point Choice of 1 st point ( x 1 / 1 ): Find KKT violations Choice of 2 nd point ( x 2 / 2 ): update 1, 2 which cause a large change, which, in turn, result in a large increase of the dual objective maximize quantity |E 1 -E 2 | SMO - components

18 Probabilistic outputs

19 Features selection using SVM Linear kernel Primal optimisation form Keeped only that value that have weight in learned w vector great ther a threshold

20 Contents Classification (clustering) steps Reuters Database processing Feature extraction and selection Information Gain Support Vector Machine Binary classification Multiclass classification Clustering Sequential Minimal Optimizations (SMO) Probabilistic outputs Experiments & results Binary classification. Aspects and results. Feature subset selection. A comparative approach. Multiclass classification. Quantitative aspects. Clustering. Quantitative aspects. Conclusions and further work

21 Polynomial kernel Gaussian kernel Kernels used

22 Binary using values 0 and 1 Nominal Connell SMART Data representation

23 Binary classification - 63 d - kernels degree123456710 Binary40.1364.7866.5427.2346.5471.6256.9555.19 Nominal38.9662.6567.9382.0316.6211.9583.9964.08 CONNELL SMART40.2463.3262.4114.417.7849.7268.2749.65

24 Binary classification - 7999 d - kernels degree123456710 Binary35.7741.7461.8877.6469.2181.8710.9535.77 Nominal56.6926.8328.0628.2729.1441.3836.1934.05 CONNELL SMART50.4435.2841.1759.2879.8281.8182.3217.85

25 Influence of vector size Polynomial kernel

26 Influence of vector size Gaussian kernel

27 Polynomial kernel IG versus SVM – 427 features

28 Gaussian kernel IG versus SVM – 427 features

29 LibSvm versus UseSvm - 2493 Polynomial kernel

30 LibSvm versus UseSvm - 2493 Gaussian kernel

31 Multiclass classification Polynomial kernel - 2488 features

32 Multiclass classification Gaussian kernel 2488 features

33 Clustering using SVM υ\#features416313092111 0,010,6% 0,7%0,6% 0,10,5% 0,525,2%25,1%

34 Conclusions – best results Polynomial kernel and nominal representation (degree 5 and 6 ) Gaussian kernel and Connell Smart ( C=2.7) Reduced # of support vectors for polynomial kernel in comparison with Gaussian kernel (24,41% versus 37.78%) # features between 6% (1309) and 10% (2488) Multiclass follows the binary classification Clustering has a smaller # of svs Clustering follows binary classification

35 Further work Features extraction and selection Association rules between words (Mutual Information) Synonym and Polysemy problem Better implementation of SVM with linear kernel Using families of words (WordNet) SVM with kernel degree greater then 1 Classification and clustering Using classification and clustering together

36 Influence of bias – Pol. kernel

37

38

39 Influence of bias – RBF kernel

40

41


Download ppt "Classifying and clustering using Support Vector Machine 2 nd PhD report PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Suppervisor:"

Similar presentations


Ads by Google