Download presentation

Presentation is loading. Please wait.

1
**Clustering High Dimensional Data Using SVM**

Tsau Young Lin and Tam Ngo Department of Computer Science San José State University

2
**Overview Introduction Support Vector Machine (SVM)**

What is SVM How SVM Works Data Preparation Using SVD Singular Value Decomposition (SVD) Analysis of SVD The Project Conceptual Exploration Result Analysis Conclusion Future Work

3
**Introduction World Wide Web Project’s Goals**

No. 1 place for information contains billions of documents impossible to classify by humans Project’s Goals Cluster documents Reduce documents size Get reasonable results when compared to humans classification

4
**Support Vector Machine (SVM)**

a supervised learning machine outperforms many popular methods for text classification used for bioinformatics, signature/hand writing recognition, image and text classification, pattern recognition, and spam categorization

5
**Motivation for SVM How do we separate these points? with a hyperplane**

Source: Author’s Research

6
SVM Process Flow Feature Space Input Space Input Space Source: DTREG

7
Convex Hulls Source: Bennett, K. P., & Campbell, C., 2000

8
**Simple SVM Example How would SVM separates these points?**

Class X1 +1 -1 1 2 3 How would SVM separates these points? use the kernel trick Φ(X1) = (X1, X12) It becomes 2-deminsional Source: Author’s Research

9
**Simple Points in Feature Space**

Class X1 X12 +1 -1 1 2 4 3 9 All points here are support vectors. Source: Author’s Research

10
**SVM Calculation Positive: w x + b = +1 Negative: w x + b = -1**

Hyperplane: w x + b = 0 find the unknowns, w and b Expending the equations: w1x1 + w2x2 + b = +1 w1x1 + w2x2 + b = -1 w1x1 + w2x2 + b = 0

11
**Use Linear Algebra to Solve w and b**

w1x1 + w2x2 + b = +1 w10 + w20 + b = +1 w13 + w29 + b = +1 w1x1 + w2x2 + b = -1 w11 + w21 + b = -1 w12 + w24 + b = -1 Solution is w1 = -3, w2 = 1, b = 1 SVM algorithm can find the solution that returns a hyperplane with the largest margin

12
**Use Solutions to Draw the Planes**

Positive Plane: w x + b = +1 w1x1 + w2x2 + b = +1 -3x1 + 1x2 + 1 = +1 x2 = 3x1 Negative Plane: w x + b = -1 w1x1 + w2x2 + b = -1 -3x1 + 1x2 + 1 = -1 x2 = x1 Hyperplane: w x + b = 0 w1x1 + w2x2 + b = 0 -3x1 + 1x2 + 1 = 0 x2 = x1 X1 X2 1 3 2 6 9 X1 X2 -2 1 2 4 3 7 X1 X2 -1 1 2 5 3 8 Source: Author’s Research

13
**Simple Data Separated by a Hyperplane**

Source: Author’s Research

14
**LIBSVM and Parameter C LIBSVM: A Java Library for SVM**

C is very small: SVM only considers about maximizing the margin and the points can be on the wrong side of the plane. C value is very large: SVM will want very small slack penalties to make sure that all data points in each group are separated correctly.

15
Choosing Parameter C Source: LIBSVM

16
4 Basic Kernel Types LIBSVM has implemented 4 basic kernel types: linear, polynomial, radial basis function, and sigmoid 0 -- linear: u'*v 1 -- polynomial: (gamma*u'*v + coef0)^degree 2 -- radial basis function: exp(-gamma*|u-v|^2) 3 -- sigmoid: tanh(gamma*u'*v + coef0) We use radial basis function with large parameter C for this project.

17
**Data Preparation Using SVD**

SVM is excellent for text classification, but requires labeled documents to use for training Singular Value Decomposition (SVD) separates a matrix into three parts; left eigenvectors, singular values, and right eigenvectors decompose data such as images and text. reduce data size We will use SVD to cluster

18
**SVD Example of 4 Documents**

D1: Shipment of gold damaged in a fire D2: Delivery of silver arrived in a silver truck D3: Shipment of gold arrived in a truck D4: Gold Silver Truck Source: Garcia, E., 2006

19
**Matrix A = U*S*VT D1 D2 D3 D4 a 1 arrived damaged delivery fire gold**

arrived damaged delivery fire gold in of shipment silver 2 truck Given a matrix A, we can factor it into three parts: U, S, and VT. Source: Garcia, E., 2006

20
**Using JAMA to Decompose Matrix A**

S = Source: JAMA (MathWorks and the National Institute of Standards and Technology (NIST))

21
**Using JAMA to Decompose Matrix A**

V = VT = Matrix A can be reconstructed by multiplying matrices: U*S*VT Source: JAMA

22
**Rank 2 Approximation (Reduced U, S, and V Matrices)**

S’ = V’ =

23
**Use Matrix V to Calculate Cosine Similarities**

calculate cosine similarities for each document. sim(D’, D’) = (D’• D’) / (|D’| |D’|) example, Calculate for D1’: sim(D1’, D2’) = (D1’• D2’) / (|D1’| |D2’|) sim(D1’, D3’) = (D1’• D3’) / (|D1’| |D3’|) sim(D1’, D4’) = (D1’• D4’) / (|D1’| |D4’|)

24
**Result for Cosine Similarities**

Example result for D1’: sim(D1’, D2’) = (( * ) + ( * )) = ( (0.4652)2 + ( )2 ) * ( (0.6406)2 + (0.6401) 2 ) sim(D1’, D3’) = (( * ) + ( * )) = ( (0.4652)2 + ( )2 ) * ( (0.5622)2 + ( )2 ) sim(D1’, D4’) = (( * ) + ( * )) = ( (0.4652)2 + ( )2 ) * ( (0.2391)2 + (0.2450)2 ) D3 returns the highest value, pair D1 with D3 Do the same for D2, D3, and D4.

25
**Result of Simple Data Set**

label 1: 1 3 label 2: 2 4 label 1: D1: Shipment of gold damaged in a fire D3: Shipment of gold arrived in a truck label 2: D2: Delivery of silver arrived in a silver truck D4: Gold Silver Truck

26
**Check Cluster Using SVM**

Now we have the label, we can use it to train with SVM SVM input format on original data: 1 1:1.00 2:0.00 3:1.00 4:0.00 5:1.00 6:1.00 7:1.00 8:1.00 9: : :0.00 2 1:1.00 2:1.00 3:0.00 4:1.00 5:0.00 6:0.00 7:1.00 8:1.00 9: : :1.00 1 1:1.00 2:1.00 3:0.00 4:0.00 5:0.00 6:1.00 7:1.00 8:1.00 9: : :1.00 2 1:0.00 2:0.00 3:0.00 4:0.00 5:0.00 6:1.00 7:0.00 8:0.00 9: : :1.00

27
**Results from SVM’s Prediction**

Results from SVM’s Prediction on Original Data Documents use for Training Predict the Following Document SVM Prediction Result SVD Cluster Result D1, D2, D3 D4 1.0 2 D1, D2, D4 D3 1 D1, D3, D4 D2 2.0 D2, D3, D4 D1 Source: Author’s Research

28
**Using Truncated V Matrix**

We want to reduce data size, more practical to use truncated V matrix SVM input format (truncated V matrix): 1 1: : 2 1: :0.6401 1 1: : 2 1: :0.2450

29
**SVM Result From Truncated V Matrix**

Results from SVM’s Prediction on Reduced Data Documents use for Training Predict the Following Document SVM Prediction Result SVD Cluster Result D1, D2, D3 D4 2.0 2 D1, D2, D4 D3 1.0 1 D1, D3, D4 D2 D2, D3, D4 D1 Using truncated V matrix gives better results. Source: Author’s Research

30
**Vector Documents on a Graph**

Source: Author’s Research

31
**Analysis of the Rank Approximation**

Cluster Results from Different Ranking Approximation Rank 1 Rank 2 Rank 3 Rank 4 D1: 4 D2: 4 D3: 4 D4: 3 D1: 3 D3: 1 D4: 2 D2: 3 D1: 2 D3: 2 label 1: label 1: 1 3 label 2: 2 4 label 1: label 1: Source: Author’s Research

32
**Program Process Flow use the previous methods on larger data sets**

compare the results with that of humans classification Program Process Flow

33
**Conceptual Exploration**

Reuters-21578 a collection of newswire articles that have been human-classified by Carnegie Group, Inc. and Reuters, Ltd most widely used data set for text categorization

34
**First Data Set from Reuters-21578 (200 x 9928)**

Result Analysis Clustering with SVD vs. Humans Classification First Data Set First Data Set from Reuters (200 x 9928) # of Naturally Formed Cluster using SVD SVD Cluster Accuracy SVM Prediction Accuracy Rank 002 80 75.0% 65.0% Rank 005 66 81.5% 82.0% Rank 010 60.5% 54.0% Rank 015 64 52.0% 51.5% Rank 020 67 38.0% 46.5% Rank 030 72 60.0% Rank 040 62.5% 58.5% Rank 050 73 54.5% Rank 100 75 45.5% Source: Author’s Research

35
**Second Data Set from Reuters-21578 (200 x 9928)**

Result Analysis Clustering with SVD vs. Humans Classification Second Data Set Second Data Set from Reuters (200 x 9928) # of Naturally Formed Cluster using SVD SVD Cluster Accuracy SVM Prediction Accuracy Rank 002 76 67.0% 84.5% Rank 005 73 Rank 010 64 70.0% 85.5% Rank 015 63.0% 81.0% Rank 020 67 59.5% 50.0% Rank 030 69 68.5% 83.5% Rank 040 59.0% 79.0% Rank 050 44.5% 25.5% Rank 100 71 52.0% 47.0% Source: Author’s Research

36
Result Analysis highest percentage accuracy for SVD clustering is 81.5% lower rank value seems to give better results SVM predicts about the same accuracy as SVD cluster

37
**Result Analysis: Why results may not be higher?**

humans classification is more subjective than a program reducing many small clusters to only 2 clusters by computing the average may decrease the accuracy

38
**Conclusion Showed how SVM works**

Explore the strength of SVM Showed how SVD can be used for clustering Analyzed simple and complex data the method seems to cluster data reasonably Our method is able to: reduce data size (by using truncated V matrix) cluster data reasonably classify new data efficiently (based on SVM) By combining known methods, we created a form of unsupervised SVM

39
Future Work extend SVD to very large data set that can only be stored in secondary storage looking for more efficient kernels of SVM

40
Thank You!

41
References Bennett, K. P., & Campbell, C. (2000). Support Vector Machines: Hype or Hellelujah?. ACM SIGKDD Explorations. VOl. 2, No. 2, 1-13 Chang, C & Lin, C. (2006). LIBSVM: a library for support vector machines, Retrived November 29, 2006, from Cristianini, N. (2001). Support Vector and Kernel Machines. Retrieved November 29, 2005, from Cristianini, N., & Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines. Cambridge UK: Cambridge University Press Garcia, E. (2006). SVD and LSI Tutorial 4: Latent Semantic Indexing (LSI) How-to Calculations. Retrieved November 28, 2006, from Guestrin, C. (2006). Machine Learning. Retrived November 8, 2006, from Hicklin, J., Moler, C., & Webb, P. (2005). JAMA : A Java Matrix Package. Retrieved November 28, 2006, from

42
References Joachims, T. (1998). Text Categorization with Support Vector Machines: Learning with Many Relevant Features. Joachims, T. (2004). Support Vector Machines. Retrived November 28, 2006, from Reuters Text Categorization Test Collection. Retrived November 28, 2006, from SVM - Support Vector Machines. DTREG. Retrived November 28, 2006, from Vapnik, V. N. (2000, 1995). The Nature of Statistical Learning Theory. Springer-Verlag New York, Inc.

Similar presentations

OK

Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.

Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on measuring central venous pressure Water saving tips for kids ppt on batteries Ppt on cultural heritage of india Ppt on bullet train Ppt on judicious use of water Ppt on online mobile recharge Free download ppt on mobile number portability Ppt on subconscious mind power Ppt on placement in hrm Ppt on e learning system