Clustering High Dimensional Data Using SVM

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Support Vector Machines
Lecture 9 Support Vector Machines
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
INTRODUCTION TO Machine Learning 2nd Edition
Support Vector Machine & Its Applications Abhishek Sharma Dept. of EEE BIT Mesra Aug 16, 2010 Course: Neural Network Professor: Dr. B.M. Karan Semester.
S UPPORT V ECTOR M ACHINES Jianping Fan Dept of Computer Science UNC-Charlotte.
Support Vector Machine & Its Applications Mingyue Tan The University of British Columbia Nov 26, 2004 A portion (1/3) of the slides are taken from Prof.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines
CHAPTER 10: Linear Discrimination
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
Support Vector Machines
SVM—Support Vector Machines
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Machine learning continued Image source:
The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SUPPORT VECTOR MACHINES PRESENTED BY MUTHAPPA. Introduction Support Vector Machines(SVMs) are supervised learning models with associated learning algorithms.
Principal Component Analysis
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
CS 4700: Foundations of Artificial Intelligence
SVMs Finalized. Where we are Last time Support vector machines in grungy detail The SVM objective function and QP Today Last details on SVMs Putting it.
SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.
An Introduction to Support Vector Machines Martin Law.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Support Vector Machine & Image Classification Applications
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
The Disputed Federalist Papers: Resolution via Support Vector Machine Feature Selection Olvi Mangasarian UW Madison & UCSD La Jolla Glenn Fung Amazon Inc.,
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Support Vector Machines Project מגישים : גיל טל ואורן אגם מנחה : מיקי אלעד נובמבר 1999 הטכניון מכון טכנולוגי לישראל הפקולטה להנדסת חשמל המעבדה לעיבוד וניתוח.
컴퓨터 과학부 김명재.  Introduction  Data Preprocessing  Model Selection  Experiments.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
An Introduction to Support Vector Machine (SVM)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Support Vector Machines Jordan Smith MUMT February 2008.
Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Computational Intelligence: Methods and Applications Lecture 24 SVM in the non-linear case Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
SVMs in a Nutshell.
Lecture 14. Outline Support Vector Machine 1. Overview of SVM 2. Problem setting of linear separators 3. Soft Margin Method 4. Lagrange Multiplier Method.
Support Vector Machine (SVM) Presented by Robert Chen.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Support vector machines
PREDICT 422: Practical Machine Learning
An Introduction to Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Support Vector Machines
Support vector machines
Support vector machines
Support vector machines
SVMs for Document Ranking
Support Vector Machines 2
Presentation transcript:

Clustering High Dimensional Data Using SVM Tsau Young Lin and Tam Ngo Department of Computer Science San José State University

Overview Introduction Support Vector Machine (SVM) What is SVM How SVM Works Data Preparation Using SVD Singular Value Decomposition (SVD) Analysis of SVD The Project Conceptual Exploration Result Analysis Conclusion Future Work

Introduction World Wide Web Project’s Goals No. 1 place for information contains billions of documents impossible to classify by humans Project’s Goals Cluster documents Reduce documents size Get reasonable results when compared to humans classification

Support Vector Machine (SVM) a supervised learning machine outperforms many popular methods for text classification used for bioinformatics, signature/hand writing recognition, image and text classification, pattern recognition, and e-mail spam categorization

Motivation for SVM How do we separate these points? with a hyperplane Source: Author’s Research

SVM Process Flow Feature Space Input Space Input Space Source: DTREG

Convex Hulls Source: Bennett, K. P., & Campbell, C., 2000

Simple SVM Example How would SVM separates these points? Class X1 +1 -1 1 2 3 How would SVM separates these points? use the kernel trick  Φ(X1) = (X1, X12) It becomes 2-deminsional Source: Author’s Research

Simple Points in Feature Space Class X1 X12 +1 -1 1 2 4 3 9 All points here are support vectors. Source: Author’s Research

SVM Calculation Positive: w  x + b = +1 Negative: w  x + b = -1 Hyperplane: w  x + b = 0 find the unknowns, w and b Expending the equations: w1x1 + w2x2 + b = +1 w1x1 + w2x2 + b = -1 w1x1 + w2x2 + b = 0

Use Linear Algebra to Solve w and b w1x1 + w2x2 + b = +1  w10 + w20 + b = +1  w13 + w29 + b = +1 w1x1 + w2x2 + b = -1  w11 + w21 + b = -1  w12 + w24 + b = -1 Solution is w1 = -3, w2 = 1, b = 1 SVM algorithm can find the solution that returns a hyperplane with the largest margin

Use Solutions to Draw the Planes Positive Plane: w  x + b = +1 w1x1 + w2x2 + b = +1  -3x1 + 1x2 + 1 = +1  x2 = 3x1 Negative Plane: w  x + b = -1 w1x1 + w2x2 + b = -1  -3x1 + 1x2 + 1 = -1  x2 = -2 + 3x1 Hyperplane: w  x + b = 0 w1x1 + w2x2 + b = 0  -3x1 + 1x2 + 1 = 0  x2 = -1 + 3x1 X1 X2 1 3 2 6 9 X1 X2 -2 1 2 4 3 7 X1 X2 -1 1 2 5 3 8 Source: Author’s Research

Simple Data Separated by a Hyperplane Source: Author’s Research

LIBSVM and Parameter C LIBSVM: A Java Library for SVM C is very small: SVM only considers about maximizing the margin and the points can be on the wrong side of the plane. C value is very large: SVM will want very small slack penalties to make sure that all data points in each group are separated correctly.

Choosing Parameter C Source: LIBSVM

4 Basic Kernel Types LIBSVM has implemented 4 basic kernel types: linear, polynomial, radial basis function, and sigmoid 0 -- linear: u'*v 1 -- polynomial: (gamma*u'*v + coef0)^degree 2 -- radial basis function: exp(-gamma*|u-v|^2) 3 -- sigmoid: tanh(gamma*u'*v + coef0) We use radial basis function with large parameter C for this project.

Data Preparation Using SVD SVM is excellent for text classification, but requires labeled documents to use for training Singular Value Decomposition (SVD) separates a matrix into three parts; left eigenvectors, singular values, and right eigenvectors decompose data such as images and text. reduce data size We will use SVD to cluster

SVD Example of 4 Documents D1: Shipment of gold damaged in a fire D2: Delivery of silver arrived in a silver truck D3: Shipment of gold arrived in a truck D4: Gold Silver Truck Source: Garcia, E., 2006

Matrix A = U*S*VT D1 D2 D3 D4 a 1 arrived damaged delivery fire gold arrived damaged delivery fire gold in of shipment silver 2 truck Given a matrix A, we can factor it into three parts: U, S, and VT. Source: Garcia, E., 2006

Using JAMA to Decompose Matrix A 0.3966 -0.1282 -0.2349 0.0941 0.2860 0.1507 -0.0700 0.5212 0.1106 -0.2790 -0.1649 -0.4271 0.1523 0.2650 -0.2984 -0.0565 0.3012 -0.2918 0.6468 -0.2252 0.2443 -0.3932 0.0635 0.1507 0.3615 0.6315 -0.0134 -0.4890 0.3428 0.2522 0.5134 0.1453 S = 4.2055 0.0000 0.0000 0.0000 0.0000 2.4155 0.0000 0.0000 0.0000 0.0000 1.4021 0.0000 0.0000 0.0000 0.0000 1.2302 Source: JAMA (MathWorks and the National Institute of Standards and Technology (NIST))

Using JAMA to Decompose Matrix A V = 0.4652 -0.6738 -0.2312 -0.5254 0.6406 0.6401 -0.4184 -0.0696 0.5622 -0.2760 0.3202 0.7108 0.2391 0.2450 0.8179 -0.4624 VT = 0.4652 0.6406 0.5622 0.2391 -0.6738 0.6401 -0.2760 0.2450 -0.2312 -0.4184 0.3202 0.8179 -0.5254 -0.0696 0.7108 -0.4624 Matrix A can be reconstructed by multiplying matrices: U*S*VT Source: JAMA

Rank 2 Approximation (Reduced U, S, and V Matrices) 0.3966 -0.1282 0.2860 0.1507 0.1106 -0.2790 0.1523 0.2650 0.3012 -0.2918 0.2443 -0.3932 0.3615 0.6315 0.3428 0.2522 S’ = 4.2055 0.0000 0.0000 2.4155 V’ = 0.4652 -0.6738 0.6406 0.6401 0.5622 -0.2760 0.2391 0.2450

Use Matrix V to Calculate Cosine Similarities calculate cosine similarities for each document. sim(D’, D’) = (D’• D’) / (|D’| |D’|) example, Calculate for D1’: sim(D1’, D2’) = (D1’• D2’) / (|D1’| |D2’|) sim(D1’, D3’) = (D1’• D3’) / (|D1’| |D3’|) sim(D1’, D4’) = (D1’• D4’) / (|D1’| |D4’|)

Result for Cosine Similarities Example result for D1’: sim(D1’, D2’) = ((0.4652 * 0.6406) + (-0.6738 * 0.6401)) = -0.1797 ( (0.4652)2 + (-0.6738)2 ) * ( (0.6406)2 + (0.6401) 2 ) sim(D1’, D3’) = ((0.4652 * 0.5622) + (-0.6738 * -0.2760)) = 0.8727 ( (0.4652)2 + (-0.6738)2 ) * ( (0.5622)2 + (-0.2760)2 ) sim(D1’, D4’) = ((0.4652 * 0.2391) + (-0.6738 * 0.2450)) = -0.1921 ( (0.4652)2 + (-0.6738)2 ) * ( (0.2391)2 + (0.2450)2 ) D3 returns the highest value, pair D1 with D3 Do the same for D2, D3, and D4.

Result of Simple Data Set label 1: 1 3 label 2: 2 4 label 1: D1: Shipment of gold damaged in a fire D3: Shipment of gold arrived in a truck label 2: D2: Delivery of silver arrived in a silver truck D4: Gold Silver Truck

Check Cluster Using SVM Now we have the label, we can use it to train with SVM SVM input format on original data: 1 1:1.00 2:0.00 3:1.00 4:0.00 5:1.00 6:1.00 7:1.00 8:1.00 9:1.00 10:0.00 11:0.00 2 1:1.00 2:1.00 3:0.00 4:1.00 5:0.00 6:0.00 7:1.00 8:1.00 9:0.00 10:2.00 11:1.00 1 1:1.00 2:1.00 3:0.00 4:0.00 5:0.00 6:1.00 7:1.00 8:1.00 9:1.00 10:0.00 11:1.00 2 1:0.00 2:0.00 3:0.00 4:0.00 5:0.00 6:1.00 7:0.00 8:0.00 9:0.00 10:1.00 11:1.00

Results from SVM’s Prediction Results from SVM’s Prediction on Original Data Documents use for Training Predict the Following Document SVM Prediction Result SVD Cluster Result D1, D2, D3 D4 1.0 2 D1, D2, D4 D3 1 D1, D3, D4 D2 2.0 D2, D3, D4 D1 Source: Author’s Research

Using Truncated V Matrix We want to reduce data size, more practical to use truncated V matrix SVM input format (truncated V matrix): 1 1:0.4652 2:-0.6738 2 1:0.6406 2:0.6401 1 1:0.5622 2:-0.2760 2 1:0.2391 2:0.2450

SVM Result From Truncated V Matrix Results from SVM’s Prediction on Reduced Data Documents use for Training Predict the Following Document SVM Prediction Result SVD Cluster Result D1, D2, D3 D4 2.0 2 D1, D2, D4 D3 1.0 1 D1, D3, D4 D2 D2, D3, D4 D1 Using truncated V matrix gives better results. Source: Author’s Research

Vector Documents on a Graph Source: Author’s Research

Analysis of the Rank Approximation Cluster Results from Different Ranking Approximation Rank 1 Rank 2 Rank 3 Rank 4 D1: 4 D2: 4 D3: 4 D4: 3 D1: 3 D3: 1 D4: 2 D2: 3 D1: 2 D3: 2 label 1: 1 4 2 3 label 1: 1 3 label 2: 2 4 label 1: 1 3 2 4 label 1: 1 2 3 4 Source: Author’s Research

Program Process Flow use the previous methods on larger data sets compare the results with that of humans classification Program Process Flow

Conceptual Exploration Reuters-21578 a collection of newswire articles that have been human-classified by Carnegie Group, Inc. and Reuters, Ltd most widely used data set for text categorization

First Data Set from Reuters-21578 (200 x 9928) Result Analysis Clustering with SVD vs. Humans Classification First Data Set First Data Set from Reuters-21578 (200 x 9928) # of Naturally Formed Cluster using SVD SVD Cluster Accuracy SVM Prediction Accuracy Rank 002 80 75.0% 65.0% Rank 005 66 81.5% 82.0% Rank 010 60.5% 54.0% Rank 015 64 52.0% 51.5% Rank 020 67 38.0% 46.5% Rank 030 72 60.0% Rank 040 62.5% 58.5% Rank 050 73 54.5% Rank 100 75 45.5% Source: Author’s Research

Second Data Set from Reuters-21578 (200 x 9928) Result Analysis Clustering with SVD vs. Humans Classification Second Data Set Second Data Set from Reuters-21578 (200 x 9928) # of Naturally Formed Cluster using SVD SVD Cluster Accuracy SVM Prediction Accuracy Rank 002 76 67.0% 84.5% Rank 005 73 Rank 010 64 70.0% 85.5% Rank 015 63.0% 81.0% Rank 020 67 59.5% 50.0% Rank 030 69 68.5% 83.5% Rank 040 59.0% 79.0% Rank 050 44.5% 25.5% Rank 100 71 52.0% 47.0% Source: Author’s Research

Result Analysis highest percentage accuracy for SVD clustering is 81.5% lower rank value seems to give better results SVM predicts about the same accuracy as SVD cluster

Result Analysis: Why results may not be higher? humans classification is more subjective than a program reducing many small clusters to only 2 clusters by computing the average may decrease the accuracy

Conclusion Showed how SVM works Explore the strength of SVM Showed how SVD can be used for clustering Analyzed simple and complex data the method seems to cluster data reasonably Our method is able to: reduce data size (by using truncated V matrix) cluster data reasonably classify new data efficiently (based on SVM) By combining known methods, we created a form of unsupervised SVM

Future Work extend SVD to very large data set that can only be stored in secondary storage looking for more efficient kernels of SVM

Thank You!

References Bennett, K. P., & Campbell, C. (2000). Support Vector Machines: Hype or Hellelujah?. ACM SIGKDD Explorations. VOl. 2, No. 2, 1-13 Chang, C & Lin, C. (2006). LIBSVM: a library for support vector machines, Retrived November 29, 2006, from http://www.csie.ntu.edu.tw/~cjlin/libsvm Cristianini, N. (2001). Support Vector and Kernel Machines. Retrieved November 29, 2005, from http://www.support-vector.net/icml-tutorial.pdf Cristianini, N., & Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines. Cambridge UK: Cambridge University Press Garcia, E. (2006). SVD and LSI Tutorial 4: Latent Semantic Indexing (LSI) How-to Calculations. Retrieved November 28, 2006, from http://www.miislita.com/information-retrieval-tutorial/svd-lsi-tutorial-4-lsi-how-to-calculations.html Guestrin, C. (2006). Machine Learning. Retrived November 8, 2006, from http://www.cs.cmu.edu/~guestrin/Class/10701/ Hicklin, J., Moler, C., & Webb, P. (2005). JAMA : A Java Matrix Package. Retrieved November 28, 2006, from http://math.nist.gov/javanumerics/jama/

References Joachims, T. (1998). Text Categorization with Support Vector Machines: Learning with Many Relevant Features. http://www.cs.cornell.edu/People/tj/publications/joachims_98a.pdf Joachims, T. (2004). Support Vector Machines. Retrived November 28, 2006, from http://svmlight.joachims.org/ Reuters-21578 Text Categorization Test Collection. Retrived November 28, 2006, from http://www.daviddlewis.com/resources/testcollections/reuters21578/ SVM - Support Vector Machines. DTREG. Retrived November 28, 2006, from http://www.dtreg.com/svm.htm Vapnik, V. N. (2000, 1995). The Nature of Statistical Learning Theory. Springer-Verlag New York, Inc.