Presentation is loading. Please wait.

Presentation is loading. Please wait.

Concept Decomposition for Large Sparse Text Data Using Clustering

Similar presentations


Presentation on theme: "Concept Decomposition for Large Sparse Text Data Using Clustering"— Presentation transcript:

1 Concept Decomposition for Large Sparse Text Data Using Clustering
Dhillon, I. S. and Modha, D. S. Machine Learning, 42(1), 2001 Nov., 9, 2001 Summarized by Jeong-Ho Chang

2 Introduction Study a certain spherical k-means algorithm for clustering document vectors. Empirically demonstrate that the clusters produced have a certain “fractal-like” and “self similar” behavior. Matrix approximation by concept decomposition: explore intimate connections between clustering using the spherical k-means algorithm and the problem of matrix approximation for the word-by-document matrices.

3 Vector Space Model for Text
term weighting component depends on the number of occurrences of words j in document i . global weighting component depends on the number of documents which contain the word j. normalization component

4 The Spherical k-means Algorithm

5 Concept Vectors Cosine Similarity Concept Vectors concept vector
mean vector

6 Spherical k-means (1/4) Objective function Optimal Partitioning
Measurement for “coherence” or “quality” of each cluster

7 Spherical k-means (2/4)

8 Spherical k-means (3/4): Convergence
Monotone:

9 Spherical k-means (4/4): Convergence
Bound: the limit exists Does not imply that underlying partitioning converges.

10 Experimental Results (1/4)
Data set CLASSIC3 data set 3,893 documents MEDLINE(1033), CISI(1460), CRANFIELD(1400) 4,099 words after preprocessing Use only term frequency. NSF data set 13,297 abstracts of the grants awarded by NSF 5,298 words after preprocessing Use term frequency and inverse document frequency.

11 Experimental Results (2/4)
Confusion matrix for CLASSIC3 data Objective function plot

12 Experimental Results (3/4)
Intra-cluster structure

13 Experimental Results (4/4)
Inter-cluster structure

14 Relation with Euclidean k-means Algorithms
Can be thought of as a matrix approximation problem

15 Matrix Approximation using Clustering

16 Clustering as Matrix Approximation
Formulation : word-by-document matrix : matrix approximation where i-th column is the concept vector closest to the xi . How effective is the approximation? Frobenius norm

17 Concept Decomposition (1/2)
Formulation concept decomposition as the least-squares approximation of X

18 Concept Decomposition (2/2)

19 Concept Vectors and Singular Vectors: A Comparison

20 Concept vectors are local and sparse (1/6)
Locality Three concept vectors for CLASSIC3 data

21 Concept vectors are local and sparse (2/6)
Three singular vectors for CLASSIC3 data

22 Concept vectors are local and sparse (3/6)
Four (among 10) concept vectors for NSF data

23 Concept vectors are local and sparse (4/6)
Four (among 10) singular vectors for NSF data

24 Concept vectors are local and sparse (5/6)
Sparsity With the increasing number of clusters, concept vectors become progressively sparser.

25 Concept vectors are local and sparse (6/6)
Orthonormality With increasing number of clusters, the concept vectors tend towards “orthonormality.”

26 Principal Angles: Comparing Concept and Singular subspaces (1/4)
Generalize the notion of an angle between two lines to higher-dimensional subspaces of Rd. Formulation F and G is subspaces of Rd. Average cosine of the principal angles

27 Principal Angles: Comparing Concept and Singular subspaces (2/4)
CLASSIC3 data set With singular subspace S3 With singular subspace S10

28 Principal Angles: Comparing Concept and Singular subspaces (3/4)
NSF data set (1/2) With singular subspace S64

29 Principal Angles: Comparing Concept and Singular subspaces (4/4)
NSF data set (2/2) With singular subspace S235

30 Conclusions Present spherical k-means algorithm for text documents
High-dimensional and sparse. Average cluster coherence tend to be quite low. There is a large void surrounding each concept vector. Uncommon for low-dimensional, dense data set. The concept decompositions that are derived from concept vectors can be used for matrix approximation. Comparable to that of truncated SVDs. The concept vectors constitute a powerful sparse and localized “basis” for text data sets.


Download ppt "Concept Decomposition for Large Sparse Text Data Using Clustering"

Similar presentations


Ads by Google