Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information-Theoretic Co- Clustering Inderjit S. Dhillon et al. University of Texas, Austin presented by Xuanhui Wang.

Similar presentations


Presentation on theme: "Information-Theoretic Co- Clustering Inderjit S. Dhillon et al. University of Texas, Austin presented by Xuanhui Wang."— Presentation transcript:

1 Information-Theoretic Co- Clustering Inderjit S. Dhillon et al. University of Texas, Austin presented by Xuanhui Wang

2 Introduction Clustering –Group “similar” objects together –Typically, the data is represented in a two- dimensional co-occurrence matrix. E.g. in text analysis, the document-term co- occurrence matrix.

3 One-dimensional Clustering Document Clustering:  Treat each row as one Doc  Define a similarity measure  Clustering the documents using e.g. k-means Term Clustering:  Symmetric with Doc Clustering Doc-Term Co-occurrence Matrix

4 Idea of Co-Clustering Co-occurrence Matrices Characteristics –Data sparseness –High dimension –Noise Motivation –Is it possible to combine the document and term clustering together? Can they bootstrap each other? Yes, Co-Clustering – Simultaneously cluster the rows X and columns Y of the co-occurrence matrix.

5 Information-Theoretic Co- Clustering View (scaled) co-occurrence matrix as a joint probability distribution between row & column random variables We seek a hard-clustering of both dimensions such that loss in “Mutual Information” is minimized given a fixed no. of row & col. clusters

6 Example Mutual Information between random variables X and Y: It can be verified that this is the minimum mutual information loss

7 Information Theoretic Co-clustering (Lemma) Loss in mutual information equals where –Can be shown that q(x,y) is a “maximum entropy” approximation to p(x,y). –q(x,y) preserves marginals : q(x)=p(x) & q(y)=p(y)

8 Given a co-clustering result, we can get 3 distribution matrix Then get

9 Preserving Mutual Information Lemma : Note that may be thought of as the “prototype” of row cluster (the usual “centroid” of the cluster is ) Similarly,

10 Example – Cont’d.20.300 0.18.14.18.36.28000 000.36.16.24.16.300 0 0 0

11 Co-Clustering Algorithm 1. Given a partition, calculate the “prototype” of each row cluster. 2. Assign each row x to its nearest cluster. 3. Update the probabilities based on the new row clusters and then compute new column cluster “prototype”. 4. Assign each column y to its nearest cluster. 5. Update the probabilities based on the new column clusters and then compute new row cluster “prototype”. 6. If converge, stop. Otherwise go to Step 2.

12 Properties of Co-clustering Algorithm Theorem: The co-clustering algorithm monotonically decreases loss in mutual information (objective function value) Marginals p(x) and p(y) are preserved at every step (q(x)=p(x) and q(y)=p(y) )

13

14 Experiments Data sets –20 Newsgroups data 20 classes, 20000 documents –Classic3 data set 3 classes (cisi, med and cran), 3893 documents

15 Results– CLASSIC3 109986275138741 405954417145240 4414284784992 1-D Clustering (0.821) Co-Clustering (0.9835)

16 Results (Monotonicity) Loss in mutual information decreases monotonically with the number of iterations.

17 Conclusions Information theoretic approaches to clustering, co-clustering. Co-clustering intertwines row and column clusterings at all stages and is guaranteed to reach a local minimum. Can deal with the high-dimensional, sparse data efficiently.

18 Remarks Theoretically solid paper! Great! It is like k-means or EM in spirit. But it uses different formula to compute the cluster “prototype” (centroid in k-means). It needs to specify the number of clusters of row and column in advance.

19 Thank you!


Download ppt "Information-Theoretic Co- Clustering Inderjit S. Dhillon et al. University of Texas, Austin presented by Xuanhui Wang."

Similar presentations


Ads by Google