Information-Theoretic Co- Clustering Inderjit S. Dhillon et al. University of Texas, Austin presented by Xuanhui Wang.

Slides:



Advertisements
Similar presentations
Nonnegative Matrix Factorization with Sparseness Constraints S. Race MA591R.
Advertisements

Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Lecture outline Density-based clustering (DB-Scan) – Reference: Martin Ester, Hans-Peter Kriegel, Jorg Sander, Xiaowei Xu: A Density-Based Algorithm for.
Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach Xiaoli Zhang Fern, Carla E. Brodley ICML’2003 Presented by Dehong Liu.
Clustering V. Outline Validating clustering results Randomization tests.
Dimensionality Reduction PCA -- SVD
K Means Clustering , Nearest Cluster and Gaussian Mixture
DATA MINING LECTURE 7 Minimum Description Length Principle Information Theory Co-Clustering.
Probabilistic Clustering-Projection Model for Discrete Data
K-means clustering Hongning Wang
Introduction to Bioinformatics
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Fully Automatic Cross-Associations Deepayan Chakrabarti (CMU) Spiros Papadimitriou (CMU) Dharmendra Modha (IBM) Christos Faloutsos (CMU and IBM)
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Vector Space Information Retrieval Using Concept Projection Presented by Zhiguo Li
Machine Learning CMPT 726 Simon Fraser University
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
What is Cluster Analysis?
1 Fully Automatic Cross-Associations Deepayan Chakrabarti (CMU) Spiros Papadimitriou (CMU) Dharmendra Modha (IBM) Christos Faloutsos (CMU and IBM)
NEW APPROACH TO CALCULATION OF RANGE OF POLYNOMIALS USING BERNSTEIN FORMS.
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
EE462 MLCV 1 Lecture 3-4 Clustering (1hr) Gaussian Mixture and EM (1hr) Tae-Kyun Kim.
Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.
An Unsupervised Learning Approach for Overlapping Co-clustering Machine Learning Project Presentation Rohit Gupta and Varun Chandola
Radial Basis Function Networks
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Bi-Clustering Jinze Liu. Outline The Curse of Dimensionality Co-Clustering  Partition-based hard clustering Subspace-Clustering  Pattern-based 2.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.
Non Negative Matrix Factorization
Clustering Methods K- means. K-means Algorithm Assume that K=3 and initially the points are assigned to clusters as follows. C 1 ={x 1,x 2,x 3 }, C 2.
Co-clustering Documents and Words Using Bipartite Spectral Graph Partitioning Jinghe Zhang 10/28/2014 CS 6501 Information Retrieval.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
Self Organization of a Massive Document Collection Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Teuvo Kohonen et al.
Clustering Paolo Ferragina Dipartimento di Informatica Università di Pisa This is a mix of slides taken from several presentations, plus my touch !
Class Opener:. Identifying Matrices Student Check:
Low-Rank Kernel Learning with Bregman Matrix Divergences Brian Kulis, Matyas A. Sustik and Inderjit S. Dhillon Journal of Machine Learning Research 10.
UNSUPERVISED LEARNING David Kauchak CS 451 – Fall 2013.
Clustering and Testing in High- Dimensional Data M. Radavičius, G. Jakimauskas, J. Sušinskas (Institute of Mathematics and Informatics, Vilnius, Lithuania)
Bahman Bahmani Stanford University
Information Retrieval Lecture 6 Introduction to Information Retrieval (Manning et al. 2007) Chapter 16 For the MSc Computer Science Programme Dell Zhang.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
An Efficient Greedy Method for Unsupervised Feature Selection
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A modified version of the K-means algorithm with a distance.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 29 Nov 11, 2005 Nanjing University of Science & Technology.
Analysis of Bootstrapping Algorithms Seminar of Machine Learning for Text Mining UPC, 18/11/2004 Mihai Surdeanu.
Machine Learning Queens College Lecture 7: Clustering.
SAS® Macros for Constraining Arrays of Numbers Charles D. Coleman Economic Statistical Methods Division US Census Bureau.
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
About Me Swaroop Butala  MSCS – graduating in Dec 09  Specialization: Systems and Databases  Interests:  Learning new technologies  Application of.
Using Game Reviews to Recommend Games Michael Meidl, Steven Lytinen DePaul University School of Computing, Chicago IL Kevin Raison Chatsubo Labs, Seattle.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 8. Text Clustering.
FORECASTING METHODS OF NON- STATIONARY STOCHASTIC PROCESSES THAT USE EXTERNAL CRITERIA Igor V. Kononenko, Anton N. Repin National Technical University.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Hierarchical Co-Clustering Based on Entropy Splitting Wei Cheng 1 Xiang Zhang 2 Feng Pan 3 Wei Wang 4 1.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 28 Nov 9, 2005 Nanjing University of Science & Technology.
A PAC-Bayesian Approach to Formulation of Clustering Objectives Yevgeny Seldin Joint work with Naftali Tishby.
A PAC-Bayesian Approach to Formulation of Clustering Objectives Yevgeny Seldin Joint work with Naftali Tishby.
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
PAC-Bayesian Analysis of Unsupervised Learning Yevgeny Seldin Joint work with Naftali Tishby.
13.4 Product of Two Matrices
Clustering Uncertain Taxi data
A Consensus-Based Clustering Method
Information Organization: Clustering
SMEM Algorithm for Mixture Models
Concept Decomposition for Large Sparse Text Data Using Clustering
Information Theoretical Probe Selection for Hybridisation Experiments
CS 485G: Special Topics in Data Mining
Discrete Random Variables: Joint PMFs, Conditioning and Independence
Presentation transcript:

Information-Theoretic Co- Clustering Inderjit S. Dhillon et al. University of Texas, Austin presented by Xuanhui Wang

Introduction Clustering –Group “similar” objects together –Typically, the data is represented in a two- dimensional co-occurrence matrix. E.g. in text analysis, the document-term co- occurrence matrix.

One-dimensional Clustering Document Clustering:  Treat each row as one Doc  Define a similarity measure  Clustering the documents using e.g. k-means Term Clustering:  Symmetric with Doc Clustering Doc-Term Co-occurrence Matrix

Idea of Co-Clustering Co-occurrence Matrices Characteristics –Data sparseness –High dimension –Noise Motivation –Is it possible to combine the document and term clustering together? Can they bootstrap each other? Yes, Co-Clustering – Simultaneously cluster the rows X and columns Y of the co-occurrence matrix.

Information-Theoretic Co- Clustering View (scaled) co-occurrence matrix as a joint probability distribution between row & column random variables We seek a hard-clustering of both dimensions such that loss in “Mutual Information” is minimized given a fixed no. of row & col. clusters

Example Mutual Information between random variables X and Y: It can be verified that this is the minimum mutual information loss

Information Theoretic Co-clustering (Lemma) Loss in mutual information equals where –Can be shown that q(x,y) is a “maximum entropy” approximation to p(x,y). –q(x,y) preserves marginals : q(x)=p(x) & q(y)=p(y)

Given a co-clustering result, we can get 3 distribution matrix Then get

Preserving Mutual Information Lemma : Note that may be thought of as the “prototype” of row cluster (the usual “centroid” of the cluster is ) Similarly,

Example – Cont’d

Co-Clustering Algorithm 1. Given a partition, calculate the “prototype” of each row cluster. 2. Assign each row x to its nearest cluster. 3. Update the probabilities based on the new row clusters and then compute new column cluster “prototype”. 4. Assign each column y to its nearest cluster. 5. Update the probabilities based on the new column clusters and then compute new row cluster “prototype”. 6. If converge, stop. Otherwise go to Step 2.

Properties of Co-clustering Algorithm Theorem: The co-clustering algorithm monotonically decreases loss in mutual information (objective function value) Marginals p(x) and p(y) are preserved at every step (q(x)=p(x) and q(y)=p(y) )

Experiments Data sets –20 Newsgroups data 20 classes, documents –Classic3 data set 3 classes (cisi, med and cran), 3893 documents

Results– CLASSIC D Clustering (0.821) Co-Clustering (0.9835)

Results (Monotonicity) Loss in mutual information decreases monotonically with the number of iterations.

Conclusions Information theoretic approaches to clustering, co-clustering. Co-clustering intertwines row and column clusterings at all stages and is guaranteed to reach a local minimum. Can deal with the high-dimensional, sparse data efficiently.

Remarks Theoretically solid paper! Great! It is like k-means or EM in spirit. But it uses different formula to compute the cluster “prototype” (centroid in k-means). It needs to specify the number of clusters of row and column in advance.

Thank you!