Margareta Ackerman Joint work with Shai Ben-David Measures of Clustering Quality: A Working Set of Axioms for Clustering.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

A Framework for Result Diversification
Introduction The concept of transform appears often in the literature of image processing and data compression. Indeed a suitable discrete representation.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
The Stability of a Good Clustering Marina Meila University of Washington
Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.
Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang.
Clustering Social Networks Isabelle Stanton, University of Virginia Joint work with Nina Mishra, Robert Schreiber, and Robert E. Tarjan.
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
CHAPTER 2 D IRECT M ETHODS FOR S TOCHASTIC S EARCH Organization of chapter in ISSO –Introductory material –Random search methods Attributes of random search.
A Fairy Tale of Greedy Algorithms Yuli Ye Joint work with Allan Borodin, University of Toronto.
Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT.
Weighted Clustering Margareta Ackerman Work with Shai Ben-David, Simina Branzei, and David Loker.
Discerning Linkage-Based Algorithms Among Hierarchical Clustering Methods Margareta Ackerman and Shai Ben-David IJCAI 2011.
Classification-Based Clustering Evaluation John S. Whissell, Charles L.A. Clarke Benjamin Parimala Dheeraj V Katta.
The Distortion of Cardinal Preferences in Voting Ariel D. Procaccia and Jeffrey S. Rosenschein.
1 CLUSTERING  Basic Concepts In clustering or unsupervised learning no training data, with class labeling, are available. The goal becomes: Group the.
Clustering II.
Affine-invariant Principal Components Charlie Brubaker and Santosh Vempala Georgia Tech School of Computer Science Algorithms and Randomness Center.
International Workshop on Computer Vision - Institute for Studies in Theoretical Physics and Mathematics, April , Tehran 1 IV COMPUTING SIZE.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Social Choice Theory By Shiyan Li. History The theory of social choice and voting has had a long history in the social sciences, dating back to early.
The Computational Complexity of Searching for Predictive Hypotheses Shai Ben-David Computer Science Dept. Technion.
Support Vector Machines Based on Burges (1998), Scholkopf (1998), Cristianini and Shawe-Taylor (2000), and Hastie et al. (2001) David Madigan.
Computational Learning Theory; The Tradeoff between Computational Complexity and Statistical Soundness Shai Ben-David CS Department, Cornell and Technion,
Clustering Social Networks Isabelle Stanton, University of Virginia Joint work with Nina Mishra, Robert Schreiber, and Robert E. Tarjan.
Clustering Color/Intensity
Support Vector Machines
1 On statistical models of cluster stability Z. Volkovich a, b, Z. Barzily a, L. Morozensky a a. Software Engineering Department, ORT Braude College of.
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
A Discriminative Framework for Clustering via Similarity Functions Maria-Florina Balcan Carnegie Mellon University Joint with Avrim Blum and Santosh Vempala.
Sufficient Dimensionality Reduction with Irrelevance Statistics Amir Globerson 1 Gal Chechik 2 Naftali Tishby 1 1 Center for Neural Computation and School.
Clustering III. Lecture outline Soft (model-based) clustering and EM algorithm Clustering aggregation [A. Gionis, H. Mannila, P. Tsaparas: Clustering.
A Theory of Learning and Clustering via Similarity Functions Maria-Florina Balcan 09/17/2007 Joint work with Avrim Blum and Santosh Vempala Carnegie Mellon.
Evaluating Performance for Data Mining Techniques
An Impossibility Theorem for Clustering By Jon Kleinberg.
On Data Mining, Compression, and Kolmogorov Complexity. C. Faloutsos and V. Megalooikonomou Data Mining and Knowledge Discovery, 2007.
Math 3121 Abstract Algebra I Section 0: Sets. The axiomatic approach to Mathematics The notion of definition - from the text: "It is impossible to define.
Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker.
RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David.
1. Clustering is one of the most widely used tools for exploratory data analysis. Social Sciences Biology Astronomy Computer Science …. All apply clustering.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.
Formal Foundations of Clustering Margareta Ackerman Work with Shai Ben-David, Simina Branzei, and David Loker.
1 What is Game Theory About? r Analysis of situations where conflict of interests is present r Goal is to prescribe how conflicts can be resolved 2 2 r.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Joint Moments and Joint Characteristic Functions.
Relation Strength-Aware Clustering of Heterogeneous Information Networks with Incomplete Attributes ∗ Source: VLDB.
A PAC-Bayesian Approach to Formulation of Clustering Objectives Yevgeny Seldin Joint work with Naftali Tishby.
Machine Learning Lunch - 29 Sep 2009 – ClusteringTheory.org John Oliver from “The Daily Show” Supporting worthy causes at the G20 Pittsburgh Summit: “Bayesians.
A PAC-Bayesian Approach to Formulation of Clustering Objectives Yevgeny Seldin Joint work with Naftali Tishby.
Learning with General Similarity Functions Maria-Florina Balcan.
Clustering with Spectral Norm and the k-means algorithm Ravi Kannan Microsoft Research Bangalore joint work with Amit Kumar (Indian Institute of Technology,
Using Asymmetric Distributions to Improve Text Classifier Probability Estimates Paul N. Bennett Computer Science Dept. Carnegie Mellon University SIGIR.
Clustering Data Streams
Semi-Supervised Clustering
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Hidden Markov Models Part 2: Algorithms
William Norris Professor and Head, Department of Computer Science
Xiaoyang Zhang1, Yuchong Hu1, Patrick P. C. Lee2, Pan Zhou1
Clustering.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Margareta Ackerman Joint work with Shai Ben-David Measures of Clustering Quality: A Working Set of Axioms for Clustering

Clustering is one of the most widely used tools for exploratory data analysis. Social Sciences Biology Astronomy Computer Science.. All apply clustering to gain a first understanding of the structure of large data sets. The Theory-Practice Gap Yet, there is distressingly little theoretical understanding of clustering.

Can clustering be given a formal and general definition? What is a “good” clustering? Can we distinguish “clusterable” from “structureless” data? Questions that research of fundamentals of clustering should address

Clustering is not well defined. There is a wide variety of different clustering tasks, with different (often implicit) measures of quality. Inherent Obstacles In most practical clustering tasks there is no clear ground truth to evaluate your solution by. (in contrast with classification tasks, in which you can have a hold out labeled set to evaluate the classifier against). A clustering may have different value to different users. e.g. Cluster paintings by painter vs. topic

Objective utility functions Sum Of In-Cluster Distances, Average Distances to Center Points, Cut Weight, Spectral Clustering, etc. (Shmoys, Charikar, Meyerson, Luxburg,..) Analyze the computational complexity of discrete optimization problems. Consider a restricted set of distributions (“generative models”): Ex. Mixtures of Gaussians [Dasgupta ‘99], [Vempala, ’03], [Kannan et al ‘04], [Achlitopas, McSherry ‘05]. Recover the parameters of the model generating the data. Many more… Add structure:“Relevant Information” Ex. Information bottle-neck approach [Tishby, Pereira, Bialek ‘99] Factor out user-irrelevant information. Common Solutions

What can we say independently of any specific algorithm, specific objective function or specific generative data model ? Clustering Axioms Postulate axioms that, ideally, every clustering approach should satisfy. e.g. [Hartigan 1975], [Puzicha, Hofmann, Buhmann ‘00], [Kleinberg ‘02]. usually conclude with negative results. Quest for a General Theory

Sd For a finite domain set S, a distance function d is the distance defined between the domain points. A Clustering Function maps d S Input: a distance function d over Sto S Output: a partition (clustering) of S Our Formal Setup

Kleinberg proposes natural-looking “Axioms” that distinguish clustering functions from other functions that output domain partitions. Kleinberg’s Work on Clustering Functions

Scale Invariance F(λd)=F(d)d λ F(λd)=F(d) for all d and all strictly positive λ. Consistency d’d, F(d),F(d)=F(d’). If d’ equals d, except for shrinking distances within clusters of F(d) or stretching between-cluster distances, then F(d)=F(d’). Richness PS For any partition P of S, there exists a distance d S F(d)=P function d over S so that F(d)=P. Kleinberg’s Axioms

Theorem [Kleinberg, 2002]: These axioms are inconsistent. Namely, no function can satisfy these three axioms. Theorem [Kleinberg, 2002]: These axioms are inconsistent. Namely, no function can satisfy these three axioms. How come “axioms” that seem to capture our intuition about clustering are inconsistent?? Our answer: The formalization of these axioms is stronger than the intuition they intend to capture. We express that same intuition in an alternative framework, and achieve consistency.

Clustering-quality measures quantify the quality of clusterings. How good is this clustering? Clustering-Quality Measures

A clustering-quality measure is a function m( dataset, clustering ) m( dataset, clustering ) ε R satisfying some properties that make this function a meaningful clustering-quality measure. What properties should it satisfy? Defining Clustering-Quality Measures

Scale Invariance m(C,d)=m(C, λd)d λCd m(C,d)=m(C, λd) for all d and all strictly positive λ, and C over d. Richness CS d S For any clustering C of S, there exists a distance function d over S so that C = argmax c m (C,d) C = argmax c m (C,d). Rephrasing Kleinberg’s axioms as clustering-quality measures axioms

Consistency d’d, C, m(C,d)≤m(C,d’). If d’ equals d, except for shrinking distances within clusters of C or stretching between-cluster distances, then m(C,d)≤m(C,d’). dd’ C C Rephrasing Kleinberg’s axioms as clustering-quality measures axioms

C(X,d)C’(X,d’) f:XXxyC f(x)f(y)C’ Clusterings C over (X,d) and C’ over (X,d’) are isomorphic, if there exists a distance-preserving automorphism f:X → X, such that x,y share the same C- cluster iff f(x) and f(y) share the same C’-cluster. Isomorphism Invariance: CC’m(C,d) = m(C’,d’) If C and C’ are isomorphic, then m(C,d) = m(C’,d’). An Additional Axiom

Moreover, every reasonable CQM satisfies our axioms. We prove this result by demonstrating measures that satisfy these axioms. Theorem: Consistency, scale invariance, richness, and isomorphism invariance for clustering quality measures form a consistent set of requirements. Theorem: Consistency, scale invariance, richness, and isomorphism invariance for clustering quality measures form a consistent set of requirements. Major Gain – Consistency of New Axioms

xC The Relative Margin of a point x in C is x x (dist. to closest center to x) / (dist. to 2 nd closest center to x) C The Relative Margin of C is the average relative margin over all non-center points (over all possible center settings). Relative Margin satisfies scale-invariance, consistency, richness, and isomorphism invariance. An example of a CQM for center-based clustering: Relative Margin

 C-index (Dalrymple-Alford, 1970)  Gamma (Baker & Hubert, 1975)  Adjusted ratio of clustering (Roenker et al., 1971)  D-index (Dalrymple-Alford, 1970)  Modified ratio of repetition (Bower, Lesgold, and Tieman, 1969)  Dunn's index (Dunn, 1973)  Variations of Dunn’s index (Bezdek and Pal, 1998)  Strict separation (based on Balacan, Blum, and Vempala, 2008)  And many more... Additional CQMs Satisfying Our Axioms

In the setting of clustering functions, the consistency axiom requires that consistent changes to the underlying distance should not create any new contenders for the best-clustering of the data. dd’ CC C’ C’ A clustering function that satisfies Kleinberg’s Consistency cannot output C’. Why is the CQM formalism more faithful to intuition?

C In the setting of clustering-quality measures, the consistency axiom requires only that the quality of the clustering of a given clustering C does not get worse. dd’ CC C’ C C’ While the quality of C improves, a different clustering, C’, can still have better quality. Why is the CQM formalism more faithful to intuition?

The intuition behind Kleinberg’s axioms is consistent (in spite of his impossibility result). The Impossibility Result can be overcome by a change of formalism. We do this by focusing on clustering-quality measures. Every reasonable clustering-quality measure satisfies our axioms. Summary

How can the “completeness” of a set of axioms be argued? Are the axioms useful for gaining interesting new insights about clusterings? Can we find properties that distinguish different clustering paradigms? Future Work

Appendix: Another Clustering-Quality Measure: Gamma (Baker & Hubert, 1975) Gamma is the best performing measure in Milligan’s study of 30 internal criterions (Milligan, 1981). C Let d(+) denote the number of times that points which were clustered together in C had distance greater than two points which were not in the same cluster Let d(-) denote the opposite result Gamma satisfies scale-invariance, consistency, richness, and isomorphism invariance.

Variants of Quality Measures m Given a clustering-quality measure m, we can create new ones by applying it to a subset of the clusters. m min (C,d) = min s (m(S,d)) m min (C,d) = min s (m(S,d)), S C where S is a subset of a least 2 clusters in C. m max m average Similarly, we can define m max and m average. m m min m max m average. If m satisfies the axioms of clustering-quality measures, then so do m min, m max,and m average.