Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING.

Slides:

Advertisements

Similar presentations

Great Theoretical Ideas in Computer Science

Advertisements

Clustering Categorical Data The Case of Quran Verses

The Greedy Method1. 2 Outline and Reading The Greedy Method Technique (§5.1) Fractional Knapsack Problem (§5.1.1) Task Scheduling (§5.1.2) Minimum Spanning.

Great Theoretical Ideas in Computer Science for Some.

1 Discrete Structures & Algorithms Graphs and Trees: III EECE 320.

Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang.

Minimum Spanning Trees Definition of MST Generic MST algorithm Kruskal's algorithm Prim's algorithm.

Self Stabilizing Algorithms for Topology Management Presentation: Deniz Çokuslu.

Section 7.4: Closures of Relations Let R be a relation on a set A. We have talked about 6 properties that a relation on a set may or may not possess: reflexive,

Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.

Margareta Ackerman Joint work with Shai Ben-David Measures of Clustering Quality: A Working Set of Axioms for Clustering.

Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.

Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT.

Introduction to Bioinformatics

Weighted Clustering Margareta Ackerman Work with Shai Ben-David, Simina Branzei, and David Loker.

Discerning Linkage-Based Algorithms Among Hierarchical Clustering Methods Margareta Ackerman and Shai Ben-David IJCAI 2011.

1 By Gil Kalai Institute of Mathematics and Center for Rationality, Hebrew University, Jerusalem, Israel presented by: Yair Cymbalista.

Great Theoretical Ideas in Computer Science.

1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.

CSE 780 Algorithms Advanced Algorithms Minimum spanning tree Generic algorithm Kruskal’s algorithm Prim’s algorithm.

University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.

L16: Micro-array analysis Dimension reduction Unsupervised clustering.

1 Minimum Spanning Trees Definition of MST Generic MST algorithm Kruskal's algorithm Prim's algorithm.

Segmentation Graph-Theoretic Clustering.

Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)

Backtracking Reading Material: Chapter 13, Sections 1, 2, 4, and 5.

Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.

Performance guarantees for hierarchical clustering Sanjoy Dasgupta University of California, San Diego Philip Long Genomics Institute of Singapore.

Intro. ANN & Fuzzy Systems Lecture 21 Clustering (2)

Introduction Outline The Problem Domain Network Design Spanning Trees Steiner Trees Triangulation Technique Spanners Spanners Application Simple Greedy.

Clustering III. Lecture outline Soft (model-based) clustering and EM algorithm Clustering aggregation [A. Gionis, H. Mannila, P. Tsaparas: Clustering.

Minimum Spanning Trees. Subgraph A graph G is a subgraph of graph H if –The vertices of G are a subset of the vertices of H, and –The edges of G are a.

TECH Computer Science Graph Optimization Problems and Greedy Algorithms Greedy Algorithms  // Make the best choice now! Optimization Problems  Minimizing.

The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.

Graph Coalition Structure Generation Maria Polukarov University of Southampton Joint work with Tom Voice and Nick Jennings HUJI, 25 th September 2011.

An Impossibility Theorem for Clustering By Jon Kleinberg.

Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker.

UNC Chapel Hill Lin/Foskey/Manocha Minimum Spanning Trees Problem: Connect a set of nodes by a network of minimal total length Some applications: –Communication.

Accessible Set Systems Andreas Klappenecker. Matroid Let S be a finite set, and F a nonempty family of subsets of S, that is, F  P(S). We call (S,F)

7.1 and 7.2: Spanning Trees. A network is a graph that is connected –The network must be a sub-graph of the original graph (its edges must come from the.

Module 5 – Networks and Decision Mathematics Chapter 23 – Undirected Graphs.

CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David.

1. Clustering is one of the most widely used tools for exploratory data analysis. Social Sciences Biology Astronomy Computer Science …. All apply clustering.

CS654: Digital Image Analysis

Chameleon: A hierarchical Clustering Algorithm Using Dynamic Modeling By George Karypis, Eui-Hong Han,Vipin Kumar and not by Prashant Thiruvengadachari.

Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.

Formal Foundations of Clustering Margareta Ackerman Work with Shai Ben-David, Simina Branzei, and David Loker.

Presenter ： Kuang-Jui Hsu Date ： 2011/3/24(Thur.).

Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.

Hedonic Clustering Games Moran Feldman Joint work with: Seffi Naor and Liane Lewin-Eytan.

1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.

Machine Learning Lunch - 29 Sep 2009 – ClusteringTheory.org John Oliver from “The Daily Show” Supporting worthy causes at the G20 Pittsburgh Summit: “Bayesians.

Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!

Chapter 20: Graphs. Objectives In this chapter, you will: – Learn about graphs – Become familiar with the basic terminology of graph theory – Discover.

Clustering Data Streams A presentation by George Toderici.

Lecture 12 Algorithm Analysis Arne Kutzner Hanyang University / Seoul Korea.

Clustering [Idea only, Chapter 10.1, 10.2, 10.4].

Great Theoretical Ideas in Computer Science

Lecture 12 Algorithm Analysis

Chapter 5. Optimal Matchings

Discrete Mathematics for Computer Science

CS 583 Analysis of Algorithms

Compact routing schemes with improved stretch

Lecture 12 Algorithm Analysis

Locality In Distributed Graph Algorithms

Minimum Spanning Trees

Presentation transcript:

Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

TALK OUTLINE Questions being addressed Introduce Axioms Introduce Properties Uniqueness Theorem for Single-Linkage Taxonomy of Partitioning Functions Bosagh Zadeh, Ben-David, UAI 2009

WHAT IS CLUSTERING? Given a collection of objects (characterized by feature vectors, or just a matrix of pair- wise similarities), detects the presence of distinct groups, and assign objects to groups. Bosagh Zadeh, Ben-David, UAI 2009

SOME BASIC UNANSWERED QUESTIONS  Are there principles governing all clustering paradigms?  Which clustering paradigm should I use for a given task? Bosagh Zadeh, Ben-David, UAI 2009

“Clustering” is an ill-defined problem  There are many different clustering tasks, leading to different clustering paradigms: THERE ARE MANY CLUSTERING TASKS Bosagh Zadeh, Ben-David, UAI 2009

“Clustering” is an ill-defined problem  There are many different clustering tasks, leading to different clustering paradigms: THERE ARE MANY CLUSTERING TASKS Bosagh Zadeh, Ben-David, UAI 2009

WE WOULD LIKE TO DISCUSS THE BROAD NOTION OF CLUSTERING Independently of any particular algorithm, particular objective function, or particular generative data model Bosagh Zadeh, Ben-David, UAI 2009

WHAT FOR?  Choosing a suitable algorithm for a given task.  Axioms: to capture intuition about clustering in general. Expected to be satisfied by all clustering functions  Properties: to capture differences between different clustering paradigms Bosagh Zadeh, Ben-David, UAI 2009

TIMELINE Jardine, Sibson 1971 Considered only hierarchical functions Kleinberg 2003 Presented an impossibility result This paper Presents a uniqueness result for Single-Linkage Bosagh Zadeh, Ben-David, UAI 2009

THE BASIC SETTING S  For a finite domain set S, a distance function is a symmetric mapping d:SxSR + d:SxS → R + such that d(x,y)=0x=y d(x,y)=0 iff x=y. SS  A partitioning function takes a dissimilarity function on S and returns a partition of S.  We wish to define the axioms that distinguish clustering functions, from any other functions that output domain partitions. Bosagh Zadeh, Ben-David, UAI 2009

KLEINBERG’S AXIOMS  Scale Invariance F(λd)=F(d)d λ F(λd)=F(d) for all d and all strictly positive λ.  Richness F(d)d The range of F(d) over all d is the set of all possible partitionings  Consistency d’d F(d) If d’ equals d except for shrinking distances within clusters of F(d) or stretching between-cluster distances, F(d) = F(d’). then F(d) = F(d’). Bosagh Zadeh, Ben-David, UAI 2009

KLEINBERG’S AXIOMS  Scale Invariance F(λd)=F(d)d λ F(λd)=F(d) for all d and all strictly positive λ.  Richness F(d)d The range of F(d) over all d is the set of all possible partitionings  Consistency d’d F(d) If d’ equals d except for shrinking distances within clusters of F(d) or stretching between-cluster distances, F(d) = F(d’). then F(d) = F(d’). Inconsistent! No algorithm can satisfy all 3 of these. Bosagh Zadeh, Ben-David, UAI 2009

CONSISTENT AXIOMS:  Scale Invariance F(λd, k)=F(d, k)d λ F(λd, k)=F(d, k) for all d and all strictly positive λ.  k-Richness F(d, k)d The range of F(d, k) over all d is the set of all possible k- k- partitionings  Consistency d’d F(d, k) If d’ equals d except for shrinking distances within clusters of F(d, k) or stretching between-cluster distances, F(d, k)=F(d’, k). then F(d, k)=F(d’, k). Consistent! (And satisfied by Single-Linkage, Min-Sum, …) k Fix k Bosagh Zadeh, Ben-David, UAI 2009

Definition. Call any partitioning function which satisfies a Clustering Function  Scale Invariance  k-Richness  Consistency CLUSTERING FUNCTIONS Bosagh Zadeh, Ben-David, UAI 2009

TWO CLUSTERING FUNCTIONS Single-Linkage 1.Start with with all points in their own cluster 2.While there are more than k clusters Merge the two most similar clusters Similarity between two clusters is the similarity of the most similar two points from differing clusters Min-Sum k-Clustering Find the k-partitioning Γ which minimizes (Is NP-Hard to optimize)  Scale Invariance  k-Richness  Consistency Both Functions satisfy: Hierarchical Not Hierarchical Proofs in paper. Bosagh Zadeh, Ben-David, UAI 2009

CLUSTERING FUNCTIONS Single-Linkage and Min-Sum are both Clustering functions. How to distinguish between them in an Axiomatic framework? Use Properties Not all properties are desired in every clustering situation: pick and choose properties for your task Bosagh Zadeh, Ben-David, UAI 2009

PROPERTIES - ORDER- CONSISTENCY  Order-Consistency dd’ k If two datasets d and d’ have the same ordering of the distances, then for all k, F(d, k)=F(d’, k) Bosagh Zadeh, Ben-David, UAI 2009 o In other words the clustering function only cares about whether a pair of points are closer/further than another pair of points. o Satisfied by Single-Linkage, Max-Linkage, Average-Linkage… o NOT satisfied by most objective functions (Min-Sum, k-means, …)

PATH-DISTANCE In other words, we find the path from x to y, which has the smallest longest jump in it e.g. P d (, ) = 2 Since the path from above has a jump of distance 2 Undrawn edges are large Bosagh Zadeh, Ben-David, UAI 2009

PATH-DISTANCE Imagine each point is an island, and we would like to go from island a to island b. As if we’re trying to cross a river by jumping on rocks. Being human, we are restricted in how far we can jump from island to island. Path-Distance would have us find the path with the smallest longest jump, ensuring that we could complete all the jumps successfully. Bosagh Zadeh, Ben-David, UAI 2009

PROPERTIES – PATH-DISTANCE COHERENCE  Path-Distance Coherence dd’ k If two datasets d and d’ have the same induced path distance then for all k, F(d, k)=F(d’, k) Bosagh Zadeh, Ben-David, UAI 2009

UNIQUENESS THEOREM Theorem (This work) Single-Linkage is the only clustering function satisfying Order-Consistency and Path Distance-Coherence Bosagh Zadeh, Ben-David, UAI 2009

UNIQUENESS THEOREM Theorem (This work) Single-Linkage is the only clustering function satisfying Order-Consistency and Path-Distance-Coherence Is Path-Distance-Coherence doing all the work? No. Consistency is necessary for uniqueness k-Richness is necessary “X is Necessary”: All other axioms/properties satisfied, just X missing, still not enough to get uniqueness Bosagh Zadeh, Ben-David, UAI 2009

PRACTICAL CONSIDERATIONS Single-Linkage is not always the right function to use. Because Path-Distance-Coherence is not always desirable. It’s not always immediately obvious when we want a function to focus on the Path Distance Introduce a different formulation involving Minimum Spanning Trees Bosagh Zadeh, Ben-David, UAI 2009

20 PROPERTIES - MST-COHERENCE F If Then  MST-Coherence dd’ k If two datasets d and d’ have the same Minimum Spanning Tree then for all k, F(d, k)=F(d’, k), 2 F 20, 2 Bosagh Zadeh, Ben-David, UAI 2009

A TAXONOMY OF CLUSTERING FUNCTIONS Min-Sum satisfies neither MST-Coherence nor Order-Consistency Future work: Characterize other clustering functions Bosagh Zadeh, Ben-David, UAI 2009

THANKS FOR YOUR ATTENTION! Bosagh Zadeh, Ben-David, UAI 2009

ASIDE: MINIMUM SPANNING TREES Spanning Tree: Tree Sub-graph of original graph which touches all nodes. Weight of tree is equal to sum of all edge weights. Spanning Trees ordered by weight, we are interested in the Minimum Spanning Tree Picture: Wikipedia Bold: Minimum Spanning Tree of the graph Bosagh Zadeh, Ben-David, UAI 2009

PROOF OUTLINE: CHARACTERIZATION OF SINGLE- LINKAGE 1. Start with arbitrary d, k 2. By k-Richness, there exists a d 1 such that F(d 1, k) = SL(d, k) 3. Through a series of Consistent transformations, can transform d 1 into d 6, which will have the same MST as d 4. Invoke MST-Coherence to get F(d 1, k) = F(d 6, k) = F(d, k) = SL(d, k) Bosagh Zadeh, Ben-David, UAI 2009

KLEINBERG’S IMPOSSIBILITY RESULT There exist no clustering function all 3 properties Proof: Scaling up Consistency Bosagh Zadeh, Ben-David, UAI 2009

AXIOMS AS A TOOL FOR A TAXONOMY OF CLUSTERING PARADIGMS The goal is to generate a variety of axioms (or properties) over a fixed framework, so that different clustering approaches could be classified by the different subsets of axioms they satisfy. Scale Invariance k-RichnessConsistencySeparabilityOrder Invariance Hier- archy Single Linkage Center Based Spectral MDL ++- Rate Distortion ++- “Axioms” “Properties” Bosagh Zadeh, Ben-David, UAI 2009

PROPERTIES Order-Consistency Function only compares distances together, not using absolute value Minimum Spanning Tree Coherence If two datasets d and d’ have the same Minimum Spanning Tree, then for all k, F(d, k) = F(d’, k) Function makes all its decisions using the Minimum Spanning Tree Bosagh Zadeh, Ben-David, UAI 2009

SOME MORE EXAMPLES Bosagh Zadeh, Ben-David, UAI 2009

AXIOMS - SCALE INVARIANCE F If F Then  Scale Invariance F(λd)=F(d)d λ F(λd)=F(d) for all d and all strictly positive λ. 3 6 e.g. double the distances Bosagh Zadeh, Ben-David, UAI 2009

AXIOMS - RICHNESS F F F … Etc. can get all partitionings of the points  Richness F(d)d The range of F(d) over all d is the set of all possible partitionings Bosagh Zadeh, Ben-David, UAI 2009

AXIOMS - CONSISTENCY  Consistency d’d F(d) If d’ equals d except for shrinking distances within clusters of F(d) or stretching between-cluster distances, F(d)=F(d’). then F(d)=F(d’). F If Then F Bosagh Zadeh, Ben-David, UAI 2009

PROPERTIES - ORDER- CONSISTENCY F If Then  Order-Consistency dd’ k If two datasets d and d’ have the same ordering of the distances, then for all k, F(d, k)=F(d’, k) F 35 3 Maintain edge ordering, 2 Bosagh Zadeh, Ben-David, UAI 2009