Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING.

Similar presentations


Presentation on theme: "Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING."— Presentation transcript:

1 Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING

2 TALK OUTLINE Questions being addressed Introduce Axioms Introduce Properties Uniqueness Theorem for Single-Linkage Taxonomy of Partitioning Functions Bosagh Zadeh, Ben-David, UAI 2009

3 WHAT IS CLUSTERING? Given a collection of objects (characterized by feature vectors, or just a matrix of pair- wise similarities), detects the presence of distinct groups, and assign objects to groups. Bosagh Zadeh, Ben-David, UAI 2009

4 SOME BASIC UNANSWERED QUESTIONS  Are there principles governing all clustering paradigms?  Which clustering paradigm should I use for a given task? Bosagh Zadeh, Ben-David, UAI 2009

5 “Clustering” is an ill-defined problem  There are many different clustering tasks, leading to different clustering paradigms: THERE ARE MANY CLUSTERING TASKS Bosagh Zadeh, Ben-David, UAI 2009

6 “Clustering” is an ill-defined problem  There are many different clustering tasks, leading to different clustering paradigms: THERE ARE MANY CLUSTERING TASKS Bosagh Zadeh, Ben-David, UAI 2009

7 WE WOULD LIKE TO DISCUSS THE BROAD NOTION OF CLUSTERING Independently of any particular algorithm, particular objective function, or particular generative data model Bosagh Zadeh, Ben-David, UAI 2009

8 WHAT FOR?  Choosing a suitable algorithm for a given task.  Axioms: to capture intuition about clustering in general. Expected to be satisfied by all clustering functions  Properties: to capture differences between different clustering paradigms Bosagh Zadeh, Ben-David, UAI 2009

9 TIMELINE Jardine, Sibson 1971 Considered only hierarchical functions Kleinberg 2003 Presented an impossibility result This paper Presents a uniqueness result for Single-Linkage Bosagh Zadeh, Ben-David, UAI 2009

10 THE BASIC SETTING S  For a finite domain set S, a distance function is a symmetric mapping d:SxSR + d:SxS → R + such that d(x,y)=0x=y d(x,y)=0 iff x=y. SS  A partitioning function takes a dissimilarity function on S and returns a partition of S.  We wish to define the axioms that distinguish clustering functions, from any other functions that output domain partitions. Bosagh Zadeh, Ben-David, UAI 2009

11 KLEINBERG’S AXIOMS  Scale Invariance F(λd)=F(d)d λ F(λd)=F(d) for all d and all strictly positive λ.  Richness F(d)d The range of F(d) over all d is the set of all possible partitionings  Consistency d’d F(d) If d’ equals d except for shrinking distances within clusters of F(d) or stretching between-cluster distances, F(d) = F(d’). then F(d) = F(d’). Bosagh Zadeh, Ben-David, UAI 2009

12 KLEINBERG’S AXIOMS  Scale Invariance F(λd)=F(d)d λ F(λd)=F(d) for all d and all strictly positive λ.  Richness F(d)d The range of F(d) over all d is the set of all possible partitionings  Consistency d’d F(d) If d’ equals d except for shrinking distances within clusters of F(d) or stretching between-cluster distances, F(d) = F(d’). then F(d) = F(d’). Inconsistent! No algorithm can satisfy all 3 of these. Bosagh Zadeh, Ben-David, UAI 2009

13 CONSISTENT AXIOMS:  Scale Invariance F(λd, k)=F(d, k)d λ F(λd, k)=F(d, k) for all d and all strictly positive λ.  k-Richness F(d, k)d The range of F(d, k) over all d is the set of all possible k- k- partitionings  Consistency d’d F(d, k) If d’ equals d except for shrinking distances within clusters of F(d, k) or stretching between-cluster distances, F(d, k)=F(d’, k). then F(d, k)=F(d’, k). Consistent! (And satisfied by Single-Linkage, Min-Sum, …) k Fix k Bosagh Zadeh, Ben-David, UAI 2009

14 Definition. Call any partitioning function which satisfies a Clustering Function  Scale Invariance  k-Richness  Consistency CLUSTERING FUNCTIONS Bosagh Zadeh, Ben-David, UAI 2009

15 TWO CLUSTERING FUNCTIONS Single-Linkage 1.Start with with all points in their own cluster 2.While there are more than k clusters Merge the two most similar clusters Similarity between two clusters is the similarity of the most similar two points from differing clusters Min-Sum k-Clustering Find the k-partitioning Γ which minimizes (Is NP-Hard to optimize)  Scale Invariance  k-Richness  Consistency Both Functions satisfy: Hierarchical Not Hierarchical Proofs in paper. Bosagh Zadeh, Ben-David, UAI 2009

16 CLUSTERING FUNCTIONS Single-Linkage and Min-Sum are both Clustering functions. How to distinguish between them in an Axiomatic framework? Use Properties Not all properties are desired in every clustering situation: pick and choose properties for your task Bosagh Zadeh, Ben-David, UAI 2009

17 PROPERTIES - ORDER- CONSISTENCY  Order-Consistency dd’ k If two datasets d and d’ have the same ordering of the distances, then for all k, F(d, k)=F(d’, k) Bosagh Zadeh, Ben-David, UAI 2009 o In other words the clustering function only cares about whether a pair of points are closer/further than another pair of points. o Satisfied by Single-Linkage, Max-Linkage, Average-Linkage… o NOT satisfied by most objective functions (Min-Sum, k-means, …)

18 PATH-DISTANCE In other words, we find the path from x to y, which has the smallest longest jump in it. 12 7 14 e.g. P d (, ) = 2 Since the path from above has a jump of distance 2 Undrawn edges are large Bosagh Zadeh, Ben-David, UAI 2009

19 PATH-DISTANCE Imagine each point is an island, and we would like to go from island a to island b. As if we’re trying to cross a river by jumping on rocks. Being human, we are restricted in how far we can jump from island to island. Path-Distance would have us find the path with the smallest longest jump, ensuring that we could complete all the jumps successfully. Bosagh Zadeh, Ben-David, UAI 2009

20 PROPERTIES – PATH-DISTANCE COHERENCE  Path-Distance Coherence dd’ k If two datasets d and d’ have the same induced path distance then for all k, F(d, k)=F(d’, k) Bosagh Zadeh, Ben-David, UAI 2009

21 UNIQUENESS THEOREM Theorem (This work) Single-Linkage is the only clustering function satisfying Order-Consistency and Path Distance-Coherence Bosagh Zadeh, Ben-David, UAI 2009

22 UNIQUENESS THEOREM Theorem (This work) Single-Linkage is the only clustering function satisfying Order-Consistency and Path-Distance-Coherence Is Path-Distance-Coherence doing all the work? No. Consistency is necessary for uniqueness k-Richness is necessary “X is Necessary”: All other axioms/properties satisfied, just X missing, still not enough to get uniqueness Bosagh Zadeh, Ben-David, UAI 2009

23 PRACTICAL CONSIDERATIONS Single-Linkage is not always the right function to use. Because Path-Distance-Coherence is not always desirable. It’s not always immediately obvious when we want a function to focus on the Path Distance Introduce a different formulation involving Minimum Spanning Trees Bosagh Zadeh, Ben-David, UAI 2009

24 20 PROPERTIES - MST-COHERENCE F If Then  MST-Coherence dd’ k If two datasets d and d’ have the same Minimum Spanning Tree then for all k, F(d, k)=F(d’, k), 2 F 20, 2 Bosagh Zadeh, Ben-David, UAI 2009

25 A TAXONOMY OF CLUSTERING FUNCTIONS Min-Sum satisfies neither MST-Coherence nor Order-Consistency Future work: Characterize other clustering functions Bosagh Zadeh, Ben-David, UAI 2009

26 THANKS FOR YOUR ATTENTION! Bosagh Zadeh, Ben-David, UAI 2009

27 ASIDE: MINIMUM SPANNING TREES Spanning Tree: Tree Sub-graph of original graph which touches all nodes. Weight of tree is equal to sum of all edge weights. Spanning Trees ordered by weight, we are interested in the Minimum Spanning Tree Picture: Wikipedia Bold: Minimum Spanning Tree of the graph Bosagh Zadeh, Ben-David, UAI 2009

28 PROOF OUTLINE: CHARACTERIZATION OF SINGLE- LINKAGE 1. Start with arbitrary d, k 2. By k-Richness, there exists a d 1 such that F(d 1, k) = SL(d, k) 3. Through a series of Consistent transformations, can transform d 1 into d 6, which will have the same MST as d 4. Invoke MST-Coherence to get F(d 1, k) = F(d 6, k) = F(d, k) = SL(d, k) Bosagh Zadeh, Ben-David, UAI 2009

29 KLEINBERG’S IMPOSSIBILITY RESULT There exist no clustering function all 3 properties Proof: Scaling up Consistency Bosagh Zadeh, Ben-David, UAI 2009

30 AXIOMS AS A TOOL FOR A TAXONOMY OF CLUSTERING PARADIGMS The goal is to generate a variety of axioms (or properties) over a fixed framework, so that different clustering approaches could be classified by the different subsets of axioms they satisfy. Scale Invariance k-RichnessConsistencySeparabilityOrder Invariance Hier- archy Single Linkage ++++++ Center Based ++++- Spectral ++--- MDL ++- Rate Distortion ++- “Axioms” “Properties” Bosagh Zadeh, Ben-David, UAI 2009

31 PROPERTIES Order-Consistency Function only compares distances together, not using absolute value Minimum Spanning Tree Coherence If two datasets d and d’ have the same Minimum Spanning Tree, then for all k, F(d, k) = F(d’, k) Function makes all its decisions using the Minimum Spanning Tree Bosagh Zadeh, Ben-David, UAI 2009

32 SOME MORE EXAMPLES Bosagh Zadeh, Ben-David, UAI 2009

33 AXIOMS - SCALE INVARIANCE F If F Then  Scale Invariance F(λd)=F(d)d λ F(λd)=F(d) for all d and all strictly positive λ. 3 6 e.g. double the distances Bosagh Zadeh, Ben-David, UAI 2009

34 AXIOMS - RICHNESS F F F … Etc. can get all partitionings of the points  Richness F(d)d The range of F(d) over all d is the set of all possible partitionings Bosagh Zadeh, Ben-David, UAI 2009

35 AXIOMS - CONSISTENCY  Consistency d’d F(d) If d’ equals d except for shrinking distances within clusters of F(d) or stretching between-cluster distances, F(d)=F(d’). then F(d)=F(d’). F If Then F Bosagh Zadeh, Ben-David, UAI 2009

36 PROPERTIES - ORDER- CONSISTENCY F If Then  Order-Consistency dd’ k If two datasets d and d’ have the same ordering of the distances, then for all k, F(d, k)=F(d’, k) 3 0.5 5 F 35 3 Maintain edge ordering, 2 Bosagh Zadeh, Ben-David, UAI 2009


Download ppt "Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING."

Similar presentations


Ads by Google