Download presentation

Presentation is loading. Please wait.

1
Tree Clustering & COBWEB

2
Remember: k-Means Clustering

3
k-Means Example (K=2) Pick seeds Reassign clusters Compute centroids x x Reasssign clusters x x x x Compute centroids Reassign clusters Converged!

5
EM-Clustering

12
Tree clustering Linkage rules Conceptual Clustering COBWEB Category utility

13
Tree Clustering Tree clustering algorithm allow us to reveal the internal similarities of a given pattern set To structure these similarities hierarchically Applied to a small set of typical patterns For n patterns these algorithm generates a sequence of 1 to n clusters

15
The sequence of 1 to n clusters has a form of a binary tree (two branches for each tree node) Tree can be structured bottom up Merging algorithm starting with the individual patterns Splitting algorithm starting with a cluster composed of all patterns

16
Merging Algorithm given n patterns x i consider initially k=n singleton clusters C i ={x i }; /*every cluster has only one element*/ while k ≥ 1 do { determine the two nearest clusters C i and C j using an approximate similarity rule; merge C i and C j : C ij ={C i,C j }, therefore obtaining a solution with k-1 clusters; k=k-1; }

18
The determination of the nearest clusters depends on: the similarity measure the rule used to access the similarity of the clusters

19
Example Similarity between two clusters is assessed by measuring the similarity of the furthest pair of patterns (each one from the distinct cluster) This is the so-called complete linkage rule

20
As the merging process evolves, the similarity of the merged clusters decreases

21
Schedule graph may be of help for selecting the best solution Solutions with very small or even singletons clusters are rather suspicious

22
Linkage rules Complete linkage (FN furthest neighbor) Evaluates the dissimilarity between two clusters as the greatest distance between any two patterns, one from each cluster This rule performs well when the clusters are compact and of equal size Inadequate for filamentary clusters

23
Complete Link Example

24
Single linkage (NN nearest neighbor) Dissimilarity between two clusters as the dissimilarity of the nearest patterns, one from each cluster Produce chaining effect and works well with filamentary shape

25
a globular data b filamentary data

27
Single Link Example

28
Average linkage between groups Also known as UPGMA (un-weighted pair-group method using arithmetic averages) This rule assesses the distance between two clusters as the average of the distances between all pairs of patterns from a distinct cluster

29
Impact of cluster distance measures “Single-Link” (inter-cluster distance= distance between closest pair of points) “Complete-Link” (inter-cluster distance= distance between farthest pair of points)

30
Conceptual Clustering - COBWEB Conceptual Clustering Begins with a collection of unclassified objects and some means of measuring the similarity of objects Numeric taxonomy: representation of objects as a collection of features, each which may have some numerical value Objects a treated by a distance function as a vector of n features

31
bird is defined by the following features: flies, sings, lays eggs, nests in trees, eats insects. bat is defined by the following features: flies, gives milk, eats insects

32
Humans distinguish degrees of category membership We generally think of a robin as a better example of a bird than a chicken Oak is more typical example of a tree than a palm

33
Family resemblance theory (Wittgenstein 1953) Categories are defined by a complex systems of similarities between members A category may not have shared properties by all their members Games: Not all games require two or more players - solitaire (paciacia) Not all games are fun for the players - football Not all games involve competition - jumping rope Game category is well defined

34
Logic, feature vectors or decision trees do not account for these effects COBWEB (Fisher 1987) addresses these issues Models base-level categorization and degrees of category membership Represents categories probabilistically, instead of defining category memberships as a set of values that must be present Builds up a hierarchy (tree)

35
COBWEB represents the probability with which each feature values is present of an object p(f i =v ij |c k ) is the conditional probability with which each feature f i will have a value v ij, given that an object is in category c k

36
Example COBWEB forms a taxonomy (tree, hierarchy) of categories Example: Categorization of four single-cell animals

37
Each animal is defined by number of features Number of tails, color, and number of nuclei Category C3: have a 1.0 probability of having 2 tails, a 0.5 probability of having light color, and a 1.0 probability of having 2 nuclei

38
When given a new example, COBWEB considers the overall quality of either placing the example in an existing category or modifying the hierarchy The criterion COBWEB uses for evaluating the quality of the classification is called category utility

39
Category utility Was developed in research of human categorization (Gluck and Corter 1985) Category utility attempts to maximize both the probability that two objects in the same category have values in common and the probability that objects in different categories will have different property values

40
Category utility This sum is taken across all categories c k, all features f i and all feature values v ij

41
p(f i =v ij |c k ) is called predictability, it is the probability that an object has the value v ij for feature f i given that the object belongs to category c k The higher this probability, the more likely two objects in a category share the same feature values p(c k |f i =v ij ) is called predictiveness is the probability with which an objects belongs to the category c k given it has a value v ij for a feature f i The greater this probability, the less likely objects not in the category will have those values p(f i =v ij ) serves as a weight, frequent features exert a stronger influence

42
By combining these values, high category utility measures indicate a high likelihood that objects in the same category will share properties, while decreasing the likelihood of objects in different categories having properties in common

43
COBWEB performs a hill-climbing search of the space of possible taxonomies (trees) using category utility to evaluate and select possible categorizations

44
Initializes the taxonomy to a single category whose features are those of the first example For each example, the algorithm begins with the root category and moves through the tree At each level is uses category utility to evaluate the taxonomies 1. Placing the example in the best category 2. Adding a new category containing the example 3. Merging two existing categories and adding the example to the category 4. Splitting two existing categories and placing the example into the best category in the tree

46
COBWEB is efficient in producing trees with reasonable number of classes Because is allows probabilistic membership, its categories are flexible and robust

47
Tree clustering Linkage rules Conceptual Clustering COBWEB Category utility

48
Assessment Cluster validation

Similar presentations

© 2021 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google