Presentation is loading. Please wait.

Presentation is loading. Please wait.

GRAPH-BASED HIERARCHICAL CONCEPTUAL CLUSTERING by Istvan Jonyer, Lawrence B. Holder and Diane J. Cook The University of Texas at Arlington.

Similar presentations


Presentation on theme: "GRAPH-BASED HIERARCHICAL CONCEPTUAL CLUSTERING by Istvan Jonyer, Lawrence B. Holder and Diane J. Cook The University of Texas at Arlington."— Presentation transcript:

1 GRAPH-BASED HIERARCHICAL CONCEPTUAL CLUSTERING by Istvan Jonyer, Lawrence B. Holder and Diane J. Cook The University of Texas at Arlington

2 Outline What is hierarchical conceptual clustering? Overview of Subdue Conceptual clustering in Subdue Evaluation of hierarchical clusterings Experiments and results Conclusions

3 What is clustering?

4 What is hierarchical conceptual clustering? Unsupervised concept learning Generating hierarchies to explain data Applications – Hypothesis generation and testing – Prediction based on groups – Finding taxonomies

5 Example hierarchical conceptual clustering Animals BodyTemp: unregulated HeartChamber: four BodyTemp: regulated Fertilization: internal Fertilization: external Name: mammal BodyCover: hair Name: bird BodyCover: feathers Name: reptile BodyCover: cornified-skin HeartChamber: imperfect-four Fertilization: internal Name: fish BodyCover: scales HeartChamber: two Name: amphibian BodyCover: moist-skin HeartChamber: three

6 The Problem Hierarchical conceptual clustering in discrete-valued structural databases Existing systems: – Continuous-valued – Discrete but unstructured – We can do better! (Field under explored)

7 Related Work Cobweb Labyrinth AutoClass Snob In Euclidian space: Chameleon, Cure Unsupervised learning algorithms

8 The Solution Take Subdue and extend it!

9 Overview of Subdue Data mining in graph representations of structural databases A C BD A C BD F E f c b a d e a b c g

10 Overview of Subdue Iteratively searching for best substructure by MDL heuristic A C BD c b a

11 Overview of Subdue Compress using best substructure S S F E f d e g

12 Overview of Subdue Fuzzy match – Inexact matching of subgraphs – Applications: Defining fuzzy concepts Evaluation of clusterings

13 Conceptual Clustering with Subdue Use Subdue to identify clusters – The best subgraph in an iteration defines a cluster When to stop within an iteration? 1) Use –limit option 2) Use –size option 3) Use first minimum heuristic (new)

14 The First Minimum Heuristic Use subgraph at first local minimum – Detect it using –prune2 option

15 The First Minimum Heuristic Not a greedy heuristic! – Although first local minimum is usually the global minimum – First local minimum is caused by a smaller, more frequently occurring subgraph – Subsequent minima are caused by bigger, less frequently occurring subgraphs => First subgraph is more general

16 The First Minimum Heuristic A multi-minimum search space:

17 Lattice vs. Tree Previous work defined classification trees – Inadequate in structured domains Better hierarchical description: classification lattice – A cluster can have more than one parent – A parent can be at any level (not only one level above)

18 Hierarchical Clustering in Subdue Subdue can compress by a subgraph after each iteration Subsequent clusters may be defined in terms of previously defined clusters This results in a hierarchy

19 Hierarchical Conceptual Clustering of an Artificial Domain

20 Root

21 Evaluation of Clusterings Traditional evaluation: – Not applicable to hierarchical domains No known evaluation for hierarchical clusterings – Most hierarchical evaluations are anecdotal

22 New Evaluation Heuristic for Hierarchical Clusterings Properties of a good clustering: – Small number of clusters Large coverage  good generality – Big cluster descriptions More features  more inferential power – Minimal or no overlap between clusters More distinct clusters  better defined concepts

23 New Evaluation Heuristic for Hierarchical Clusterings Big clusters: bigger distance between disjoint clusters Overlap: less overlap  bigger distance Few clusters: averaging comparisons

24 Experiments and Results Validation in an artificial domain Validation in unstructured domains Comparison to existing systems Real world applications

25 The Animal Domain NameBody Cover Heart ChamberBody Temp.Fertilization mammalhairfourregulatedinternal birdfeathersfourregulatedinternal reptilecornified-skinimperfect-fourunregulatedinternal amphibianmoist-skinthreeunregulatedexternal fishscalestwounregulatedexternal animal hair mammal BodyCover Fertilization HeartChamber BodyTemp internalregulated Name four

26 Hierarchical Clustering of the Animal Domain Animals BodyTemp: unregulated HeartChamber: four BodyTemp: regulated Fertilization: internal Fertilization: external Name: mammal BodyCover: hair Name: bird BodyCover: feathers Name: reptile BodyCover: cornified-skin HeartChamber: imperfect-four Fertilization: internal Name: fish BodyCover: scales HeartChamber: two Name: amphibian BodyCover: moist-skin HeartChamber: three

27 Hierarchical Clustering of the Animal Domain by Cobweb animals amphibian/fish mammal/bird reptile mammalbird fishamphibian

28 Comparison of Subdue and Cobweb Quality of Subdue’s lattice (tree): 2.60 Quality of Cobweb’s tree: 1.74 Therefore Subdue is better Reasons for a higher score: – Better generalization resulting in less clusters – Eliminating overlap between (reptile) and (amphibian/fish)

29 Chemical Application: Clustering of a DNA sequence

30 Coverage – 61% – 68% – 71% DNA O | O == P — OH C — NC — C \ O | O == P — OH | O | CH 2 C \ N — C \ C O \ C / \ C — C N — C / \ O C

31 Conclusions Goal of hierarchical conceptual clustering of structured databases was achieved Synthesized classification lattice Developed new evaluation heuristic for hierarchical clusterings Good performance in comparison to other systems, even in unstructured domains

32 Future Work More experiments on real-world domains Comparison to other systems Incorporation of evaluation tool into Subdue


Download ppt "GRAPH-BASED HIERARCHICAL CONCEPTUAL CLUSTERING by Istvan Jonyer, Lawrence B. Holder and Diane J. Cook The University of Texas at Arlington."

Similar presentations


Ads by Google