Iterative Optimization and Simplification of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial.

Slides:



Advertisements
Similar presentations
Data Mining Classification: Alternative Techniques
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Fast Algorithms For Hierarchical Range Histogram Constructions
Paper By - Manish Mehta, Rakesh Agarwal and Jorma Rissanen
RIPPER Fast Effective Rule Induction
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Computational Methods for Management and Economics Carla Gomes Module 8b The transportation simplex method.
Experiments on Query Expansion for Internet Yellow Page Services Using Log Mining Summarized by Dongmin Shin Presented by Dongmin Shin User Log Analysis.
Cluster Analysis.
SASH Spatial Approximation Sample Hierarchy
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Mutual Information Mathematical Biology Seminar
Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
4. Ad-hoc I: Hierarchical clustering
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Iterative Optimization and Simplification of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial.
Prof. Bodik CS 164 Lecture 171 Register Allocation Lecture 19.
Tree Clustering & COBWEB. Remember: k-Means Clustering.
Register Allocation (via graph coloring)
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Expectation-Maximization
1 Topology Design of Structured Campus Networks by Habib Youssef Sadiq M. SaitSalman A. Khan Department of Computer Engineering King Fahd University of.
Branch and Bound Algorithm for Solving Integer Linear Programming
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
“A Comparison of Document Clustering Techniques” Michael Steinbach, George Karypis and Vipin Kumar (Technical Report, CSE, UMN, 2000) Mahashweta Das
Clustering Unsupervised learning Generating “classes”
Evaluating Performance for Data Mining Techniques
Data Mining Chun-Hung Chou
Presented by Tienwei Tsai July, 2005
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
Particle Filters for Shape Correspondence Presenter: Jingting Zeng.
SCATTER/GATHER : A CLUSTER BASED APPROACH FOR BROWSING LARGE DOCUMENT COLLECTIONS GROUPER : A DYNAMIC CLUSTERING INTERFACE TO WEB SEARCH RESULTS MINAL.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.
1 Discovering Robust Knowledge from Databases that Change Chun-Nan HsuCraig A. Knoblock Arizona State UniversityUniversity of Southern California Journal.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Unsupervised learning introduction
UNSUPERVISED LEARNING David Kauchak CS 451 – Fall 2013.
Clustering I. 2 The Task Input: Collection of instances –No special class label attribute! Output: Clusters (Groups) of instances where members of a cluster.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Sparse Signals Reconstruction Via Adaptive Iterative Greedy Algorithm Ahmed Aziz, Ahmed Salim, Walid Osamy Presenter : 張庭豪 International Journal of Computer.
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
2015/12/251 Hierarchical Document Clustering Using Frequent Itemsets Benjamin C.M. Fung, Ke Wangy and Martin Ester Proceeding of International Conference.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
Branch and Bound Algorithms Present by Tina Yang Qianmei Feng.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Clustering (1) Chapter 7. Outline Introduction Clustering Strategies The Curse of Dimensionality Hierarchical k-means.
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.
Constraint Programming for the Diameter Constrained Minimum Spanning Tree Problem Thiago F. Noronha Celso C. Ribeiro Andréa C. Santos.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Jian-Lin Kuo Author : Aristidis Likas Nikos Vlassis Jakob J.Verbeek 國立雲林科技大學 National Yunlin.
Color Image Segmentation Mentor : Dr. Rajeev Srivastava Students: Achit Kumar Ojha Aseem Kumar Akshay Tyagi.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Semi-Supervised Clustering
Fast Effective Rule Induction
Iterative Deepening A*
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Computer Science cpsc322, Lecture 14
A weight-incorporated similarity-based clustering ensemble method based on swarm intelligence Yue Ming NJIT#:
Clustering.
Cluster Analysis.
Presentation transcript:

Iterative Optimization and Simplification of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence Research, 4 (1996) Presented by: Biyu Liang ('06), Paul Haake ('07)

2 Outline Introduction Fast but Rough Clustering: Hierarchical Sorting Iterative Optimization Methods and Comparison Simplification of Hierarchical Clustering Conclusion

3 Introduction  Overview of method: Construct an initial clustering inexpensively Iteratively optimize the clustering using some control strategy Simplify the clustering ➢ Goals: Find high quality clusterings without overfitting Good CPU efficiency

4 Introduction (continued) Properties of any clustering algorithm: objective function: evaluates the quality of a particular clustering on a set of data. control strategy: specifes how the algorithm searches the space of all possible clusterings, given some objective function. In this paper, the authors compare different control strategies using the same objective function.

5 Outline Introduction Fast but Rough Clustering: Hierarchical Sorting Iterative Optimization Methods and Experiments Simplification of Hierarchical Clustering Conclusion

6 Hierarchical Sorting Greedy algorithm to quickly build an initial rough clustering. All three control strategies (discussed later) begin with the clustering generated by hierarchical sorting. By shuffling records around, they improve the clustering.

7 Hierarchical Sorting CU(C K ) = P(C k )  i  j [P(A i = V ij |C K ) 2 -P(A i = V ij ) 2 ]  Clusters whose data records have similar attribute values have a higher CU score. Objective function = the “partition utility” (PU), the average CU value over all clusters.

8 Hierarchical Sorting Start with an empty clustering and add each data record one at a time For each record being added, there are two choices:  Place the record in some existing cluster in the hierarchy  Place the record in a new cluster Select the option that yields the highest quality score (PU)

9

10

11 Outline Introduction Fast but Rough Clustering: Hierarchical Sorting Iterative Optimization Methods and Comparison Simplification of Hierarchical Clustering Conclusion

12 Iterative Optimization Methods Important note:  The primary goal of clustering in this paper is to obtain a single-level partitioning of optimal quality. Hierarchical clustering is used only as an intermediate means toward that end. To evaluate the quality of a solution, the authors therefore only apply the objective function to the first-level partition.

13 Iterative Optimization Methods Reorder-resort (CLUSTER/2): very similar to k-means Iterative redistribution of single observation: reassign each record to a better cluster Iterative hierarchical redistribution: reassign each record or subtree of records to a better cluster

14 Reorder-resort (k-mean) k random seeds are selected, and k clusters are growing around these attractors. The centroids of the clusters are picked as new seeds. The process iterates until there is no further improvement in the quality of generated clustering.

15 Reorder-resort (k-mean) con’t Ordering data to make consecutive observations dissimilar leads to good clusterings. Extract a “dissimilarity” ordering from the hierarchical sorting: consecutive records will tend to be dissimilar.

16 Iterative Redistribution of Single Observations Repeat until the clustering doesn't change:  For every record, remove it from the clustering and resort it beginning at the root

17 Iterative Hierarchical Redistribution Problem: The last control strategy resorts only one record at a time. Solution: Resort entire subtrees of records at a time.

18 Iterative Hierarchical Redistribution Hierarchical-Redistribute-Recurse(SiblingSet)  Repeat until two consecutive clusterings have the same set of siblings: For each sibling in SiblingSet:  Remove the sibling from the hierarchy and resort  SiblingSet ← remaining siblings  For each sibling S in SiblingSet call Hierarchical-Redistribute-Recurse(S.children) Repeat until clustering converges:  Clustering ← Hierarchical-Redistribute- Recurse(Clustering.root.children)

19

20 Main findings from the experiments Hierarchical redistribution achieves the highest mean PU scores in most cases Reordering and re-clustering comes closest to hierarchical redistribution’s performance in all cases Single-observation redistribution modestly improves an initial sort, and is substantially worse than the other two optimization methods

21 Outline Introduction Generating Initial Hierarchical Clustering Iterative Optimization Methods and Comparison Simplification of Hierarchical Clustering Conclusion

22 Simplifying Hierarchical Clustering Higher levels of the hierarchy are meaningful, but lower levels are subject to overfitting. Solution: post-process the hierarchy with validation and pruning.

23 Validation Strategy: Find internal nodes that are most predictive on unseen data (a testing set). What does “predictive” mean in this case? When a data record is classified into a cluster, we want to know how accurately that cluster, in turn, can predict the data record's attribute values. In a high-quality clustering, we expect that an unseen data record, classified into some cluster, will have attribute values similar to the attribute values of other data records in the cluster.

24 Validation For each variable A i :  For each data record: Classify the data record through the cluster hierarchy, beginning at the root, and ignoring the value of A i. At each node, compare the record's A i value to the node's expected A i value; keep a counter of correct predictions for each variable at each node.

25 Validation After processing all variables, for each variable, identify a “frontier” in the hierarchy such that the number of correct predictions of that variable is maximized. If a node lies below the frontier of every variable, then it is pruned.

26

27 Validation The authors' experiments show that their validation method substantially reduces clustering size without diminishing predictive accuracy.

28 Concluding Remarks There are three phases in searching the space of hierarchical clusterings:  Inexpensive generation of an initial clustering  Iterative optimization for clusterings  Post-processing simplification of generated clusterings Experiments found that the new method, hierarchical redistribution optimization, beats the other iterative optimization methods in most cases.

29 Final Exam Question #1 The main idea in this paper is to construct clusterings which satisfy two conditions.  Name the conditions: Consistently constructs high-quality clusterings Computationally inexpensive  Name the two steps to satisfy the conditions: Generate a tentative clustering inexpensively, using hierarchical sorting Iteratively optimize that initial clustering

30 Final Exam Question #2 Describe the three iterative methods for clustering optimization:  Seed Selection, Reordering, and Reclustering (p )  Iterative Redistribution of Single Observations (p. 16)  Iterative Hierarchical Redistribution (p )

31 Final Exam Question #3 The cluster is better when the relative CU score is a) big, b) small, c) equal to 0 Which sorting method is better? a) random sorting, b) similarity sorting

Thanks! Question?