Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.

Slides:



Advertisements
Similar presentations
Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition.
Advertisements

Conceptual Clustering
DECISION TREES. Decision trees  One possible representation for hypotheses.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Iterative Optimization and Simplification of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial.
K Means Clustering , Nearest Cluster and Gaussian Mixture
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Artificial Intelligence Adversarial search Fall 2008 professor: Luigi Ceccaroni.
Experiments on Query Expansion for Internet Yellow Page Services Using Log Mining Summarized by Dongmin Shin Presented by Dongmin Shin User Log Analysis.
Introduction to Bioinformatics
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
L16: Micro-array analysis Dimension reduction Unsupervised clustering.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Iterative Optimization and Simplification of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial.
Basic Data Mining Techniques
Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
Gene Expression 1. Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC EPCLUST 2.
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Clustering Unsupervised learning Generating “classes”
Evaluating Performance for Data Mining Techniques
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Data mining and machine learning A brief introduction.
Surface Simplification Using Quadric Error Metrics Michael Garland Paul S. Heckbert.
Presented by Tienwei Tsai July, 2005
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
Computational Intelligence: Methods and Applications Lecture 30 Neurofuzzy system FSM and covering algorithms. Włodzisław Duch Dept. of Informatics, UMK.
Ch10 Machine Learning: Symbol-Based
tch?v=Y6ljFaKRTrI Fireflies.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Self Organization of a Massive Document Collection Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Teuvo Kohonen et al.
Clustering I. 2 The Task Input: Collection of instances –No special class label attribute! Output: Clusters (Groups) of instances where members of a cluster.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Clustering.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Flat clustering approaches
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.
CLUSTERING PARTITIONING METHODS Elsayed Hemayed Data Mining Course.
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Adversarial Search 2 (Game Playing)
Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )
Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Jian-Lin Kuo Author : Aristidis Likas Nikos Vlassis Jakob J.Verbeek 國立雲林科技大學 National Yunlin.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.
Data Mining and Text Mining. The Standard Data Mining process.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Unsupervised Learning
Semi-Supervised Clustering
Data Science Algorithms: The Basic Methods
Iterative Deepening A*
Announcements Homework 3 due today (grace period through Friday)
Clustering.
Text Categorization Berlin Chen 2003 Reference:
Data Mining CSCI 307, Spring 2019 Lecture 24
Unsupervised Learning
Presentation transcript:

Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence Research 4 (1996) Presentation: Yugong Cheng 04/23/02

2 Outline Introduction Objective Function Iterative Optimization Methods and Experiments Simplification of Hierarchical Clustering Conclusion Final Exam Questions Summary

3 Introduction Clustering is a process of unsupervised learning, which groups objects into clusters. Major Clustering Methods –Partitioning –Hierarchical –Density-based –Grid-based –Model-based

4 Introduction (Continued) Clustering systems differ in objective function control strategy Usually a search strategy cannot be both computationally inexpensive and give any guarantee about the quality.

5 Introduction (Continued)  This paper discusses the use of iterative optimization and simplification to construct clusters that satisfy both conditions: High quality Computationally inexpensive  The suggested method involves two steps: Constructing a clustering inexpensively Using an iterative optimization method to improve the clustering

6 Category Utility CU(C K ) = P(C k )  [P(A i = V ij |C K ) 2 -P(A i = V ij ) 2 ] PU({C 1, C 2, … C N }) =  k CU(C K )/N Where an observation is a vector of V ij along attributes(or variables) A i This measure rewards clusters C k, that increases the predictability of V ij within C k (i.e. P(A i =V ij |C k )) relative to their predictability in the population as a whole (i.e. P(A i = V ij ))

7

8 Hierarchical Sorting Given an observation and current partition, evaluate the quality of the clusterings that result from –Placing the observation in each of the existing clusters –Creating a new cluster that only covers the new observation Select the option that yields the highest quality score (PU)

9

10 Iterative Optimization Methods Reorder-resort (Cluster/2): seed selection, reordering, and re-clustering. Iterative redistribution of single observation: moving single observation one by one. Iterative hierarchical redistribution: moving clusters together with its sub-tree.

11 Reorder-resort (k-mean) k-mean: k random seeds are selected, and k clusters are growing around these attractors; the centroids of the clusters are picked as new seeds, new clusters are growing. The process iterates until there is no further improvement in the quality of generated clustering.

12 Reorder-resort (k-mean) Ordering data to make consecutive observations dissimilar based on Euclidean distance leads to good clusterings Extracting biased “dissimilarity” ordering from the hierarchical clustering Initial sorting, extraction dissimilarity ordering, re-clustering

13 Iterative Redistribution of Single Observations Moves single observations from cluster to cluster A cluster contains only one observation is removed and its single observation is resorted Iterate until two consecutive iterations yield the same clustering

14 The ISODATA algorithm determines a target cluster for each observation but does not move the cluster until targets for all observations have been determined A sequential version that moves each observation as its target is identified through sorting Single Observation Redistribution Variations

15 Iterative Hierarchical Redistribution Takes large steps in the search for a better clustering Resorts sub-tree instead of single observation Tree removal requires that the various counts of ancestors’ be decremented. Also, the host cluster’s variable value counts needs to be incremented.

16 Scheme Given an existing hierarchical clustering, a recursive loop examines sibling clusters in the hierarchy in a depth first fashion. An inner, iterative loop examines each sibling based on the objective function. And repeats until two consecutive iterations lead to the same set of siblings.

17 (Continued) The recursive loop then turns its attention to the children of each of these remaining siblings. Finally the leaves will be reached and resorted. The recursive loop will be applied several times until there are no changes that occur from one pass to the next.

18

19 Experiment conditions –The initial clustering is generated by hierarchical sorting on random ordering observations similarity ordering observations, which samples observations within the same region before sampling observations from differing regions. –Optimization strategies are applied –Assume the primary goal of clustering is to discover a single-level partitioning of the data that is of optimal quality

20 Comparison between Iterative Optimization Strategies

21 Main findings from the Table: Hierarchical redistribution achieves the highest mean PU scores in both the random and similarity case in 3 of 4 domains. Reordering and re-clustering comes closest to hierarchical redistribution’s performance in all cases, better it in 1 domain. Single-observation redistribution modestly improves an initial sort, and is substantially worse than the other two optimization methods.

22 Time requirements

23 Level of Tree

24 Simplifying Hierarchical Clustering Simplify hierarchical clustering and minimize classification cost Minimize Error Rate Validation set to identify the frontier of clusters for prediction of each variable Node lies below the frontier of every variable would be pruned

25 Validation For each variable, A i, the objects from the validation set are each classified through the hierarchical clustering with the value of variable A i “masked” for purposes of classification. At each cluster encountered during classification the observation’s value for A i is compared to the most probable value for A i at the cluster. A Count of all correct predictions for each variable at a cluster is maintained. A preferred frontier for each variable is identified that maximizes the number of correct counts for the variable.

26

27

28 Concluding Remarks There are three phases in searching the space of hierarchical clusterings: –Inexpensive generation of an initial clustering –Iterative optimization for clusterings –Retrospective simplification of generated clusterings The new method, hierarchical redistribution optimization works well.

29 Final Exam Questions 1.The main idea of the paper is to construct clusterings which satisfy two conditions, 1) name the conditions, 2) name the two steps to satisfy the conditions 1) To construct clusterings that satisfy both conditions: high quality and computationally inexpensive 2) First constructs a clustering inexpensively (hierarchical sorting), then uses an iterative optimization method to improve the quality of clustering (reorder- resort, iterative single redistribution, hierarchical redistribution).

Final Exam Question 2. Describe the three iterative methods for clustering optimization: Reorder-resort (k-mean): Extracting biased “dissimilarity” ordering from the initial hierarchical clustering, then performing k-mean partitioning iteratively. Iterative redistribution of single observation: moving single observation one by one. A cluster contains only one observation is removed and its single observation is resorted. Iterating until two consecutive iterations yield the same clustering. Hierarchical redistribution: Takes large steps in the search for a better clustering. It resorts sub-tree instead of single observation. Given an existing hierarchical clustering, a recursive loop examines sibling clusters in the hierarchy in a depth first fashion. An inner, iterative loop examines each sibling based on the objective function. And repeats until two consecutive iterations lead to the same set of siblings The recursive loop then turns its attention to the children of each of these remaining siblings. Finally the leaves will be reached and resorted. The recursive loop will be applied several times until there are no changes that occur from one pass to the next.

31 Final Exam Question 3. (1) The cluster is better when the relative CU score is a) big, b) small, c) equal to 0. The cluster is better with a higher CU score. So choose a). (2) Which sorting method is better? a) random sorting, b) similarity sorting. Dissimilar ordering will yield better clustering, so random sorting of samples will be better. Choose a).