Intelligent Database Systems Lab Advisor ： Dr. Hsu Graduate ： Jian-Lin Kuo Author ： Aristidis Likas Nikos Vlassis Jakob J.Verbeek 國立雲林科技大學 National Yunlin.

Slides:

Advertisements

Similar presentations

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor ： Dr. Hsu Presenter ： Yu Cheng Chen Author: Hichem.

Advertisements

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 On Rival Penalization Controlled Competitive Learning.

Segmentation Divide the image into segments. Each segment:

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Fast exact k nearest neighbors search using an orthogonal search tree Presenter : Chun-Ping Wu Authors.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Unsupervised pattern recognition models for mixed feature-type.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology U*F clustering : a new performant “ clustering-mining ”

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Human eye sclera detection and tracking using a modified.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel genetic algorithm for automatic clustering Advisor.

Intelligent Database Systems Lab Advisor ： Dr. Hsu Graduate ： Chien-Shing Chen Author ： Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin.

Intelligent Database Systems Lab Advisor ： Dr. Hsu Graduate ： Chien-Shing Chen Author ： Byoung-Kee Yi N.D.Sidiropoulos Theodore Johnson 國立雲林科技大學 National.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A data mining approach to the prediction of corporate failure.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology HE-Tree: a framework for detecting changes in clustering.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 The k-means range algorithm for personalized data clustering.

Intelligent Database Systems Lab 1 Advisor ： Dr. Hsu Graduate ： Jian-Lin Kuo Author ： Silvia Nittel Kelvin T.Leung Amy Braverman 國立雲林科技大學 National Yunlin.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comprehensive Comparison Study of Document Clustering.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology On Data Labeling for Clustering Categorical Data Hung-Leng.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A self-organizing neural network using ideas from the immune.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology SIGIR1 Improving Web Search Results Using Affinity Graph.

Intelligent Database Systems Lab Advisor ： Dr. Hsu Graduate ： Chien-Ming Hsiao Author ： Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor ： Dr. Hsu Presenter ： Keng-Wei Chang Author: Yehuda.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 New Unsupervised Clustering Algorithm for Large Datasets.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A k-mean clustering algorithm for mixed numeric and categorical.

A Fuzzy k-Modes Algorithm for Clustering Categorical Data

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor ： Dr. Hsu Graduate ： Yu Cheng Chen Author: Manoranjan.

國立雲林科技大學 National Yunlin University of Science and Technology Self-organizing map learning nonlinearly embedded manifoldsmanifolds Author :Timo Simila.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 The Evolving Tree — Analysis and Applications Advisor.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 2007.SIGIR.8 New Event Detection Based on Indexing-tree.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Fast accurate fuzzy clustering through data reduction Advisor.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A Novel Density-Based Clustering Framework by Using Level.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Utilizing Marginal Net Utility for Recommendation in E-commerce.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Efficient Optimal Linear Boosting of a Pair of Classifiers.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor ： Dr. Hsu Graduate ： Yu Cheng Chen Author: Chung-hung.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A modified version of the K-means algorithm with a distance.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Authors :

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Model-based evaluation of clustering validation measures.

Intelligent Database Systems Lab Advisor ： Dr. Hsu Graduate ： Chien-Shing Chen Author ： Juan D.Velasquez Richard Weber Hiroshi Yasuda 國立雲林科技大學 National.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Extreme Visualization: Squeezing a Billion Records into a Million Pixels Presenter : Jiang-Shan Wang.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Rival-Model Penalized Self-Organizing Map Yiu-ming Cheung.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Multiclass boosting with repartitioning Graduate : Chen,

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology O( ㏒ 2 M) Self-Organizing Map Algorithm Without Learning.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Enhanced neural gas network for prototype-based clustering.

Chapter 13 (Prototype Methods and Nearest-Neighbors )

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A new data clustering approach- Generalized cellular automata.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A hierarchical clustering algorithm for categorical sequence.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Learning multiple nonredundant clusterings Presenter :

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Direct mining of discriminative patterns for classifying.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Growing Mechanisms and Cluster Identification with TurSOM.

Intelligent Database Systems Lab Advisor ： Dr. Hsu Graduate ： Chien-Shing Chen Author ： Jessica K. Ting Michael K. Ng Hongqiang Rong Joshua Z. Huang 國立雲林科技大學.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor ： Dr. Hsu Graduate ： Yu Cheng Chen Author: Wei Xu,

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A survey of kernel and spectral methods for clustering.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Recognizing Partially Occluded, Expression Variant Faces.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Comparing Association Rules and Decision Trees for Disease.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Hierarchical model-based clustering of large datasets.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Growing Hierarchical Tree SOM: An unsupervised neural.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication.

Intelligent Database Systems Lab Advisor ： Dr. Hsu Graduate ： Yu Cheng Chen Author ： Yongqiang Cao Jianhong Wu 國立雲林科技大學 National Yunlin University of Science.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Dual clustering ： integrating data clustering over optimization.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor ： Dr. Hsu Presenter ： Chien-Shing Chen Author: Gustavo.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Sanghamitra.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Visualizing social network concepts Presenter : Chun-Ping Wu Authors :Bin Zhu, Stephanie Watts, Hsinchun.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor ： Dr. Hsu Graduate ： Yu Cheng Chen Author: Lynette.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor ： Dr. Hsu Graduate ： Chun Kai Chen Author ： Andrew.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Electricity Based External Similarity of Categorical Attributes.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A Nonlinear Mapping for Data Structure Analysis John W.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A New Cluster Validity Index for Data with Merged Clusters.

Intelligent Database Systems Lab Advisor ： Dr. Hsu Graduate ： Ching-Lung Chen Author ： Pabitra Mitra Student Member 國立雲林科技大學 National Yunlin University.

Presentation transcript:

Intelligent Database Systems Lab Advisor ： Dr. Hsu Graduate ： Jian-Lin Kuo Author ： Aristidis Likas Nikos Vlassis Jakob J.Verbeek 國立雲林科技大學 National Yunlin University of Science and Technology The global k-means clustering algorithem Received 23 March 2001; accepted 4 March 2002

Intelligent Database Systems Lab Outline Motivation Objective Literature review Global k-means algorithm Speeding up execution Experiment I,II,III,IV Conclusions N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Motivation The k-means algorithm is a popular clustering method. But it has a serious drawback that its performance heavily depends on the initial starting condition. N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Objective The global k-means clustering algorithm is proposed. It constitutes a deterministic effective global clustering algorithm for the minimization of the cluster error. N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Literature review Clustering error - The most used clustering criterion is the sum of the squared Euclidean distances. where is data point of the set. is the cluster center of the subset. I(X) = 1 if X is true and 0 otherwise. N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Literature review (cont.) K-means algorithm - randomly initialize the cluster centers. - finds locally optimal solutions with respect to the clustering error. - its sensitivity to initial positions of cluster centers is the main disadvantage. N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Literature review (cont.) K-d trees - is a recursive partitioning of the data space into disjoint subspaces. - terminates if a terminal node is created containing less than or equal the bucket size. - k-d tree structure was originally used for speeding up distance-based search operations. N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Global k-means algorithm employs the k-means algorithm as a local search procedure. proceeds in an incremental way attemping to optimally add one new cluster center at each stage. N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Global k-means algorithm (cont.) To solve a clustering problem with M clusters. 1. Starts with one cluster (k=1) and find its optimal position. (the centroid of the data set X) 2. Solve the problem with two clusters(k=2) we perform N executions of k-means algorithm. a) The first cluster center is placed at the optimal position with k=1. b) The sencond center at execution n is placed at the position of the datapoint (n=1…N). c) The solution which is best obtained after the N executions of the k-means algorithm. N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Global k-means algorithm (cont.) 3. We try to find the solution of the k-means algorithm with the k-clustering problem until finding the solution for the (k-1)-clustering problem 4. The best solution obtained from the N runs is considered as the solution of the k-clustering problem. 5. A solution with M clusters is obtained. ※ In general, let denote the final solution for k-clustering problem. N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Speeding-up execution The global k-means algorithm is rather computational heavy assumption. There are several other options ( examing fewer initial positions) : - the fast global k-means algorithm - initialization with k-d trees N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Speeding-up execution (cont.) The fast global k-means algorithm 1. not execute the k-means algorithm for each of the N initial states until convergence to obtain the final clustering error. 2. Computing an upper bound ≤ E To Initialize the new cluster center at the point that minimizes, or maximized. 4. Executes the k-means algorithm to obtain the k-clusters solution. N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Speeding-up execution (cont.) where is for all possible allocation positions E is the error in the (k-1)-clustering problem is the squared distance between and the closest center of the cluster which belongs. N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Speeding-up execution (cont.) Initialization with k-d trees - In this paper, it uses a variational k-means. - use the bucket centers as possible insertion locations for the algorithm presented previously. N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Experiment 1 The introduction of the data set - 10 data sets - each one consisting of 300 data points - each point draw from the 15 Gaussian mixture. Compare three methods for the clustering problem with k=15 centers. N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Experiment 1 (cont.) the solid line with error bar: the fast global k-means algorithm using k-d tree (μ=14.9,σ=1.3) the solid line: the standard k-means algorithm using k-d tree (μ=24.4,σ=9.8) the dashed line: the fast global k-means algorithm (μ=15.7,σ=12) (Variation of the buckets number on the horizontal axis for the fast global global algorithm with k-d tree of the solid line) N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Experiment 1 (cont.) We can conclude - the fast global k-means approach is better than when starting with all centers at the same time initialized using the k-d tree method. - the fast global k-means with k-d tree does not degrade performance significantly (using bucket number larger then the cluster number). N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Experiment 2 The introduction of the data set - the iris data data set: 150 four-dimensional data points - the synthetic data set: 250 two-dimensional data points - the image segmentation data: 210 six-dimensional data points Compare three methods for the clustering problem with k=15 centers. N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Experiment 2 (cont.) Each data set we conducted the follwing experiments: - one run of the global k-means algorithm for M=15 - one run of the fast global k-means algorithm for M=15 - for each k=1…15, the k-means algorithm was executed N times starting from random initial positions for the k centers ( N is the number of data points). N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Experiment 2 (cont.) N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Experiment 2 (cont.) Global k-means algorithm - better or equal the k-means algorithm in all cases. Fast global k-means algorithm - provides solution comparable to original method (significantly fast ). - can run much faster if k-d are employed N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Experiment 3 The objecive is to cluster 16*16 pixel image patched extracted from a set of 37 Brodatz texture images. - each complete texture image consists of 256*256 pixel patches per texture were extracted by random selecting 16*16 windows. - patches originating from the same texture image to form an individual cluster. N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Experiment 3 (cont.) N.Y.U.S.T. I.M. Brodatz texture image

Intelligent Database Systems Lab Experiment 3 (cont.) The introduction of the data set data set. - randomly selecting k of the 37 texture in each data set ( k=2…6 ). - then selecting 200 patches for each texture, resulting in 200k per data set. N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Experiment 3 (cont.) We compared performance of the three algorithms: - k-means initialized using a uniformly selected random subset of the data. - fast global k-mean. - fast global k-means with the insertion locations limited to the top 2k nodes of the correspomding k-d tree. We considered the mean squared clustering error ( MSE ) of the patches to their closest mean. N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Experiment 3 (cont.) << Results for the texture segmentation problem using as many clusters as textures >> Results for the texture segmentation problem using twice as many clusters as textures N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Experiment 3 (cont.) It can be observed - k-means algorithm with random initialization gives the worst results. - the fast global k-means that uses the top 2k nodes of the k-d tree is not only faster than the generic fast global k-means algorithm, but also provides slightly better reskults. N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Experiment 4 The purpose is to compare - the randomly initialized k-means algorithm - ‘greedy’ algorithm : the fast global k-means algorithm that uses the top2k nodes of the corresponding k-d tree ( as canidate insertion locations ). The data have been drawn from randomly Gaussian mixtures. The number of data points in each data set was 50k (k is the number of sources). N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Experiment 4 (cont.) Greedy algoritm is applied first. The randomly initialized k-means algorithm is applied as many times as the run time of the greedy algorithm. There are three variable of the sources: - the number of sources k={2,7,12,17,22} - the dimensionality d={2,4,6,8} of data space - the separation c={0.5,0.6,0.7,0.8,0.9,1.0 } N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Experiment 4 (cont.) ‘min’ is (1-min/u) * 100 ‘ ‘ is (1- /μ)*100 ‘σ‘ is (σ/μ) * 100 ‘trial’ is how many runs of the k-means algorithm where min is the minimum clustering error value of the k-means bold values in the tables to indicate the smallest values. N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Experiment 4 (cont.) ‘min’ is (1-min/u) * 100 ‘ ‘ is (1- /μ)*100 ‘σ‘ is (σ/μ) * 100 ‘trial’ is how many runs of the k-means algorithm where min is the minimum clustering error value of the k-means bold values in the tables to indicate the smallest values. N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Experiment 4 (cont.) It is clear from the experiments: - the benefit of the greedy method becomes large if there are more clusters, the spearation larger, and the dimensionality gets smaller. - The greedy algorithm gives better result in almost all cases. - The number of trials allowed for the random k-means algorithm grows slowly as the number of clusters increase. N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Conclusions The global k-means algorithm - excellent results in terms of the clustering error criterion. - independent of any starting conditions. - compares favorably to the k-means algorithm. Reduce the computational load without significantly affecting solution quality. Future work is related with the use of parallel processing for accelerating the proposed methods. N.Y.U.S.T. I.M.