Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A k-mean clustering algorithm for mixed numeric and categorical.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A k-mean clustering algorithm for mixed numeric and categorical."— Presentation transcript:

1 Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A k-mean clustering algorithm for mixed numeric and categorical data Presenter : Shao-Wei Cheng Authors : Amir Ahmad, Lipika Dey DKE 2007

2 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Motivation Objective Methodology Experiments Conclusion Comments

3 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation 3 The traditional k-mean algorithm is limited to numeric data. The Huang’s cost algorithm tried to cluster mixed numeric and categorical data The cluster center is represented by the mode of the cluster. Use the binary distance between two categorical attribute values. The significance(weight) of numeric attribute is taken to be 1, and γ j is a user-defined parameter.

4 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 4 Objectives This paper attempts to alleviate the short-comings of Huang’s cost algorithm. Propose a new representation for the cluster center. Computing distance between two categorical values by the overall distribution of categorical attribute. The parameter is defined by the contribution of a categorical attribute.

5 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Cost function The Huang’s cost algorithm The proposed cost algorithm 5 Methodology The distance between De Niro and Stewart is ?

6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 6 Methodology

7 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 7 Methodology Significance of numeric attribute The numeric attributes need to be discretized. equal width discretization

8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 8 Methodology Algorithm  Initialization. ‚ Computing the cluster centers. ƒ Assign the data element to the cluster whose center is closest to it „ Repeat 2 and 3, until clusters do not change or for a fixed number of iterations.

9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Evaluation method Data sets Iris – all numeric attributes Vote – all categorical attributes Heart disease data – mixed data set Australian credit data – mixed data set Experiments 9

10 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments 10

11 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusion 11 This paper introduced a new distance measure for categorical attribute values and proposed a modified k- mean algorithm for clustering mixed data sets. The results obtained with this algorithm over a number of real-world data sets are highly encouraging. Future work Other methods for discretizing numeric valued attributes. Other implementations of k-mean algorithm.

12 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 12 Comments Advantage  The view of overall attributes is good. Drawback  … Application  Mixed data sets clustering.


Download ppt "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A k-mean clustering algorithm for mixed numeric and categorical."

Similar presentations


Ads by Google