Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Chung-hung.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Chung-hung."— Presentation transcript:

1 Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Chung-hung Cheng, Ada Wai-chee Fu, Yi Zhang Entropy-based Subspace Clustering for Mining Numerical Data ACM 1999

2 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Outline Motivation Objective Introduction Related Work Criteria of Subspace clustering Entropy-based Method Entropy vs the Clustering Criteria Algorithm Experiments Conclusions

3 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation Real-life databases contain many attributes. Most traditional clustering methods are shown to handle problem sizes of several hundreds to several thousands transactions.

4 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objective Propose a method that gives reasonable performance on high dimensionality and large data sets.

5 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction A good clustering algorithm need: Handle arbitrary shapes for clusters Do not make assumptions about distribution of data Not be sensitive to the outliers Not require input parameters Convey the resulting clusters to the users

6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction A solution to the above problem would consist of the following steps: (1) Find the subspaces with good clustering (2) Identify the clusters in the selected subspaces. (3) present the result to the users

7 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Related Work CLIQUE is the only published algorithm that satisfied to identify clusters embedded in subspaces of datasets. Two parameters, ξand τ. Partition every dimension into ξintervals of equal length (unit). A unit is dense if data points contained in it is > τ

8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Related Work Clusters are unions of connected dense units. To reduce the search space we used a bottom- up algorithm. If a collection of points S is a cluster in k dimensional space, then S is also part of a cluster in (k-1) dimensional projections of space.

9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Related Work Example

10 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Criteria of Subspace Clustering High coverage. High density Correlation of dimensions

11 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Entropy-based Method Entropy is defined as following: Calculation of Entropy where d(x) be the density of a cell x in terms of the percentage of data contained in x.

12 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Entropy vs the Clustering Criteria Entropy and the coverage criterion.

13 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Entropy vs the Clustering Criteria We want to establish the relationship that, under certain conditions, the entropy decreases as the coverage increases.

14 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Entropy vs the Clustering Criteria Entropy and the density criterion. Assume that the density of dense units are all equal to α, the density of non-dense units are all equal to p.

15 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Entropy vs the Clustering Criteria Entropy and variable correlation

16 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Algorithm Overall strategy consists of three main steps:

17 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Algorithm A subspace whose entropy is below w is considered to have good clustering. We start with finding 1 dimensional subspace with good clustering, then we use them to generate the can candidate 2 dimensional subspaces.

18 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Algorithm Dimensions Correlation

19 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Algorithm We define the term interest as below: The higher the interest, the stronger the correlation

20 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Algorithm  Downward closure is a pruning property. ─ If a subspace does not satisfy this property, we can cross out all its super-spaces  Upward closure is a constructive property. ─ If a subspace satisfies the property, all its Super-spaces also satisfy this property.

21 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Algorithm

22 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments We use data of 10 dimensions and 300,000 transaction in the experiment.

23 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments Figure 10 & 11

24 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusions We establish some relationship between entropy and the three criteria. We incorporates the idea of using a pair of downward and upward closure properties which is shown effective in the reduction of the search space.

25 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Personal Opinion ……


Download ppt "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Chung-hung."

Similar presentations


Ads by Google