Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology HE-Tree: a framework for detecting changes in clustering.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology HE-Tree: a framework for detecting changes in clustering."— Presentation transcript:

1 Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology HE-Tree: a framework for detecting changes in clustering structure for categorical data streams Keke Chen · Ling Liu VLDB, Vol.18, 2009, pp. 1241–1260 Presenter : Wei-Shen Tai 2010/8/4

2 N.Y.U.S.T. I. M. Intelligent Database Systems Lab 2 Outline Introduction Entropy-based categorical clustering BKPlot for determining the “Best K” for categorical clustering HE-Tree: capturing cluster entropy of the categorical data stream A monitoring framework based on the HE-Tree Experiments Conclusion Comments

3 N.Y.U.S.T. I. M. Intelligent Database Systems Lab 3 Motivation Problems of clustering categorical data streams  None addressed the problems of monitoring the change of clustering structure in categorical data streams.  Most methods often assume a fixed number of clusters in the data stream.

4 N.Y.U.S.T. I. M. Intelligent Database Systems Lab 4 Objective Hierarchical Entropy Tree structure (HE-Tree)  It captures the entropy characteristics of clusters in a data stream, and detects the change of Best K.

5 N.Y.U.S.T. I. M. Intelligent Database Systems Lab 5 Entropy-based categorical clustering  Classical entropy definition  Optimal partition, Minimizing the weighted entropy of cluster C k  Incremental entropy(IE) After merging two clusters in a partition, the expected entropy should not be reduced.  Minimizing the expected entropy criterion in clustering

6 N.Y.U.S.T. I. M. Intelligent Database Systems Lab 6 BKPlot for determining the “Best K” for categorical clustering BKPlot method  Determines the candidate best K for static datasets. Investigates the entropy difference between any two optimal neighboring partitions. Second-order difference  ACE (entropy-based agglomerative hierarchical clustering) Generates such high-quality approximate BKPlots.

7 N.Y.U.S.T. I. M. Intelligent Database Systems Lab 7 ACE IE (incremental entropy)  It is a natural inter-cluster similarity measure, ready for constructing a hierarchical clustering algorithm.  summary table for conveniently counting occurrences of values  M-table for bookkeeping M(Cp, Cq ) of any pair of clusters Cp and Cq.  M-heap for maintaining the minimum M value in each step. EducationWork Elementary schoolEngineering High schoolTeaching university EducationWork 27 58 7 25778

8 N.Y.U.S.T. I. M. Intelligent Database Systems Lab 8 HE-Tree: capturing cluster entropy of the categorical data stream  Find the most similar sub-tree to sample e  Growing stage If M(e, e i ) = 0 then e is merged to entry e i Else  If leaf-node has empty entry then e is assigned to an empty one  Else spilt leaf-node  Absorbing stage e is merged to entry e i with min M (e, e i )

9 N.Y.U.S.T. I. M. Intelligent Database Systems Lab 9 A monitoring framework based on the HE-Tree  Time-decaying HE-Tree Let the decaying rate λ, 0 < λ < 1, represent the proportion of the information that is preserved from the last window. (record number, summary table and M-table)  Extended ACE It takes sub-clusters as input and consecutively merges the pair of clusters.

10 N.Y.U.S.T. I. M. Intelligent Database Systems Lab 10 Experiments - detecting changes

11 N.Y.U.S.T. I. M. Intelligent Database Systems Lab 11 Effect of the time-decaying HE-Tree

12 N.Y.U.S.T. I. M. Intelligent Database Systems Lab 12 Conclusion HE-Tree  Detects the change of clustering structure in categorical data streams.  A time-decaying HE-tree makes the framework more sensitive to recently emerging clustering structures.

13 N.Y.U.S.T. I. M. Intelligent Database Systems Lab 13 Comments Advantage  This proposed scheme provides a solution for detecting changes of categorical data streams.  This entropy-based HE-tree and its decaying ideas can be accepted intuitively. Drawback  Due to summary table cannot handle mixed-type data in the same time, This proposed method only was applied to categorical data streams.  Is the decaying processes still necessary once the fixed-interval window is changed to a moving window? Application  Categorical data stream clustering


Download ppt "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology HE-Tree: a framework for detecting changes in clustering."

Similar presentations


Ads by Google