Presentation is loading. Please wait.

Presentation is loading. Please wait.

DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation Alexander Hinneburg Martin-Luther-University Halle-Wittenberg, Germany Hans-Henning Gabriel.

Similar presentations


Presentation on theme: "DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation Alexander Hinneburg Martin-Luther-University Halle-Wittenberg, Germany Hans-Henning Gabriel."— Presentation transcript:

1 DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation Alexander Hinneburg Martin-Luther-University Halle-Wittenberg, Germany Hans-Henning Gabriel 101tec GmbH, Halle, Germany

2 Overview Density-based clustering and DENCLUE 1.0 Hill climbing as EM-algorithm Identification of local maxima Applications of general EM-acceleration Experiments

3 Density-Based Clustering Assumption –clusters are regions of high density in the data space, How to estimate density? –parametric models mixture models –non-parametric models histogram kernel density estimation

4 Kernel Density Estimation Idea –influence of a data point is modeled by a kernel –density is the normalized sum of all kernels –smoothing parameter h Gaussian Kernel Density Estimate

5 DENCLUE 1.0 Framework Clusters are defined by local maxima of the density estimate –find all maxima by hill climbing Problem –const. step size Gradient Hill Climbing const. step size

6 Problem of const. Step Size Not efficient –many unnecessary small steps Not effective –does not converge to a local maximum just comes close Example

7 New Hill Climbing Approach General approach –differentiate density estimate and set to zero –no solution, but can be used for iteration

8 New DENCLUE 2.0 Hill Climbing Efficient –automatically adjusted step size at no extra costs Effective –converges to local maximum (proof follows) Example

9 Proof of Convergence Cast the problem of maximizing kernel denstiy as maximizing the likelihood of a mixture model Introduce hidden variable

10 Proof of Convergence Complete likelihood is maximized by EM-Algorithm this also maximizes the original likelihood, which is the kernel density estimate When starting the EM with we do the hill climbing for E-Step M-Step

11 Identification of local Maxima EM-Algorithm iterates until –reached end point –sum of k last step sizes Assumption –true local maximum is in a ball of around Points with end points closer belong to the same maximum M In case of non-unique assignment do a few extra EM iterations

12 Acceleration Sparse EM –update only the p% points with largest posterior –saves 1-p% of kernel computations after first iteration Data Reduction –use only %p of the data as representative points –random sampling –kMeans

13 Experiments Comparison of DENCLUE 1.0 (FS) vs. 2.0 (SSA) 16-dim. artificial data both methods are tuned to find the correct clustering

14 Experiments Comparison of acceleration methods

15 Experiments Clustering quality (normalized mutual information, NMI) vs. sample size (RS)

16 Experiments Cluster Quality (NMI) of DENCLUE 2.0 (SSA) and acceleration methods and k-Means on real data sample sizes 0.8, 0.4, 0.2

17 Conclusion New hill climbing for DENCLUE Automatic step size adjustment Convergence proof by reduction to EM Allows the application of general EM accelerations Future work –automatic setting of smoothing parameter h (so far tuned manually)

18 Thank you for your attention!


Download ppt "DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation Alexander Hinneburg Martin-Luther-University Halle-Wittenberg, Germany Hans-Henning Gabriel."

Similar presentations


Ads by Google