Topic9: Density-based Clustering

Slides:



Advertisements
Similar presentations
DBSCAN & Its Implementation on Atlas Xin Zhou, Richard Luo Prof. Carlo Zaniolo Spring 2002.
Advertisements

Clustering (2). Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram –A tree like.
Cluster Analysis: Basic Concepts and Algorithms
Hierarchical Clustering, DBSCAN The EM Algorithm
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
Lecture outline Density-based clustering (DB-Scan) – Reference: Martin Ester, Hans-Peter Kriegel, Jorg Sander, Xiaowei Xu: A Density-Based Algorithm for.
Presented by: GROUP 7 Gayathri Gandhamuneni & Yumeng Wang.
DBSCAN – Density-Based Spatial Clustering of Applications with Noise M.Ester, H.P.Kriegel, J.Sander and Xu. A density-based algorithm for discovering clusters.
Density-based Approaches
Cluster Analysis Part III. Learning Objectives Density-Based Methods Grid-Based Methods Model-Based Clustering Methods Outlier Analysis Summary.
Qiang Yang Adapted from Tan et al. and Han et al.
Clustering Prof. Navneet Goyal BITS, Pilani
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Clustering CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
Clustering Methods Professor: Dr. Mansouri
More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.
Chapter 3: Cluster Analysis
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
1 Clustering Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: J.W. Han, I. Witten, E. Frank.
K-Means and DBSCAN Erik Zeitler Uppsala Database Laboratory.
Cluster Analysis.
4. Clustering Methods Concepts Partitional (k-Means, k-Medoids)
An Introduction to Clustering
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Clustering II.
Overview Of Clustering Techniques D. Gunopulos, UCR.
Instructor: Qiang Yang
SCAN: A Structural Clustering Algorithm for Networks
Cluster Analysis.
Cluster Analysis: Basic Concepts and Algorithms
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Clustering IV. Outline Impossibility theorem for clustering Density-based clustering and subspace clustering Bi-clustering or co-clustering.
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
DATA MINING LECTURE 8 Clustering The k-means algorithm
Clustering Part2 BIRCH Density-based Clustering --- DBSCAN and DENCLUE
Advanced Database Technologies
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
An Efficient Approach to Clustering in Large Multimedia Databases with Noise Alexander Hinneburg and Daniel A. Keim.
1 CSE 980: Data Mining Lecture 17: Density-based and Other Clustering Algorithms.
Density-Based Clustering Algorithms
Han/Eick: Clustering II 1 Clustering Part2 continued 1. BIRCH skipped 2. Density-based Clustering --- DBSCAN and DENCLUE 3. GRID-based Approaches --- STING.
Han/Eick: Clustering II 1 Clustering Part2 continued 1. BIRCH skipped 2. Density-based Clustering --- DBSCAN and DENCLUE 3. GRID-based Approaches --- STING.
Clustering.
DBSCAN Data Mining algorithm Dr Veljko Milutinović Milan Micić
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Presented by Ho Wai Shing
Density-Based Clustering Methods. Clustering based on density (local cluster criterion), such as density-connected points Major features: –Discover clusters.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Other Clustering Techniques
CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course.
Clustering By : Babu Ram Dawadi. 2 Clustering cluster is a collection of data objects, in which the objects similar to one another within the same cluster.
Parameter Reduction for Density-based Clustering on Large Data Sets Elizabeth Wang.
1 Similarity and Dissimilarity Between Objects Distances are normally used to measure the similarity or dissimilarity between two data objects Some popular.
DATA MINING: CLUSTER ANALYSIS (3) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
1 Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods Density-Based.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
Data Mining: Basic Cluster Analysis
More on Clustering in COSC 4335
CSE 4705 Artificial Intelligence
Hierarchical Clustering: Time and Space requirements
CSE 5243 Intro. to Data Mining
©Jiawei Han and Micheline Kamber Department of Computer Science
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
Overview Of Clustering Techniques
CSE572, CBS572: Data Mining by H. Liu
CSE572: Data Mining by H. Liu
Lecture 10 Clustering.
CS 685: Special Topics in Data Mining Jinze Liu
Presentation transcript:

Topic9: Density-based Clustering DBSCAN DENCLUE Remark: “short version” of Topic9

Density-Based Clustering Methods Clustering based on density (local cluster criterion), such as density-connected points or based on an explicitly constructed density function Major features: Discover clusters of arbitrary shape Handle noise One scan Need density parameters Several interesting studies: DBSCAN: Ester, et al. (KDD’96) DENCLUE: Hinneburg & D. Keim (KDD’98/2006) OPTICS: Ankerst, et al (SIGMOD’99). CLIQUE: Agrawal, et al. (SIGMOD’98)

DBSCAN (http://www2.cs.uh.edu/~ceick/7363/Papers/dbscan.pdf ) DBSCAN is a density-based algorithm. Density = number of points within a specified radius r (Eps) A point is a core point if it has more than a specified number of points (MinPts) within Eps These are points that are at the interior of a cluster A border point has fewer than MinPts within Eps, but is in the neighborhood of a core point A noise point is any point that is not a core point or a border point.

DBSCAN: Core, Border, and Noise Points

DBSCAN Algorithm (simplified view for teaching) Create a graph whose nodes are the points to be clustered For each core-point c create an edge from c to every point p in the -neighborhood of c Set N to the nodes of the graph; If N does not contain any core points terminate Pick a core point c in N Let X be the set of nodes that can be reached from c by going forward; create a cluster containing X{c} N=N/(X{c}) Continue with step 4 Remark: points that are not assigned to any cluster are outliers;

DBSCAN: Core, Border and Noise Points Original Points Point types: core, border and noise Eps = 10, MinPts = 4

When DBSCAN Works Well Original Points Clusters Resistant to Noise Can handle clusters of different shapes and sizes

When DBSCAN Does NOT Work Well (MinPts=4, Eps=9.75). Original Points Varying densities High-dimensional data (MinPts=4, Eps=9.92)

DBSCAN: Determining EPS and MinPts Idea is that for points in a cluster, their kth nearest neighbors are at roughly the same distance Noise points have the kth nearest neighbor at farther distance So, plot sorted distance of every point to its kth nearest neighbor Core-points Non-Core-points Run K-means for Minp=4 and not fixed

Complexity DBSCAN Time Complexity: O(n2)—for each point it has to be determined if it is a core point, can be reduced to O(n*log(n)) in lower dimensional spaces by using efficient data structures (n is the number of objects to be clustered); Space Complexity: O(n).

Summary DBSCAN Good: can detect arbitrary shapes, not very sensitive to noise, supports outlier detection, complexity is kind of okay, beside K-means the second most used clustering algorithm. Bad: does not work well in high-dimensional datasets, parameter selection is tricky, has problems of identifying clusters of varying densities (SSN algorithm), density estimation is kind of simplistic (does not create a real density function, but rather a graph of density-connected points)

DBSCAN Algorithm Revisited Skip! DBSCAN Algorithm Revisited Eliminate noise points Perform clustering on the remaining points:

DENCLUE (http://www2.cs.uh.edu/~ceick/ML/Denclue2.pdf ) DENsity-based CLUstEring by Hinneburg & Keim (KDD’98) Major features Solid mathematical foundation Good for data sets with large amounts of noise Allows a compact mathematical description of arbitrarily shaped clusters in high-dimensional data sets Significant faster than existing algorithm (faster than DBSCAN by a factor of up to 45) ???????? But needs a large number of parameters

Denclue: Technical Essence Uses grid cells but only keeps information about grid cells that do actually contain data points and manages these cells in a tree-based access structure. Influence function: describes the impact of a data point within its neighborhood. Overall density of the data space can be calculated as the sum of the influence function of all data points. Clusters can be determined using hill climbing by identifying density attractors; density attractors are local maximal of the overall density function. Objects that are associated with the same density attractor belong to the same cluster.

Gradient: The steepness of a slope Example

Example: Density Computation D={x1,x2,x3,x4} fDGaussian(x)= influence(x,x1) + influence(x,x2) + influence(x,x3) + influence(x4)=0.04+0.06+0.08+0.6=0.78 x1 x3 0.04 0.08 y x2 x4 0.06 x 0.6 Remark: the density value of y would be larger than the one for x

Density Attractor

Examples of DENCLUE Clusters

Basic Steps DENCLUE Algorithms Determine density attractors Associate data objects with density attractors using hill climbing Possibly, merge the initial clusters further relying on a hierarchical clustering approach (optional; not covered in this lecture)