OPTICS: Ordering Points To Identify the Clustering Structure Mihael Ankerst, Markus M. Breunig, Hans- Peter Kriegel, Jörg Sander Presented by Chris Mueller.

Slides:



Advertisements
Similar presentations
Density-Based Clustering Math 3210 By Fatine Bourkadi.
Advertisements

DBSCAN & Its Implementation on Atlas Xin Zhou, Richard Luo Prof. Carlo Zaniolo Spring 2002.
Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Density-Based Clustering of Spatial Data when facing.
Lecture outline Density-based clustering (DB-Scan) – Reference: Martin Ester, Hans-Peter Kriegel, Jorg Sander, Xiaowei Xu: A Density-Based Algorithm for.
Efficient Density-Based Clustering of Complex Objects Stefan Brecheisen, Hans-Peter Kriegel, Martin Pfeifle University of Munich Institute for Computer.
DBSCAN – Density-Based Spatial Clustering of Applications with Noise M.Ester, H.P.Kriegel, J.Sander and Xu. A density-based algorithm for discovering clusters.
Density-based Approaches
Segmentation in color space using clustering Student: Yijian Yang Advisor: Longin Jan Latecki.
Cluster Analysis Part III. Learning Objectives Density-Based Methods Grid-Based Methods Model-Based Clustering Methods Outlier Analysis Summary.
2001/12/18CHAMELEON1 CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Paper presentation in data mining class Presenter : 許明壽 ; 蘇建仲.
Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering.
Qiang Yang Adapted from Tan et al. and Han et al.
Clustering Prof. Navneet Goyal BITS, Pilani
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Clustering CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
Part II - Clustering© Prentice Hall1 Clustering Large DB Most clustering algorithms assume a large data structure which is memory resident. Most clustering.
Clustering Methods Professor: Dr. Mansouri
More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.
Chapter 3: Cluster Analysis
MR-DBSCAN: An Efficient Parallel Density-based Clustering Algorithm using MapReduce Yaobin He, Haoyu Tan, Wuman Luo, Huajian Mao, Di Ma, Shengzhong Feng,
1 Clustering Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: J.W. Han, I. Witten, E. Frank.
K-Means and DBSCAN Erik Zeitler Uppsala Database Laboratory.
Cluster Analysis.
An Introduction to Clustering
Instructor: Qiang Yang
SCAN: A Structural Clustering Algorithm for Networks
Cluster Analysis.
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies.
2015/7/21 Incremental Clustering for Mining in a Data Warehousing Environment Martin Ester Hans-Peter Kriegel J.Sander Michael Wimmer Xiaowei Xu Proceedings.
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
Project Presentation Arpan Maheshwari Y7082,CSE Supervisor: Prof. Amitav Mukerjee Madan M Dabbeeru.
Clustering Part2 BIRCH Density-based Clustering --- DBSCAN and DENCLUE
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Clustering CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
Garrett Poppe, Liv Nguekap, Adrian Mirabel CSUDH, Computer Science Department.
Outlier Detection Lian Duan Management Sciences, UIOWA.
Density-Based Clustering Algorithms
October 27, 2015Data Mining: Concepts and Techniques1 Data Mining: Concepts and Techniques — Slides for Textbook — — Chapter 7 — ©Jiawei Han and Micheline.
Han/Eick: Clustering II 1 Clustering Part2 continued 1. BIRCH skipped 2. Density-based Clustering --- DBSCAN and DENCLUE 3. GRID-based Approaches --- STING.
Topic9: Density-based Clustering
Han/Eick: Clustering II 1 Clustering Part2 continued 1. BIRCH skipped 2. Density-based Clustering --- DBSCAN and DENCLUE 3. GRID-based Approaches --- STING.
Clustering Algorithms for Numerical Data Sets. Contents 1.Data Clustering Introduction 2.Hierarchical Clustering Algorithms 3.Partitional Data Clustering.
Data Mining and Warehousing: Chapter 8
DBSCAN Data Mining algorithm Dr Veljko Milutinović Milan Micić
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
Graph preprocessing. Framework for validating data cleaning techniques on binary data.
Presented by Ho Wai Shing
Density-Based Clustering Methods. Clustering based on density (local cluster criterion), such as density-connected points Major features: –Discover clusters.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
1 Core Techniques: Cluster Analysis Cluster: a number of things of the same kind being close together in a group (Longman dictionary of contemporary English.
CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course.
Marko Živković 3179/2015.  Clustering is the process of grouping large data sets according to their similarity  Density-based clustering: ◦ groups together.
Clustering By : Babu Ram Dawadi. 2 Clustering cluster is a collection of data objects, in which the objects similar to one another within the same cluster.
Parameter Reduction for Density-based Clustering on Large Data Sets Elizabeth Wang.
1 Similarity and Dissimilarity Between Objects Distances are normally used to measure the similarity or dissimilarity between two data objects Some popular.
Clustering Microarray Data based on Density and Shared Nearest Neighbor Measure CATA’06, March 23-25, 2006 Seattle, WA, USA Ranapratap Syamala, Taufik.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar GNET 713 BCB Module Spring 2007 Wei Wang.
1 OPTICS walk – animated (DataSURG) ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● p is an (r,M)-coreobject iff |D(p, r)|  M (e.g. M=4, r = ) x is.
1 Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods Density-Based.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
Data Mining Comp. Sc. and Inf. Mgmt. Asian Institute of Technology
Data Mining: Basic Cluster Analysis
DATA MINING Spatial Clustering
More on Clustering in COSC 4335
CSE 4705 Artificial Intelligence
CSE 5243 Intro. to Data Mining
©Jiawei Han and Micheline Kamber Department of Computer Science
The University of Adelaide, School of Computer Science
Trajectory Clustering
CSE572, CBS572: Data Mining by H. Liu
CSE572: Data Mining by H. Liu
CS 685: Special Topics in Data Mining Jinze Liu
Presentation transcript:

OPTICS: Ordering Points To Identify the Clustering Structure Mihael Ankerst, Markus M. Breunig, Hans- Peter Kriegel, Jörg Sander Presented by Chris Mueller November 4, 2004

Clustering Goal: Group objects into meaningful subclasses as part of an exploratory process to insight into data or as a preprocessing step for other algorithms. Porpoise Beluga Sperm Fin Sei Cow Giraffe Clustering Strategies Hierarchical Partitioning k-means Density Based Density Based clustering requires a distance metric between points and works well on high dimensional data and data that forms irregular clusters.

DBSCAN: Density Based Clustering MinPts = 3 a b c d  a b c d   A core object has at least MinPts in its  -neighborhood. An object p is directly density-reachable from object q if q is a core object and p is in the  - neighborhood of q. An object p is in the  -neighborhood of q if the distance from p to q is less than . An object p is density-reachable from object q if there is a chain of objects p 1, …, p n, where p 1 = q and p n = p such that p i+1 is directly density reachable from p i. An object p is density-connected to object q if there is an object o such that both p and q are density-reachable from o. A cluster is a set of density-connected objects which is maximal with respect to density- reachability. Noise is the set of objects not contained in any cluster.

OPTICS: Density-Based Cluster Ordering OPTICS generalizes DB clustering by creating an ordering of the points that allows the extraction of clusters with arbitrary values for . The core-distance is the smallest distance  ’ between p and an object in its  -neighborhood such that p would be a core object. The reachability-distance of p is the smallest distance such that p is density-reachable from a core object o. The generating-distance  is the largest distance considered for clusters. Clusters can be extracted for all  i such that 0   i  . MinPts = 3 ’’ Reachability Distance This point has an undefined reachability distance.  1 OPTICS(Objects, e, MinPts, OrderFile): 2 for each unprocessed obj in objects: 3 neighbors = Objects.getNeighbors(obj, e) 4 obj.setCoreDistance(neighbors, e, MinPts) 5 OrderFile.write(obj) 6 if obj.coreDistance != NULL: 7 orderSeeds.update(neighbors, obj) 8 for obj in orderSeeds: 9 neighbors = Objects.getNeighbors(obj, e) 10 obj.setCoreDistance(neighbors, e, MinPts) 11 OrderFile.write(obj) 12 if obj.coreDistance != NULL: 13 orderSeeds.update(neighbors, obj) 1 OrderSeeds::update(neighbors, centerObj): 2 d = centerObj.coreDistance 3 for each unprocessed obj in neighbors: 4 newRdist = max(d, dist(obj, centerObj)) 5 if obj.reachability == NULL: 6 obj.reachability = newRdist 7 insert(obj, newRdist) 8 elif newRdist < obj.reachability: 9 obj.reachability = newRdist 10 decrease(obj, newRdist)

Get Neighbors, Calc Core Distance, Save Current Object Calc/Update Reachability Distances Update Processing Order => pt1  ’ :10 rd:NULL => pt => pt3 7 5 … ’’  ’’ 

Get Neighbors, Calc Core Distance, Save Current Object Calc/Update Reachability Distances Update Processing Order => pt => pt3 7 5 … ’’  => pt1  ’ :10 rd:NULL

Get Neighbors, Calc Core Distance, Save Current Object Calc/Update Reachability Distances Update Processing Order => pt => pt3 7 5 … ’’  => pt1  ’ :10 rd:NULL

Get Neighbors, Calc Core Distance, Save Current Object Calc/Update Reachability Distances Update Processing Order => pt => pt3 7 5 … ’’  => pt1  ’ :10 rd:NULL

Get Neighbors, Calc Core Distance, Save Current Object Calc/Update Reachability Distances Update Processing Order => pt => pt3 7 5 … ’’  => pt1  ’ :10 rd:NULL

Get Neighbors, Calc Core Distance, Save Current Object Calc/Update Reachability Distances Update Processing Order => pt => pt3 7 5 … ’’  => pt1  ’ :10 rd:NULL

Get Neighbors, Calc Core Distance, Save Current Object Calc/Update Reachability Distances Update Processing Order => pt => pt3 7 5 … ’’  => pt1  ’ :10 rd:NULL

Get Neighbors, Calc Core Distance, Save Current Object Calc/Update Reachability Distances Update Processing Order => pt => pt3 7 5 … ’’  => pt1  ’ :10 rd:NULL

Get Neighbors, Calc Core Distance, Save Current Object Calc/Update Reachability Distances Update Processing Order => pt => pt3 7 5 … ’’  => pt1  ’ :10 rd:NULL

Get Neighbors, Calc Core Distance, Save Current Object Calc/Update Reachability Distances Update Processing Order => pt => pt3 7 5 … ’’  => pt1  ’ :10 rd:NULL

Reachability Plots A reachability plot is a bar chart that shows each object’s reachability distance in the order the object was processed. These plots clearly show the cluster structure of the data. >  n 

ii  Automatic Cluster Extraction Retrieving DBSCAN clusters Extracting hierarchical clusters 1 ExtractDBSCAN(OrderedPoints, ei, MinPts): 2 clusterId = NOISE 3 for each obj in OrderedPoints: 4 if obj.reachability > ei: 5 if obj.coreDistance <= ei: 6 clusterId = nextId(clusterId) 7 obj.clusterId = clusterId 8 else: 9 obj.clusterId = NOISE 10 else: 11 obj.clusterId = clusterId A steep upward point is a point that is t% lower that its successor. A steep downward point is similarly defined. A steep upward area is a region from [s, e] such that s and e are both steep upward points, each successive point is at least as high as its predecessors, and the region does not contain more than MinPts successive points that are not steep upward. A cluster: Starts with a steep downward area Ends with a steep upward area Contains at least MinPts The reachability values in the cluster are at least t% lower than the first point in the cluster. 1 HierachicalCluster(objects): 2 for each index: 3 if start of down area D: 4 add D to steep down areas 5 index = end of D 6 elif start of steep up area U: 7 index = end of U 8 for each steep down area D: 9 if D and U form a cluster: 10 add [start(D), end(U)] to set of clusters  Core Distance Reachability

[DBSCAN] Ester M., Kriegel H.-P., Sander J., Xu X.: “A DensityBased Algorithm for Discovering Clusters in Large Spatial Databases with Noise”, Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, 1996, pp [OPTICS] Ankerst, M., Breunig, M., Kreigel, H.-P., and Sander, J OPTICS: Ordering points to identify clustering structure. In Proceedings of the ACM SIGMOD Conference, 49-60, Philadelphia, PA. References