Presentation is loading. Please wait.

Presentation is loading. Please wait.

Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Density-Based Clustering of Spatial Data when facing.

Similar presentations


Presentation on theme: "Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Density-Based Clustering of Spatial Data when facing."— Presentation transcript:

1 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Density-Based Clustering of Spatial Data when facing Physical Constraints Authors: Dr. Osmar R. Zaiane and Chi-hoon Lee Database Laboratory Department of Computing Science University of Alberta

2 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta DBCluC (Density-Based Clustering with Constraints) Introduction Related works Background Concepts Modeling Constraints DBCluC Algorithm Performance Evaluation Conclusion

3 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Non-Constraint Based Introduction Cluster Analysis –Clustering (unsupervised classification) is a process of partitioning data objects into a set of meaningful sub-classes called clusters by maximizing intra closeness in a cluster and minimizing inter closeness between clusters. Taxonomy of Clustering methods Data Clustering Constraint Based PartitioningGraph- Partitioning Hierarchical Density- Based Grid- Based K-means K-medoids CLARANS CHAMELEON AUTOCLUST AGNES/DIANA BIRCH CURE DBSCAN DENCLUE STING WaveCluster

4 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Non-Constraint Based Introduction Cluster Analysis –Clustering (unsupervised classification) is a process of partitioning data objects into a set of meaningful sub-classes called clusters by maximizing intra closeness in a cluster and minimizing inter closeness between clusters. Taxonomy of Clustering methods Data Clustering Constraint Based PartitioningGraph- Partitioning Density- Based CLARANS AUTOCLUST DBSCAN PartitioningGraph- Partitioning Density- Based COD-CLARANS AUTOCLUST+ DBCluC

5 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Introduction (Cont.) Key factors for a spatial clustering algorithm –Scalability –Discover arbitrary shaped clusters –Discriminate noise and outliers –Minimum Domain Knowledge –Insensitive to data input order –Constraints Operational Constraints –Ex) SQL aggregate and existence constraints [4] Physical Constraints –Ex) Obstacles [1, 2] and crossings

6 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta DBCluC (Density-Based Clustering with Constraints) Introduction Related works Background Concepts Modeling Constraints DBCluC Algorithm Performance Evaluation Conclusion

7 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Related Works COD-CLARANS (A.K.H. Tung, et al. 2001) –Defines the relationship between obstacles and data objects by visibility graphs to compute obstructed distances between data objects –Require expensive preprocessing steps. –Inherits disadvantages of CLARANS Number of clusters (k) Main memory management Micro-clustering method, Detection of only spherical shaped clusters AUTOCLUST+ (Vladimir Estivill-Castro, et al. 2000) –Delaunay structure for data points –Model obstacles as a set of line segments –Scalable and efficient in 2-dimensional space

8 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta DBCluC (Density-Based Clustering with Constraints) Introduction Related works Background Concepts Modeling Constraints DBCluC Algorithm Performance Evaluation Conclusion

9 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Background Concepts DBSCAN –Proposed by Ester, Kriegel, Sander, and Xu (KDD’ 96). –Density based spatial clustering algorithm discriminating noise. –Detection capability of arbitrary shaped clusters with noise. –R* tree indexing structure (O(logn)). –Density notion evaluated by two parameters: Eps and MinPts. Eps: Maximum radius of the neighbourhood. MinPts: Minimum number of points in an Eps-neighbourhood of a given query point. –N eps (p): {q  D| dist(p,q)  Eps}. |N eps (p)|: MinPts.

10 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Background Concepts: DBSCAN p q Density – reachable o q p Density – connected p q Directly Density-reachable A point p is directly density reachable from a point q wrt. Eps, MinPts if p  N eps (q) A point p is density-reachable from a point q wrt. Eps, MinPts, if there is a chain of points p 1, …,p n,, p 1 =q, p n =p A point p is density-connected to a point q wrt. Eps, MinPts, if there is a point o such that both, p and q are density-reachable from o wrt. Eps and MinPts MinPts: 4 Eps: 2cm

11 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Background Concepts: DBSCAN Cluster – A non-empty subset of data points satisfying the following conditions: –1) Maximality: ∀ p, q: if p  C and q is density-reachable from p with respect to Eps and MinPts, then q  C. –2) Connectivity. ∀ p, q  C: p is density-connected to q with respect to Eps and MinPts. Noise – Data point that does not belong to any cluster Motivating Concepts - Obstacle

12 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Background Concepts (cont.) Obstacle Constraints: 1.An Obstacle entity -Disconnectivity functionality Grouping nearest data objects is not feasible A polygon denoted by P(V, E) where V is a set of points from the polygon and E is a set of line segments Types: Convex and Concave.

13 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Background Concepts: Obstacle free density notions p q Obstacle free density – reachable o q p Obstacle free density – connected p q Directly obstacle free density-reachable A point p is directly density reachable from a point q wrt. Eps, MinPts if p  N eps (q) and an edge joining p and q is obstacle-free. A point p is density-reachable from a point q wrt. Eps, MinPts, if there is a chain of points p 1, …,p n,, p 1 =q, p n =p such that p i is directly obstacle free density-reachable from p i+1. A point p is density-connected to a point q wrt. Eps, MinPts, if there is a point o such that both, p and q are obstacle free density-reachable from o. MinPts: 4 Eps: 2cm r r

14 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Background Concepts: DBCluC Cluster – A non-empty subset of data points satisfying the following conditions: –1) Maximality: ∀ p, q: if p  C and q is obstacle free density- reachable from p with respect to Eps and MinPts, then q  C. –2) Connectivity. ∀ p, q  C: p is obstacle free density-connected to q with respect to Eps and MinPts. Noise – Data point that does not belong to any cluster Motivating Concepts - Obstacle

15 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta DBCluC (Density-Based Clustering with Constraints) Introduction Background Concepts Modeling Constraints DBCluC Algorithm Performance Evaluation Conclusion

16 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Modeling Constraints – Obstacles Modeling Obstacles –Objectives Assign Disconnectivity Functionality. Enhance performance of processing large number of obstacles by reducing search spaces. –Method: Polygon Reduction Algorithm Observation –An obstacle is able to be modeled by a polygon. –A given polygon creates a set of visible spaces with respect to data objects to be clustered.visible spaces Goal –Maintain a set of visible spaces created by an obstacle associated with data objects. Approach –Represents an obstacle as a set of Obstruction Lines. Crossings

17 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Modeling Constraints Polygon Reduction Algorithm Two steps 1.Convexity Test 2.Construct obstruction lines 1.Convexity Test. A pre-stage in order to determine if a polygon is a convex or a concave by checking the type of all points in the polygon. Approaches –Turning Directional Approach »Assume points of a polygon is enumerated in an order: clockwise or counterclockwise »O(n) –Externality Approach »Check the relations between a polygon and an assessment edge that are “very” close to a query point »O(n 2 )

18 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Examples of Convexity Test- Turning Directional Approach v1v1 v2v2 v3v3

19 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Examples of Convexity Test – Externality Approach Query point Convex point Concave point Query point   Convex point A point inside triangle area of the query point and two endpoints of an assessment edge Assessment edge

20 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Modeling Constraints – Polygon Reduction Algorithm 1.Define the type of a polygon via Convexity Test A polygon is concave if  a concave point in the polygon. A polygon is convex if  points are convex points. 2.Convex - obstruction lines *. 3.Concave – The number of obstruction lines depends on a shape of a given polygon

21 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Modeling Obstacles: An example 8 4 vs 1 vs 2 vs 5 vs 3 vs 4 vs 6

22 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Modeling Constraints – a crossing Crossing Modeling –Objective Efficiently assign connectivity functionality. – Method: A polygon with Entry Points and Entry Edge. Defined by users’ or applications’ demands –Entry points modeled from a crossing connect reachable objects Eps Entry Points Entry Edges

23 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta DBCluC (Density-Based Clustering with Constraints) Introduction Background Concepts Modeling Constraints DBCluC Algorithm Performance Evaluation Conclusion

24 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta DBCluC –Extension from DBSCAN –Start clustering from an arbitrary data point. –Indexing data points with SR-tree K-NN Query and Range Query available. –Consider crossing constraints while (after) clustering. –Consider obstacles after retrieving neighbours of a given query point. Visibility between a query point and its neighbours is checked for all obstacles. –Complexity O( N ·logN ·L), where N is the number of data points and L is the number of obstruction lines.

25 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta DBCluC (Density-Based Clustering with Constraints) Introduction Background Concepts Modeling Constraints DBCluC Algorithm Performance Evaluation Conclusion

26 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Performance Performance Evaluation - based on synthetic data sets –Detecting arbitrary shaped clusters –Insensitive to data input order –Discriminating noise and outliers –Pruning search spaces DS3DS5 Number of Data objects 12k1000 Number of Obstacles(line segments/ crossings 7( 29 )/218( 114 )/3 Number of obstruction lines 1574

27 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Performance (DS3) (a) Before clustering(b) Clustering ignoring constraints(c) Clustering with bridges (d) Clustering with obstacles(e) Clustering with obstacles and bridges

28 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Performance (DS5) (a) Before clustering(b) Clustering ignoring constraints

29 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Performance (DS5) (c) Clustering with bridges(d) Clustering with obstacles (e) Clustering with obstacles and bridges

30 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Performance (a) Run time varying size of data objects Time in second

31 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Performance (b) Run time varying size of obstacles

32 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Conclusion Propose a spatial clustering algorithm in the presence of Constraints: Obstacles and Crossings. Modeling constraints –Obstacles Polygon Reduction Algorithm. –Reduces search spaces allowing DBCluC to handle large number of obstacles –Crossing Entry point and Entry edge. –Control connectivity flow Experiments –Scalable, efficient, and effective.

33 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Future Work Indexing obstacles –Prune search spaces for large number of obstacles –Reduce the complexity of DBCluC to O(NlogN) Extension to a high dimension with obstruction hyper planes Consider the object altitude Consider more constraints: Time, Length of a crossing, Direction of Crossing (one direction/bi- direction) Extension to operational constraints

34 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta References [1] A. K. H. Tung, J. Hou, and J. Han, Spatial Clustering in the Presence of Obstacles, Proc. 2001 Int. Conf. on Data Engineering (ICDE'01), Heidelberg, Germany, April 2001. [2] Vladimir Estivill-Castro and IckJai Lee. Autoclust+: Automatic clustering of point-data sets in the presence of obstacles. In International Workshop on Temporal and Spatial and Spatio-Temporal Data Mining (TSDM2000), pages 133-146, 2000. [3] M.G. Stone. A mnemonic for areas of polygons. AMER. MATH. MONTHLY, 93:479-480, 1986. [4] Anthony K. H. Tung, Raymond T. Ng, Laks V. S. Lakshmanan, and Jiawei Han. Constraint-based clustering in large databases. In ICDT, pages 405-419, 2001. [5] Osmar R. Zaïane and Chi-Hoon Lee, Clustering Spatial Data in the Presence of Obstacles: a Density- Based Approach, Sixth International Database Engineering and Applications Symposium (IDEAS 2002), Edmonton, Alberta, Canada, July 17-19, 2002 [6] Osmar R. Zaïane, Andrew Foss, Chi-Hoon Lee, Weinan Wang, On Data Clustering Analysis: Scalability, Constraints and Validation, in Proc. of the Sixth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'02), pp 28-39, Taipei, Taiwan, May, 2002 [7] Osmar R. Zaïane, Chi-Hoon Lee, Clustering Spatial Data When Facing Physical Constraints, in Proc. of the IEEE 2001 International Conference on Data Mining (ICDM'2002), pp ??-??, Maebashi City, Japan, December 9 - 12, 2002

35 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta O1O1 O2O2 pq v1v1 v2v2 v3v3 v4v4 v5v5 Visibility Graph from [1]

36 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Delaunay diagram -Collection of edges satisfying an "empty circle" property: for each edge we can find a circle containing the edge's endpoints but not containing any other points. - Dual of Voronoi Diagram

37 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Delaunay diagram

38 Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Visible Space Given a set D of n data objects with a polygon P(V, E), a visible space S is a space that has a set P of data objects satisfying the following 1. Space S is defined by three edges: the first edge(edges) e  E connects two minimal convex points v i, v j  V, the second edge f is the extension of the line connecting v i and its other adjacent point v k  V, and the third edge g is the extension of the line connecting v j and its other adjacent v l  V. 2.  p,q  P, p and q are visible to each other in S. Thus, P  D 3.S is not visible to any other visible space S’. Thus, S’  S =  e1e1 e4e4 e5e5 S1S1 S2S2 S3S3 S4S4 S5S5 e2e2 e3e3 S1S1 S 2 S3S3 S4S4 S5S5 S3S3 S4S4 S5S5 S4S4 S5S5 S5S5


Download ppt "Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Density-Based Clustering of Spatial Data when facing."

Similar presentations


Ads by Google