• 0:00
    /
    0:00
    Loaded: 0%
    0:00
    Progress: 0%
    Stream TypeLIVE
    0:00
     
    1x
    Advertisement

SCAN: A Structural Clustering Algorithm for Networks

Similar presentations


Presentation on theme: "SCAN: A Structural Clustering Algorithm for Networks"— Presentation transcript:

1 SCAN: A Structural Clustering Algorithm for Networks
Xiaowei Xu, Nurcan Yuruk, Zhidan Feng, and Thomas Schweiger KDD’07

2 An Introduction to DBSCAN
DBSCAN is a density-based algorithm. Density = number of points within a specified radius (Eps) A point is a core point if it has more than a specified number of points (MinPts) within Eps These are points that are at the interior of a cluster A border point has fewer than MinPts within Eps, but is in the neighborhood of a core point A noise point is any point that is not a core point or a border point.

3 DBSCAN: Core, Border, and Noise Points

4 DBSCAN Algorithm Eliminate noise points
Perform clustering on the remaining points

5 DBSCAN: Core, Border and Noise Points
Original Points Point types: core, border and noise Eps = 10, MinPts = 4

6 When DBSCAN Works Well Original Points Clusters Resistant to Noise
Can handle clusters of different shapes and sizes

7 DBSCAN: Determining EPS and MinPts
Idea is that for points in a cluster, their kth nearest neighbors are at roughly the same distance Noise points have the kth nearest neighbor at farther distance So, plot sorted distance of every point to its kth nearest neighbor

8 Network Clustering Problem
Networks made up of the mutual relationships of data elements usually have an underlying structure. Because relationships are complex, it is difficult to discover these structures. How can the structure be made clear? Stated another way, given simply information of who associates with whom, could one identify clusters of individuals with common interests or special relationships (families, cliques, terrorist cells). 8

9 An Example of Networks How many clusters? What size should they be?
What is the best partitioning? Should some points be differentiated? 9

10 A Social Network Model Individuals in a tight social group, or clique, know many of the same people, regardless of the size of the group. Individuals who are hubs know many people in different groups but belong to no single group. Politicians, for example bridge multiple groups. Individuals who are outliers reside at the margins of society. Hermits, for example, know few people and belong to no group.

11 The Neighborhood of a Vertex
Define () as the immediate neighborhood of a vertex (i.e. the set of people that an individual knows ).

12 Structure Similarity The desired features tend to be captured by a measure we call Structural Similarity Structural similarity is large for members of a clique and small for hubs and outliers.

13 Structural Connectivity [1]
-Neighborhood: Core: Direct structure reachable: Structure reachable: transitive closure of direct structure reachability Structure connected: [1] M. Ester, H. P. Kriegel, J. Sander, & X. Xu (KDD'97)

14 Structure-Connected Clusters
Structure-connected cluster C Connectivity: Maximality: Hubs: Not belong to any cluster Bridge to many clusters Outliers: Connect to less clusters hub outlier

15 Algorithm 13 9 10 11 7 8 12 6 4 1 5 2 3  = 2  = 0.7

16 Algorithm 2 3  = 2  = 0.7 5 1 4 7 6 11 8 12 10 9 0.63 13

17 Algorithm 2 3  = 2  = 0.7 5 1 4 7 6 0.67 11 8 0.82 12 10 0.75 9 13

18 Algorithm 13 9 10 11 7 8 12 6 4 1 5 2 3  = 2  = 0.7

19 Algorithm 2 3  = 2  = 0.7 5 1 4 7 6 11 8 12 10 9 0.67 13

20 Algorithm 2 3  = 2  = 0.7 5 1 4 7 6 0.73 11 8 12 0.73 0.73 10 9 13

21 Algorithm 2 3  = 2  = 0.7 5 1 4 7 6 11 8 12 10 9 13

22 Algorithm 2 3  = 2  = 0.7 5 1 4 7 0.51 6 11 8 12 10 9 13

23 Algorithm 2 3  = 2  = 0.7 5 1 4 7 6 11 0.68 8 12 10 9 13

24 Algorithm 2 3  = 2  = 0.7 5 1 4 7 6 11 8 12 0.51 10 9 13

25 Algorithm 2 3  = 2  = 0.7 5 1 4 7 6 11 8 12 10 9 13

26 Algorithm 2 3  = 2  = 0.7 5 1 0.51 4 7 0.68 6 11 0.51 8 12 10 9 13

27 Algorithm 2 3  = 2  = 0.7 5 1 4 7 6 11 8 12 10 9 13

28 Running Time Running time = O(|E|) For sparse networks = O(|V|)
[2] A. Clauset, M. E. J. Newman, & C. Moore, Phys. Rev. E 70, (2004).

29 Conclusion We propose a novel network clustering algorithm:
It is fast O(|E|), for scale free networks: O(|V|) It can find clusters, as well as hubs and outliers


Download ppt "SCAN: A Structural Clustering Algorithm for Networks"

Similar presentations


Ads by Google