Presentation is loading. Please wait.

Presentation is loading. Please wait.

ICDE 2014 LinkSCAN*: Overlapping Community Detection Using the Link-Space Transformation Sungsu Lim †, Seungwoo Ryu ‡, Sejeong Kwon§, Kyomin Jung ¶, and.

Similar presentations


Presentation on theme: "ICDE 2014 LinkSCAN*: Overlapping Community Detection Using the Link-Space Transformation Sungsu Lim †, Seungwoo Ryu ‡, Sejeong Kwon§, Kyomin Jung ¶, and."— Presentation transcript:

1 ICDE 2014 LinkSCAN*: Overlapping Community Detection Using the Link-Space Transformation Sungsu Lim †, Seungwoo Ryu ‡, Sejeong Kwon§, Kyomin Jung ¶, and Jae-Gil Lee † † Dept. of Knowledge Service Engineering, KAIST ‡ Samsung Advanced Institute of Technology § Graduate School of Cultural Technology, KAIST ¶ Dept. of Electrical and Computer Engineering, SNU

2 Contents Motivation Link-Space Transformation
Proposed Algorithm: LinkSCAN* Experiment Evaluation Conclusions

3 Clusters are NOT overlapped
Community Detection Network communities Sets of nodes where the nodes in the same set are similar (more internal links) and the nodes in different sets are dissimilar (less external links) Communities, clusters, modules, groups, etc. Non-overlapping community detection Finding a good partition of nodes Clusters are NOT overlapped

4 Overlapping Community Detection
A person (node) can belong to multiple communities, e.g., family, friends, colleagues, etc. Overlapping community detection allows that a node can be included in different groups family, friends, colleagues,

5 Existing Methods Node-based: A node overlaps if more than one belonging coefficient values are larger than some threshold Label Propagation (COPRA) [Gregory 2010, Subelj and Bajec 2011] Structure-based: A node overlaps if it participates in multiple base structures with different memberships Clique Percolation (CPM) [Palla et al. 2005, Derenyi et al. 2005] Link Partition [Evans and Lambiotte 2009 , Ahn et al. 2010] f(i,c1)=0.35, f(i,c2)=0.05, f(i,c3)=0.4, … Base structure: cliques of size 𝑘 Base structure: links 𝜏=0.3 𝑘=4 i i i f(i,c)=mean(f(j,c)) j ∈ nbr(i)

6 Limitations of Existing Methods
The existing methods do not perform well for 1. networks with many highly overlapping nodes, 2. networks with various base structures, and 3. networks with many weak-ties i f(i,c1)=0.2, f(i,c2)=0.15, f(i,c3)=0.25, f(i,c4)=0.2, … c1 c4 c2 c3 𝜏=0.3 𝑘≥3 Weak-tie i: overlapping COPRA fails i: non-overlapping CPM fails Link partition fails

7 Contents Motivation Link-Space Transformation
Proposed Algorithm: LinkSCAN* Experiment Evaluation Conclusions

8 Our Solution We propose a new framework called the link-space transformation that transforms a given graph into the link-space graph We develop an algorithm that performs a non-overlapping clustering on the link-space graph, which enables us to discover overlapping clustering Original Graph Link-Space Graph Link Communities Overlapping Communities Link-Space Transformation Non-overlapping Clustering Membership Translation

9 Overall Procedure We propose an overlapping clustering algorithm using the link-space transformation Original Graph Link-Space Graph Link Communities Overlapping Communities Link-Space Transformation Non-overlapping Clustering Membership Translation

10 Link-Space Transformation
Topological structure Each link of an original graph maps to a node of the link-space graph Two nodes of the links-space graph are adjacent if the corresponding two links of the original graph are incident Weights Weights of links of the link-space graph are calculated from the similarity of corresponding links of the original graph i1 j1 1 2 3 4 i0 i2 j2 j3 i j ik jk j4 k k5 k8 𝑤 𝑣 𝑖𝑘 , 𝑣 𝑗𝑘 =𝜎 𝑒 𝑖𝑘 , 𝑒 𝑗𝑘 5 6 7 8 k6 k7

11 Overall Procedure Overlapping clustering algorithm using the link-space transformation Original Graph Link-Space Graph Link Communities Overlapping Communities Link-Space Transformation Non-overlapping Clustering Membership Translation

12 Clustering on Link-Space Graph
Applying a non-overlapping clustering algorithm to the link-space graph We use structural clustering that can assign a node into hubs or outliers (neutral membership) 1 4 03 3 13 34 Another weights are less than 1/3 1/2 1/2 1 1 2 5 12 23 35 45 1/2 1/2 Original graph Non-overlapping clustering on the link-space graph

13 Overall Procedure Overlapping clustering algorithm using the link-space transformation Original Graph Link-Space Graph Link Communities Overlapping Communities Link-Space Transformation Non-overlapping Clustering Membership Translation

14 Membership Translation
Memberships of nodes of the link-space graph map to the memberships of links of the original graph Memberships of a node of the original graph are from the memberships of incident links of the node 03 1 4 13 34 1/2 1/2 3 1 1 12 23 35 45 1/2 1/2 2 5 Non-overlapping clustering on the link-space graph Membership translation

15 Advantages of Link-Space Graph
Inheriting the advantages of the link-space graph, finding disjoint communities enables us to find overlapping communities where its original structure is preserved since similarity properly reflect the structure of the original graph. Easier to find overlapping communities Preserving the original structure Easier to find overlapping communities while preserving the original structure Link-space graph +

16 Contents Motivation Link-Space Transformation
Proposed Algorithm: LinkSCAN* Experiment Evaluation Conclusions

17 LinkSCAN* We propose an efficient overlapping clustering algorithm using the link-space transformation For a massive graph, it may be dense Original Graph Link-Space Graph Link Communities Overlapping Communities Link-Space Transformation Structural Clustering Membership Translation

18 LinkSCAN* We propose an efficient overlapping clustering algorithm using the link-space transformation Sampling process Original Graph Link-Space Graph Link Communities Overlapping Communities Link-Space Transformation Structural Clustering Membership Translation

19 LinkSCAN* We propose an efficient overlapping clustering algorithm using the link-space transformation Original Graph Link-Space Graph Sampled Graph Link Communities Overlapping Communities Link-Space Transformation Link Sampling Structural Clustering Membership Translation

20 Link Sampling Sampling Strategy: For each node 𝑣, we sample 𝑛 𝑣 incident links of 𝑣, where 𝑛 𝑣 = min 𝑑 𝑣 ,𝛼+𝛽 ln 𝑑 𝑣 and 𝑑 𝑣 is the degree of 𝑣 Thm 1 guarantees that sampling errors are not significant even when 𝑛 𝑣 is small For real nets, a sampled graph and the link-space graph are close (NMI>0.9) , while sampling rate is small (~0.1) Thm 1 (Error bound) Applying Chernoff bound, the estimation error of selecting core nodes decreases exponentially as the 𝑛 𝑣 ’s increase.

21 Contents Motivation Link-Space Transformation
Proposed Algorithm: LinkSCAN* Experiment Evaluation Conclusions

22 Network Datasets Synthetic network: LFR benchmark networks [Lancichinetti and Fortunato 2009] Real network: Social and information networks [snap.stanford.edu/data/ and # nodes # links Aver. degree Clust. Coeff. DBLP 1,068,037 3,800,963 7.50 0.19 Amazon 334,863 925,872 5.53 0.21 Enron- 36,692 183,831 10.02 0.08 Brightkite 58,228 214,078 7.35 0.11 Facebook 63,392 816,886 25.77 0.15 WWW 325,729 1,090,108 6.69 0.09

23 Performance Evaluation
When ground-truth is known NMI for overlapping clustering [ancichietti et al. 2009] F-score (performance of identifying overlapping nodes) When ground-truth is unknown Quality (Mov): Modularity for overlapping clustering [Lazar et al. 2010] Coverage (CC): Clustering coverage [Ahn et al. 2010]

24 Problem 1 For networks with many highly overlapping nodes, LinkSCAN* outperforms the existing methods.

25 Problem 2 For networks with various base-structures, our method performs well compared to the existing methods

26 Problem 3 For networks with many weak ties, the existing methods fail for the following toy networks. But, LinkSCAN* detects all the clusters well

27 Real Networks For real network datasets, the normalized measure of (Quality + Coverage) indicates that LinkSCAN* is better than the existing methods.

28 Link Sampling The comparisons between the use of the link-space graph (LinkSCAN) and the use of sampled graphs (LinkSCAN*) show that LinkSCAN* improves efficiency with small errors Enron- network # nodes = 37K # links = 184K 𝛼=0.5 𝑑 ~16 𝑑 𝛽=1

29 Scalability The running time of LinkSCAN∗ for a set of LFR benchmark networks shows that LinkSCAN∗ has near-linear scalability LFR benchmark networks # nodes = 1K to 1M # links = 10K to 10M 𝛼=2 𝑑 𝛽=1

30 Contents Motivation Link-Space Transformation
Proposed Algorithm: LinkSCAN* Experiment Evaluation Conclusions

31 Conclusions We propose a notion of the link-space transformation and develop a new overlapping clustering algorithms LinkSCAN* that satisfy membership neutrality LinkSCAN* outperforms existing algorithms for the networks with many highly overlapping nodes and those with various base-structures

32 Acknowledgement Coauthors Funding Agencies
This research was supported by National Research Foundation of Korea

33 Thank You!


Download ppt "ICDE 2014 LinkSCAN*: Overlapping Community Detection Using the Link-Space Transformation Sungsu Lim †, Seungwoo Ryu ‡, Sejeong Kwon§, Kyomin Jung ¶, and."

Similar presentations


Ads by Google