Presentation on theme: "ICDE 2014 LinkSCAN*: Overlapping Community Detection Using the Link-Space Transformation Sungsu Lim †, Seungwoo Ryu ‡, Sejeong Kwon§, Kyomin Jung ¶, and."— Presentation transcript:
1 ICDE 2014LinkSCAN*: Overlapping Community Detection Using the Link-Space TransformationSungsu Lim †, Seungwoo Ryu ‡, Sejeong Kwon§,Kyomin Jung ¶, and Jae-Gil Lee †† Dept. of Knowledge Service Engineering, KAIST‡ Samsung Advanced Institute of Technology§ Graduate School of Cultural Technology, KAIST¶ Dept. of Electrical and Computer Engineering, SNU
3 Clusters are NOT overlapped Community DetectionNetwork communitiesSets of nodes where the nodes in the same set are similar (more internal links) and the nodes in different sets are dissimilar (less external links)Communities, clusters, modules, groups, etc.Non-overlapping community detectionFinding a good partition of nodesClusters are NOT overlapped
4 Overlapping Community Detection A person (node) can belong to multiple communities, e.g., family, friends, colleagues, etc.Overlapping community detection allows that a node can be included in different groupsfamily,friends,colleagues,
5 Existing MethodsNode-based: A node overlaps if more than one belonging coefficient values are larger than some thresholdLabel Propagation (COPRA) [Gregory 2010, Subelj and Bajec 2011]Structure-based: A node overlaps if it participates in multiple base structures with different membershipsClique Percolation (CPM) [Palla et al. 2005, Derenyi et al. 2005]Link Partition [Evans and Lambiotte 2009 , Ahn et al. 2010]f(i,c1)=0.35, f(i,c2)=0.05, f(i,c3)=0.4, …Base structure:cliques of size 𝑘Base structure: links𝜏=0.3𝑘=4iiif(i,c)=mean(f(j,c))j ∈ nbr(i)
6 Limitations of Existing Methods The existing methods do not perform well for1. networks with many highly overlapping nodes,2. networks with various base structures, and3. networks with many weak-tiesif(i,c1)=0.2, f(i,c2)=0.15, f(i,c3)=0.25, f(i,c4)=0.2, …c1c4c2c3𝜏=0.3𝑘≥3Weak-tiei: overlappingCOPRA failsi: non-overlappingCPM failsLink partition fails
8 Our SolutionWe propose a new framework called the link-space transformation that transforms a given graph into the link-space graphWe develop an algorithm that performs a non-overlapping clustering on the link-space graph, which enables us to discover overlapping clusteringOriginalGraphLink-SpaceGraphLinkCommunitiesOverlapping CommunitiesLink-Space TransformationNon-overlapping ClusteringMembership Translation
9 Overall ProcedureWe propose an overlapping clustering algorithm using the link-space transformationOriginalGraphLink-SpaceGraphLinkCommunitiesOverlapping CommunitiesLink-Space TransformationNon-overlapping ClusteringMembership Translation
10 Link-Space Transformation Topological structureEach link of an original graph maps to a node of the link-space graphTwo nodes of the links-space graph are adjacent if the corresponding two links of the original graph are incidentWeightsWeights of links of the link-space graph are calculated from the similarity of corresponding links of the original graphi1j11234i0i2j2j3ijikjkj4kk5k8𝑤 𝑣 𝑖𝑘 , 𝑣 𝑗𝑘 =𝜎 𝑒 𝑖𝑘 , 𝑒 𝑗𝑘5678k6k7
11 Overall ProcedureOverlapping clustering algorithm using the link-space transformationOriginalGraphLink-SpaceGraphLinkCommunitiesOverlapping CommunitiesLink-Space TransformationNon-overlapping ClusteringMembership Translation
12 Clustering on Link-Space Graph Applying a non-overlapping clustering algorithm to the link-space graphWe use structural clustering that can assign a node into hubs or outliers (neutral membership)140331334Another weights are less than 1/31/21/21125122335451/21/2Original graphNon-overlapping clustering on the link-space graph
13 Overall ProcedureOverlapping clustering algorithm using the link-space transformationOriginalGraphLink-SpaceGraphLinkCommunitiesOverlapping CommunitiesLink-Space TransformationNon-overlapping ClusteringMembership Translation
14 Membership Translation Memberships of nodes of the link-space graph map to the memberships of links of the original graphMemberships of a node of the original graph are from the memberships of incident links of the node031413341/21/2311122335451/21/225Non-overlapping clustering on the link-space graphMembership translation
15 Advantages of Link-Space Graph Inheriting the advantages of the link-space graph, finding disjoint communities enables us to find overlapping communities where its original structure is preserved since similarity properly reflect the structure of the original graph.Easier to find overlapping communitiesPreserving the original structureEasier to find overlapping communities while preserving the original structureLink-space graph+
17 LinkSCAN*We propose an efficient overlapping clustering algorithm using the link-space transformationFor a massive graph, it may be denseOriginalGraphLink-SpaceGraphLinkCommunitiesOverlapping CommunitiesLink-Space TransformationStructural ClusteringMembership Translation
18 LinkSCAN*We propose an efficient overlapping clustering algorithm using the link-space transformationSampling processOriginalGraphLink-SpaceGraphLinkCommunitiesOverlapping CommunitiesLink-Space TransformationStructural ClusteringMembership Translation
19 LinkSCAN*We propose an efficient overlapping clustering algorithm using the link-space transformationOriginalGraphLink-SpaceGraphSampled GraphLinkCommunitiesOverlapping CommunitiesLink-Space TransformationLinkSamplingStructural ClusteringMembership Translation
20 Link SamplingSampling Strategy: For each node 𝑣, we sample 𝑛 𝑣 incident links of 𝑣, where 𝑛 𝑣 = min 𝑑 𝑣 ,𝛼+𝛽 ln 𝑑 𝑣 and 𝑑 𝑣 is the degree of 𝑣Thm 1 guarantees that sampling errors are not significant even when 𝑛 𝑣 is smallFor real nets, a sampled graph and the link-space graph are close (NMI>0.9) , while sampling rate is small (~0.1)Thm 1 (Error bound)Applying Chernoff bound, the estimation error of selecting core nodes decreases exponentially as the 𝑛 𝑣 ’s increase.
22 Network DatasetsSynthetic network: LFR benchmark networks [Lancichinetti and Fortunato 2009]Real network: Social and information networks [snap.stanford.edu/data/ and# nodes# linksAver. degreeClust. Coeff.DBLP1,068,0373,800,9637.500.19Amazon334,863925,8725.530.21Enron-36,692183,83110.020.08Brightkite58,228214,0787.350.11Facebook63,392816,88625.770.15WWW325,7291,090,1086.690.09
23 Performance Evaluation When ground-truth is knownNMI for overlapping clustering [ancichietti et al. 2009]F-score (performance of identifying overlapping nodes)When ground-truth is unknownQuality (Mov): Modularity for overlapping clustering [Lazar et al. 2010]Coverage (CC): Clustering coverage [Ahn et al. 2010]
24 Problem 1For networks with many highly overlapping nodes, LinkSCAN* outperforms the existing methods.
25 Problem 2For networks with various base-structures, our method performs well compared to the existing methods
26 Problem 3For networks with many weak ties, the existing methods fail for the following toy networks. But, LinkSCAN* detects all the clusters well
27 Real NetworksFor real network datasets, the normalized measure of (Quality + Coverage) indicates that LinkSCAN* is better than the existing methods.
28 Link SamplingThe comparisons between the use of the link-space graph (LinkSCAN) and the use of sampled graphs (LinkSCAN*) show that LinkSCAN* improves efficiency with small errorsEnron- network# nodes = 37K# links = 184K𝛼=0.5 𝑑 ~16 𝑑𝛽=1
29 ScalabilityThe running time of LinkSCAN∗ for a set of LFR benchmark networks shows that LinkSCAN∗ has near-linear scalabilityLFR benchmark networks# nodes = 1K to 1M# links = 10K to 10M𝛼=2 𝑑𝛽=1
31 ConclusionsWe propose a notion of the link-space transformation and develop a new overlapping clustering algorithms LinkSCAN* that satisfy membership neutralityLinkSCAN* outperforms existing algorithms for the networks with many highly overlapping nodes and those with various base-structures
32 Acknowledgement Coauthors Funding Agencies This research was supported by National Research Foundation of Korea
Your consent to our cookies if you continue to use this website.