Presentation is loading. Please wait.

Presentation is loading. Please wait.

Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69, 026113.

Similar presentations


Presentation on theme: "Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69, 026113."— Presentation transcript:

1 Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69, 026113 (2004). M. E. J. Newman. Fast algorithm for detecting community structure in networks. Physical Review E 69, 066133 (2004).

2 The Problem Can we partition the network into groups s.t. the inter-group edges are sparse while the intra-group edges are dense? Why is it interesting/useful? ◦ Understanding comm. structure – means to understanding n/w structure. ◦ Graph partitioning – similar problem; graph of processes, edges=communication; assign sub- graphs to processors to minimize inter-processor comm. & balance processor load. (NP-hard in general.) ◦ Diff. w/ graph partitioning.

3 An Example with Three Communities

4 A Hierarchical Clustering Approach

5 Community detection via hierarchical clustering Compute all pairwise node similarities for every edge present. Repeatedly add edges with greatest similarity.  leads to a tree (called dendrogram). A slice throguh the dendogram represents a clustering or comm. structure.

6 Dendrogram example

7 Limitations of HC approach “Misplaces” nodes in the periphery. E.g.: Which community should 5 belong to?  Alternative approach based on “edge betweenness”. 5 1 2 3 4

8 Key Intuition An inter-comm. edge has a higher “betweenness” compared to an intra- comm. edge, i.e., more paths between node pairs pass through it. Start with G. Repeatedly remove edges with highest betweenness until. Communities = resulting components.

9 Basic Algorithm repeat { ◦ Calculate betweenness of all edges; ◦ Remove one with highest betweenness, breaking ties arbitrarily; } Until no edges left. Remarks: ◦ Which betweenness score? ◦ Calculate upfront and reuse or recalculate? ◦ Can we incrementally recalculate after each edge removal? ◦ Related algorithms for node betweenness by Newman and Brandes.

10 A Real Example (Zachary’s Karate Club) With recalculation of betweenness. Without recalculation of betweenness.

11 Scalability Issues

12 Computing edge betweenness An Example b a d c f e g Breadth-first search – means for doing many things. Compute #geodesics from every node to g.

13 Computing edge betweenness An Example b a d c f e g Breadth-first search – means for doing many things. d=0 w=1

14 Computing edge betweenness An Example b a d c f e g Breadth-first search – means for doing many things. d=0 w=1 d=1 w=1 d=1 w=1

15 Computing edge betweenness An Example b a d c f e g Breadth-first search – means for doing many things. d=0 w=1 d=1 w=1 d=1 w=1 d=2 w=2 d=2 w=2 d=2 w=2

16 Computing edge betweenness An Example b a d c f e g Breadth-first search – means for doing many things. d=0 w=1 d=1 w=1 d=1 w=1 d=2 w=2 d=2 w=2 d=2 w=2 d=3 w=4 Have all info. we need for edge betweenness now.

17 Computing edge betweenness An Example b a d c f e g Breadth-first search – means for doing many things. d=0 w=1 d=1 w=1 d=1 w=1 d=2 w=2 d=2 w=2 d=2 w=2 d=3 w=4 Note: a and f are like leaves: no geodesic to g from other nodes passes through them. 2/4 1/2

18 Computing edge betweenness An Example b a d c f e g Breadth-first search – means for doing many things. d=0 w=1 d=1 w=1 d=1 w=1 d=2 w=2 d=2 w=2 d=2 w=2 d=3 w=4 Note: a and f are like leaves: no geodesic to g from other nodes passes through them. 2/4 ½(1+2/4) 1/2

19 Computing edge betweenness An Example b a d c f e g Breadth-first search – means for doing many things. d=0 w=1 d=1 w=1 d=1 w=1 d=2 w=2 d=2 w=2 d=2 w=2 d=3 w=4 Note: a and f are like leaves: no geodesic to g from other nodes passes through them. 2/4 ½(1+2/4) 1/2 1/1[ 1+½(1+2/4)+1/2(1+2/4)+1/2]

20 EB Computation summary

21 EB Computation summary (contd.)

22 EB computation – complexity analysis

23 On scaling up CD algorithm Point to ponder!

24 Closing Remarks 1/2 Newman also proposed other bases for defining edge betweenness. Electrical current flow through the edge where every edge is viewed as unit resistance and we consider all source-sink pairs. Based on random walks. Both less effective and more expensive than geodesics (see paper for details). What about directed and weighted cases?

25 Closing Remarks 2/2 Goodness metric of community division. Helpful when we don’t know the ground truth. Q = ∑ i (e ii – a i 2 ), where E kxk = matrix of community division: e ij = fraction of edges linking comm. i to comm. j; a i = ∑ j e ij. Q measures fraction of intra-comm. edges over what is expected by chance (assuming uniform distribution). See paper for details of experimental results. Turns out study of influence/information propagation can suggest new ways of detecting communities: will revisit this issue after we study influence propagation.

26 Recommended Reading J. Ruan and W. Zhang. An Efcient Spectral Algorithm for Network Community Discovery and Its Applications to Biological and Social Networks. ICDM 2007. M. E. J. Newman "Modularity and community structure in networks", physics/0602124 = Proceedings of the National Academy of Sciences (USA) 103 (2006): 87577—8582.physics/0602124 Jure Leskovec, Kevin J. Lang, and Michael W. Mahoney. Empirical Comparison of Algorithms for Network Community Detection. WWW 2010. M. E. J. Newman. Communities, modules and large-scale structure in networks. Nature Physics 8, 25–31 (2012) doi:10.1038/nphys2162 Received 23 September 2011 Accepted 04 November 2011 Published online 22 December 2011.


Download ppt "Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69, 026113."

Similar presentations


Ads by Google