Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discovering Community Structure

Similar presentations


Presentation on theme: "Discovering Community Structure"— Presentation transcript:

1 Discovering Community Structure
by Optimizing Community Quality Metrics Mingming Chen Department of Computer Science Rensselaer Polytechnic Institute 09/28/2015

2 Community Structure Community is the basic structure in many networks
Community is a group of people that are similar to each other Community is a group of nodes that are more densely connected with each other than to the rest of the network Disjoint community structure Each node belongs to one and only one community Overlapping community structure Each node belongs to one or more communities Disjoint Overlapping

3 Community Structure Strong community: each node has more connections inside its community than with the rest of the network Weak community: the total internal degree of c exceeds its total external degree Radicchi et al., Proc. Natl. Acad. Sci. 101, 2658–2663 (2004)

4 Zachary’s Karate Club Network
Nodes: Club members Edges: Interactions

5 American College Football Network
NCAA conferences Nodes: Football teams Edges: Games played

6 Collaboration Network between Scientists
at Santa Fe Institute Research divisions Nodes: Scientists Edges: Collaboration

7 Facebook Network High school Summer internship Stanford (Basketball)
Social communities High school Summer internship Stanford (Basketball) Stanford (Squash) Nodes: Facebook users Edges: Friendships

8 Protein-Protein Interactions
Functional modules Nodes: Proteins Edges: Physical interactions

9 A New Community Quality Metric
Modularity Density: A New Community Quality Metric Mingming Chen, Tommy Nguyen, and Boleslaw K. Szymanski, “A New Metric for Quality of Network Community Structure”, ASE Human Journal, vol. 2, no. 4, Sep. 2013, pp Mingming Chen, Tommy Nguyen, and Boleslaw K. Szymanski, “On Measuring the Quality of a Network Community Structure”, The ASE/IEEE International Conference on Social Computing (SocialCom), Washington D.C., Sep. 2013, pp

10 What Makes a Good Community Structure?
Part I What Makes a Good Community Structure?

11 Community Quality Metrics
Part I Community Quality Metrics How to measure the quality of the community structure found with community detection algorithms Community quality metrics: modularity

12 Part I Modularity Modularity (Q): the fraction of edges inside the communities minus the expected value in an equivalent network with edges placed at random Newman and Girvan, Phys. Rev. E 69, (2004) Newman, Proc. Natl. Acad. Sci. 103, 8577–8582 (2006)

13 Two Problems of Modularity Maximization
Part I Two Problems of Modularity Maximization In some cases, it splits large communities by favoring small communities In other cases, it favors large communities by failing to discover communities smaller than a certain size even when such communities are well defined This size depends on the total number of edges in the network and the degree of interconnectedness of communities Also known as the resolution limit problem of modularity Fortunato et al., 2008; Li et al., 2008; Arenas et al., 2008; Berry et al., 2009; Good et al., 2010; Ronhovde et al., 2010; Fortunato, 2010; Lancichinetti et al., 2011; Traag et al., 2011; Darst et al., 2013.

14 Multi-resolution Modularity
Part I Multi-resolution Modularity Multi-resolution modularity (Qλ): introduce the resolution parameter λ into modularity High values of λ lead to smaller communities Low values of λ lead to larger communities Lancichinetti and Fortunato, Phys. Rev. E 84, (2011)

15 Multi-resolution Modularity
Qλ still suffers from the two opposite yet coexisting issues Favoring small communities: e.g. split random graph Resolution limit problem: e.g. merge loosely connected cliques Often very difficult and impossible to tune the resolution parameter so as to avoid both problems simultaneously Heterogonous distribution of community sizes Lancichinetti and Fortunato, Phys. Rev. E 84, (2011) Schematic network with a random subgraph and two cliques

16 Modularity with Split Penalty
Part I Modularity with Split Penalty Split penalty (SP): the fraction of edges that connect nodes of different communities Qs = Q – SP: solving the problem of favoring small communities of modularity

17 Qs with Community Density: Modularity Density
Part I Qs with Community Density: Modularity Density Supplement both modularity and split penalty with edge densities to arrive at Modularity Density Modularity Density solves both problems of modularity: the resolution limit problem and favoring small communities problem

18 Two Very Well-Separated Communities
Part I Two Very Well-Separated Communities Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds 2 communities 0.5 1 community 0.245

19 Two Well-Separated Communities
Part I Two Well-Separated Communities Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds 2 communities 0.357 0.143 0.214 0.339 1 community 0.25

20 Two Weakly Connected Communities
Part I Two Weakly Connected Communities Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds 2 communities 0.3 0.2 0.1 0.263 1 community 0.249

21 Ambiguity between One and Two Communities
Part I Ambiguity between One and Two Communities Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds 2 communities 0.25 0.188 1 community 0.245

22 One Well Connected Community
Part I One Well Connected Community Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds 2 communities 0.167 0.333 -0.167 0.0417 1 community 0.23

23 One Very Well Connected Community
Part I One Very Well Connected Community Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds 2 communities 0.0455 0.455 -0.409 -0.239 1 community 0.168

24 Community quality on a complete graph with 8 nodes
Part I One Complete Graph Community quality on a complete graph with 8 nodes Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds 2 communities 0.571 -0.643 1 community

25 Modularity Has Nothing to Do with #Nodes
Part I Modularity Has Nothing to Do with #Nodes

26 Example of Resolution Limit Problem
Part I Example of Resolution Limit Problem Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds 30 communities 0.8758 0.7848 0.8721 15 communities 0.8879 0.8424 0.4305 ∆Qs=( )= > ∆Q=( )=0.0121

27 Proof of Solving Resolution Limit Problem
Part I Proof of Solving Resolution Limit Problem (a) Modularity density Qds does not merge two or more consecutive clqiues. (b) Qds does not merge two small communities.

28 Proof of Solving the Two Problems
Part I Proof of Solving the Two Problems Modularity density does not split the random subgraph Modularity density does not merge the two cliques Schematic network with a random subgraph and two cliques

29 Other Community Quality Metrics
Part I Other Community Quality Metrics The number of Intra-edges: Contraction: , average number of edges per node inside community c The number of Inter-edges: Expansion: , average number of edges per node that point outside community c Conductance: , the fraction of the total number of edges that point outside community c

30 Evaluation and Analysis
Part I Evaluation and Analysis Senate dataset Totally 111 snapshots over 220 years Nodes are senators; weight on the edge between two senators is the fraction of times they voted similarly Reality mining Bluetooth scan data Each week is a snapshot, totally 43 snapshots Nodes are subjects; weight on the edge is the number of Bluetooth scans between two subjects Best value of parameter q (0.05≤q≤0.95) of LabelRankT compared with Estrangement Q Qs Qds #Intra-edges Contraction #Inter-edges Expansion Conductance Senate 0.2 0.6 0.7, 0.8 Reality mining 0.5 0.7

31 Summary Modularity density solves the two issues simultaneously without the trouble of specifying any particular parameter We demonstrated with proofs and experiments on real dynamic datasets that modularity density is an effective alternative to modularity. Modularity density for overlapping community structure Mingming Chen, Konstantin Kuzmin, and Boleslaw Szymanski, “Extension of Modularity Density for Overlapping Community Structure”, IEEE/ACM ASONAM Workshop on Social Network Analysis in Applications (SNAA), Beijing, China, Aug. 2014, pp Mingming Chen and Boleslaw K. Szymanski, “Fuzzy Overlapping Community Quality Metrics”, Social Network Analysis and Mining 5:40, Jul

32 Thanks! Q & A

33 Fine-tuned Disjoint Community
Detection Algorithm Mingming Chen, Konstantin Kuzmin, and Boleslaw Szymanski, ``Community Detection via Maximization of Modularity and Its Variants'', IEEE transactions on Computational Social Systems 1(1) Mar. 2014, pp

34 Introduction Optimize community quality metrics to detect communities
Part II Introduction Optimize community quality metrics to detect communities Community quality metrics: modularity and modularity density Modularity optimization methods Greedy algorithms Spectral algorithms Fine-tuned disjoint community detection algorithm Iteratively tries to improve the quality metrics by splitting and merging the given community structure Combines both greedy and spectral methods, but a little more sophisticated

35 How to Find Communities: Splitting and Merging
Part II How to Find Communities: Splitting and Merging Spectral algorithm (top down): split the network (as a whole community) until each node is a community of itself Greedy algorithm (bottom up): merge two communities until there is only a single community left

36 Spectral Partitioning: Laplacian Matrix
Part II Spectral Partitioning: Laplacian Matrix 1 2 3 4 5 6 -1 Laplacian matrix (L) |V|  |V| symmetric matrix What is trivial eigenpair? 𝒙=(𝟏,…,𝟏) then 𝑳⋅𝒙=𝟎 and so 𝝀=𝝀 𝟏 =𝟎 Important properties: Eigenvalues are non-negative real numbers Eigenvectors are real and orthogonal 1 3 2 5 4 6 𝑳 = 𝑫 − 𝑨

37 How to Split a Community?
Part II How to Split a Community? C1 C Approximation C2 Fiedler vector Fiedler vector: the eigenvector of the Laplacian matrix corresponding to the second smallest eigenvalue Split: put the nodes corresponding to the positive values of the Fiedler vector into one group and the other nodes into the other group

38 Example: Spectral Partitioning
Part II Example: Spectral Partitioning Components of x2 Value of x2 Rank in x2

39 G Fine-tuned Algorithm
Part II Fine-tuned Algorithm Iteratively split and merge the community structure until doing so cannot improve the community quality metrics Split stage Merging stage COMMUN I T Y S R U C E G

40 Split Stage Clique Based on Fiedler vector Fiedler vector
Part II Split Stage Clique Clique Based on Fiedler vector Sort its elements in decreasing (or increasing) order, then cut them into two groups in each of the |c| - 1 possible ways Choose the one that improves the metric the largest X2 = [ ] Fiedler vector 0.8 0.03

41 Evaluation and Analysis
Part II Evaluation and Analysis Three community detection algorithms Greedy Q*, greedy algorithm of modularity maximization Fine-tuned Q, fine-tuned algorithm to maximize modularity Fine-tuned Qds, fine-tuned algorithm to optimize modularity density Metrics with ground truth community structure Information theory based metrics Variation of Information (VI), Normalized Mutual Information (NMI) Cluster matching based metrics F-measure, Normalized Van Dongen metric (NVD) Pair counting based metrics Rand Index (RI), Adjusted Rand Index (ARI), Jaccard Index (JI) Network datasets Zachary's karate club network American college football network Clique network for resolution limit problem LFR benchmark networks (0.1 ≤μ≤0.5) *Clauset et al., Phys. Rev. E 70, (2004)

42 Zachary’s Karate Club Network
Part II Zachary’s Karate Club Network

43 American College Football Network
Part II American College Football Network American college football network: the schedule of games between American college football teams in a single season 115 nodes and 613 edges 12 ground truth communities

44 Part II 12 communities 7 communities 9 communities 12 communities

45 Part II Clique Network

46 LFR Benchmark Networks
Part II LFR Benchmark Networks μ is the mixing parameter. Low values of μ indicate strong community structure.

47 LFR Benchmark Networks
Part II LFR Benchmark Networks

48 Part II Summary Fine-tuned Qds performs the best among all the three algorithms, followed by fine-tuned Q, and both are much more effective than Greedy Q Fine-tuned Qds can be used to significantly improve the community detection results of other algorithms All the seven quality metrics based on ground truth community structure are consistent with Qds, but not consistent with Q Superiority of Qds over Q as a community quality metric

49 Resources Papers and book chapters worth to read
S. Fortunato, “Community detection in graphs,” Physics Reports, vol. 486, pp. 75–174, 2010. J. Xie, S. Kelley, and B. K. Szymanski, “Overlapping community detection in networks: The state-of-the-art and comparative study,” ACM Comput. Surv., vol. 45, no. 4, pp. 43:1–43:35, Aug M. Chen, K. Kuzmin, and B. K. Szymanski, “Community detection via maximization of modularity and its variants,” IEEE Trans. Comput. Soc. Syst., vol. 1, no. 1, pp. 46–65, 2014. Community Detection Algorithms Greedy Q or Fast Modularity: Fine-tuned Algorithm: Louvain Algorithm: GANXiS: CFinder: Datasets: Visualization tools Gephi: Cytoscape:

50 Thanks! Q & A


Download ppt "Discovering Community Structure"

Similar presentations


Ads by Google