# Community Detection Algorithm and Community Quality Metric Mingming Chen & Boleslaw K. Szymanski Department of Computer Science Rensselaer Polytechnic.

## Presentation on theme: "Community Detection Algorithm and Community Quality Metric Mingming Chen & Boleslaw K. Szymanski Department of Computer Science Rensselaer Polytechnic."— Presentation transcript:

Community Detection Algorithm and Community Quality Metric Mingming Chen & Boleslaw K. Szymanski Department of Computer Science Rensselaer Polytechnic Institute

Community Structure  Many networks display community structure  Groups of nodes within which connections are denser than between them Community detection algorithms Community quality metrics

Two Related Community Detection Topics  Community detection algorithm  LabelRank: a stabilized label propagation community detection algorithm  LabelRankT: extended algorithm for dynamic networks based on LabelRank  A new community quality metric solving two problems of Modularity M. E. J. Newman, 2006; Newman and Girvan, 2004. Xie, Chen, and Symanski, 2013. Xie and Symanski, 2013.

LabelRank Algorithm  Four operators applied to the labels  Label propagation operator  Inflation operator  Cutoff operator  Conditional update operator 2 4 1 3 1 1 1 1 Question: NP=P ? Node 1: No; Node 2: No; Node 3: No; Node 4: Yes. P 1 (No)=3/4; P 1 (Yes)=1/4. Node 1: No. No Yes 97 P 1 (No)=3/100; P 1 (Yes)=97/100. Node 1: Yes.

Label Propagation Operator  where W is the n x n weighted adjacent matrix. P is the n x n label probability distribution matrix which is composed of n (1 x n) row vectors P i, one for each node  Each element P i (c) holds the current estimation of probability of node i observing label, where C is the set of labels (here, suppose C={1, 2, …, n})  Ex. P i =(0.1, 0.2, …, 0.05, …)  To initialize P, each node is assigned a distribution of probabilities of all incoming edges

Label Propagation Operator  Each node receives the label probability distribution from its neighbors and computes the new distribution P 3 = (0.25, 0, 0.25, 0, 0, 0, 0.25, 0.25, 0, 0) P 2 = (0.25, 0.25, 0, 0, 0.25, 0.25, 0, 0, 0, 0)P 4 = (0.25, 0, 0, 0.25, 0, 0, 0, 0, 0.25, 0.25) P 1 = (0.25, 0.25, 0.25, 0.25, 0, 0, 0, 0, 0, 0) P 1 = (0.25, 0.125, 0.125, 0.125, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625)

Inflation Operator  Each element P i (c) rises to the in th power:  It increases probabilities of labels with high probability but decreases that of labels with low probabilities during label propagation. P 1 = (0.25, 0.125, 0.125, 0.125, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625) P 1 = (0.129, 0.0323, 0.0323, 0.0323, 0.00806, 0.00806, 0.00806, 0.00806, 0.00806, 0.00806)

Cutoff Operator  The cutoff operator on P removes labels that are below the threshold with the help from Inflation Operator that decreases probabilities of labels with low probabilities during propagation.  efficiently reduces the space complexity from quadratic to linear. P 1 = (0.129, 0.0323, 0.0323, 0.0323, 0.00806, 0.00806, 0.00806, 0.00806, 0.00806, 0.00806) P 1 = (0.129) With r = 0.1, the average number of labels in each node is less than 3.

Conditional Update Operator  At each iteration, it updates a node i only when it is significantly different from its incoming neighbors in terms of labels: where is the set of maximum probability labels at node i at the last step. returns 1 if and 0 otherwise. k i is the node degree and q ∈ [0,1].  isSubset can be viewed as a measure of similarity between two nodes.

Effect of Conditional Update Operator

Running time of LabelRank  O(Tm): m is the number of edges and T is the number of iterations. LabelRank is a linear algorithm

Performance of LabelRank

LabelRankT  It is a LabelRank with one extra conditional update rule by which only nodes involved changes will be updated. Changes are handled by comparing neighbors of node i at two consecutive steps, and.

Two Problems of Modularity Maximization  Split large communities  Favor small communities  Resolution limit problem  Modularity optimization may fail to discover communities smaller than a scale even in cases where communities are unambiguously defined.  This scale depends on the total number of edges in the network and the degree of interconnectedness of the communities.  Favor large communities Fortunato et al, 2008; Li et al, 2008; Arenas et al, 2008; Berry et al, 2009; Good et al, 2010; Ronhovde et al, 2010; Fortunato, 2010; Lancichinetti et al, 2011; Traag et al, 2011; Darst et al, 2013.

Modularity  Modularity (Q): the fraction of edges falling within communities minus the expected value in an equivalent network with edges placed at random  Equivalent definition M. E. J. Newman, 2006. Newman and Girvan, 2004.

Modularity with Split Penalty  Modularity (Q): the modularity of the community detection result  Split penalty (SP): the fraction of edges that connect nodes of different communities  Q s = Q – SP: solving the problem, favoring small communities, of Modularity

Q s with Community Density  Resolution limit: Modularity optimization may fail to detect communities smaller than a scale  Intuitively, put density into Modularity and Split Penalty to solve the resolution limit problem  Equivalent definition

Example of Two Well-Separated Communities Modularity (Q)Split Penalty (SP)Q s = Q – SPQ ds 2 communities0.50 1 community0000.245

Example of Two Weakly Connected Communities Modularity (Q)Split Penalty (SP)Q s = Q – SPQ ds 2 communities0.3570.1430.2140.339 1 community0000.25

Ambiguity between One and Two Communities Modularity (Q)Split Penalty (SP)Q s = Q – SPQ ds 2 communities0.30.20.10.263 1 community0000.249

Ambiguity between One and Two Communities Modularity (Q)Split Penalty (SP)Q s = Q – SPQ ds 2 communities0.25 00.188 1 community0000.245

Example of One Well Connected Community Modularity (Q)Split Penalty (SP)Q s = Q – SPQ ds 2 communities0.1670.333-0.1670.0417 1 community0000.23

Example of One Very Well Connected Community Modularity (Q)Split Penalty (SP)Q s = Q – SPQ ds 2 communities0.04550.455-0.409-0.239 1 community0000.168

Example of One Complete Graph Community Quality on a complete graph with 8 nodes Modularity (Q)Split Penalty (SP)Q s = Q – SPQ ds 2 communities-0.07140.571-0.643 1 community0000

Modularity Has Nothing to Do with #Nodes

5-clique Example Modularity (Q)Split Penalty (SP)Q s = Q – SPQ ds 30 communities0.87580.090910.78480.8721 15 communities0.88790.045450.84240.4305 ∆Q s =(0.8424-0.7848)=0.0576 > ∆Q=(0.8879-0.8758)=0.0121

Thanks! Q & A

Example of Two Weakly Connected Communities Modularity (Q)Split Penalty (SP)Q s = Q – SPQ ds 2 communities0.3090.250.05860.264 1 community-0.005860.125-0.1310.202

Download ppt "Community Detection Algorithm and Community Quality Metric Mingming Chen & Boleslaw K. Szymanski Department of Computer Science Rensselaer Polytechnic."

Similar presentations