Community Detection Algorithm and Community Quality Metric Mingming Chen & Boleslaw K. Szymanski Department of Computer Science Rensselaer Polytechnic.

Slides:



Advertisements
Similar presentations
Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Advertisements

Social network partition Presenter: Xiaofei Cao Partick Berg.
ICDE 2014 LinkSCAN*: Overlapping Community Detection Using the Link-Space Transformation Sungsu Lim †, Seungwoo Ryu ‡, Sejeong Kwon§, Kyomin Jung ¶, and.
Modularity and community structure in networks
ICCV 2007 tutorial Part III Message-passing algorithms for energy minimization Vladimir Kolmogorov University College London.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
CS774. Markov Random Field : Theory and Application Lecture 17 Kyomin Jung KAIST Nov
Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Lecture 21: Spectral Clustering
Tirgul 12 Algorithm for Single-Source-Shortest-Paths (s-s-s-p) Problem Application of s-s-s-p for Solving a System of Difference Constraints.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Jierui Xie, Boleslaw Szymanski, Mohammed J. Zaki Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180, USA {xiej2, szymansk,
Fast algorithm for detecting community structure in networks.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
HCC class lecture 22 comments John Canny 4/13/05.
The Role of Specialization in LDPC Codes Jeremy Thorpe Pizza Meeting Talk 2/12/03.
Models of Influence in Online Social Networks
Social Network Analysis via Factor Graph Model
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
A Distributed and Privacy Preserving Algorithm for Identifying Information Hubs in Social Networks M.U. Ilyas, Z Shafiq, Alex Liu, H Radha Michigan State.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Popularity versus Similarity in Growing Networks Fragiskos Papadopoulos Cyprus University of Technology M. Kitsak, M. Á. Serrano, M. Boguñá, and Dmitri.
Community detection algorithms: a comparative analysis Santo Fortunato.
Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI ,
Random Walk with Restart (RWR) for Image Segmentation
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
1 CS612 Algorithms for Electronic Design Automation CS 612 – Lecture 8 Lecture 8 Network Flow Based Modeling Mustafa Ozdal Computer Engineering Department,
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Discovering Community Structure
Uncovering Overlap Community Structure in Complex Networks using Particle Competition Fabricio A. Liang
An Efficient Algorithm for Enumerating Pseudo Cliques Dec/18/2007 ISAAC, Sendai Takeaki Uno National Institute of Informatics & The Graduate University.
Concept Switching Azadeh Shakery. Concept Switching: Problem Definition C1C2Ck …
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
SpeakEasy: Algorithm for Robust Community Detection
Communities. Questions 1.What is a community (intuitively)? Examples and fundamental hypothesis 2.What do we really mean by communities? Basic definitions.
Network Community Behavior to Infer Human Activities.
Community Detection Algorithms: A Comparative Analysis Authors: A. Lancichinetti and S. Fortunato Presented by: Ravi Tiwari.
COMMUNITY DISCOVERY PART 1: A (BRIEF) INTRODUCTION Giulio Rossetti WMA - 4 May 2015.
Comparison of Tarry’s Algorithm and Awerbuch’s Algorithm CS 6/73201 Advanced Operating System Presentation by: Sanjitkumar Patel.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Brief Announcement : Measuring Robustness of Superpeer Topologies Niloy Ganguly Department of Computer Science & Engineering Indian Institute of Technology,
March 3, 2009 Network Analysis Valerie Cardenas Nicolson Assistant Adjunct Professor Department of Radiology and Biomedical Imaging.
CS 590 Term Project Epidemic model on Facebook
Community detection via random walk Draft slides.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Network Theory: Community Detection Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale.
1 An Efficient Optimal Leaf Ordering for Hierarchical Clustering in Microarray Gene Expression Data Analysis Jianting Zhang Le Gruenwald School of Computer.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Distributed, Self-stabilizing Placement of Replicated Resources in Emerging Networks Bong-Jun Ko, Dan Rubenstein Presented by Jason Waddle.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Statistics and Computing, Dec. 2007, Vol. 17, No.
Department of Computer and IT Engineering University of Kurdistan Social Network Analysis Communities By: Dr. Alireza Abdollahpouri.
Graph clustering to detect network modules
Yu Wang1, Gao Cong2, Guojie Song1, Kunqing Xie1
Cohesive Subgraph Computation over Large Graphs
Discrete ABC Based on Similarity for GCP
The minimum cost flow problem
by Hyunwoo Park and Kichun Lee Knowledge-Based Systems 60 (2014) 58–72
Greedy Algorithm for Community Detection
MST in Log-Star Rounds of Congested Clique
Network Science: A Short Introduction i3 Workshop
Hidden Markov Models Part 2: Algorithms
Resolution Limit in Community Detection
Department of Computer Science University of York
Noémi Gaskó, Rodica Ioana Lung, Mihai Alexandru Suciu
Computer Vision Chapter 4
3.3 Network-Centric Community Detection
Presentation transcript:

Community Detection Algorithm and Community Quality Metric Mingming Chen & Boleslaw K. Szymanski Department of Computer Science Rensselaer Polytechnic Institute

Community Structure  Many networks display community structure  Groups of nodes within which connections are denser than between them Community detection algorithms Community quality metrics

Two Related Community Detection Topics  Community detection algorithm  LabelRank: a stabilized label propagation community detection algorithm  LabelRankT: extended algorithm for dynamic networks based on LabelRank  A new community quality metric solving two problems of Modularity M. E. J. Newman, 2006; Newman and Girvan, Xie, Chen, and Symanski, Xie and Symanski, 2013.

LabelRank Algorithm  Four operators applied to the labels  Label propagation operator  Inflation operator  Cutoff operator  Conditional update operator Question: NP=P ? Node 1: No; Node 2: No; Node 3: No; Node 4: Yes. P 1 (No)=3/4; P 1 (Yes)=1/4. Node 1: No. No Yes 97 P 1 (No)=3/100; P 1 (Yes)=97/100. Node 1: Yes.

Label Propagation Operator  where W is the n x n weighted adjacent matrix. P is the n x n label probability distribution matrix which is composed of n (1 x n) row vectors P i, one for each node  Each element P i (c) holds the current estimation of probability of node i observing label, where C is the set of labels (here, suppose C={1, 2, …, n})  Ex. P i =(0.1, 0.2, …, 0.05, …)  To initialize P, each node is assigned a distribution of probabilities of all incoming edges

Label Propagation Operator  Each node receives the label probability distribution from its neighbors and computes the new distribution P 3 = (0.25, 0, 0.25, 0, 0, 0, 0.25, 0.25, 0, 0) P 2 = (0.25, 0.25, 0, 0, 0.25, 0.25, 0, 0, 0, 0)P 4 = (0.25, 0, 0, 0.25, 0, 0, 0, 0, 0.25, 0.25) P 1 = (0.25, 0.25, 0.25, 0.25, 0, 0, 0, 0, 0, 0) P 1 = (0.25, 0.125, 0.125, 0.125, , , , , , )

Inflation Operator  Each element P i (c) rises to the in th power:  It increases probabilities of labels with high probability but decreases that of labels with low probabilities during label propagation. P 1 = (0.25, 0.125, 0.125, 0.125, , , , , , ) P 1 = (0.129, , , , , , , , , )

Cutoff Operator  The cutoff operator on P removes labels that are below the threshold with the help from Inflation Operator that decreases probabilities of labels with low probabilities during propagation.  efficiently reduces the space complexity from quadratic to linear. P 1 = (0.129, , , , , , , , , ) P 1 = (0.129) With r = 0.1, the average number of labels in each node is less than 3.

Conditional Update Operator  At each iteration, it updates a node i only when it is significantly different from its incoming neighbors in terms of labels: where is the set of maximum probability labels at node i at the last step. returns 1 if and 0 otherwise. k i is the node degree and q ∈ [0,1].  isSubset can be viewed as a measure of similarity between two nodes.

Effect of Conditional Update Operator

Running time of LabelRank  O(Tm): m is the number of edges and T is the number of iterations. LabelRank is a linear algorithm

Performance of LabelRank

LabelRankT  It is a LabelRank with one extra conditional update rule by which only nodes involved changes will be updated. Changes are handled by comparing neighbors of node i at two consecutive steps, and.

Two Problems of Modularity Maximization  Split large communities  Favor small communities  Resolution limit problem  Modularity optimization may fail to discover communities smaller than a scale even in cases where communities are unambiguously defined.  This scale depends on the total number of edges in the network and the degree of interconnectedness of the communities.  Favor large communities Fortunato et al, 2008; Li et al, 2008; Arenas et al, 2008; Berry et al, 2009; Good et al, 2010; Ronhovde et al, 2010; Fortunato, 2010; Lancichinetti et al, 2011; Traag et al, 2011; Darst et al, 2013.

Modularity  Modularity (Q): the fraction of edges falling within communities minus the expected value in an equivalent network with edges placed at random  Equivalent definition M. E. J. Newman, Newman and Girvan, 2004.

Modularity with Split Penalty  Modularity (Q): the modularity of the community detection result  Split penalty (SP): the fraction of edges that connect nodes of different communities  Q s = Q – SP: solving the problem, favoring small communities, of Modularity

Q s with Community Density  Resolution limit: Modularity optimization may fail to detect communities smaller than a scale  Intuitively, put density into Modularity and Split Penalty to solve the resolution limit problem  Equivalent definition

Example of Two Well-Separated Communities Modularity (Q)Split Penalty (SP)Q s = Q – SPQ ds 2 communities community

Example of Two Weakly Connected Communities Modularity (Q)Split Penalty (SP)Q s = Q – SPQ ds 2 communities community

Ambiguity between One and Two Communities Modularity (Q)Split Penalty (SP)Q s = Q – SPQ ds 2 communities community

Ambiguity between One and Two Communities Modularity (Q)Split Penalty (SP)Q s = Q – SPQ ds 2 communities community

Example of One Well Connected Community Modularity (Q)Split Penalty (SP)Q s = Q – SPQ ds 2 communities community

Example of One Very Well Connected Community Modularity (Q)Split Penalty (SP)Q s = Q – SPQ ds 2 communities community

Example of One Complete Graph Community Quality on a complete graph with 8 nodes Modularity (Q)Split Penalty (SP)Q s = Q – SPQ ds 2 communities community0000

Modularity Has Nothing to Do with #Nodes

5-clique Example Modularity (Q)Split Penalty (SP)Q s = Q – SPQ ds 30 communities communities ∆Q s =( )= > ∆Q=( )=0.0121

Thanks! Q & A

Example of Two Weakly Connected Communities Modularity (Q)Split Penalty (SP)Q s = Q – SPQ ds 2 communities community