Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69, 026113.

Slides:



Advertisements
Similar presentations
Clustering II.
Advertisements

Class 12: Communities Network Science: Communities Dr. Baruch Barzel.
Fast algorithm for detecting community structure in networks M. E. J. Newman Department of Physics and Center for the Study of Complex Systems, University.
Social network partition Presenter: Xiaofei Cao Partick Berg.
Cluster Analysis: Basic Concepts and Algorithms
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Hierarchical Clustering, DBSCAN The EM Algorithm
An Evaluation of Community Detection Algorithms on Large-Scale Traffic 1 An Evaluation of Community Detection Algorithms on Large-Scale Traffic.
Jure Leskovec (Stanford) Kevin Lang (Yahoo! Research) Michael Mahoney (Stanford)
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research.
Graph Partitioning Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta, Michael Mahoney Yahoo! Research.
3.3 Network-Centric Community Detection
1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.
V4 Matrix algorithms and graph partitioning
Network Analysis Max Hinne Social Networks 6/1/20152Network Analysis.
Clustering II.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Fast algorithm for detecting community structure in networks.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
A scalable multilevel algorithm for community structure detection
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Clustering Unsupervised learning Generating “classes”
Graph Partitioning Donald Nguyen October 24, 2011.
Community Detection by Modularity Optimization Jooyoung Lee
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Eric Horvitz, Michael Mahoney,
Vladyslav Kolbasin Stable Clustering. Clustering data Clustering is part of exploratory process Standard definition:  Clustering - grouping a set of.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
CSE5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides.
Communities. Questions 1.What is a community (intuitively)? Examples and fundamental hypothesis 2.What do we really mean by communities? Basic definitions.
Network Community Behavior to Infer Human Activities.
Measuring Behavioral Trust in Social Networks
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
University at BuffaloThe State University of New York Detecting Community Structure in Networks.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Community detection via random walk Draft slides.
Finding community structure in very large networks
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Informatics tools in network science
Network Theory: Community Detection Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.
Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.
Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.
Department of Computer and IT Engineering University of Kurdistan Social Network Analysis Communities By: Dr. Alireza Abdollahpouri.
Graph clustering to detect network modules
Social Media Analytics
Groups of vertices and Core-periphery structure
School of Computing Clemson University Fall, 2012
Data Mining K-means Algorithm
Greedy Algorithm for Community Detection
Community detection in graphs
Peer-to-Peer and Social Networks
Finding modules on graphs
Michael L. Nelson CS 495/595 Old Dominion University
Discovering Functional Communities in Social Media
Statistical properties of network community structure
Overcoming Resolution Limits in MDL Community Detection
CS224w: Social and Information Network Analysis
Clustering The process of grouping samples so that the samples are similar within each group.
Hierarchical Clustering
Presentation transcript:

Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69, (2004). M. E. J. Newman. Fast algorithm for detecting community structure in networks. Physical Review E 69, (2004).

The Problem Can we partition the network into groups s.t. the inter-group edges are sparse while the intra-group edges are dense? Why is it interesting/useful? ◦ Understanding comm. structure – means to understanding n/w structure. ◦ Graph partitioning – similar problem; graph of processes, edges=communication; assign sub- graphs to processors to minimize inter-processor comm. & balance processor load. (NP-hard in general.) ◦ Diff. w/ graph partitioning.

An Example with Three Communities

A Hierarchical Clustering Approach

Community detection via hierarchical clustering Compute all pairwise node similarities for every edge present. Repeatedly add edges with greatest similarity.  leads to a tree (called dendrogram). A slice throguh the dendogram represents a clustering or comm. structure.

Dendrogram example

Limitations of HC approach “Misplaces” nodes in the periphery. E.g.: Which community should 5 belong to?  Alternative approach based on “edge betweenness”

Key Intuition An inter-comm. edge has a higher “betweenness” compared to an intra- comm. edge, i.e., more paths between node pairs pass through it. Start with G. Repeatedly remove edges with highest betweenness until. Communities = resulting components.

Basic Algorithm repeat { ◦ Calculate betweenness of all edges; ◦ Remove one with highest betweenness, breaking ties arbitrarily; } Until no edges left. Remarks: ◦ Which betweenness score? ◦ Calculate upfront and reuse or recalculate? ◦ Can we incrementally recalculate after each edge removal? ◦ Related algorithms for node betweenness by Newman and Brandes.

A Real Example (Zachary’s Karate Club) With recalculation of betweenness. Without recalculation of betweenness.

Scalability Issues

Computing edge betweenness An Example b a d c f e g Breadth-first search – means for doing many things. Compute #geodesics from every node to g.

Computing edge betweenness An Example b a d c f e g Breadth-first search – means for doing many things. d=0 w=1

Computing edge betweenness An Example b a d c f e g Breadth-first search – means for doing many things. d=0 w=1 d=1 w=1 d=1 w=1

Computing edge betweenness An Example b a d c f e g Breadth-first search – means for doing many things. d=0 w=1 d=1 w=1 d=1 w=1 d=2 w=2 d=2 w=2 d=2 w=2

Computing edge betweenness An Example b a d c f e g Breadth-first search – means for doing many things. d=0 w=1 d=1 w=1 d=1 w=1 d=2 w=2 d=2 w=2 d=2 w=2 d=3 w=4 Have all info. we need for edge betweenness now.

Computing edge betweenness An Example b a d c f e g Breadth-first search – means for doing many things. d=0 w=1 d=1 w=1 d=1 w=1 d=2 w=2 d=2 w=2 d=2 w=2 d=3 w=4 Note: a and f are like leaves: no geodesic to g from other nodes passes through them. 2/4 1/2

Computing edge betweenness An Example b a d c f e g Breadth-first search – means for doing many things. d=0 w=1 d=1 w=1 d=1 w=1 d=2 w=2 d=2 w=2 d=2 w=2 d=3 w=4 Note: a and f are like leaves: no geodesic to g from other nodes passes through them. 2/4 ½(1+2/4) 1/2

Computing edge betweenness An Example b a d c f e g Breadth-first search – means for doing many things. d=0 w=1 d=1 w=1 d=1 w=1 d=2 w=2 d=2 w=2 d=2 w=2 d=3 w=4 Note: a and f are like leaves: no geodesic to g from other nodes passes through them. 2/4 ½(1+2/4) 1/2 1/1[ 1+½(1+2/4)+1/2(1+2/4)+1/2]

EB Computation summary

EB Computation summary (contd.)

EB computation – complexity analysis

On scaling up CD algorithm Point to ponder!

Closing Remarks 1/2 Newman also proposed other bases for defining edge betweenness. Electrical current flow through the edge where every edge is viewed as unit resistance and we consider all source-sink pairs. Based on random walks. Both less effective and more expensive than geodesics (see paper for details). What about directed and weighted cases?

Closing Remarks 2/2 Goodness metric of community division. Helpful when we don’t know the ground truth. Q = ∑ i (e ii – a i 2 ), where E kxk = matrix of community division: e ij = fraction of edges linking comm. i to comm. j; a i = ∑ j e ij. Q measures fraction of intra-comm. edges over what is expected by chance (assuming uniform distribution). See paper for details of experimental results. Turns out study of influence/information propagation can suggest new ways of detecting communities: will revisit this issue after we study influence propagation.

Recommended Reading J. Ruan and W. Zhang. An Efcient Spectral Algorithm for Network Community Discovery and Its Applications to Biological and Social Networks. ICDM M. E. J. Newman "Modularity and community structure in networks", physics/ = Proceedings of the National Academy of Sciences (USA) 103 (2006): 87577—8582.physics/ Jure Leskovec, Kevin J. Lang, and Michael W. Mahoney. Empirical Comparison of Algorithms for Network Community Detection. WWW M. E. J. Newman. Communities, modules and large-scale structure in networks. Nature Physics 8, 25–31 (2012) doi: /nphys2162 Received 23 September 2011 Accepted 04 November 2011 Published online 22 December 2011.