Department of Computer and IT Engineering University of Kurdistan Social Network Analysis Communities By: Dr. Alireza Abdollahpouri.

Slides:



Advertisements
Similar presentations
Class 12: Communities Network Science: Communities Dr. Baruch Barzel.
Advertisements

Fast algorithm for detecting community structure in networks M. E. J. Newman Department of Physics and Center for the Study of Complex Systems, University.
Social network partition Presenter: Xiaofei Cao Partick Berg.
SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Cluster Analysis: Basic Concepts and Algorithms
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Modularity and community structure in networks
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
Community Detection Algorithm and Community Quality Metric Mingming Chen & Boleslaw K. Szymanski Department of Computer Science Rensselaer Polytechnic.
Graph Partitioning Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Introduction to Bioinformatics
V4 Matrix algorithms and graph partitioning
Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.
Identity and search in social networks Presented by Pooja Deodhar Duncan J. Watts, Peter Sheridan Dodds and M. E. J. Newman.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Fast algorithm for detecting community structure in networks.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Cluster Analysis: Basic Concepts and Algorithms
A scalable multilevel algorithm for community structure detection
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Clustering Unsupervised learning Generating “classes”
Community detection algorithms: a comparative analysis Santo Fortunato.
Uncovering Overlap Community Structure in Complex Networks using Particle Competition Fabricio A. Liang
Chapter 3. Community Detection and Evaluation May 2013 Youn-Hee Han
CSE5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides.
Clustering.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Communities. Questions 1.What is a community (intuitively)? Examples and fundamental hypothesis 2.What do we really mean by communities? Basic definitions.
Network Community Behavior to Infer Human Activities.
COMMUNITY DISCOVERY PART 1: A (BRIEF) INTRODUCTION Giulio Rossetti WMA - 4 May 2015.
University at BuffaloThe State University of New York Detecting Community Structure in Networks.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Community detection via random walk Draft slides.
Finding community structure in very large networks
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Network Theory: Community Detection Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale.
Selected Topics in Data Networking Explore Social Networks:
James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.
Hierarchical Organization in Complex Networks by Ravasz and Barabasi İlhan Kaya Boğaziçi University.
GUILLOU Frederic. Outline Introduction Motivations The basic recommendation system First phase : semantic similarities Second phase : communities Application.
Cmpe 588- Modeling of Internet Emergence of Scale-Free Network with Chaotic Units Pulin Gong, Cees van Leeuwen by Oya Ünlü Instructor: Haluk Bingöl.
Graph clustering to detect network modules
Unsupervised Learning
Cohesive Subgraph Computation over Large Graphs
Hierarchical Agglomerative Clustering on graphs
Computational Molecular Biology
Semi-Supervised Clustering
Clustering CSC 600: Data Mining Class 21.
Groups of vertices and Core-periphery structure
Data Mining K-means Algorithm
Frontiers of Network Science Class 15: Degree Correlations II
Greedy Algorithm for Community Detection
Community detection in graphs
TELCOM2125: Network Science and Analysis
Assessing Hierarchical Modularity in Protein Interaction Networks
Finding modules on graphs
Michael L. Nelson CS 495/595 Old Dominion University
Overcoming Resolution Limits in MDL Community Detection
Data Mining – Chapter 4 Cluster Analysis Part 2
Cluster Analysis.
Text Categorization Berlin Chen 2003 Reference:
Detecting Important Nodes to Community Structure
Unsupervised Learning
Presentation transcript:

Department of Computer and IT Engineering University of Kurdistan Social Network Analysis Communities By: Dr. Alireza Abdollahpouri

2 1.What is a community (intuitively)? Examples and fundamental hypothesis 2.What do we really mean by communities? Basic definitions 3.Graph partitioning and its computational complexity 4.Hierarchical clustering: Ravasz algorithm and its computational complexity 5.Hierarchical clustering: Girvan-Newman algorithm and its complexity 6.Hierarchy in real networks 7.Modularity Outline

3 Introduction

4 Introduction: Belgium

5 Same area as Massachusetts (~12,000 sq miles) Same population as Ohio (~11.5 millions ) Introduction: Belgium

6 V.D. Blondel et al, J. Stat. Mech. P10008 (2008). A.-L. Barabási, Network Science: Communities. Introduction: Belgium

7 Examples of communities

8 Zachary’s Karate Club

9 Citation history of the Zachary’s Karate club paper Zachary’s Karate Club

10 The first scientist at any conference on networks who uses Zachary's karate club as an example is inducted into the Zachary Karate Club Club, and awarded a prize. Chris Moore (9 May 2013). Mason Porter (NetSci, June 2013). Yong-Year Ahn (Oxford University, July 2013) Marián Boguñá (ECCS, September 2013). Mark Newman (Netsci, June 2014) Zachary’s Karate Club

11  Karate Club: Breakup of the club  Belgian Phone Data: Language spoken Auxiliary information

12 Basics of communities

13 We focus on the mesoscopic scale of the network Microscopic MesoscopicMacroscopic Communities

14 H1: A network’s community structure is uniquely encoded in its wiring diagram Fundamental Hypothesis

15 H2: Connectedness Hypothesis A community corresponds to a connected subgraph. H3: Density Hypothesis Communities correspond to locally dense neighborhoods of a network. Basics of Communities

16 H2: Connectedness Hypothesis A community corresponds to a connected subgraph. H3: Density Hypothesis Communities correspond to locally dense neighborhoods of a network. Basics of Communities

17 Cliques as communities A clique is a complete subgraph of k-nodes R.D. Luce & A.D. Perry, Psychometrika 14 (1949) A.-L. Barabási, Network Science: Communities. Basics of Communities

18 Triangles are frequent; larger cliques are rare. Communities do not necessarily correspond to complete subgraphs, as many of their nodes do not link directly to each other. Finding the cliques of a network is computationally rather demanding, being a so-called NP-complete problem. Cliques as communities Basics of Communities

19 Consider a connected subgraph C of N c nodes Internal degree, k i int : set of links of node i that connects to other nodes of the same community C. External degree k i ext : the set of links of node i that connects to the rest of the network. If k i ext =0: all neighbors of i belong to C, and C is a good community for i. If k i int =0, all neighbors of i belong to other communities, then i should be assigned to a different community. Strong and weak communities A.-L. Barabási, Network Science: Communities. Basics of Communities

20 Strong community: Each node of C has more links within the community than with the rest of the graph. Weak community: The total internal degree of C exceeds its total external degree, Clique Strong Weak A.-L. Barabási, Network Science: Communities. Basics of Communities

21 How many ways can we partition a network into 2 communities? Divide a network into two equal non-overlapping subgraphs, such that the number of links between the nodes in the two groups is minimized. Two subgroups of size n 1 and n 2. Total number of combinations: N=10  256 partitions (1 ms) N=100  partitions (10 21 years) Graph bisection A.-L. Barabási, Network Science: Communities. Number of Partitions

billion transistors partition the full wiring diagram of an integrated circuit into smaller subgraphs, so that they minimize the number of connections between them. Graph Partitioning Graph Partitions (history)

23 Kerninghan-Lin Algorithm for graph bisection Partition a network into two groups of predefined size. This partition is called cut. Inspect each a pair of nodes, one from each group. Identify the pair that results in the largest reduction of the cut size (links between the two groups) if we swap them Swap them. If no pair deduces the cut size, we swap the pair that increases the cut size the least. The process is repeated until each node is moved once. Graph Partitions (history)

24 Community detection The number and size of the communities are unknown at the beginning. Partition Division of a network into groups of nodes, so that each node belongs to one group. Bell Number: number of possible partitions of N nodes A.-L. Barabási, Network Science: Communities. Number of communities

25 Hierarchical Clustering

26 Agglomerative algorithms merge nodes and communities with high similarity. Divisive algorithms split communities by removing links that connect nodes with low similarity. 1. Build a similarity matrix for the network 2. Similarity matrix: how similar two nodes are to each other  we need to determine from the adjacency matrix 3. Hierarchical clustering iteratively identifies groups of nodes with high similarity, following one of two distinct strategies: Hierarchical tree or dendrogram: visualize the history of the merging or splitting process the algorithm follows. Horizontal cuts of this tree offer various community partitions. 4. Hierarchical Clustering

27 Step 1: Define the Similarity Matrix (Ravasz algorithm) High for node pairs that likely belong to the same community, low for those that likely belong to different communities. Nodes that connect directly to each other and/or share multiple neighbors are more likely to belong to the same dense local neighborhood, hence their similarity should be large. Topological overlap matrix: J N (i,j) : number of common neighbors of node i and j ; (+1) if there is a direct link between i and j ; Agglomerative algorithms merge nodes and communities with high similarity. Agglomerative Algorithms

28 Step 2: Decide Group Similarity Groups are merged based on their mutual similarity through single, complete or average cluster linkage Agglomerative Algorithms

29 Step 2: Decide Group Similarity. groups are merged based on their mutual similarity: single, complete and average cluster similarity Agglomerative Algorithms

30 Step 3: Apply Hierarchical Clustering Assign each node to a community of its own and evaluate the similarity for all node pairs. The initial similarities between these “communities” are simply the node similarities. Find the community pair with the highest similarity and merge them to form a single community. Calculate the similarity between the new community and all other communities. Repeat from Step 2 until all nodes are merged into a single community. Step 4: Build Dendrogram Describes the precise order in which the nodes are assigned to communities. E. Ravasz et al., Science 297 (2002). A.-L. Barabási, Network Science: Communities. Agglomerative Algorithms

31 Computational complexity: Step 1 (calculation similarity matrix): Step 2-3 (group similarity): Step 4 (dendrogram): E. Ravasz et al., Science 297 (2002). A.-L. Barabási, Network Science: Communities. Agglomerative Algorithms

32 Step 1: Define a Centrality Measure (Girvan-Newman algorithm) Link betweenness is the number of shortest paths between all node pairs that run along a link. Random-walk betweenness. A pair of nodes m and n are chosen at random. A walker starts at m, following each adjacent link with equal probability until it reaches n. Random walk betweenness x ij is the probability that the link i→j was crossed by the walker after averaging over all possible choices for the starting nodes m and n Divisive algorithms split communities by removing links that connect nodes with low similarity. M. Girvan & M.E.J. Newman, PNAS 99 (2002). A.-L. Barabási, Network Science: Communities. Divisive Algorithms

33 M. Girvan & M.E.J. Newman, PNAS 99 (2002). A.-L. Barabási, Network Science: Communities. Step 2: Hierarchical Clustering a)Compute of the centrality of each link. b)Remove the link with the largest centrality; in case of a tie, choose one randomly. c)Recalculate the centrality of each link for the altered network. d)Repeat until all links are removed (yields a dendrogram). Divisive Algorithms

34 M. Girvan & M.E.J. Newman, PNAS 99 (2002). A.-L. Barabási, Network Science: Communities. Step 2: Hierarchical Clustering a)Compute of the centrality of each link. b)Remove the link with the largest centrality; in case of a tie, choose one randomly. c)Recalculate the centrality of each link for the altered network. d)Repeat until all links are removed (yields a dendrogram). Divisive Algorithms

35 Step 2: Hierarchical Clustering a)Compute of the centrality of each link. b)Remove the link with the largest centrality; in case of a tie, choose one randomly. c)Recalculate the centrality of each link for the altered network. d)Repeat until all links are removed (yields a dendrogram). M. Girvan & M.E.J. Newman, PNAS 99 (2002). A.-L. Barabási, Network Science: Communities. Divisive Algorithms

36 M. Girvan & M.E.J. Newman, PNAS 99 (2002). A.-L. Barabási, Network Science: Communities. Computational complexity: Step 1a (calculation betweenness centrality): Step 1b (Recalculation of betweenness centrality for all links): for sparse networks Divisive Algorithms

37 Hierarchy in networks

38 (1) Scale-free property The obtained network is scale-free, its degree distribution following a power- law with E. Ravasz & A.-L. Barabási, PRE 67 (2003). A.-L. Barabási, Network Science: Communities. Hierarchy in networks

39 (1) Scale-free property The obtained network is scale-free, its degree distribution following a power- law with E. Ravasz & A.-L. Barabási, PRE 67 (2003). A.-L. Barabási, Network Science: Communities. Hierarchy in networks

40 (2) Clustering coefficient scaling with k Small k nodes: *high clustering coefficient; *their neighbors tend to link to each other in highly interlinked, compact communities. High k nodes (hubs): *small clustering coefficient; *connect independent communities. E. Ravasz & A.-L. Barabási, PRE 67 (2003). A.-L. Barabási, Network Science: Communities. Hierarchy in networks

41 (3) Clustering coefficient independent of N E. Ravasz & A.-L. Barabási, PRE 67 (2003). A.-L. Barabási, Network Science: Communities. Hierarchy in networks

42 (3) Clustering coefficient independent of N E. Ravasz & A.-L. Barabási, PRE 67 (2003). A.-L. Barabási, Network Science: Communities. Hierarchy in networks

Scale-free (a) Modular (b) Network Science: Degree Correlations March 7, 2011 Hierarchy in networks

2. Scaling clustering coefficient (DGM) 1. Scale-free 3. Clustering coefficient independent of N x E. Ravasz & A.-L. Barabási, PRE 67 (2003). A.-L. Barabási, Network Science: Communities. Hierarchy in networks

A.-L. Barabási, Network Science: Communities. POWER GRID INTERNET Hierarchy in real networks

Network Science: Degree Correlations March 7, 2011 Barabási, Ravasz, Vicsek, Physica A 2003 Dorogovtsev, Goltsev, Mendes, 2001 HIERARCHICAL MODELS

47 A.-L. Barabási, Network Science: Communities. Where to “cut”? Ambiguity in Hierarchical clustering

48 Modularity

49 MEJ Newman, PNAS 103 (2006). A.-L. Barabási, Network Science: Communities. H4: Random Hypothesis Randomly wired networks are not expected to have a community structure. Modularity

50 MEJ Newman, PNAS 103 (2006). A.-L. Barabási, Network Science: Communities. Imagine a partition in n c communities Modularity H4: Random Hypothesis Randomly wired networks are not expected to have a community structure. Modularity

51 MEJ Newman, PNAS 103 (2006). A.-L. Barabási, Network Science: Communities. Imagine a partition in n c communities Modularity Original data H4: Random Hypothesis Randomly wired networks are not expected to have a community structure. Modularity

52 MEJ Newman, PNAS 103 (2006). A.-L. Barabási, Network Science: Communities. Imagine a partition in n c communities Modularity Original data Expected connections, a model H4: Random Hypothesis Randomly wired networks are not expected to have a community structure. Modularity

53 MEJ Newman, PNAS 103 (2006). A.-L. Barabási, Network Science: Communities. Imagine a partition in n c communities Modularity Original data Expected connections, a model Relative to a specific partition H4: Random Hypothesis Randomly wired networks are not expected to have a community structure. Modularity

54 MEJ Newman, PNAS 103 (2006). A.-L. Barabási, Network Science: Communities. Imagine a partition in n c communities Modularity Original data Expected connections, a model Relative to a specific partition Modularity is a measure associated to a partition Random network H4: Random Hypothesis Randomly wired networks are not expected to have a community structure. Modularity

55 Another way of writing M MEJ Newman, PNAS 103 (2006). A.-L. Barabási, Network Science: Communities. where L C is the number of links within C. In a similar fashion, the second term becomes We can rewrite the first term as Finally we get: kc is the total degree of the nodes in this community. Modularity

56 MEJ Newman, PNAS 103 (2006). A.-L. Barabási, Network Science: Communities. H5: Maximal Modularity Hypothesis The partition with the maximum modularity M for a given network offers the optimal community structure Modularity

57 MEJ Newman, PNAS 103 (2006). A.-L. Barabási, Network Science: Communities. H5: Maximal Modularity Hypothesis The partition with the maximum modularity M for a given network offers the optimal community structure Find Goal that maximizes M Modularity

58 Optimal partition, that maximizes the modularity. Sub-optimal but positive modularity. Negative Modularity: If we assign each node to a different community. Zero modularity: Assigning all nodes to the same community, independent of the network structure. Modularity is size dependent Which partition ? A.-L. Barabási, Network Science: Communities. Modularity

59 A greedy algorithm, which iteratively joins nodes if the move increases the new partition’s modularity. Step 1. Assign each node to a community of its own. Hence we start with N communities. Step 2. Inspect each pair of communities connected by at least one link and compute the modularity variation obtained if we merge these two communities. Step 3. Identify the community pairs for which ΔM is the largest and merge them. Note that modularity of a particular partition is always calculated from the full topology of the network. Step 4. Repeat step 2 until all nodes are merged into a single community. Step 5. Record for each step and select the partition for which the modularity is maximal. MEJ Newman, PRE 69 (2004). A.-L. Barabási, Network Science: Communities. Modularity based community identification

60 Which partition ? A.-L. Barabási, Network Science: Communities. Modularity can be used to compare different partitions provided by other algorithms, like hierarchical clustering It can be used to design new algorithms, aiming at maximizing M Modularity

61 Which partition ? A.-L. Barabási, Network Science: Communities. Modularity for the Girvan-Newman

62 MEJ Newman, PRE 69 (2004). A.-L. Barabási, Network Science: Communities. Computational complexity: Step 1-2 (calculation of ΔM for L links ): Step 3 (matrix update): Step 4 ( N-1 community merges): for sparse networks Modularity based community identification

63 MEJ Newman, PRE 69 (2004). A.-L. Barabási, Network Science: Communities. Computational complexity: Step 1-2 (calculation of ΔM for L links ): Step 3 (matrix update): Step 4 ( N-1 community merges): for sparse networks Modularity based community identification

64 A.-L. Barabási, Network Science: Communities. k A and k B total degree in A and B A B Resolution limit Limits of Modularity

65 A.-L. Barabási, Network Science: Communities. k A and k B total degree in A and B If and A B Resolution limit Limits of Modularity

66 A.-L. Barabási, Network Science: Communities. k A and k B total degree in A and B If and A B We merge A and B to maximize modularity. Resolution limit Limits of Modularity

67 A.-L. Barabási, Network Science: Communities. k A and k B total degree in A and B If and Assuming A B We merge A and B to maximize modularity. Resolution limit Limits of Modularity

68 A.-L. Barabási, Network Science: Communities. k A and k B total degree in A and B If and Assuming Modularity has a resolution limit, as it cannot detect communities smaller than this size. A B We merge A and B to maximize modularity. Resolution limit Limits of Modularity

69 A.-L. Barabási, Network Science: Communities. One maximum? Limits of Modularity

70 Null models Expected connections, a model can take into account weights can take into account directions can take into account attributes or space S. Fortunato, Phys. Rep. 486 (2010) P. Expert el al., PNAS 108 (2011) Limits of Modularity

71 Gephi NetworkX R assigns self-loops to nodes to increase or decrease the aversion of nodes to form communities Finds the partition that maximizes modularity (considers weights and direction) Calculates the modularity of the partition you provide Online Resources(Modularity)

72 The greedy algorithm is neither particularly fast nor particularly successful at maximizing M. Scalability: Due to the sparsity of the adjacency matrix, the update of the matrix involves a large number of useless operations. The use of data structures for sparse matrices can decrease the complexity of the computational algorithm to, which allows us to analyze is of networks up to nodes. See "Fast Modularity" Community Structure Inference Algorithm for the code. A fast greedy algorithm was proposed by Blondel and collaborators, that can process networks with millions of nodes. For the description of the algorithm see ‪Louvain method: Finding communities in large networks‬ for the code.