Social network partition Presenter: Xiaofei Cao Partick Berg.

Slides:



Advertisements
Similar presentations
Chapter 5: Tree Constructions
Advertisements

Algorithms (and Datastructures) Lecture 3 MAS 714 part 2 Hartmut Klauck.
Fast algorithm for detecting community structure in networks M. E. J. Newman Department of Physics and Center for the Study of Complex Systems, University.
Chapter 8 Topics in Graph Theory
U of Houston – Clear Lake
Weighted graphs Example Consider the following graph, where nodes represent cities, and edges show if there is a direct flight between each pair of cities.
Lecture 3: Parallel Algorithm Design
CSE 5243 (AU 14) Graph Basics and a Gentle Introduction to PageRank 1.
Modularity and community structure in networks
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
Peer-to-Peer and Social Networks Centrality measures.
Edited by Malak Abdullah Jordan University of Science and Technology Data Structures Using C++ 2E Chapter 12 Graphs.
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 CHAPTER 4 - PART 2 GRAPHS 1.
Graph Partitioning Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
Distributed Breadth-First Search with 2-D Partitioning Edmond Chow, Keith Henderson, Andy Yoo Lawrence Livermore National Laboratory LLNL Technical report.
1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.
V4 Matrix algorithms and graph partitioning
The Out of Kilter Algorithm in Introduction The out of kilter algorithm is an example of a primal-dual algorithm. It works on both the primal.
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
Applied Discrete Mathematics Week 12: Trees
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Fast algorithm for detecting community structure in networks.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Shortest Path Problems Directed weighted graph. Path length is sum of weights of edges on path. The vertex at which the path begins is the source vertex.
Chapter 9 Graph algorithms Lec 21 Dec 1, Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
CSC 2300 Data Structures & Algorithms March 30, 2007 Chapter 9. Graph Algorithms.
Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of s Connections on social network Bus or flight routes Social graphs:
Spectral coordinate of node u is its location in the k -dimensional spectral space: Spectral coordinates: The i ’th component of the spectral coordinate.
Community Detection by Modularity Optimization Jooyoung Lee
COSC 2007 Data Structures II Chapter 14 Graphs III.
Vladyslav Kolbasin Stable Clustering. Clustering data Clustering is part of exploratory process Standard definition:  Clustering - grouping a set of.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
Shortest Path Problem Weight of the graph –Nonnegative real number assigned to the edges connecting to vertices Weighted graphs –When a graph.
7.1 and 7.2: Spanning Trees. A network is a graph that is connected –The network must be a sub-graph of the original graph (its edges must come from the.
TCP Traffic and Congestion Control in ATM Networks
Data Structures and Algorithms Ver. 1.0 Session 17 Objectives In this session, you will learn to: Implement a graph Apply graphs to solve programming problems.
1 COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf.
1 Branch and Bound Searching Strategies Updated: 12/27/2010.
Bipartite Matching. Unweighted Bipartite Matching.
Network Community Behavior to Infer Human Activities.
Data Structures and Algorithms in Parallel Computing Lecture 3.
University at BuffaloThe State University of New York Detecting Community Structure in Networks.
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Nov 3, 2005.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Graph Theory. undirected graph node: a, b, c, d, e, f edge: (a, b), (a, c), (b, c), (b, e), (c, d), (c, f), (d, e), (d, f), (e, f) subgraph.
CSE 421 Algorithms Richard Anderson Winter 2009 Lecture 5.
Finding community structure in very large networks
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Informatics tools in network science
Network Theory: Community Detection Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.
Iterative Improvement for Domain-Specific Problems Lecturer: Jing Liu Homepage:
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
Spanning Trees Dijkstra (Unit 10) SOL: DM.2 Classwork worksheet Homework (day 70) Worksheet Quiz next block.
Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.
Code: BCA302 Data Structures with C Prof. (Dr.) Monalisa Banerjee By.
Data Structures & Algorithm Analysis lec(8):Graph T. Souad alonazi
Graph clustering to detect network modules
Graphs Representation, BFS, DFS
Chapter 3. Decompositions of Graphs
The minimum cost flow problem
Community detection in graphs
Graphs Representation, BFS, DFS
Michael L. Nelson CS 495/595 Old Dominion University
CS 584 Project Write up Poster session for final Due on day of final
Warm Up – Tuesday Find the critical times for each vertex.
Graphs: Shortest path and mst
Presentation transcript:

Social network partition Presenter: Xiaofei Cao Partick Berg

Problem Statement Say we have a graph of nodes, representing anything you can imagine, but for our purposes let’s say it represents the population spread out across a country. Some of these nodes are closely batched together, representing a “community”, while others are further away representing a different “community”. We want to detect the communities in this graph. So how do we do this?

Some Definitions Before we try and develop a solution to this problem, we should get a few common definitions out of the way first. Degrees – A degree is the number of edges connected to a node. Community – A community is a grouped together (by some similarity) set of nodes that are densely connected internally.

A C B D G FE H Degree of C is 3 Degree of D is 2

Why find Communities? We want to find communities to see the relations between groups and their connections to others. We can use this to find groups of people that share particular traits easily, such as terrorist organizations (or any other social network).

How do we find Communities? Vertex Betweenness – This is a measure of a vertex (or node’s) centrality within the graph. This quantifies the number of times a node acts as a bridge in a shortest path between two other nodes.

Use BFS to find shortest-paths We use the BFS (Breadth First Search) Algorithm to find the shortest paths between each node and every other node. From this we can calculate the vertex betweenness for each node.

Girvan Newman Algorithm We can use the Girvan Newman algorithm to detect communities in the graph. Girvan Newman takes the “Betweenness” score and extends the definition to edges. So an edge “Betweenness” score is the number of shortest paths between a pair of nodes that runs along it. If there are more than one shortest paths, each path is assigned a value such that all paths have equal value.

Girvan Newman Algorithm Continued We can see that by using this method of edge “betweenness” scoring that communities will have lower edge scores between nodes in their community and higher edge scores along edges that connect them to other communities. To find the community, we now remove the highest scoring edge and re-calculate the “betweenness” score for each of the affected edges.

Example A C D B The highest edge score is 6, connecting node A to node C. So we remove this edge first.

Girvan Newman Algorithm Continued Now we continue to remove each highest score edge from the graph and recalculate until no edges remain. The end result is a dendrogram that shows the clusters of communities in our graph.

Proposed by Girvan-Newman in paper: "Community structure in social and biological networks." Proceedings of the National Academy of Sciences (2002): Complete algorithm in paper: "Finding and evaluating community structure in networks." Physical review E 69.2 (2004): Sequential Algorithm

Girvan Newman algorithm Goal: find the edge with the highest betweenness score and remove it. Continue doing that until the graph been partitioned. Import: The graph for every iteration. (adjacency matrix) Output: The betweenness score for every edges. (Betweenness matrix) The algorithm can be separate into 2 parts.

Part I: Find the number of shortest path from one node to every other nodes From top to down. Using breadth first algorithm to generate a new view for that node. Find the number of shortest path.

View from node

Part II Calculate the edges betweenness score for every iteration From bottom to up. Every nodes contain one score. Every edges’ score equal to Node_score/#shortest_path*(# of shortest path to the upper layer nodes) Sum up edges’ scores for every iteration.

View from node /3 1/3 4/3 1 5/6 25/6 1 11/6 1/2 3/2 1/ Score=Node_score/#shortest_path*(# of shortest path to the upper layer nodes)

Analysis the time complex Number of iteration in the big loop: n (number of nodes) Time complex of finding the shortest path: O(n^2) Time complex of calculating the betweenness score: O(n) Adding the betweenness matrix: n^2 Time complex is: n*(n^2+n+n^2)=O(n^3);

Parallel algorithm (Intuitively) Assigned every processor the same adjacency matrix of the original network. They start from different nodes. Generating views and calculating the betweenness matrix for each starting nodes. Then sum the matrix locally first. Doing prefix sum and update the original network by remove the highest score edges.

P1P2P4P3P5P6P8P7 G1,G2,G3G4,G5,G6G7,G8,G9G10,G11,G12G13,G14,G15G16,G17,G18G19,G20,G21G22,G23,G24 Breath first algorithm V1, V2, V3 V4, V5, V6 V7, V8, V9 V10, V11, V12 V13, V14, V15 V16, V17, V18 V19, V20, V21 V22, V23, V24 Sum the between -ness score locally B1B2B4B3B5B6B8B7 Parallel Prefix Sum B1B2B4B3B5B6B8B7 B1B2B4B3B5B6B8B7 B1B2B4B3B5B6B8B7 B1B2B4B3B5B6B8B7 Use B8 Value to update network Gn: start from node n in graph

Analysis of time complex Number of iteration: n/p; Find the number of shortest path: O(n^2); Find the betweenness score: O(n); Adding betweenness score locally: O(n^2); Adding betweenness score globally(prefix sum): O(n^2*log(p)) Time complex: n/p*(n^2+n+n^2)+n^2*log(p) =n^2(n/p+log(p));

Continue Speed up: n/(n/p+log(P)) When n=p*log(p); speed up = p; It is cost optimal.

Question