Community detection via random walk Draft slides.

Slides:



Advertisements
Similar presentations
Social network partition Presenter: Xiaofei Cao Partick Berg.
Advertisements

1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Analysis and Modeling of Social Networks Foudalis Ilias.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
The Greedy Approach Chapter 8. The Greedy Approach It’s a design technique for solving optimization problems Based on finding optimal local solutions.
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
Community Detection Algorithm and Community Quality Metric Mingming Chen & Boleslaw K. Szymanski Department of Computer Science Rensselaer Polytechnic.
More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.
Distributed Breadth-First Search with 2-D Partitioning Edmond Chow, Keith Henderson, Andy Yoo Lawrence Livermore National Laboratory LLNL Technical report.
Networks. Graphs (undirected, unweighted) has a set of vertices V has a set of undirected, unweighted edges E graph G = (V, E), where.
Lecture 21: Spectral Clustering
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Fast algorithm for detecting community structure in networks.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Clustering Social Networks Isabelle Stanton, University of Virginia Joint work with Nina Mishra, Robert Schreiber, and Robert E. Tarjan.
A scalable multilevel algorithm for community structure detection
1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti
1 Analyzing Kleinberg’s (and other) Small-world Models Chip Martel and Van Nguyen Computer Science Department; University of California at Davis.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Clustering Unsupervised learning Generating “classes”
Models of Influence in Online Social Networks
Randomized Algorithms - Treaps
Computer Vision James Hays, Brown
Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.
Clustering Spatial Data Using Random Walk David Harel and Yehuda Koren KDD 2001.
Spanning Trees CSIT 402 Data Structures II 1. 2 Two Algorithms Prim: (build tree incrementally) – Pick lower cost edge connected to known (incomplete)
CSE5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides.
Communities. Questions 1.What is a community (intuitively)? Examples and fundamental hypothesis 2.What do we really mean by communities? Basic definitions.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Presentation: Genetic clustering of social networks using random walks ELSEVIER Computational Statistics & Data Analysis February 2007 Genetic clustering.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
CS 590 Term Project Epidemic model on Facebook
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Exponential random graphs and dynamic graph algorithms David Eppstein Comp. Sci. Dept., UC Irvine.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Network Theory: Community Detection Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale.
Correlation Clustering Nikhil Bansal Joint Work with Avrim Blum and Shuchi Chawla.
Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems M. RosvallM. Rosvall and C. T. BergstromC.
Project 1 : Phase 1 22C:021 CS II Data Structures.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Graphs Definition: a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected.
Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.
Department of Computer and IT Engineering University of Kurdistan Social Network Analysis Communities By: Dr. Alireza Abdollahpouri.
Breadth-First Search (BFS)
Graph clustering to detect network modules
Data Mining: Basic Cluster Analysis
More on Clustering in COSC 4335
Clustering CSC 600: Data Mining Class 21.
BackTracking CS255.
Parallel Graph Algorithms
Lecture 13 Shortest Path.
CSC317 Graph algorithms Why bother?
Know thy Neighbor’s Neighbor Better Routing for Skip Graphs and Small Worlds Moni Naor Udi Wieder.
Community detection in graphs
Clustering.
Peer-to-Peer and Social Networks
CSE 421: Introduction to Algorithms
Michael L. Nelson CS 495/595 Old Dominion University
Randomized Algorithms CS648
Outline This topic covers Prim’s algorithm:
On the effect of randomness on planted 3-coloring models
Shortest Path Algorithms
Clustering The process of grouping samples so that the samples are similar within each group.
Lecture 12 Shortest Path.
Presentation transcript:

Community detection via random walk Draft slides

Background Consider a social graph G=(V, E), where |V|= n and |E|= m Girvan and Newman’s algorithm for community detection runs in O(m 2 n) time, and O(n 2 ) space. The Walktrap algorithm by Pons et al. computes a community structure (dendogram) in O(mnH) time, where H is the height of the dendogram – more scalable. The worst case is O(m 2 n) time. H

Random walk = probability that a random walk from j reaches a neighbor k, where A is the graph matrix (0-1) The probability of going from i to j through a random walk of length t is

Random Walk If two vertices are in the same community, the probability then will surely be high. But the fact that is high does not necessarily imply that are in the same community.

Ward’s agglomerative clustering Well known statistical method that estimates the distance between two clusters C1 and C2 (see Wikipedia). Walktrap uses this idea, but defines its own measure of distance based on random walk and probability.

Random Walk Intuition behind Walktrap Random walkers tend to get ‘trapped’ into densely connected parts (communities). Establish a distance measure between vertices (and between clusters) based on P t

Distance between nodes Let be two vertices of the graph. Pons et al. defined distance where is the probability of reaching j from through a random walk of length t

Distance between two communities Consider the probability that a random walk from a random vertex in community C to reach a vertex in steps. Call it Then the distance between two communities is

Distance between two communities

The Algorithm Initially there is one partition In each step, choose two communities (based on the distance between them) and create a new partition Where is the new community Update the distance between them.

Communities to merge This choice plays a central role in the quality of the community detection. At each step k, merge two communities that minimize the mean of the squared distance between them NP-hard for a given k Can be reduced to the K—median problem L

Conclusion Walktrap (Pons et al.) has been implemented in iGraph Library