Clustering Social Networks Isabelle Stanton, University of Virginia Joint work with Nina Mishra, Robert Schreiber, and Robert E. Tarjan.

Slides:



Advertisements
Similar presentations
The Primal-Dual Method: Steiner Forest TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A AA A A A AA A A.
Advertisements

Connectivity - Menger’s Theorem Graphs & Algorithms Lecture 3.
Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Deterministic vs. Non-Deterministic Graph Property Testing Asaf Shapira Tel-Aviv University Joint work with Lior Gishboliner.
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
Comp 122, Spring 2004 Greedy Algorithms. greedy - 2 Lin / Devi Comp 122, Fall 2003 Overview  Like dynamic programming, used to solve optimization problems.
Multicut Lower Bounds via Network Coding Anna Blasiak Cornell University.
Community Detection Algorithm and Community Quality Metric Mingming Chen & Boleslaw K. Szymanski Department of Computer Science Rensselaer Polytechnic.
Information Networks Graph Clustering Lecture 14.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
On the Spread of Viruses on the Internet Noam Berger Joint work with C. Borgs, J.T. Chayes and A. Saberi.
Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
Yuzhou Zhang ﹡, Jianyong Wang #, Yi Wang §, Lizhu Zhou ¶ Presented by Nam Nguyen Parallel Community Detection on Large Networks with Propinquity Dynamics.
A general approximation technique for constrained forest problems Michael X. Goemans & David P. Williamson Presented by: Yonatan Elhanani & Yuval Cohen.
Sublinear Algorithms for Approximating Graph Parameters Dana Ron Tel-Aviv University.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Clustering Social Networks Isabelle Stanton, University of Virginia Joint work with Nina Mishra, Robert Schreiber, and Robert E. Tarjan.
EXPANDER GRAPHS Properties & Applications. Things to cover ! Definitions Properties Combinatorial, Spectral properties Constructions “Explicit” constructions.
Message Passing for the Coloring Problem: Gallager Meets Alon and Kahale Sonny Ben-Shimon and Dan Vilenchik Tel Aviv University AofA June, 2007 TexPoint.
A scalable multilevel algorithm for community structure detection
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Finding a maximum independent set in a sparse random graph Uriel Feige and Eran Ofek.
K-Coloring k-coloring: A k-coloring of a graph G is a labeling f: V(G)  S, where |S|=k. The labels are colors; the vertices of one color form a color.
Approximation Algorithms: Bristol Summer School 2008 Seffi Naor Computer Science Dept. Technion Haifa, Israel TexPoint fonts used in EMF. Read the TexPoint.
Approximating the MST Weight in Sublinear Time Bernard Chazelle (Princeton) Ronitt Rubinfeld (NEC) Luca Trevisan (U.C. Berkeley)
Graph Sparsifiers Nick Harvey University of British Columbia Based on joint work with Isaac Fung, and independent work of Ramesh Hariharan & Debmalya Panigrahi.
Graph Partitioning and Clustering E={w ij } Set of weighted edges indicating pair-wise similarity between points Similarity Graph.
X-Stream: Edge-Centric Graph Processing using Streaming Partitions
Models and Algorithms for Complex Networks Graph Clustering and Network Communities.
Greedy Approximation Algorithms for finding Dense Components in a Graph Paper by Moses Charikar Presentation by Paul Horn.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Near Optimal Streaming algorithms for Graph Spanners Surender Baswana IIT Kanpur.
Edge-disjoint induced subgraphs with given minimum degree Raphael Yuster 2012.
1 Burning a graph as a model of social contagion Anthony Bonato Ryerson University Institute of Software Chinese Academy of Sciences.
CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David.
An Efficient Algorithm for Enumerating Pseudo Cliques Dec/18/2007 ISAAC, Sendai Takeaki Uno National Institute of Informatics & The Graduate University.
1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science
1 The Search Landscape of Graph Partitioning Problems using Coupling and Cohesion as the Clustering Criteria Brian S. Mitchell & Spiros Mancoridis
Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel University of Haifa, Israel.
Miniconference on the Mathematics of Computation
1 How to burn a graph Anthony Bonato Ryerson University GRASCan 2015.
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Community detection via random walk Draft slides.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Network Theory: Community Detection Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale.
Correlation Clustering Nikhil Bansal Joint Work with Avrim Blum and Shuchi Chawla.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Clustering with Spectral Norm and the k-means algorithm Ravi Kannan Microsoft Research Bangalore joint work with Amit Kumar (Indian Institute of Technology,
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Cohesive Subgraph Computation over Large Graphs
Groups of vertices and Core-periphery structure
MEIKE: Influence-based Communities in Networks
Sathya Ronak Alisha Zach Devin Josh
Approximating the MST Weight in Sublinear Time
June 2017 High Density Clusters.
What is the next line of the proof?
MST in Log-Star Rounds of Congested Clique
Network Science: A Short Introduction i3 Workshop
Lecture 7: Dynamic sampling Dimension Reduction
The Importance of Communities for Learning to Influence
Clustering Social Networks
Community Detection: Overlapping Communities
Finding Subgraphs with Maximum Total Density and Limited Overlap
Approximating the Community Structure of the Long Tail
On the effect of randomness on planted 3-coloring models
Overcoming Resolution Limits in MDL Community Detection
3.3 Network-Centric Community Detection
Malik Magdon-Ismail, Konstantin Mertsalov, Mark Goldberg
Contagious sets in random graphs
Presentation transcript:

Clustering Social Networks Isabelle Stanton, University of Virginia Joint work with Nina Mishra, Robert Schreiber, and Robert E. Tarjan

Outline Motivation Previous Work Combinatorial properties Finding Tightly Knit Clusters Finding Loosely Knit Clusters Future Work

Motivation Many large social networks: A fundamental problem is finding communities automatically  Viral and Targeted Marketing  Recommendation Engines

Previous Work Modularity:  M.E.J. Newman 2002 Spectral Methods:  Kannan, Vempala, Vetta 2000, Spielman and Teng 1996, Shi and Malik 2000, Kempe and McSherry 2004, Karypis and Kumar 1998 and many others Both require disjoint partitions of all elements

Communities in Social Networks Disjoint partitionings are not good for social networks

Objective: Internal Density,  Each vertex in C is adjacent to at least  fraction of (the rest of) C Examples:  =1/2  =3/4  =1

Each vertex outside of C is adjacent to at most  of C  <  Objective: External Sparsity,   =1/5,  =1  =1

(α, β)-Clusters C is an (α, β)- cluster if:  Internally Dense: Every vertex in the cluster neighbors at least a β fraction of the cluster  Externally Sparse: Every vertex outside the cluster neighbors at most an α fraction of the cluster (1/4, 1) (1/4, 2/3)

Previous Work – (α, β)-clusters Solved Areas: α β β > ½ + α/2 – This work (1- ε,1) – Tsukiyama et al, Johnson et al. α = 0 – connected components

Outline Motivation Previous Work Combinatorial properties  Can clusters overlap arbitrarily?  How many clusters can there be? Finding Tightly Knit Clusters Finding Loosely Knit Clusters Future Work

Combinatorial Properties - Overlaps Let A and B be (α, β)-clusters with |A|=|B| Theorem: A and B overlap by at most (1-(β-α))|A| vertices

Combinatorial Properties - |Clusters| Claim: There are at most (α,1)-clusters of size s in a graph Proof is from Steiner Systems  7 points, block size = 3, restriction = 2  {1,2,4},{2,3,5},{3,4,6},{4,5,7},{1,5,6},{2,6,7},{1,3,7} Bound is tight as α → 1 and α = 0. Seems loose elsewhere

Too Many Clusters.. x1x1 x2x2 x n/2 y1y1 y2y2 y n/2 n vertices MISSING edges drawn Problem: Every vertex in every cluster has as many neighbors outside the cluster as in it...

ρ -Champions Wes Anderson Ben Stiller Owen Wilson Bill Murray Gwenyth Paltrow Will Ferrell Vince Vaughn Anjelica Houston Steve Martin

ρ -Champions Def: A vertex is a ρ-champion of C if it has at most ρ|C| neighbors outside C Claim: If ρ < 2β – 1 – α, every vertex can ρ- champion at most one cluster

Intuition behind the Algorithm Let c be a ρ-champion If v in C, then v and c share at least (2β -1)|C| neighbors If v is outside C then v and c share at most (ρ + α)|C| neighbors c β|C| ρ|C| α|C| (2β-1)|C| cv v

Deterministic Algorithm To find all clusters of size s: for each c in V do  C ←   For each v within two steps of c do If v and c share (2β – 1)s neighbors then add v to C  If C is an (α, β)-cluster then output C

Algorithmic Guarantees Claim: Our algorithm will find all clusters where β > ½ + (ρ + α)/2 Runs in O(d 0.7 n 1.9 +n 2+o(1) ) time where d is the average degree d is small for social networks so O(n 2 )

Outline Motivation Previous Work Combinatorial properties Finding Tightly Knit Clusters Finding Loosely Knit Clusters Future Work

Loosely Knit Clusters (0, 4/9) β < ½ Technical Problem:

Expansion Expansion of a cut: AB cut(A,B) |A| Often used as a part of a criterion: [Shi, Malik] [Kannan, Vempala, Vetta] [Flake, Tarjan, Tsioutsiouliklis] etc

Randomized Algorithm for each c in V do  Draw a sample of size t, k times  For each sample, iteratively add vertices that have many neighbors in the sample  When no more vertices can be added check if we have an (α, β)-cluster

Guarantees Claim: The randomized algorithm finds all clusters with a ρ-champions where the expansion is greater than with probability 1 - δ Only relies on ρ-champions for good sampling probabilities

Conclusions Defined (α, β)-clusters Explored some combinatorial properties Introduced ρ-champions Developed algorithms for a subset of the problem

Future Work Algorithms that reduce the necessary α-β gap Relaxing ρ-champion restriction Weighted and directed graphs Decentralized algorithms Streaming algorithms

Evaluation Do ρ-champions exist in real graphs? Tsukiyama’s algorithm finds all maximal cliques ((1-ε, 1)-clusters) in a graph We compare our algorithm’s output with Tsukiyama’s ground truth

HEP Co-Author Dataset Results Found 115 of 126 clusters ~ 90%

Theory Co-Author Dataset Results Found 797 of 854 clusters ~ 93%

LiveJournal Dataset Results Too big to run Tsukiyama. Found 4289 clusters, 876 have large ρ-champions

Timing ExperimentHEPTALJ Our Algorithm 8 sec2 min 4 sec3 hours 37 min Tsukiyama8 hours36 hoursN/A * * Estimated Running Time 25 weeks All experiments written in Python and run on a machine with 2 dual core 3 GHz Intel Xeons and 16 GB of RAM

Datasets High Energy Physics Co-Authorship Graph Theory Co-authorship graph A subset of LiveJournal.com Data SetSizeAvg. DegreeAvg. τ(v) HEP8, TA31, LJ581, τ(v) = the neighbors and neighbors’ neighbors of v

Previous Work - Modularity Compares the edge distribution with the expected distribution of a random graph with the same degrees Many competitive methods developed Inherently defined as a partitioning Introduced by Newman (2002)

Intuition behind the Algorithm Let c be a ρ-champion If v in C, then v and c share at least (2β -1)|C| neighbors If v is outside C then v and c share at most (ρ + α)|C| neighbors cc v v β|C| ρ|C| α|C| (2β-1)|C|