Charalampos (Babis) E. Tsourakakis WAW 2010, Stanford 16 th December ‘10 WAW '101.

Slides:



Advertisements
Similar presentations
Charalampos (Babis) E. Tsourakakis KDD 2013 KDD'131.
Advertisements

A Model of Computation for MapReduce
Jure Leskovec, CMU Lars Backstrom, Cornell Ravi Kumar, Yahoo! Research Andrew Tomkins, Yahoo! Research.
Week 5 - Models of Complex Networks I Dr. Anthony Bonato Ryerson University AM8002 Fall 2014.
Theory of Computing Lecture 3 MAS 714 Hartmut Klauck.
CHARALAMPOS E. TSOURAKAKIS SCHOOL OF COMPUTER SCIENCE CARNEGIE MELLON UNIVERSITY Fast counting of triangles in large networks without counting: Algorithms.
Bahman Bahmani  Fundamental Tradeoffs  Drug Interaction Example [Adapted from Ullman’s slides, 2012]  Technique I: Grouping 
Discussion #33 Adjacency Matrices. Topics Adjacency matrix for a directed graph Reachability Algorithmic Complexity and Correctness –Big Oh –Proofs of.
Maximizing the Spread of Influence through a Social Network
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Noga Alon Institute for Advanced Study and Tel Aviv University
On the Spread of Viruses on the Internet Noam Berger Joint work with C. Borgs, J.T. Chayes and A. Saberi.
Fast FAST By Noga Alon, Daniel Lokshtanov And Saket Saurabh Presentation by Gil Einziger.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Chapter 9 Graph algorithms Lec 21 Dec 1, Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
EXPANDER GRAPHS Properties & Applications. Things to cover ! Definitions Properties Combinatorial, Spectral properties Constructions “Explicit” constructions.
CSE 421 Algorithms Richard Anderson Lecture 4. What does it mean for an algorithm to be efficient?
A scalable multilevel algorithm for community structure detection
1 On the Benefits of Adaptivity in Property Testing of Dense Graphs Joint work with Mira Gonen Dana Ron Tel-Aviv University.
1 Algorithmic Aspects in Property Testing of Dense Graphs Oded Goldreich – Weizmann Institute Dana Ron - Tel-Aviv University.
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Distributed Combinatorial Optimization
Some 3CNF Properties are Hard to Test Eli Ben-Sasson Harvard & MIT Prahladh Harsha MIT Sofya Raskhodnikova MIT.
Spectral Graph Theory (Basics)
Charalampos (Babis) E. Tsourakakis Brown University Brown University May 22 nd 2014 Brown University1.
Decomposing Networks and Polya Urns with the Power of Choice Joint work with Christos Amanatidis, Richard Karp, Christos Papadimitriou, Martha Sideri Presented.
Alan Frieze Charalampos (Babis) E. Tsourakakis WAW June ‘12 WAW '121.
CMU SCS Large Graph Mining: Power Tools and a Practitioner’s Guide Christos Faloutsos Gary Miller Charalampos (Babis) Tsourakakis CMU.
Neighbourhood Sampling for Local Properties on a Graph Stream A. Pavan, Iowa State University Kanat Tangwongsan, IBM Research Srikanta Tirthapura, Iowa.
Primal-Dual Meets Local Search: Approximating MST’s with Non-uniform Degree Bounds Author: Jochen Könemann R. Ravi From CMU CS 3150 Presentation by Dan.
Charalampos (Babis) E. Tsourakakis SODA th January ‘11 SODA '111.
Graph Sparsifiers Nick Harvey University of British Columbia Based on joint work with Isaac Fung, and independent work of Ramesh Hariharan & Debmalya Panigrahi.
Dense subgraphs of random graphs Uriel Feige Weizmann Institute.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
FAST COUNTING OF TRIANGLES IN LARGE NETWORKS: ALGORITHMS AND LAWS RPI Theory Seminar, 24 November 2008 Charalampos (Babis) Tsourakakis School of Computer.
Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.
Greedy Approximation Algorithms for finding Dense Components in a Graph Paper by Moses Charikar Presentation by Paul Horn.
Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.
Expanders via Random Spanning Trees R 許榮財 R 黃佳婷 R 黃怡嘉.
Graph Sparsifiers Nick Harvey Joint work with Isaac Fung TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte 2008 SIAM Conference on Data Mining, April 25 th Atlanta, Georgia.
Graph Colouring L09: Oct 10. This Lecture Graph coloring is another important problem in graph theory. It also has many applications, including the famous.
On-line Social Networks - Anthony Bonato 1 Dynamic Models of On-Line Social Networks Anthony Bonato Ryerson University WAW’2009 February 13, 2009 nt.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos,
Artur Czumaj DIMAP DIMAP (Centre for Discrete Maths and it Applications) Computer Science & Department of Computer Science University of Warwick Testing.
Miniconference on the Mathematics of Computation
1 How to burn a graph Anthony Bonato Ryerson University GRASCan 2015.
Estimating PageRank on Graph Streams Atish Das Sarma (Georgia Tech) Sreenivas Gollapudi, Rina Panigrahy (Microsoft Research)
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Presented by Alon Levin
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
Hongyu Liang Institute for Theoretical Computer Science Tsinghua University, Beijing, China The Algorithmic Complexity.
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Random Walk for Similarity Testing in Complex Networks
Cohesive Subgraph Computation over Large Graphs
DOULION: Counting Triangles in Massive Graphs with a Coin
Stochastic Streams: Sample Complexity vs. Space Complexity
Community detection in graphs
Density Independent Algorithms for Sparsifying
CIS 700: “algorithms for Big Data”
Haim Kaplan and Uri Zwick
CSCI B609: “Foundations of Data Science”
On the effect of randomness on planted 3-coloring models
Lecture 6: Counting triangles Dynamic graphs & sampling
Approximate Graph Mining with Label Costs
Presentation transcript:

Charalampos (Babis) E. Tsourakakis WAW 2010, Stanford 16 th December ‘10 WAW '101

Richard Peng SCS, CMU Gary L. Miller SCS, CMU Mihail N. Kolountzakis Math, University of Crete WAW '102

 Motivation  Existing Work  Our contributions  Experimental Results  Ramifications  Conclusions WAW '103

A C B (Wasserman Faust ‘94) Friends of friends tend to become friends themselves! (left to right) Paul Erdös, Ronald Graham, Fan Chung Graham WAW '104

5 Eckmann-Moses, Uncovering the Hidden Thematic Structure of the Web (PNAS, 2001) Key Idea: Connected regions of high curvature (i.e., dense in triangles) indicate a common topic!

6 WAW '10 Triangles used for Web Spam Detection (Becchetti et al. KDD ‘08) Key Idea: Triangle Distribution among spam hosts is significantly different from non-spam hosts!

7 WAW '10 Triangles used for assessing Content Quality in Social Networks Welser, Gleave, Fisher, Smith Journal of Social Structure 2007 Key Claim: The amount of triangles in the self-centered social network of a user is a good indicator of the role of that user in the community!

WAW '108

9 (Watts,Strogatz’98)

 Signed triangles in structural balance theory Jon Kleinberg’s talk (Leskovec et al. ‘10)  Triangle closing models also used to model the microscopic evolution of social networks (Leskovec et.al., KDD ‘08) WAW '1010

 CAD applications,  E.g., solving systems of geometric constraints involves triangle counting! (Fudos, Hoffman 1997) WAW '1011

Numerous other applications including : Motif Detection/ Frequent Subgraph Mining (e.g., Protein-Protein Interaction Networks) Community Detection (Berry et al. ‘09) Outlier Detection (Tsourakakis ‘08) Link Recommendation 12 WAW '10 Fast triangle counting algorithms are necessary.

 Motivation  Existing Work  Our contributions  Experimental Results  Ramifications  Conclusions WAW '1013

AlonYusterZwick Asymptotically the fastest algorithm but not practical for large graphs. In practice, one of the iterator algorithms are preferred. Node Iterator (count the edges among the neighbors of each vertex) Edge Iterator (count the common neighbors of the endpoints of each edge) Both run asymptotically in O(mn) time. WAW '1014

 Remarks  In Alon, Yuster, Zwick appears the idea of partitioning the vertices into “large” and “small” degree and treating them appropriately.  For more work, see references in our paper: ▪ Itai, Rodeh (STOC ‘77) ▪ Papadimitriou, Yannakakis (IPL ‘81) …… WAW '1015

 r independent samples of three distinct vertices WAW '10 16 Then the following holds: with probability at least 1-δ Works for dense graphs. e.g., T 3 n 2 logn

 (Yosseff, Kumar, Sivakumar ‘02) require n 2 /polylogn edges  More follow up work:  (Jowhari, Ghodsi ‘05)  (Buriol, Frahling, Leondardi, Marchetti, Spaccamela, Sohler ‘06)  (Becchetti, Boldi, Castillio, Gionis ‘08) WAW '1017

 Triangle Sparsifiers  Keep an edge with probability p. Count the triangles in sparsified graph and multiply by 1/p 3.  If the graph has O(n polylogn) triangles we get concentration and we know how to pick p (Tsourakakis, Kolountzakis, Miller ‘08)  Proof uses the Kim-Vu concentration result for multivariate polynomials which have bad Lipschitz constant but behave “well” on average. WAW '1018

19 WAW '10 Keep only 3! 3 eigenvalues of adjacency matrix i-th eigenvector Political Blogs More: Tsourakakis (KAIS 2010) SVD also works Haim Avron (KDD 2010) randomized trace estimation Tsourakakis (ICDM 2008)

 Motivation  Existing Work  Our contributions  Experimental Results  Ramifications  Conclusions WAW '1020

 Theorem If then with probability 1-1/n 3-d the sampled graph has a triangle count that ε-approximates the true number of triangles for any 0<d<3. WAW '1021

WAW ' k+1 2 Every graph on n vertices with max. degree Δ(G) =k is (k+1) -colorable with all color classes differing at size by at most 1. ….

 Create an auxiliary graph where each triangle is a vertex and two vertices are connected iff the corresponding triangles share an edge. Observe: Δ(G)=Ο(n)  Invoke Hajnal-Szemerédi theorem and apply Chernoff bound per each chromatic class. Finally, take a union bound. Q.E.D. WAW '1023

WAW '1024

WAW '1025

WAW '1026 Given a graph G with n vertices and m edges which graph maximizes the edges in the line graph L(G)?

WAW '1027

 Motivation  Existing Work  Our contributions  Experimental Results  Ramifications  Conclusions WAW '1028

WAW '1029

WAW '1030 LiveJournal (5.4M,48M) Orkut (3.1M,117M) Web-EDU (9.9M,46.3M) YouTube (1.2M,3M) Flickr, (1.9M,15.6M)

WAW '1031 Social networks abundant in triangles!

WAW '1032 secs

 p was set to 0.1. More sophisticated techniques for setting p exist (Tsourakakis, Kolountzakis, Miller ‘09) using a doubling procedure.  From our results, there is not a clear winner, but the hybrid algorithm achieves both high accuracy and speed.  Sampling from a binomial can be done easily in (expected) sublinear time.  Our code, even our exact algorithm, outperforms the fastest approximate counting competitors code, hence we compared different versions of our code! WAW '1033

 Motivation  Existing Work  Our contributions  Experimental Results  Ramifications  Conclusions WAW '1034

WAW '1035

 Observe that if we have an edge u~v and we “dot” the corresponding rows of the adjacency matrix we get the number of triangles.  Obviously a RP cannot preserve all inner products: consider the basis e 1,..,e n. Clearly we cannot have all Re i be orthogonal since they belong to a lower dimensional space.  When does RP work for triangle counting? WAW '1036

This random projection does not work! E[Y]=0 R kxn RP matrix, e.g., iid N(0,1) r.v Y= WAW '1037

This random projection gives E[Y]=kt! To have concentration it suffices: Var[Y]=k(#circuits of length 6)=o(k(E[Y]) 2 ) WAW '1038 R kxn RP matrix, e.g., iid N(0,1) r.v

 We can adapt our proposed method in the semi-streaming model with space usage so that it performs only 3 passes over the data.  More experiments, all the implementation details. WAW '1039

 Motivation  Existing Work  Our contributions  Experimental Results  Ramifications  Conclusions WAW '1040

Remove edge (1,2) Remove any weighted edge w sufficiently large 41 WAW '10 Spielman-Srivastava and Benczur-Karger sparsifiers also don’t work! (Tsourakakis, Kolountzakis, Miller ‘08)

 State-of-the art results in triangle counting for massive graphs (sparsify and sample triples carefully)  Sampling results of different “flavor” compared to existing work.  Implement the algorithm in the MapReduce framework (done by Sergei Vassilvitskii et al., Yahoo! Research MADALGO ‘10)  For which graphs do random projections work? WAW '1042

THANK YOU! WAW '1043

WAW '1044

WAW ' ,963,073

Best method for our applications: best running time, high accuracy WAW '1046 Hybrid vs. Naïve Sampling improves accuracy, Increases running time

 Semi-streaming model (Feigenbaum et al., ICALP 2004) relaxes the strict constraints of the streaming model.  Semi-external memory constraint  Graph stored on disk as an adjacency list, no random access is allowed (only sequential accesses)  Limited number of sequential scans WAW '1047

WAW '1048

WAW '10 49 ij k ?? Sample uniformly at random an edge (i,j) and a node k in V-{i,j} Check if edges (i,k) and (j,k) exist in E(G) samples

50 WAW '10 Mildness, pick p=1 Concentration How to choose p? Tsourakakis,Kolountzakis,Miller(‘09): keep each edge with probability p