# 1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins.

## Presentation on theme: "1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins."— Presentation transcript:

1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins

2 The fundamental question… Given graph with millions/billions of nodes, how do we understand it?

3 Macroscopic Success Stories Given graph with millions/billions of nodes, how do we understand it? Spectral Graph Analysis –Eigenvalues reveal intuition for mixing time, connectivity Conductance of a graph Degree distribution

4 Macroscopic models of graphs: Understanding connectivity Bow tie model [Broder et al] Web graph Jellyfish model [Faloutsos et al] Internet AS graph No equivalent model for bipartite graphs

5 Our Goals Develop macroscopic tools to analyze social networks –Massive networks –What are simple, easy-to-understand properties? –Today: KNC-plot for bipartite graphs Given implicit graph representation, do something smarter than explicitly building graph –Bipartite representation gives an implicit graph –Our algorithms never build actual graph –Same spirit as work of [Feder, Motwani 95]

6 Outline Definition of the KNC-plot –k-neighborhood graph Analysis of real social networks using the KNC-plot Description of algorithm

7 The k-neighborhood graph, G k Given bipartite graph B, users on left, interests on right Connect two users if they share at least k interests in common

8 The k-neighborhood graph, G k Given bipartite graph B, users on left, interests on right Connect two users if they share at least k interests in common G1G1

9 Given bipartite graph B, users on left, interests on right Connect two users if they share at least k interests in common The k-neighborhood graph, G k G2G2

10 Given bipartite graph B, users on left, interests on right Connect two users if they share at least k interests in common The k-neighborhood graph, G k G3G3

11 Illustration k=1

12 Illustration k=2

13 Illustration k=3

14 Illustration k=4

15 Illustration k=5

16 The KNC-plot The k-neighbor connectivity plot –How many connected components does G k have? –What is the size of the largest component? Answers the question: how many shared interests are meaningful? –Communities, Cuts

17 Analysis Four graphs: –LiveJournal Blogging site, users can specify interests –Y! query logs (interests = queries) Queries issued for Yahoo! Search (Try it at www.yahoo.com) –Content match (users = web pages, interests = ads) Ads shown on web pages –Flickr photo tags (users = photos, interests = tags) All data anonymized, sanitized, downsampled –Graphs have 100s of thousands to a million users

18 Examples — Largest component — Number of components At k=5, all connected. At k=6, interesting! At k=6, nobody connected

19 Examples — Largest component — Number of components At k=5, all connected. At k=6, interesting! At k=6, nobody connected Content match Web pages = “users” Ads = “interests” Flickr Photos = “users” Tags = “interests”

20 Examples — Largest component — Number of components Connectivity smoothly varies “Heavy-tailed” At k=14, 10% connected At k=36, 1% connected

21 Examples — Largest component — Number of components Connectivity smoothly varies “Heavy-tailed” At k=14, 10% connected At k=36, 1% connected Y! queries Users = users Queries = “interests” LiveJournal Users = users Interests = interests

22 Algorithms Naïve implementation takes O(mn) time –Impractical for large graphs — Naïve — Ours For k = 2

23 Algorithms Naïve implementation takes O(mn) time –Impractical for large graphs Our implementation takes O(m 2-1/k ) time –Social networks are generally sparse –Faster for power-law distribution (no change in the algorithm) –Very fast for k=2, can trim graph for k=3, etc. Space O(km) — Naïve — Ours For k = 2

24 Alg-Intersect Roughly speaking, for every pair of users, determine whether they have k interests in common For each node u, record its neighborhood –For each node v, see if u’s and v’s neighborhoods intersect in at least k nodes –If so, connect them, otherwise don’t Takes O(nm) time (n= # nodes, m = # edges) Space = O(m)

25 Alg-Intersect Roughly speaking, for every pair of users, determine whether they have k interests in common For each node u  S, record its neighborhood –For each node v, see if u’s and v’s neighborhoods intersect in at least k nodes –If so, connect them, otherwise don’t Takes O(nm) time (n= # nodes, m = # edges) BUT: May explore only nodes in set S. –Takes O(|S|m) time Space = O(m)

26 Alg-Tuples Consider k=2. Suppose user 1 has interests {A,B,C} user 2 has interests {A,C,D} Create “virtual nodes” Connect user 1 to {AB}, {AC}, {BC} Connect user 2 to {AC}, {AD}, {CD} There is an edge between user 1 and user 2 in G k iff there is a virtual node that both are connected to.

27 Alg-Tuples For each node u, –Create virtual nodes for u (if not already created) –Connect u to those virtual nodes // (note: there are O( deg(u) k ) of them) Figure out connectivity of G k using virtual graph Runtime O(  u deg(u) k ) –Uses Union-Set structure –Edges not actually explicitly computed Space O (  u deg(u) k )

28 Combining them Run Alg-Intersect for some subset S of nodes –We know all edges in G k that go from u  S to any node v –Runtime O(|S|m) S Other nodes High degree nodes

29 Combining them Run Alg-Intersect for some subset S of nodes –We know all edges in G k that go from u  S to any node v –Runtime O(|S|m) Run Alg-Tuple on the rest of the nodes –We “know” all edges in G k that go from u  S to v  S –Runtime O(  u  S deg(u) k ) S Other nodes

30 Order u 1, u 2, … by decreasing deg(u i ) Initialize b=1. Increase b until  i≥b deg(u i ) k ≤ bm Let S = {u 1, u 2 …, u b } Run Alg-Intersect on nodes in S Run Alg-Tuple on nodes not in S –Connect the two Runtime is O(bm) + O(  i≥b deg(u i ) k ) = O(2bm) Finding S High degree nodes

31 Combining them Runtime is O(bm) + O(  i≥b deg(u i ) k ) But, for any graph, deg(u i ) ≤ m/i (by Markov) –Do not need power-law Hence, bm =  i≥b deg(u i ) k ≤  i≥b m k /i k = O( m k /b k ) So b = O(m 1-1/k )  Runtime is O(m 2-1/k )

32 Extensions Power-law distributed provably faster –O(m 1+(1-1/k)/  ) for power law with exponent  –Algorithm works exactly the same –No need to know whether power-law ahead of time When set of interests is logarithmic, can get quasi-linear time algorithms –Different algorithm –In paper

33 Conclusion KNC-plot useful tool –Exposes how meaningful shared interests are The k-neighborhood graph defined implicitly –Efficient algorithm for implicit graph –Other algorithms for G k, given bipartite representation Find additional social graph properties that are meaningful, computable –Describe macroscopic structure of social networks

Download ppt "1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins."

Similar presentations