Presentation is loading. Please wait.

Presentation is loading. Please wait.

Weighted Graphs and Disconnected Components Patterns and a Generator Mary McGlohon, Leman Akoglu, Christos Faloutsos Carnegie Mellon University School.

Similar presentations


Presentation on theme: "Weighted Graphs and Disconnected Components Patterns and a Generator Mary McGlohon, Leman Akoglu, Christos Faloutsos Carnegie Mellon University School."— Presentation transcript:

1 Weighted Graphs and Disconnected Components Patterns and a Generator Mary McGlohon, Leman Akoglu, Christos Faloutsos Carnegie Mellon University School of Computer Science

2 2 McGlohon, Akoglu, Faloutsos KDD08

3 ● In graphs a largest connected component emerges. ● What about the smaller-size components? ● How do they emerge, and join with the large one? 3 McGlohon, Akoglu, Faloutsos KDD08 “Disconnected” components

4 4 McGlohon, Akoglu, Faloutsos KDD08 Weighted edges ● Graphs have heavy-tailed degree distribution. ● What can we also say about these edges? ● How are they repeated, or otherwise weighted?

5 5 McGlohon, Akoglu, Faloutsos KDD08 Our goals ● Observe “Next-largest connected components” Q1. How does the GCC emerge? Q2. How do NLCC’s emerge and join with the GCC? ● Find properties that govern edge weights Q3: How does the total weight of the graph relate to the number of edges? Q4: How do the weights of nodes relate to degree? Q5: Does this relation change with the graph? ● Q6: Can we produce an emergent, generative model

6 66 McGlohon, Akoglu, Faloutsos KDD08 Outline ● Motivation ● Related work ● Preliminaries ● Data ● Observations ● Model ● Summary 12345

7 7 McGlohon, Akoglu, Faloutsos KDD08 Properties of networks ● Small diameter (“small world” phenomenon) – [Milgram 67] [Leskovec, Horovitz 07] ● Heavy-tailed degree distribution – [Barabasi, Albert 99] [Faloutsos, Faloutsos, Faloutsos 99] ● Densification – [Leskovec, Kleinberg, Faloutsos 05] ● “Middle region” components as well as GCC and singletons – [Kumar, Novak, Tomkins 06]

8 8 McGlohon, Akoglu, Faloutsos KDD08 Generative Models ● Erdos-Renyi model [Erdos, Renyi 60] ● Preferential Attachment [Barabasi, Albert 99] ● Forest Fire model [Leskovec, Kleinberg, Faloutsos 05] ● Kronecker multiplication [Leskovec, Chakrabarti, Kleinberg, Faloutsos 07] ● Edge Copying model [Kumar, Raghavan, Rajagopalan, Sivakumar, Tomkins, Upfal 00] ● “Winners don’t take all” [Pennock, Flake, Lawrence, Glover, Giles 02]

9 99 McGlohon, Akoglu, Faloutsos KDD08 Outline ● Motivation ● Related work ● Preliminaries ● Data ● Observations ● Model ● Summary 123456

10 10 McGlohon, Akoglu, Faloutsos KDD08 Diameter ● Diameter of a graph is the “longest shortest path”. n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7

11 11 McGlohon, Akoglu, Faloutsos KDD08 Diameter ● Diameter of a graph is the “longest shortest path”. diameter=3 n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7

12 12 McGlohon, Akoglu, Faloutsos KDD08 Diameter ● Diameter of a graph is the “longest shortest path”. ● Effective diameter is the distance at which 90% of nodes can be reached. diameter=3 n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7

13 13 McGlohon, Akoglu, Faloutsos KDD08 Outline ● Motivation ● Related work ● Preliminaries ● Data ● Observations ● Model ● Summary 12345

14 14 McGlohon, Akoglu, Faloutsos KDD08 Unipartite Networks ● Postnet: Posts in blogs, hyperlinks between ● Blognet: Aggregated Postnet, repeated edges ● Patent: Patent citations ● NIPS: Academic citations ● Arxiv: Academic citations ● NetTraffic: Packets, repeated edges ● Autonomous Systems (AS): Packets, repeated edges n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7

15 15 McGlohon, Akoglu, Faloutsos KDD08 Unipartite Networks ● Postnet: Posts in blogs, hyperlinks between ● Blognet: Aggregated Postnet, repeated edges ● Patent: Patent citations ● NIPS: Academic citations ● Arxiv: Academic citations ● NetTraffic: Packets, repeated edges ● Autonomous Systems (AS): Packets, repeated edges n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 (3)

16 16 McGlohon, Akoglu, Faloutsos KDD08 Unipartite Networks ● Postnet: Posts in blogs, hyperlinks between ● Blognet: Aggregated Postnet, repeated edges ● Patent: Patent citations ● NIPS: Academic citations ● Arxiv: Academic citations ● NetTraffic: Packets, repeated edges ● Autonomous Systems (AS): Packets, repeated edges n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 10 1.2 8.3 2 6 1

17 17 McGlohon, Akoglu, Faloutsos KDD08 Unipartite Networks ● (Nodes, Edges, Timestamps) ● Postnet: 250K, 218K, 80 days ● Blognet: 60K,125K, 80 days ● Patent: 4M, 8M, 17 yrs ● NIPS: 2K, 3K, 13 yrs ● Arxiv: 30K, 60K, 13 yrs ● NetTraffic: 21K, 3M, 52 mo ● AS: 12K, 38K, 6 mo n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7

18 18 McGlohon, Akoglu, Faloutsos KDD08 Bipartite Networks ● IMDB: Actor-movie network ● Netflix: User-movie ratings ● DBLP: conference- repeated edges – Author-Keyword – Keyword-Conference – Author-Conference ● US Election Donations: $ weights, repeated edges – Orgs-Candidates – Individuals-Orgs n1n1 n2n2 n3n3 n4n4 m1m1 m2m2 m3m3

19 19 McGlohon, Akoglu, Faloutsos KDD08 Bipartite Networks ● IMDB: Actor-movie network ● Netflix: User-movie ratings ● DBLP: repeated edges – Author-Keyword – Keyword-Conference – Author-Conference ● US Election Donations: $ weights, repeated edges – Orgs-Candidates – Individuals-Orgs n1n1 n2n2 n3n3 n4n4 m1m1 m2m2 m3m3

20 20 McGlohon, Akoglu, Faloutsos KDD08 Bipartite Networks ● IMDB: Actor-movie network ● Netflix: User-movie ratings ● DBLP: repeated edges – Author-Keyword – Keyword-Conference – Author-Conference ● US Election Donations: $ weights, repeated edges – Orgs-Candidates – Individuals-Orgs n1n1 n2n2 n3n3 n4n4 m1m1 m2m2 m3m3 10 1.2 2 1 5 6

21 21 McGlohon, Akoglu, Faloutsos KDD08 Bipartite Networks ● IMDB: 757K, 2M, 114 yr ● Netflix: 125K, 14M, 72 mo ● DBLP: 25 yr – Author-Keyword: 27K, 189K – Keyword-Conference: 10K, 23K – Author-Conference: 17K, 22K ● US Election Donations: 22 yr – Orgs-Candidates: 23K, 877K – Individuals-Orgs: 6M, 10M n1n1 n2n2 n3n3 n4n4 m1m1 m2m2 m3m3

22 22 McGlohon, Akoglu, Faloutsos KDD08 Outline ● Motivation ● Related work ● Preliminaries ● Data ● Observations ● Model ● Summary 12345

23 23 McGlohon, Akoglu, Faloutsos KDD08 Observation 1: Gelling Point Q1: How does the GCC emerge?

24 24 McGlohon, Akoglu, Faloutsos KDD08 Observation 1: Gelling Point ● Most real graphs display a gelling point, or burning off period ● After gelling point, they exhibit typical behavior. This is marked by a spike in diameter. Time Diameter IMDB t=1914

25 Observation 2: NLCC behavior Q2: How do NLCC’s emerge and join with the GCC? Do they continue to grow in size? Do they shrink? Stabilize? 25 McGlohon, Akoglu, Faloutsos KDD08

26 26 McGlohon, Akoglu, Faloutsos KDD08 Observation 2: NLCC behavior ● After the gelling point, the GCC takes off, but NLCC’s remain constant or oscillate. Time IMDB CC size

27 27 McGlohon, Akoglu, Faloutsos KDD08 Outline ● Motivation ● Related work ● Preliminaries ● Data ● Observations ● Model ● Summary 12345

28 Observation 3 Q3: How does the total weight of the graph relate to the number of edges? 28 McGlohon, Akoglu, Faloutsos KDD08

29 29 McGlohon, Akoglu, Faloutsos KDD08 Observation 3: Fortification Effect ● $ = # checks ? |Checks| Orgs-Candidates |$| 1980 2004

30 30 McGlohon, Akoglu, Faloutsos KDD08 Observation 3: Fortification Effect ● Weight additions follow a power law with respect to the number of edges: – W(t): total weight of graph at t – E(t): total edges of graph at t – w is PL exponent – 1.01 < w < 1.5 = super-linear! – (more checks, even more $) |Checks| Orgs-Candidates |$| 1980 2004

31 Observation 4 and 5 Q4: How do the weights of nodes relate to degree? Q5: Does this relation change over time? 31 McGlohon, Akoglu, Faloutsos KDD08

32 32 McGlohon, Akoglu, Faloutsos KDD08 Observation 4: Snapshot Power Law ● At any time, total incoming weight of a node is proportional to in degree with PL exponent, iw. 1.01 < iw < 1.26, super-linear ● More donors, even more $ Edges (# donors) In-weights ($) Orgs-Candidates e.g. John Kerry, $10M received, from 1K donors

33 33 McGlohon, Akoglu, Faloutsos KDD08 Observation 5: Snapshot Power Law ● For a given graph, this exponent is constant over time. Time exponent Orgs-Candidates

34 34 McGlohon, Akoglu, Faloutsos KDD08 Outline ● Motivation ● Related work ● Preliminaries ● Data ● Observations ● Q6: Is there a generative, “emergent” model? ● Summary

35 Goals of model 35 McGlohon, Akoglu, Faloutsos KDD08 ● a) Emergent, intuitive behavior ● b) Shrinking diameter ● c) Constant NLCC’s ● d) Densification power law ● e) Power-law degree distribution

36 Goals of model 36 McGlohon, Akoglu, Faloutsos KDD08 ● a) Emergent, intuitive behavior ● b) Shrinking diameter ● c) Constant NLCC’s ● d) Densification power law ● e) Power-law degree distribution = “Butterfly” Model

37 37 McGlohon, Akoglu, Faloutsos KDD08 Butterfly model in action ● A node joins a network, with own parameter. n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p step “Curiosity”

38 38 McGlohon, Akoglu, Faloutsos KDD08 Butterfly model in action ● A node joins a network, with own parameter. ● With (global) p host, chooses a random host n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p host “Cross-disciplinarity”

39 39 McGlohon, Akoglu, Faloutsos KDD08 Butterfly model in action ● A node joins a network, with own parameters. ● With (global) p host, chooses a random host – With (global) p link, creates link n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p link “Friendliness”

40 40 McGlohon, Akoglu, Faloutsos KDD08 Butterfly model in action ● A node joins a network, with own parameters. ● With (global) p host, chooses a random host – With (global) p link, creates link – With p step travels to random neighbor n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p step

41 41 McGlohon, Akoglu, Faloutsos KDD08 Butterfly model in action ● A node joins a network, with own parameters. ● With (global) p host, chooses a random host – With (global) p link, creates link – With p step travels to random neighbor. Repeat. n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p link

42 42 McGlohon, Akoglu, Faloutsos KDD08 Butterfly model in action ● A node joins a network, with own parameters. ● With (global) p host, chooses a random host – With (global) p link, creates link – With p step travels to random neighbor. Repeat. n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p step

43 43 McGlohon, Akoglu, Faloutsos KDD08 Butterfly model in action ● Once there are no more “steps”, repeat “host” procedure: – With p host, choose new host, possibly link, etc. n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p host

44 44 McGlohon, Akoglu, Faloutsos KDD08 Butterfly model in action ● Once there are no more “steps”, repeat “host” procedure: – With p host, choose new host, possibly link, etc. n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p host

45 45 McGlohon, Akoglu, Faloutsos KDD08 Butterfly model in action ● Once there are no more “steps”, repeat “host” procedure: – With p host, choose new host, possibly link, etc. – Until no more steps, and no more hosts. n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p link

46 46 McGlohon, Akoglu, Faloutsos KDD08 Butterfly model in action ● Once there are no more “steps”, repeat “host” procedure: – With p host, choose new host, possibly link, etc. – Until no more steps, and no more hosts. n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p step

47 47 McGlohon, Akoglu, Faloutsos KDD08 a) Emergent, intuitive behavior Novelties of model: ● Nodes link with probability – May choose host, but not link (start new component) ● Incoming nodes are “social butterflies” – May have several hosts (merges components) ● Some nodes are friendlier than others – p step different for each node – This creates power-law degree distribution (theorem)

48 Validation of Butterfly ● Chose following parameters: – p host = 0.3 – p link = 0.5 – p step ~ U(0,1) ● Ran 10 simulations ● 100,000 nodes per simulation 48 McGlohon, Akoglu, Faloutsos KDD08

49 b) Shrinking diameter ● Shrinking diameter – In model, gelling usually occurred around N=20,000 49 McGlohon, Akoglu, Faloutsos KDD08 Nodes Diam- eter N=20,000

50 ● Constant / oscillating NLCC’s Nodes NLCC size c) Oscillating NLCC’s 50 McGlohon, Akoglu, Faloutsos KDD08 N=20,000

51 d) Densification power law ● Densification: – Our datasets had a=(1.03, 1.7) – In [Leskovec+05-KDD], a= (1.1, 1.7) – Simulation produced a = (1.1,1.2) 51 McGlohon, Akoglu, Faloutsos KDD08 Nodes Edges N=20,000

52 e) Power-law degree distribution ● Power-law degree distribution – Exponents approx -2 52 McGlohon, Akoglu, Faloutsos KDD08 Degree Count

53 53 McGlohon, Akoglu, Faloutsos KDD08 Summary ● Studied several diverse public graphs – Measured at many timestamps – Unipartite and bipartite – Blogs, citations, real-world, network traffic – Largest was 6 million nodes, 10 million edges

54 54 McGlohon, Akoglu, Faloutsos KDD08 Summary ● Observations on unweighted graphs: A1: The GCC emerges at the “gelling point” A2: NLCC’s are of constant / oscillating size ● Observations on weighted graphs: A3: Total weight increases super-linearly with edges A4: Node’s weights increase super-linearly with degree, power law exponent iw A5: iw remains constant over time ● A6: Intuitive, emergent generative “butterfly” model, that matches properties

55 55 McGlohon, Akoglu, Faloutsos KDD08 References [Barabasi+99] Barabasi, A. L. & Albert, R. (1999), 'Emergence of scaling in random networks', Science 286(5439), 509--512. [Erdos+60] Erdos, P. & Renyi, A. (1960), 'On the evolution of random graphs', Publ. Math. Inst. Hungary. Acad. Sci. 5, 17-61. [Faloutsos * 99] Faloutsos, M.; Faloutsos, P. & Faloutsos, C. (1999), 'On Power-law Relationships of the Internet Topology', SIGCOMM, 251-262. [Kumar+99]. R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and Eli Upfal. Stochastic models for the Web graph. Proceedings of the 41th FOCS. 2000, pp. 57-65 [Kumar+06] Kumar, R.; Novak, J. & Tomkins, A. (2006), Structure and evolution of online social networks, in 'KDD '06: Proceedings of the 12th ACM SIGKDD International Conference on Knowedge Discover and Data Mining', pp. 611—617. [Leskovec+05KDD] Leskovec, J.; Kleinberg, J. & Faloutsos, C. (2005), Graphs over time: densification laws, shrinking diameters and possible explanations, in 'KDD '05. [Leskovec+07] Leskovec, J.; Faloutsos, C. Scalable modeling of real graphs using Kronecker Multiplication. ICML 2007. [Milgram+67] Milgram, S. (1967), 'The small-world problem', Psychology Today 2, 60—67. [Pennock+02] Winners don’t take all: Characterizing the competition for links on the web PNAS 2002 [Wang+2002] Wang, M.; Madhyastha, T.; Chang, N. H.; Papadimitriou, S. & Faloutsos, C. (2002), 'Data Mining Meets Performance Evaluation: Fast Algorithms for Modeling Bursty Traffic', ICDE.

56 56 McGlohon, Akoglu, Faloutsos KDD08 Contact us Leman Akoglu www.andrew.cmu.edu/~lakoglu lakoglu@cs.cmu.edu Christos Faloutsos www.cs.cmu.edu/~christos christos@cs.cmu.edu Mary McGlohon www.cs.cmu.edu/~mmcgloho mmcgloho@cs.cmu.edu

57 ● From time series data, begin with resolution r= T/2. ● Record entropy H R 57 McGlohon, Akoglu, Faloutsos KDD08 Entropy plots [Wang+2002] Time  Weights Resolution Entropy

58 ● From time series data, begin with resolution r= T/2. ● Record entropy H R` 58 McGlohon, Akoglu, Faloutsos KDD08 Entropy plots Time  Weights Resolution Entropy

59 ● From time series data, begin with resolution r= T/2. ● Record entropy H R ● Recursively take finer resolutions. 59 McGlohon, Akoglu, Faloutsos KDD08 Entropy plots Time  Weights Resolution Entropy

60 ● From time series data, begin with resolution r= T/2. ● Record entropy H R ● Recursively take finer resolutions. 60 McGlohon, Akoglu, Faloutsos KDD08 Entropy plots Time  Weights Resolution Entropy

61 61 McGlohon, Akoglu, Faloutsos KDD08 Entropy Plots ● Self-similarity  Linear plot Resolution Entropy s= 0.59 ● Self-similarity  Linear plot ●

62 62 McGlohon, Akoglu, Faloutsos KDD08 Entropy Plots ● Self-similarity  Linear plot Resolution Entropy s= 0.59 ● Self-similarity  Linear plot ● Uniform: slope of plot s=1. time

63 63 McGlohon, Akoglu, Faloutsos KDD08 Entropy Plots ● Self-similarity  Linear plot Resolution Entropy s= 0.59 ● Self-similarity  Linear plot ● Uniform: slope of plot s=1. Point mass: s=0 time

64 64 McGlohon, Akoglu, Faloutsos KDD08 Entropy Plots ● Self-similarity  Linear plot Resolution Entropy s= 0.59 ● Self-similarity  Linear plot ● Uniform: slope of plot s=1. Point mass: s=0 time Bursty: 0.2 < s < 0.9


Download ppt "Weighted Graphs and Disconnected Components Patterns and a Generator Mary McGlohon, Leman Akoglu, Christos Faloutsos Carnegie Mellon University School."

Similar presentations


Ads by Google