Weighted Graphs and Disconnected Components Patterns and a Generator IDB Lab. 2014. 8. 1. 현근수 In KDD 08. Mary McGlohon, Leman Akoglu, Christos Faloutsos.

Slides:



Advertisements
Similar presentations
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
Advertisements

1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
Jurij Leskovec, CMU Jon Kleinberg, Cornell Christos Faloutsos, CMU
1 Realistic Graph Generation and Evolution Using Kronecker Multiplication Jurij Leskovec, CMU Deepay Chakrabarti, CMU/Yahoo Jon Kleinberg, Cornell Christos.
Emergence of Scaling in Random Networks Albert-Laszlo Barabsi & Reka Albert.
Modeling Blog Dynamics Speaker: Michaela Götz Joint work with: Jure Leskovec, Mary McGlohon, Christos Faloutsos Cornell University Carnegie Mellon University.
Analysis and Modeling of Social Networks Foudalis Ilias.
Jure Leskovec, CMU Lars Backstrom, Cornell Ravi Kumar, Yahoo! Research Andrew Tomkins, Yahoo! Research.
Week 5 - Models of Complex Networks I Dr. Anthony Bonato Ryerson University AM8002 Fall 2014.
Lecture 21 Network evolution Slides are modified from Jurij Leskovec, Jon Kleinberg and Christos Faloutsos.
Kronecker Graphs: An Approach to Modeling Networks Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, Zoubin Ghahramani Presented.
What did we see in the last lecture?. What are we going to talk about today? Generative models for graphs with power-law degree distribution Generative.
SILVIO LATTANZI, D. SIVAKUMAR Affiliation Networks Presented By: Aditi Bhatnagar Under the guidance of: Augustin Chaintreau.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
1 Evolution of Networks Notes from Lectures of J.Mendes CNR, Pisa, Italy, December 2007 Eva Jaho Advanced Networking Research Group National and Kapodistrian.
1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira.
Complex Networks Third Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
CS728 Lecture 5 Generative Graph Models and the Web.
Masters Thesis Defense Amit Karandikar Advisor: Dr. Anupam Joshi Committee: Dr. Finin, Dr. Yesha, Dr. Oates Date: 1 st May 2007 Time: 9:30 am Place: ITE.
Graphs (Part I) Shannon Quinn (with thanks to William Cohen of CMU and Jure Leskovec, Anand Rajaraman, and Jeff Ullman of Stanford University)
Networks. Graphs (undirected, unweighted) has a set of vertices V has a set of undirected, unweighted edges E graph G = (V, E), where.
Modeling Real Graphs using Kronecker Multiplication
Weighted Graphs and Disconnected Components Patterns and a Generator Mary McGlohon, Leman Akoglu, Christos Faloutsos Carnegie Mellon University School.
Social Networks and Graph Mining Christos Faloutsos CMU - MLD.
CMU SCS KDD 2006Leskovec & Faloutsos1 ??. CMU SCS KDD 2006Leskovec & Faloutsos2 Sampling from Large Graphs poster# 305 Jurij (Jure) Leskovec Christos.
CMU SCS Mining Large Graphs Christos Faloutsos CMU.
1 Complex systems Made of many non-identical elements connected by diverse interactions. NETWORK New York Times Slides: thanks to A-L Barabasi.
Network Design IS250 Spring 2010 John Chuang. 2 Questions  What does the Internet look like? -Why do we care?  Are there any structural invariants?
Peer-to-Peer and Grid Computing Exercise Session 3 (TUD Student Use Only) ‏
Common Properties of Real Networks. Erdős-Rényi Random Graphs.
CS Lecture 6 Generative Graph Models Part II.
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos Carnegie Mellon University.
Graphs over time: densification laws, shrinking diameters and possible explanations 1.
Advanced Topics in Data Mining Special focus: Social Networks.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
On Distinguishing between Internet Power Law B Bu and Towsley Infocom 2002 Presented by.
Measurement and Evolution of Online Social Networks Review of paper by Ophir Gaathon Analysis of Social Information Networks COMS , Spring 2011,
Peer-to-Peer and Social Networks Random Graphs. Random graphs E RDÖS -R ENYI MODEL One of several models … Presents a theory of how social webs are formed.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Patterns And A Generative Model Jan 24, 2014 Authors: Jianwei Niu, Wanjiun Liao, Jing Peng, Chao Tong Presenter: Guoming Wang Published: Performance Computing.
Survey on Evolving Graphs Research Speaker: Chenghui Ren Supervisors: Prof. Ben Kao, Prof. David Cheung 1.
Structural Analysis in Large Networks Observations and Applications Mary McGlohon Committee Christos Faloutsos, co-chair Alan Montgomery, co-chair Geoffrey.
Analysis of Social Media MLD , LTI William Cohen
Jure Leskovec PhD: Machine Learning Department, CMU Now: Computer Science Department, Stanford University.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
Emergence of Scaling and Assortative Mixing by Altruism Li Ping The Hong Kong PolyU
CMU SCS Large Graph Mining Christos Faloutsos CMU.
On-line Social Networks - Anthony Bonato 1 Dynamic Models of On-Line Social Networks Anthony Bonato Ryerson University WAW’2009 February 13, 2009 nt.
Butterfly model slides. Topological Model: “Butterfly” Objective: Develop model to help explain behavioral mechanisms that cause observed properties,
Class 9: Barabasi-Albert Model-Part I
Most of contents are provided by the website Network Models TJTSD66: Advanced Topics in Social Media (Social.
Du, Faloutsos, Wang, Akoglu Large Human Communication Networks Patterns and a Utility-Driven Generator Nan Du 1,2, Christos Faloutsos 2, Bai Wang 1, Leman.
R-MAT: A Recursive Model for Graph Mining Deepayan Chakrabarti Yiping Zhan Christos Faloutsos.
How Do “Real” Networks Look?
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
1 Patterns of Cascading Behavior in Large Blog Graphs Jure Leskoves, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst SDM 2007 Date:2008/8/21.
Network (graph) Models
Graph Models Class Algorithmic Methods of Data Mining
Lecture 1: Complex Networks
Modeling networks using Kronecker multiplication
Generative Model To Construct Blog and Post Networks In Blogosphere
Part 1: Graph Mining – patterns
Lecture 13 Network evolution
R-MAT: A Recursive Model for Graph Mining
Peer-to-Peer and Social Networks Fall 2017
Graph and Tensor Mining for fun and profit
Lecture 21 Network evolution
What did we see in the last lecture?
Presentation transcript:

Weighted Graphs and Disconnected Components Patterns and a Generator IDB Lab 현근수 In KDD 08. Mary McGlohon, Leman Akoglu, Christos Faloutsos

2 / 44 Outline  Introduction  Related Work  Data  Observation  Generative model  Conclusion

3 / 44 “Disconnected” components  In graphs a largest connected component emerges.  What about the smaller-size components?  How do they emerge, and join with the large one?

4 / 44 Weighted edges  Graphs have heavy-tailed degree distribution.  What can we also say about these edges?  How are they repeated, or otherwise weighted?

5 / 44 Goals  Observe “Next-largest connected components(NLCCs)” Q1. How does the GCC emerge? Q2. How do NLCC’s emerge and join with the GCC?  Find properties that govern edge weights Q3: How does the total weight of the graph relate to the number of edges? Q4: How do the weights of nodes relate to degree? Q5: Does this relation change with the graph?  Q6: Can we produce an emergent, generative model

6 / 44 Properties of networks  Small diameter (“small world” phenomenon) – [Milgram 67] [Leskovec, Horovitz 07]  Heavy-tailed degree distribution – [Barabasi, Albert 99] [Faloutsos, Faloutsos, Faloutsos 99]  Densification – [Leskovec, Kleinberg, Faloutsos 05]  “Middle region” components as well as GCC and singletons – [Kumar, Novak, Tomkins 06]

7 / 44 Generative Models  Erdos-Renyi model [Erdos, Renyi 60]  Preferential Attachment [Barabasi, Albert 99]  Forest Fire model [Leskovec, Kleinberg, Faloutsos 05]  Kronecker multiplication [Leskovec, Chakrabarti, Kleinberg, Faloutsos 07]  Edge Copying model [Kumar, Raghavan, Rajagopalan, Sivakumar, Tomkins, Upfal 00]  “Winners don’t take all” [Pennock, Flake, Lawrence, Glover, Giles 02]

8 / 44 Diameter  Diameter of a graph is the “longest shortest path”  Effective diameter is the distance at which 90% of nodes can be reached. diameter=3 n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7

9 / 44 Unipartite Networks  Postnet: Posts in blogs, hyperlinks between  Blognet: Aggregated Postnet, repeated edges  Patent: Patent citations  NIPS: Academic citations  Arxiv: Academic citations  NetTraffic: Packets, repeated edges  Autonomous Systems (AS): Packets, repeated edges n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 (3)

10 / 44 Unipartite Networks  Postnet: Posts in blogs, hyperlinks between  Blognet: Aggregated Postnet, repeated edges  Patent: Patent citations  NIPS: Academic citations  Arxiv: Academic citations  NetTraffic: Packets, repeated edges  Autonomous Systems (AS): Packets, repeated edges n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n

11 / 44 Unipartite Networks  (Nodes, Edges, Timestamps)  Postnet: 250K, 218K, 80 days  Blognet: 60K,125K, 80 days  Patent: 4M, 8M, 17 yrs  NIPS: 2K, 3K, 13 yrs  Arxiv: 30K, 60K, 13 yrs  NetTraffic: 21K, 3M, 52 mo  AS: 12K, 38K, 6 mo n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7

12 / 44 Bipartite Networks  IMDB: Actor-movie network  Netflix: User-movie ratings  DBLP: repeated edges – Author-Keyword – Keyword-Conference – Author-Conference  US Election Donations: $ weights, repeated edges – Orgs-Candidates – Individuals-Orgs n1n1 n2n2 n3n3 n4n4 m1m1 m2m2 m3m3

13 / 44 Bipartite Networks  IMDB: Actor-movie network  Netflix: User-movie ratings  DBLP: repeated edges – Author-Keyword – Keyword-Conference – Author-Conference  US Election Donations: $ weights, repeated edges – Orgs-Candidates – Individuals-Orgs n1n1 n2n2 n3n3 n4n4 m1m1 m2m2 m3m

14 / 44 Bipartite Networks  IMDB: 757K, 2M, 114 yr  Netflix: 125K, 14M, 72 mo  DBLP: 25 yr – Author-Keyword: 27K, 189K – Keyword-Conference: 10K, 23K – Author-Conference: 17K, 22K  US Election Donations: 22 yr – Orgs-Candidates: 23K, 877K – Individuals-Orgs: 6M, 10M n1n1 n2n2 n3n3 n4n4 m1m1 m2m2 m3m3

15 / 44 Observation 1: Gelling Point Q1: How does the GCC emerge?

16 / 44 Observation 1: Gelling Point  Most real graphs display a gelling point, or burning off period  After gelling point, they exhibit typical behavior. This is marked by a spike in diameter. Time Diameter IMDB t=1914

17 / 44 Observation 2: NLCC behavior Q2: How do NLCC’s emerge and join with the GCC? Do they continue to grow in size? Do they shrink? Stabilize?

18 / 44 Observation 2: NLCC behavior  After the gelling point, the GCC takes off, but NLCC’s remain constant or oscillate. Time IMDB CC size

19 / 44 Observation 3 Q3: How does the total weight of the graph relate to the number of edges?

20 / 44 Observation 3: Fortification Effect  $ = # checks ? |Checks| Orgs-Candidates |$|

21 / 44 Observation 3: Fortification Effect  Weight additions follow a power law with respect to the number of edges: – W(t): total weight of graph at t – E(t): total edges of graph at t – w is PL exponent – 1.01 < w < 1.5 = super-linear! – (more checks, even more $) |Checks| Orgs-Candidates |$|

22 / 44 Observation 4 and 5 Q4: How do the weights of nodes relate to degree? Q5: Does this relation change over time?

23 / 44 Observation 4: Snapshot Power Law  At any time, total incoming weight of a node is proportional to in degree with PL exponent, iw < iw < 1.26, super-linear  More donors, even more $ Edges (# donors) In-weights ($) Orgs-Candidates e.g. John Kerry, $10M received, from 1K donors

24 / 44 Observation 5:Snapshot Power Law  For a given graph, this exponent is constant over time. Time exponent Orgs-Candidates

25 / 44 Goals of model ● a) Emergent, intuitive behavior ● b) Shrinking diameter ● c) Constant NLCC’s ● d) Densification power law ● e) Power-law degree distribution

26 / 44 Goals of model ● a) Emergent, intuitive behavior ● b) Shrinking diameter ● c) Constant NLCC’s ● d) Densification power law ● e) Power-law degree distribution = “Butterfly” Model

27 / 44 Butterfly model in action  A node joins a network, with own parameter. n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p step “Curiosity”

28 / 44 Butterfly model in action  A node joins a network, with own parameter.  With (global) p host, chooses a random host n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p host “Cross-disciplinarity”

29 / 44 Butterfly model in action  A node joins a network, with own parameters.  With (global) p host, chooses a random host – With (global) p link, creates link n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p link “Friendliness”

30 / 44 Butterfly model in action  A node joins a network, with own parameters.  With (global) p host, chooses a random host – With (global) p link, creates link – With p step travels to random neighbor n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p step

31 / 44 Butterfly model in action  A node joins a network, with own parameters.  With (global) p host, chooses a random host – With (global) p link, creates link – With p step travels to random neighbor. Repeat. n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p link

32 / 44 Butterfly model in action  A node joins a network, with own parameters.  With (global) p host, chooses a random host – With (global) p link, creates link – With p step travels to random neighbor. Repeat. n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p step

33 / 44 Butterfly model in action  Once there are no more “steps”, repeat “host” procedure: – With p host, choose new host, possibly link, etc. n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p host

34 / 44 Butterfly model in action  Once there are no more “steps”, repeat “host” procedure: – With p host, choose new host, possibly link, etc. n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p host

35 / 44 Butterfly model in action  Once there are no more “steps”, repeat “host” procedure: – With p host, choose new host, possibly link, etc. – Until no more steps, and no more hosts. n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p link

36 / 44 Butterfly model in action  Once there are no more “steps”, repeat “host” procedure: – With p host, choose new host, possibly link, etc. – Until no more steps, and no more hosts. n1n1 n2n2 n3n3 n4n4 n5n5 n6n6 n7n7 n8n8 p step

37 / 44 a) Emergent, intuitive behavior Novelties of model:  Nodes link with probability – May choose host, but not link (start new component)  Incoming nodes are “social butterflies” – May have several hosts (merges components)  Some nodes are friendlier than others – p step different for each node – This creates power-law degree distribution (theorem)

38 / 44 Validation of Butterfly  Chose following parameters: – p host = 0.3 – p link = 0.5 – p step ~ U(0,1)  Ran 10 simulations  100,000 nodes per simulation

39 / 44 b) Shrinking diameter  Shrinking diameter – In model, gelling usually occurred around N=20,000 Nodes Diam- eter N=20,000

40 / 44  Constant / oscillating NLCC’s Nodes NLCC size c) Oscillating NLCC’s N=20,000

41 / 44 d) Densification power law  Densification: – Our datasets had a=(1.03, 1.7) – In [Leskovec+05-KDD], a= (1.1, 1.7) – Simulation produced a = (1.1,1.2) Nodes Edges N=20,000

42 / 44 e) Power-law degree distribution  Power-law degree distribution – Exponents approx -2 Degree Count

43 / 44 Summary  Studied several diverse public graphs – Measured at many timestamps – Unipartite and bipartite – Blogs, citations, real-world, network traffic – Largest was 6 million nodes, 10 million edges

44 / 44 Summary  Observations on unweighted graphs: A1: The GCC emerges at the “gelling point” A2: NLCC’s are of constant / oscillating size  Observations on weighted graphs: A3: Total weight increases super-linearly with edges A4: Node’s weights increase super-linearly with degree, power law exponent iw A5: iw remains constant over time  A6: Intuitive, emergent generative “butterfly” model, that matches properties