Graphs over time: densification laws, shrinking diameters and possible explanations 1.

Slides:



Advertisements
Similar presentations
1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
Advertisements

Jurij Leskovec, CMU Jon Kleinberg, Cornell Christos Faloutsos, CMU
Algorithmic and Economic Aspects of Networks Nicole Immorlica.
Power Laws By Cameron Megaw 3/11/2013. What is a Power Law?
Analysis and Modeling of Social Networks Foudalis Ilias.
Week 5 - Models of Complex Networks I Dr. Anthony Bonato Ryerson University AM8002 Fall 2014.
Lecture 21 Network evolution Slides are modified from Jurij Leskovec, Jon Kleinberg and Christos Faloutsos.
Kronecker Graphs: An Approach to Modeling Networks Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, Zoubin Ghahramani Presented.
Information Networks Generative processes for Power Laws and Scale-Free networks Lecture 4.
Models of Network Formation Networked Life NETS 112 Fall 2013 Prof. Michael Kearns.
Generative Models for the Web Graph José Rolim. Aim Reproduce emergent properties: –Distribution site size –Connectivity of the Web –Power law distriubutions.
What did we see in the last lecture?. What are we going to talk about today? Generative models for graphs with power-law degree distribution Generative.
Information Retrieval Lecture 8 Introduction to Information Retrieval (Manning et al. 2007) Chapter 19 For the MSc Computer Science Programme Dell Zhang.
SILVIO LATTANZI, D. SIVAKUMAR Affiliation Networks Presented By: Aditi Bhatnagar Under the guidance of: Augustin Chaintreau.
Information Networks Small World Networks Lecture 5.
Advanced Topics in Data Mining Special focus: Social Networks.
CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
1 Evolution of Networks Notes from Lectures of J.Mendes CNR, Pisa, Italy, December 2007 Eva Jaho Advanced Networking Research Group National and Kapodistrian.
Topology Generation Suat Mercan. 2 Outline Motivation Topology Characterization Levels of Topology Modeling Techniques Types of Topology Generators.
1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira.
Complex Networks Third Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
CS728 Lecture 5 Generative Graph Models and the Web.
Network Models Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Models Why should I use network models? In may 2011, Facebook.
The Barabási-Albert [BA] model (1999) ER Model Look at the distribution of degrees ER ModelWS Model actorspower grid www The probability of finding a highly.
The structure of the Internet. How are routers connected? Why should we care? –While communication protocols will work correctly on ANY topology –….they.
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
Peer-to-Peer and Grid Computing Exercise Session 3 (TUD Student Use Only) ‏
CS Lecture 6 Generative Graph Models Part II.
On Power-Law Relationships of the Internet Topology CSCI 780, Fall 2005.
Analysis of the Internet Topology Michalis Faloutsos, U.C. Riverside (PI) Christos Faloutsos, CMU (sub- contract, co-PI) DARPA NMS, no
Advanced Topics in Data Mining Special focus: Social Networks.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
Summary from Previous Lecture Real networks: –AS-level N= 12709, M=27384 (Jan 02 data) route-views.oregon-ix.net, hhtp://abroude.ripe.net/ris/rawdata –
Computer Science 1 Web as a graph Anna Karpovsky.
Measurement and Evolution of Online Social Networks Review of paper by Ophir Gaathon Analysis of Social Information Networks COMS , Spring 2011,
On Power-Law Relationships of the Internet Topology.
Online Social Networks and Media Network models. What is a network model? Informally, a network model is a process (radomized or deterministic) for generating.
The Erdös-Rényi models
Information Networks Power Laws and Network Models Lecture 3.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Survey on Evolving Graphs Research Speaker: Chenghui Ren Supervisors: Prof. Ben Kao, Prof. David Cheung 1.
Weighted Graphs and Disconnected Components Patterns and a Generator IDB Lab 현근수 In KDD 08. Mary McGlohon, Leman Akoglu, Christos Faloutsos.
COLOR TEST COLOR TEST. Social Networks: Structure and Impact N ICOLE I MMORLICA, N ORTHWESTERN U.
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Most of contents are provided by the website Network Models TJTSD66: Advanced Topics in Social Media (Social.
Slides are modified from Lada Adamic and Jure Leskovec
School of Information University of Michigan Unless otherwise noted, the content of this course material is licensed under a Creative Commons Attribution.
How Do “Real” Networks Look?
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
Graph Models Class Algorithmic Methods of Data Mining
Topics In Social Computing (67810)
How Do “Real” Networks Look?
How Do “Real” Networks Look?
How Do “Real” Networks Look?
Models of Network Formation
Lecture 13 Network evolution
Models of Network Formation
Peer-to-Peer and Social Networks Fall 2017
Models of Network Formation
How Do “Real” Networks Look?
Models of Network Formation
Graph and Tensor Mining for fun and profit
Lecture 21 Network evolution
Modelling and Searching Networks Lecture 2 – Complex Networks
Network Models Michael Goodrich Some slides adapted from:
Advanced Topics in Data Mining Special focus: Social Networks
Advanced Topics in Data Mining Special focus: Social Networks
What did we see in the last lecture?
Presentation transcript:

Graphs over time: densification laws, shrinking diameters and possible explanations 1

Introduction (I) Key Questions Evolution over time “Normal” Growth Patterns? Conventional Wisdom Constant Average Degree Slowly growing diameter Recent Work social, technological (routers or AS networks), scientific networks, citation graphs Key Parameters Node degree: Heavy tailed in-degree distributions Node Pair Distance or Diameter (effective) 2

Introduction (II) 3

Observations I: Densification Law Exponents 1. ArXiv Citation Graph Directed (i,j) in E if a paper i cites paper j Time: January April 2003 – monthly snapshots Densification Law Plot  exponent α =1.68 (>>1: deviation from linear growth) Out-Degree: Phantom nodes/edges phenomenon due to ‘missing past ‘ 2. Patents Citation Graph 37 years (January 1, 1963 to December 30, 1999) α = Autonomous Systems Graph Communication network between Autonomous Systems (AS) (sub-graphs)  traffic between peers (source: Border Gateway Protocol logs ) Time: 785 days (November to January ) Both addition and deletion of nodes/edges α = Affiliation Graphs Bipartite graphs from ArXiv edge between people to the papers authored, with co- authorship i.e. nodes at least 1 edge with same paper- node Time: April March 2002 α ~ 1.08 – 1.15 (5 sets) 9 datasets - 4 different sources graph G{n(t),e(t)} up to t  DPL plot: e(t) = f(n(t)), log-log scale 4

Observations II: Shrinking Effective Diameter Robustness: Could this result from artifacts? (a) sampling problems: shortest paths estimation BFS, Approximate Neighborhood Function(ANF) (b) Disconnected components (c) “missing” past Pick t 0 > 0 (cut-off time) as beginning of data put back in the nodes/edges from before time t 0 Measurements: i. Full graph: effective d of the graph’s giant component ii. Post-t 0 sub-graph: as if we knew full past, i.e. for t > t 0, graph with all nodes dated before t  effective diameter of the sub-graph of nodes dated in [t 0, t] iii. Post-t 0 sub-graph, no past: ∨ nodes dated before t 0 delete all out-links (d) Emergence of giant component Definitions: g(d): fraction of connected node pairs with shortest path <d hop-plot: pairs (d,g(d)); i.e. cdf of distances between connected node pairs (interpolation to extend to real line) Effective diameter = the value of d s.t g(d) = 0.9 5

6 (c)”Missing Past”

(d) Emergence of giant component e.g. diameter of GCC shrinks in ER model Could this lead to shrinking diameters? Answer: NO Remarks: No past graph: smaller size GCC but deleted nodes negligible at t grows large After 4 years: 90% of nodes in GCC “mature” graph exhibits same behavior Fraction of all nodes that are part of the largest connected component (GCC) over time  Full graph  No past Graph 7

Proposed Models Objective: models to capture phenomena: Densification power law Shrinking diameters Heavy tailed in-out degree distributions Small world phenomena (6 degrees of separation) How? each new node out-links ✗ (no insight) Instead: Self-similarity,“communities within communities” Examples of recursive patterns: geography based links in computer networks, smaller communities easier ties in social nets, hyperlinks for related topics WWW Qualitative: linkage probability  height of least common ancestor Quantitative: Difficulty Constant = measure of difficulty in crossing communities 8

Community Guided Attachment (CGA) Tree Γ, height H, branching factor b Number of leaves h(v,w): height of smallest sub-tree containing both v,w Difficulty Function (linkage probability) decreasing in h scale-free i.e level independent, c≥1: difficulty const. THEOREM 1. In the CGA random graph model, the expected average out-degree d of a node is proportional to Note: if c 1, α= 2−log b (c) and takes all values in (1,2] as long as c in [1, b) 9

Dynamic Community Guided Attachment 10 Idea: nodes also connecting to internal nodes of Γ Construction: In time step t 1)add b new leaves as children of each leaf: b-ary tree of depth (t − 1)  depth t 2)form out-links with any node independently w.p. Extension: tree-distance d(v,w): length of path between v,w in Γ e.g. for leaves: d(v,w) = 2h(v,w) THEOREM 2. The dynamic CGA model has the following properties: When c < b, the average node degree is n (1− logb(c)) & in-degrees ~ Zipf distribution with exponent 1/2log b (c) When b < c < b 2, the average node degree is constant, and the in-degrees ~ Zipf distribution with exponent 1 − 1/2log b (c) When c > b 2, the average node degree is constant and the probability of an in- degree exceeding any constant bound k decreases exponentially in k Power Law Distribution Zipf's law: Given natural language utterances: frequency of a word ~ 1/rank

Forest fire model Idea: link { v i } (i = 1,...k) to nodes of G s.t. G final : “rich get richer” attachment  heavy-tailed in-degrees “copying” model  communities Community Guided Attachment  Densification Power Law shrinking effective diameters Construction i. nodes arrive in over time ii. “center of gravity” in some part of the network iii. linkage probability decreases rapidly with their distance from this center of gravity Result: very large number of out-links: skewed out-degree distr. & “bridges” for disparate sub-graphs 11

12 Model Formulation {t=1} G 1 = {v} {t>1} node v joins G t = graph up to t: (i) v first chooses an ambassador node w uniformly (ii) for a r.v. x (binomially distributed), E(x)= 1/(1 − p), v selects x nodes incident to w, in- links w.p. r times less than out-links (w 1, w 2,..., w x ) (iii) v forms out-links to w 1, w 2,..., w x, and then applies step (ii) recursively to each of w i (nodes not re-visited -> no cycling) Forest fire model Extensions “Orphans”: isolated nodes eg. Papers in arXiv citing non-arXiv ones  2 incorporation approaches: (a) start with n 0 > 1 nodes and (b) probability q>0 no links for newcomer Multiple ambassadors: newcomers choose more than one ambassador with some positive probability i.e burning links starting from 2 or more nodes p=forward burning prob. p b =backward burning prob. r=p b /p

Results Heavy-tailed in and out-degree distributions 13 α=1.01 p = 0.35 p b = 0.20 α=1.32 p = 0.37 p b = 0.33 α~2 p = 0.38 p b = 0.35

Phase Plot Contour plot of densification exponent a with respect to (a)Backward Burning Ratio r (y-axis) (b)Forward Burning Probability p (x-axis) white color : α = 1 (constant average degree) black color: α = 2 (graph is “dense”) Grey area: very narrow  sharp transition Note: fit a logarithmic function d = α log t + β (t is the current time ~ n(t)) Coefficient α<0: decreasing diameter 14

Conclusions Innovation: time evolution of real graphs Main Findings: Densification Power Law: average out-degree grow Shrinking diameters: in real networks, contrary to standard assumption, the diameter may exhibit a gradual decrease as the network grows Community Guided Attachment model can lead to the Densification Power Law, only one parameter suffices Forest Fire Model, based on only two parameters, captures patterns observed both in previous work and in the current study: heavy-tailed in- and out-degrees, the Densification Power Law, and a shrinking diameter Potential relevance in multiple settings: ’what if’ scenarios forecasting of future parameters of computer and social networks anomaly detection on monitored graphs designing graph sampling algorithms realistic graph generators 15

16