Dynamics of networks Jure Leskovec Computer Science Department

Slides:



Advertisements
Similar presentations
Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.
Advertisements

1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
Modeling Blog Dynamics Speaker: Michaela Götz Joint work with: Jure Leskovec, Mary McGlohon, Christos Faloutsos Cornell University Carnegie Mellon University.
Spread of Influence through a Social Network Adapted from :
Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.
Cost-effective Outbreak Detection in Networks Jure Leskovec Machine Learning Department, CMU Joint work with Andreas Krause, Carlos Guestrin, Christos.
Analysis and Modeling of Social Networks Foudalis Ilias.
Jure Leskovec, CMU Lars Backstrom, Cornell Ravi Kumar, Yahoo! Research Andrew Tomkins, Yahoo! Research.
Lecture 21 Network evolution Slides are modified from Jurij Leskovec, Jon Kleinberg and Christos Faloutsos.
Kronecker Graphs: An Approach to Modeling Networks Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, Zoubin Ghahramani Presented.
SILVIO LATTANZI, D. SIVAKUMAR Affiliation Networks Presented By: Aditi Bhatnagar Under the guidance of: Augustin Chaintreau.
Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta, Michael Mahoney Yahoo! Research.
Advanced Topics in Data Mining Special focus: Social Networks.
CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.
Hierarchy in networks Peter Náther, Mária Markošová, Boris Rudolf Vyjde : Physica A, dec
CS728 Lecture 5 Generative Graph Models and the Web.
Emergence of Scaling in Random Networks Barabasi & Albert Science, 1999 Routing map of the internet
Small Worlds Presented by Geetha Akula For the Faculty of Department of Computer Science, CALSTATE LA. On 8 th June 07.
Modeling Real Graphs using Kronecker Multiplication
Cascading Behavior in Large Blog Graphs Patterns and a Model Leskovec et al. (SDM 2007)
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
A Measurement-driven Analysis of Information Propagation in the Flickr Social Network WWW09 报告人: 徐波.
Measurement and Evolution of Online Social Networks Review of paper by Ophir Gaathon Analysis of Social Information Networks COMS , Spring 2011,
Leveraging Big Data: Lecture 11 Instructors: Edith Cohen Amos Fiat Haim Kaplan Tova Milo.
Models of Influence in Online Social Networks
Optimization Based Modeling of Social Network Yong-Yeol Ahn, Hawoong Jeong.
Jure Leskovec Machine Learning Department Carnegie Mellon University.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Eric Horvitz, Michael Mahoney,
Web Science Course Lecture: Social Networks - * Dr. Stefan Siersdorfer 1 * Figures from Easley and Kleinberg 2010 (
Jure Leskovec PhD: Machine Learning Department, CMU Now: Computer Science Department, Stanford University.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
December 7-10, 2013, Dallas, Texas
Maximizing the Spread of Influence through a Social Network David Kempe, Jon Kleinberg, Eva Tardos Cornell University KDD 2003.
Maximizing the Spread of Influence through a Social Network Authors: David Kempe, Jon Kleinberg, É va Tardos KDD 2003.
Jure Leskovec Computer Science Department Cornell University / Stanford University.
Jure Leskovec Machine Learning Department Carnegie Mellon University.
On-line Social Networks - Anthony Bonato 1 Dynamic Models of On-Line Social Networks Anthony Bonato Ryerson University WAW’2009 February 13, 2009 nt.
Most of contents are provided by the website Network Models TJTSD66: Advanced Topics in Social Media (Social.
How Do “Real” Networks Look?
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
Performance Evaluation Lecture 1: Complex Networks Giovanni Neglia INRIA – EPI Maestro 10 December 2012.
Cost-effective Outbreak Detection in Networks Presented by Amlan Pradhan, Yining Zhou, Yingfei Xiang, Abhinav Rungta -Group 1.
1 1 MPI for Intelligent Systems 2 Stanford University Manuel Gomez Rodriguez 1,2 Bernhard Schölkopf 1 S UBMODULAR I NFERENCE OF D IFFUSION NETWORKS FROM.
Jure Leskovec, CMU Lars Backstrom, Cornell Ravi Kumar, Yahoo! Research Andrew Tomkins, Yahoo! Research.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
Netlogo demo. Complexity and Networks Melanie Mitchell Portland State University and Santa Fe Institute.
Dynamics of Real-world Networks
Inferring Networks of Diffusion and Influence
Topics In Social Computing (67810)
Modeling networks using Kronecker multiplication
Who are the most influential bloggers?
How Do “Real” Networks Look?
Social and Information Network Analysis: Review of Key Concepts
How Do “Real” Networks Look?
How Do “Real” Networks Look?
Lecture 13 Network evolution
How Do “Real” Networks Look?
Dynamics of Real-world Networks
Graph and Tensor Mining for fun and profit
Cost-effective Outbreak Detection in Networks
Lecture 21 Network evolution
Modelling and Searching Networks Lecture 2 – Complex Networks
Navigation and Propagation in Networks
Advanced Topics in Data Mining Special focus: Social Networks
Presentation transcript:

Dynamics of networks Jure Leskovec Computer Science Department Cornell University / Stanford University Dynamics of networks

Today: Rich data Large on-line systems have detailed records of human activity On-line communities: Facebook (64 million users, billion dollar business) MySpace (300 million users) Communication: Instant Messenger (~1 billion users) News and Social media: Blogging (250 million blogs world-wide, presidential candidates run blogs) On-line worlds: World of Warcraft (internal economy 1 billion USD) Second Life (GDP of 700 million USD in ‘07)

Rich data as networks b) Internet (AS) c) Social networks a) World wide web Have text next to network picture d) Communication e) Citations f) Protein interactions

Networks: What do we know? We know lots about the network structure: Properties: Scale free [Barabasi ’99], 6-degrees of separation [Milgram ’67], Small-world [Watts-Strogatz ’98], Bipartite cores [Kumar et al. ’99], Network motifs [Milo et al. ‘02], Communities [Newman ‘99], Hubs and authorities [Page et al. ’98, Kleinberg ‘99] Models: Preferential attachment [Barabasi ’99], Small-world [Watts-Strogatz ‘98], Copying model [Kleinberg el al. ’99], Heuristically optimized tradeoffs [Fabrikant et al. ‘02], Latent Space Models [Raftery et al. ‘02], Searchability [Kleinberg ‘00], Bowtie [Broder et al. ‘00], Exponential Random Graphs [Frank-Strauss ‘86], Transit-stub [Zegura ‘97], Jellyfish [Tauro et al. ‘01] We know much less about processes and dynamics of networks Saberi for Stanford Split into two!

This talk: Network dynamics Network evolution How network structure changes as the network grows and evolves? Diffusion and cascading behavior How do rumors and diseases spread over networks?

This talk: Scale matters We need massive network data for the patterns to emerge: MSN Messenger network [WWW ’08] (the largest social network ever analyzed) 240M people, 255B messages, 4.5 TB data Product recommendations [EC ‘06] 4M people, 16M recommendations Blogosphere [work in progress] 60M posts, 120M links

This talk: The structure Diffusion & Cascades Network Evolution Patterns & Observations Q1: What do cascades look like? Q2: How likely are people to get influenced? Q5: How does network structure change as the network grows/evolves? Models & Algorithms Q3: How do we find influential nodes? Q4: How do we quickly detect epidemics? Q6:How do we generate realistic looking and evolving networks?

This talk: The structure Diffusion & Cascades Network Evolution Patterns & Observations Q1: What do cascades look like? Q2: How likely are people to get influenced? Q5: How does network structure change as the network grows/evolves? Models & Algorithms Q3: How do we find influential nodes? Q4: How do we quickly detect epidemics? Q6:How do we generate realistic looking and evolving networks?

Diffusion and Cascades Behavior that cascades from node to node like an epidemic News, opinions, rumors Word-of-mouth in marketing Infectious diseases As activations spread through the network they leave a trace – a cascade ADD CITATIONS Cascade (propagation graph) Network

Network diffusion Network diffusion has been extensively studied: Human behavior [Granovetter ‘78] Diseases and epidemics [Bailey ‘75] Innovations [Rogers ‘95] On the web [Gruhl et al. ‘04] Organizations [Burt ’04, Aral-Brynjolfsson-van Alstyne ‘07] For marketing purposes [Richardson-Domingos ‘02, Hill-Provost-Volinsky ‘06] Trading behaviors [Hirshleifer et al. ‘94] Decision making [Bikhchandani ‘98, Surowiecky ‘05] We know much less about individual cascading events that lead to diffusion

Setting 1: Viral marketing [w/ Adamic-Huberman, EC ’06] Setting 1: Viral marketing People send and receive product recommendations, purchase products Data: Large online retailer: 4 million people, 16 million recommendations, 500k products 10% credit 10% off

[w/ Glance-Hurst et al., SDM ’07] Setting 2: Blogosphere Bloggers write posts and refer (link) to other posts and the information propagates Data: 10.5 million posts, 16 million links

Q1) What do cascades look like? [w/ Kleinberg-Singh, PAKDD ’06] Q1) What do cascades look like? Are they stars? Chains? Trees? Information cascades (blogosphere): propagation Viral marketing (DVD recommendations): Rhetorical question: how do cascades look like (star? Chain? How big?) – intro before showing a picture Order Amazon cascades by frequency (ordered by frequency) Viral marketing cascades are more social: Collisions (no summarizers) Richer non-tree structures

Q2) Human adoption curves Prob. of adoption depends on the number of friends who have adopted [Bass ‘69, Shelling ’78] What is the shape? Distinction has consequences for models and algorithms k = number of friends adopting Prob. of adoption k = number of friends adopting Prob. of adoption To find the answer we need lots of data Diminishing returns? Critical mass?

Q2) Adoption curve: Validation [w/ Adamic-Huberman, EC ’06] Q2) Adoption curve: Validation DVD recommendations (8.2 million observations) # recommendations received Probability of purchasing Logarithmic contribution – strength of the group increases logarithmically (?) Adoption curve follows the diminishing returns. Can we exploit this? Later similar findings were made for group membership [Backstrom-Huttenlocher-Kleinberg ‘06], and probability of communication [Kossinets-Watts ’06]

My research: The structure Diffusion & Cascades Network Evolution Patterns & Observations A1: Cascade shapes A2: Human adoption follows diminishing returns Q5: How does network structure change as the network grows/evolves? Models & Algorithms Q3: How do we find influential nodes? Q4: How do we quickly detect epidemics? Q6:How do we generate realistic looking and evolving networks?  

Cascade & outbreak detection Blogs – information epidemics Which are the influential/infectious blogs? Viral marketing Who are the trendsetters? Influential people? Disease spreading Where to place monitoring stations to detect epidemics?

The problem: Detecting cascades [w/ Krause-Guestrin et al., KDD ’07] The problem: Detecting cascades c1 c3 c2 How to quickly detect epidemics as they spread?

Two parts to the problem [w/ Krause-Guestrin et al., KDD ’07] Two parts to the problem Cost: Cost of monitoring is node dependent Reward: Minimize the number of affected nodes: If A are the monitored nodes, let R(A) denote the number of nodes we save We also consider other rewards: Minimize time to detection Maximize number of detected outbreaks A ( ) R(A) Add directions to the cascade

Reward for detecting cascade i Optimization problem Given: Graph G(V,E), budget M Data on how cascades C1, …, Ci, …,CK spread over time Select a set of nodes A maximizing the reward subject to cost(A) ≤ M Solving the problem exactly is NP-hard Max-cover [Khuller et al. ’99] Reward for detecting cascade i We extend the result of [Kempe-Kleinberg-Tardos ’03]

Detection: Solution outline [w/ Krause-Guestrin et al., KDD ’07] Detection: Solution outline Problem structure Submodularity of the reward functions (think of it as “concavity”) CELF algorithm with approximate guarantee Speed up Lazy evaluation

Problem structure: Submodularity [w/ Krause-Guestrin et al., KDD ’07] Problem structure: Submodularity New monitored node: S1 S1 S’ S’ S3 S2 Adding S’ helps a lot Adding S’ helps very little S2 S4 Placement A={S1, S2} Placement B={S1, S2, S3, S4} Gain of adding a node to small set is larger than gain of adding a node to large set Submodularity: diminishing returns, think of it as “concavity”) Other objective functions are not just set cover

Problem structure: Submodularity We must show R is submodular: A  B Natural example: Sets A1, A2,…, An R(A) = size of union of Ai (size of covered area) If R1,…,RK are submodular, then ∑pi Ri is submodular Gain of adding a node to a small set Gain of adding a node to a large set R(A  {u}) – R(A) ≥ R(B  {u}) – R(B) B A u SPLIT

Reward function is submodular [w/ Krause-Guestrin et al., KDD ’07] Reward function is submodular Theorem: Reward function is submodular Consider cascade i: Ri(uk) = set of nodes saved from uk Ri(A) = size of union Ri(uk), ukA Ri is submodular Global optimization: R(A) =  Prob(i) Ri(A)  R is submodular Ri(u2) u2 Cascade i u1 Other objective functions are not just set cover Ri(u1)

Solution: CELF Algorithm [w/ Krause-Guestrin et al., KDD ’07] Solution: CELF Algorithm We develop CELF algorithm: Two independent runs of a modified greedy Solution set A’: ignore cost, greedily optimize reward Solution set A’’: greedily optimize reward/cost ratio Pick best of the two: arg max(R(A’), R(A’’)) Theorem: If R is submodular then CELF is near optimal: CELF achieves ½(1-1/e) factor approximation a d b c a c d b e Current solution: {a, c} Current solution: {} Current solution: {a} Marginal reward

Blogs: Information epidemics Question: Which blogs should one read to be most up to date? Idea: Select blogs to cover the blogosphere. Each dot is a blog Proximity is based on the number of common cascades

Blogs: Information epidemics [w/ Krause-Guestrin et al., KDD ’07] Blogs: Information epidemics Which blogs should one read to catch big stories? CELF Reward (higher is better) In-links (used by Technorati) Out-links # posts Random Number of selected blogs (sensors) For more info see our website: www.blogcascade.org

Same problem: Water Network [w/ Krause et al., J. of Water Resource Planning] Same problem: Water Network Given: a real city water distribution network data on how contaminants spread over time Place sensors (to save lives) Problem posed by the US Environmental Protection Agency c1 S S c2

Water network: Results [w/ Ostfeld et al., J. of Water Resource Planning] Water network: Results CELF Author Score CMU (CELF) 26 Sandia 21 U Exter 20 Bentley systems 19 Technion (1) 14 Bordeaux 12 U Cyprus 11 U Guelph 7 U Michigan 4 Michigan Tech U 3 Malcolm 2 Proteo Technion (2) 1 Degree Population saved (higher is better) Random Population Flow Number of placed sensors Our approach performed best at the Battle of Water Sensor Networks competition

This talk: The structure Diffusion & Cascades Network Evolution Patterns & Observations A1: Cascade shapes A2: Human adoption follows diminishing returns Q5: How does network structure change as the network grows/evolves? Models & Algorithms A3, A4: CELF algorithm for detecting cascades and outbreaks Q6: How do we generate realistic looking and evolving networks?   

Background: Network models Empirical findings on real graphs led to new network models Such models make assumptions/predictions about other network properties What about network evolution? Model log degree log prob. Explains Power-law degree distribution Preferential attachment

[w/ Kleinberg-Faloutsos, KDD ’05] Q5) Network evolution What is the relation between the number of nodes and the edges over time? Prior work assumes: constant average degree over time Internet E(t) a=1.2 N(t) Networks are denser over time Densification Power Law: a … densification exponent (1 ≤ a ≤ 2) Citations E(t) a=1.6 N(t)

What is individual node behaviors are causing such patterns? [w/ Kleinberg-Faloutsos, KDD ’05] Q5) Network evolution Prior models and intuition say that the network diameter slowly grows (like log N, log log N) Internet diameter size of the graph Citations What is individual node behaviors are causing such patterns? Redraw the diameter curves Diameter shrinks over time as the network grows the distances between the nodes slowly decrease diameter time

[w/ Backstrom-Kumar-Tomkins, KDD ’08] Q5) Edge attachment We directly observe atomic events of network evolution (and not only network snapshots) and so on for millions… We observe evolution at finest scale Test individual edge attachment Directly observe events leading to network properties Compare network models by likelihood (and not by just summary network statistics)

Setting: Edge-by-edge evolution [w/ Backstrom-Kumar-Tomkins, KDD ’08] Setting: Edge-by-edge evolution Network datasets Full temporal information from the first edge onwards LinkedIn (N=7m, E=30m), Flickr (N=600k, E=3m), Delicious (N=200k, E=430k), Answers (N=600k, E=2m) We study 3 processes that control the evolution P1) Node arrival: node enters the network P2) Edge initiation: node wakes up, initiates an edge, goes to sleep P3) Edge destination: where to attach a new edge Are edges more likely to attach to high degree nodes? Are edges more likely to attach to nodes that are close? Rhetorical questions Do we have PA How are edges created? Triangle closing

Edge attachment degree bias [w/ Backstrom-Kumar-Tomkins, KDD ’08] Edge attachment degree bias Are edges more likely to connect to higher degree nodes? PA Gnp Network τ Gnp PA 1 Flickr Delicious Answers 0.9 LinkedIn 0.6 Flickr First direct proof of preferential attachment!

But, edges also attach locally [w/ Backstrom-Kumar-Tomkins, KDD ’08] But, edges also attach locally Just before the edge (u,w) is placed how many hops is between u and w? Fraction of triad closing edges PA Gnp Network % Δ Flickr 66% Delicious 28% Answers 23% LinkedIn 50% Flickr u w v Real edges are local. Most of them close triangles!

Q6) Generating realistic graphs Want to generate realistic networks: Given a real network Generate a synthetic network Compare graphs properties, e.g., degree distribution Why synthetic graphs? Anomaly detection, Simulations, Predictions, Null-model, Sharing privacy sensitive graphs, … Q: Which network properties do we care about? A: Don’t commit, let’s match adjacency matrices Degree distribution is a probability (not count)

The model: Kronecker graphs [w/ Chakrabarti-Kleinberg-Faloutsos, PKDD ’05] The model: Kronecker graphs Edge probability Edge probability pij Kronecker product of graph adjacency matrices (actually, there is also a nice social interpretation of the model) (3x3) (9x9) (27x27) Initiator Given a real graph. How to estimate the initiator G1? Animation / work & motivation We prove Kronecker graphs mimic real graphs: Power-law degree distribution, Densification, Shrinking/stabilizing diameter, Spectral properties

Kronecker graphs: Estimation [w/ Faloutsos, ICML ’07] Kronecker graphs: Estimation Maximum likelihood estimation P( | ) Kronecker arg max a b c d Naïve estimation takes O(N!N2): N! for different node labelings: Our solution: Metropolis sampling: N!  (big) const N2 for traversing graph adjacency matrix Our solution: Kronecker product (E << N2): N2 E Do stochastic gradient descent Not commit to particular Match adjacency matrices. Approximation We estimate the model in O(E)

Estimation: Epinions (N=76k, E=510k) [w/ Faloutsos, ICML ’07] Estimation: Epinions (N=76k, E=510k) We search the space of ~101,000,000 permutations Fitting takes 2 hours Real and Kronecker are very close 0.99 0.54 0.49 0.13 Degree distribution node degree probability Path lengths number of hops # reachable pairs “Network” values rank network value Make the plots in color!

Kronecker & Network structure [w/ Dasgupta-Lang-Mahoney, WWW ’08] Kronecker & Network structure 0.91 0.54 0.49 0.13 Fitting Epinions we obtained G1 = What does this tell about the network structure? As opposed to: 0.9 0.1 Core 0.9 edges Periphery0.1 edges 0.5 edges Core-periphery No communities No good cuts which gives a hierarchy 0.9 0.1

Small vs. Large networks Small and large networks are fundamentally different Scientific collaborations (N=397, E=914) Collaboration network (N=4,158, E=13,422)

Conclusions and reflections Why are networks the way they are? Only recently have basic properties been observed on a large scale Confirms social science intuitions; calls others into question. Benefits of working with large data What patterns do we observe in massive networks? What microscopic mechanisms cause them? Social network of the whole world? Propagation of local interaction to global scale Web projections, surprise prediction Add drawings Shring into 2 bullets

Planetary look on a small-world [w/ Horvitz, WWW ’08, Nature ‘08] Planetary look on a small-world Small-world experiment [Milgram ‘67] People send letters from Nebraska to Boston How many steps does it take? Messenger social network – largest network analyzed 240M people, 30B conversations, 4.5TB data Milgram’s small world experiment MSN Messenger network Number of steps between pairs of people (i.e., hops + 1)

Reflections on Diffusion Predictive modeling of information diffusion When, where and what information will create a cascade? Where should one tap the network to get the effect they want? Social Media Marketing How do news and information spread New ranking and influence measures Sentiment analysis from cascade structure How to introduce incentives?

Actively influencing the network What’s next? Observations: Data analysis Models: Predictions Algorithms: Applications Actively influencing the network