Understanding and Managing Cascades on Large Graphs B. Aditya Prakash Virginia Tech. Christos Faloutsos Carnegie Mellon University Sept 28, Tutorial, ECML-PKDD.

Slides:



Advertisements
Similar presentations
Understanding and Managing Cascades on Large Graphs
Advertisements

Yasuko Matsubara (Kyoto University),
1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
On the Vulnerability of Large Graphs
CMU SCS I2.2 Large Scale Information Network Processing INARC 1 Overview Goal: scalable algorithms to find patterns and anomalies on graphs 1. Mining Large.
FUNNEL: Automatic Mining of Spatially Coevolving Epidemics Yasuko Matsubara, Yasushi Sakurai (Kumamoto University) Willem G. van Panhuis (University of.
Influence propagation in large graphs - theorems and algorithms B. Aditya Prakash Christos Faloutsos
Modeling Blog Dynamics Speaker: Michaela Götz Joint work with: Jure Leskovec, Mary McGlohon, Christos Faloutsos Cornell University Carnegie Mellon University.
Dynamical Processes on Large Networks B. Aditya Prakash Carnegie Mellon University MMS, SIAM AN, Minneapolis, July 10,
Interacting Viruses: Can Both Survive? Alex Beutel, B. Aditya Prakash, Roni Rosenfeld, Christos Faloutsos Carnegie Mellon University, USA KDD 2012, Beijing.
Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.
DAVA: Distributing Vaccines over Networks under Prior Information
School of Information University of Michigan Network resilience Lecture 20.
© 2012 IBM Corporation IBM Research Gelling, and Melting, Large Graphs by Edge Manipulation Joint Work by Hanghang Tong (IBM) B. Aditya Prakash (Virginia.
Advanced Topics in Data Mining Special focus: Social Networks.
CMU SCS Large Graph Mining - Patterns, Tools and Cascade Analysis Christos Faloutsos CMU.
Making Diffusion Work for You B. Aditya Prakash Computer Science Virginia Tech. GraphEx Symposium, MIT Endicott House, Aug 21, 2014.
Dynamical Processes on Large Networks B. Aditya Prakash Carnegie Mellon University UMD-College Park, April 26, 2012.
CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.
CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU.
Social Networks and Graph Mining Christos Faloutsos CMU - MLD.
1 Epidemic Spreading in Real Networks: an Eigenvalue Viewpoint Yang Wang Deepayan Chakrabarti Chenxi Wang Christos Faloutsos.
Cascading Behavior in Large Blog Graphs Patterns and a Model Leskovec et al. (SDM 2007)
INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
U. Michigan participation in EDIN Lada Adamic, PI E 2.1 fractional immunization of networks E 2.1 time series analysis approach to correlating structure.
Models of Influence in Online Social Networks
Online Social Networks and Media Epidemics and Influence.
Influence propagation in large graphs - theorems and algorithms B. Aditya Prakash Christos Faloutsos
CMU SCS Large Graph Mining Christos Faloutsos CMU.
Understanding and Managing Cascades on Large Graphs B. Aditya Prakash Computer Science Virginia Tech. CS Seminar 11/30/2012.
Understanding and Managing Cascades on Large Graphs B. Aditya Prakash Carnegie Mellon University  Virginia Tech. Christos Faloutsos Carnegie Mellon University.
Jure Leskovec PhD: Machine Learning Department, CMU Now: Computer Science Department, Stanford University.
V5 Epidemics on networks
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
Directed-Graph Epidemiological Models of Computer Viruses Presented by: (Kelvin) Weiguo Jin “… (we) adapt the techniques of mathematical epidemiology to.
Interacting Viruses in Networks: Can Both Survive? Authors: Alex Beutel, B. Aditya Prakash, Roni Rosenfeld, and Christos Faloutsos Presented by: Zachary.
December 7-10, 2013, Dallas, Texas
Winner-takes-all: Competing Viruses or Ideas on fair-play Networks B. Aditya Prakash, Alex Beutel, Roni Rosenfeld, Christos Faloutsos Carnegie Mellon University,
Spotting Culprits in Epidemics: How many and Which ones? B. Aditya Prakash Virginia Tech Jilles Vreeken University of Antwerp Christos Faloutsos Carnegie.
Maximizing the Spread of Influence through a Social Network David Kempe, Jon Kleinberg, Eva Tardos Cornell University KDD 2003.
Hidden Hazards: Finding Missing Nodes in Large Graph Epidemics Shashidhar Sundareisan Virginia Tech Jilles Vreeken Max Planck Institute B. Aditya Prakash.
Influence propagation in large graphs - theorems and algorithms B. Aditya Prakash Christos Faloutsos
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
ECML-PKDD 2010, Barcelona, Spain B. Aditya Prakash*, Hanghang Tong* ^, Nicholas Valler+, Michalis Faloutsos+, Christos Faloutsos* * Carnegie Mellon University,
Propagation on Large Networks B. Aditya Prakash Christos Faloutsos Carnegie Mellon University.
Understanding and Predicting Human Behavior using Propagation: From Flu-trends to Cyber-Security B. Aditya Prakash Computer Science Virginia Tech. Keynote.
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015.
Propagation on Large Networks B. Aditya Prakash Christos Faloutsos Carnegie Mellon University.
CS 590 Term Project Epidemic model on Facebook
Propagation on Large Networks
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
1 Patterns of Cascading Behavior in Large Blog Graphs Jure Leskoves, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst SDM 2007 Date:2008/8/21.
Controlling Propagation at Group Scale on Networks Yao Zhang*, Abhijin Adiga +, Anil Vullikanti + *, and B. Aditya Prakash* *Department of Computer Science.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
Leveraging Information Theory for Mining Graphs and Sequences: From Propagation to Segmentation B. Aditya Prakash Computer Science Virginia Tech. ITA Workshop,
Arizona State University Fast Eigen-Functions Tracking on Dynamic Graphs Chen Chen and Hanghang Tong - 1 -
B. Aditya Prakash Computer Science Virginia Tech.
Inferring Networks of Diffusion and Influence
B. Aditya Prakash Naren Ramakrishnan
B. Aditya Prakash Department of Computer Science
B. Aditya Prakash Computer Science Virginia Tech.
MEIKE: Influence-based Communities in Networks
Cost-effective Outbreak Detection in Networks
Large Graph Mining: Power Tools and a Practitioner’s guide
Navigation and Propagation in Networks
Presentation transcript:

Understanding and Managing Cascades on Large Graphs B. Aditya Prakash Virginia Tech. Christos Faloutsos Carnegie Mellon University Sept 28, Tutorial, ECML-PKDD 2012, Bristol

Networks are everywhere! Human Disease Network [Barabasi 2007] Gene Regulatory Network [Decourty 2008] Facebook Network [2010] The Internet [2005] 2Prakash and Faloutsos 2012

3 Dynamical Processes over networks are also everywhere! Prakash and Faloutsos 2012

Why do we care? Social collaboration Information Diffusion Viral Marketing Epidemiology and Public Health Cyber Security Human mobility Games and Virtual Worlds Ecology Prakash and Faloutsos 20124

Why do we care? (1: Epidemiology) Dynamical Processes over networks [AJPH 2007] CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts Diseases over contact networks 5Prakash and Faloutsos 2012

Why do we care? (1: Epidemiology) Dynamical Processes over networks Each circle is a hospital ~3000 hospitals More than 30,000 patients transferred [US-MEDICARE NETWORK 2005] 6 Problem: Given k units of disinfectant, whom to immunize? Prakash and Faloutsos 2012

Why do we care? (1: Epidemiology) CURRENT PRACTICEOUR METHOD ~6x fewer! [US-MEDICARE NETWORK 2005] 7 Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year) Prakash and Faloutsos 2012

Why do we care? (2: Online Diffusion) 8 > 800m users, ~$1B revenue [WSJ 2010] ~100m active users > 50m users Prakash and Faloutsos 2012

Why do we care? (2: Online Diffusion) Dynamical Processes over networks Celebrity Buy Versace™! Followers 9 Social Media Marketing Prakash and Faloutsos 2012

Why do we care? (3: To change the world?) Dynamical Processes over networks Social networks and Collaborative Action 10Prakash and Faloutsos 2012

High Impact – Multiple Settings Q. How to squash rumors faster? Q. How do opinions spread? Q. How to market better? 11 epidemic out-breaks products/viruses transmit s/w patches Prakash and Faloutsos 2012

Research Theme DATA Large real-world networks & processes 12 ANALYSIS Understanding POLICY/ ACTION Managing Prakash and Faloutsos 2012

Research Theme – Public Health DATA Modeling # patient transfers ANALYSIS Will an epidemic happen? POLICY/ ACTION How to control out-breaks? 13Prakash and Faloutsos 2012

Research Theme – Social Media DATA Modeling Tweets spreading POLICY/ ACTION How to market better? ANALYSIS # cascades in future? 14Prakash and Faloutsos 2012

In this tutorial 15 ANALYSIS Understanding Given propagation models: Q1: What is the epidemic threshold? Q2: How do viruses compete? Prakash and Faloutsos 2012

In this tutorial 16 Q3: How to immunize and control out-breaks better? Q4: How to detect outbreaks? Q5: Who are the culprits? POLICY/ ACTION Managing Prakash and Faloutsos 2012

In this tutorial 17 DATA Large real-world networks & processes Q6: How do cascades look like? Q7: How does activity evolve over time? Q8: How does external influence act? Prakash and Faloutsos 2012

Outline Motivation Part 1: Understanding Epidemics (Theory) Part 2: Policy and Action (Algorithms) Part 3: Learning Models (Empirical Studies) Conclusion 18Prakash and Faloutsos 2012

Part 1: Theory Q1: What is the epidemic threshold? Q2: How do viruses compete? 19Prakash and Faloutsos 2012

A fundamental question Strong Virus Epidemic? 20Prakash and Faloutsos 2012

example (static graph) Weak Virus Epidemic? 21Prakash and Faloutsos 2012

Problem Statement Find, a condition under which – virus will die out exponentially quickly – regardless of initial infection condition above (epidemic) below (extinction) # Infected time 22 Separate the regimes? Prakash and Faloutsos 2012

Threshold (static version) Problem Statement Given: – Graph G, and – Virus specs (attack prob. etc.) Find: – A condition for virus extinction/invasion 23Prakash and Faloutsos 2012

Threshold: Why important? Accelerating simulations Forecasting (‘What-if’ scenarios) Design of contagion and/or topology A great handle to manipulate the spreading – Immunization – Maximize collaboration ….. 24Prakash and Faloutsos 2012

Part 1: Theory Q1: What is the epidemic threshold? – Background – Result and Intuition (Static Graphs) – Proof Ideas (Static Graphs) – Bonus: Dynamic Graphs Q2: How do viruses compete? 25Prakash and Faloutsos 2012

“SIR” model: life immunity (mumps) Each node in the graph is in one of three states – Susceptible (i.e. healthy) – Infected – Removed (i.e. can’t get infected again) 26 Prob. β Prob. δ t = 1t = 2t = 3 Prakash and Faloutsos 2012

Terminology: continued Other virus propagation models (“VPM”) – SIS : susceptible-infected-susceptible, flu-like – SIRS : temporary immunity, like pertussis – SEIR : mumps-like, with virus incubation (E = Exposed) ….…………. Underlying contact-network – ‘who-can-infect- whom’ 27Prakash and Faloutsos 2012

Related Work  R. M. Anderson and R. M. May. Infectious Diseases of Humans. Oxford University Press,  A. Barrat, M. Barthélemy, and A. Vespignani. Dynamical Processes on Complex Networks. Cambridge University Press,  F. M. Bass. A new product growth for model consumer durables. Management Science, 15(5):215–227,  D. Chakrabarti, Y. Wang, C. Wang, J. Leskovec, and C. Faloutsos. Epidemic thresholds in real networks. ACM TISSEC, 10(4),  D. Easley and J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press,  A. Ganesh, L. Massoulie, and D. Towsley. The effect of network topology in spread of epidemics. IEEE INFOCOM,  Y. Hayashi, M. Minoura, and J. Matsukubo. Recoverable prevalence in growing scale-free networks and the effective immunization. arXiv:cond-at/ v2, Aug  H. W. Hethcote. The mathematics of infectious diseases. SIAM Review, 42,  H. W. Hethcote and J. A. Yorke. Gonorrhea transmission dynamics and control. Springer Lecture Notes in Biomathematics, 46,  J. O. Kephart and S. R. White. Directed-graph epidemiological models of computer viruses. IEEE Computer Society Symposium on Research in Security and Privacy,  J. O. Kephart and S. R. White. Measuring and modeling computer virus prevalence. IEEE Computer Society Symposium on Research in Security and Privacy,  R. Pastor-Santorras and A. Vespignani. Epidemic spreading in scale-free networks. Physical Review Letters 86, 14,  ……… All are about either: Structured topologies (cliques, block-diagonals, hierarchies, random) Specific virus propagation models Static graphs All are about either: Structured topologies (cliques, block-diagonals, hierarchies, random) Specific virus propagation models Static graphs 28Prakash and Faloutsos 2012

Part 1: Theory Q1: What is the epidemic threshold? – Background – Result and Intuition (Static Graphs) – Proof Ideas (Static Graphs) – Bonus: Dynamic Graphs Q2: How do viruses compete? 29Prakash and Faloutsos 2012

How should the answer look like? Answer should depend on: – Graph – Virus Propagation Model (VPM) But how?? – Graph – average degree? max. degree? diameter? – VPM – which parameters? – How to combine – linear? quadratic? exponential? 30 ….. Prakash and Faloutsos 2012

Static Graphs: Our Main Result Informally, 31 For,  any arbitrary topology (adjacency matrix A)  any virus propagation model (VPM) in standard literature the epidemic threshold depends only 1.on the λ, first eigenvalue of A, and 2.some constant, determined by the virus propagation model λ λ No epidemic if λ * < 1 In Prakash+ ICDM 2011 Prakash and Faloutsos 2012

Our thresholds for some models s = effective strength s < 1 : below threshold Models Effective Strength (s) Threshold (tipping point) SIS, SIR, SIRS, SEIR s = λ. s = 1 SIV, SEIV s = λ. ( H.I.V. ) s = λ. 32Prakash and Faloutsos 2012

Our result: Intuition for λ “Official” definition: Let A be the adjacency matrix. Then λ is the root with the largest magnitude of the characteristic polynomial of A [det(A – xI)]. Doesn’t give much intuition! “Un-official” Intuition λ ~ # paths in the graph 33 u u ≈. (i, j) = # of paths i  j of length k Prakash and Faloutsos 2012

N nodes Largest Eigenvalue (λ) λ ≈ 2λ = Nλ = N-1 34 N = 1000 λ ≈ 2λ= 31.67λ= 999 better connectivity higher λ Prakash and Faloutsos 2012

Examples: Simulations – SIR (mumps) (a) Infection profile (b) “Take-off” plot PORTLAND graph 31 million links, 6 million nodes Fraction of Infections Footprint Effective Strength Time ticks 35Prakash and Faloutsos 2012

Examples: Simulations – SIRS (pertusis) Fraction of Infections Footprint Effective StrengthTime ticks (a) Infection profile (b) “Take-off” plot PORTLAND graph 31 million links, 6 million nodes 36Prakash and Faloutsos 2012

Part 1: Theory Q1: What is the epidemic threshold? – Background – Result and Intuition (Static Graphs) – Proof Ideas (Static Graphs) – Bonus: Dynamic Graphs Q2: How do viruses compete? 37Prakash and Faloutsos 2012

λ * < 1 Graph-based Model-based 38 Proof Sketch General VPM structure Topology and stability Prakash and Faloutsos 2012

Models and more models ModelUsed for SIRMumps SISFlu SIRSPertussis SEIRChicken-pox …….. SICRTuberculosis MSIRMeasles SIVSensor Stability H.I.V. ………. 39Prakash and Faloutsos 2012

Ingredient 1: Our generalized model 40 Endogenous Transitions SusceptibleInfected Vigilant Exogenous Transitions Endogenous Transitions SusceptibleInfected Vigilant Prakash and Faloutsos 2012

Special case: SIR SusceptibleInfected Vigilant 41Prakash and Faloutsos 2012

Special case: H.I.V. Multiple Infectious, Vigilant states 42 “Terminal” “Non-terminal” Prakash and Faloutsos 2012

Ingredient 2: NLDS+Stability View as a NLDS – discrete time – non-linear dynamical system (NLDS) Probability vector Specifies the state of the system at time t 43 size mN x size N (number of nodes in the graph) S S I I V V Prakash and Faloutsos 2012

Ingredient 2: NLDS + Stability View as a NLDS – discrete time – non-linear dynamical system (NLDS) Non-linear function Explicitly gives the evolution of system 44 size mN x Prakash and Faloutsos 2012

Ingredient 2: NLDS + Stability View as a NLDS – discrete time – non-linear dynamical system (NLDS) Threshold  Stability of NLDS 45Prakash and Faloutsos 2012

= probability that node i is not attacked by any of its infectious neighbors Special case: SIR size 3N x 1 I I R R S S NLDS I I R R S S 46Prakash and Faloutsos 2012

Fixed Point State when no node is infected Q: Is it stable? 47Prakash and Faloutsos 2012

Stability for SIR Stable under threshold Unstable above threshold 48Prakash and Faloutsos 2012

λ * < 1 Graph-based Model-based 49 General VPM structure Topology and stability See paper for full proof Prakash and Faloutsos 2012

Part 1: Theory Q1: What is the epidemic threshold? – Background – Result and Intuition (Static Graphs) – Proof Ideas (Static Graphs) – Bonus: Dynamic Graphs Q2: How do viruses compete? 50Prakash and Faloutsos 2012

Dynamic Graphs: Epidemic? adjacency matrix 8 8 Alternating behaviors DAY (e.g., work) 51Prakash and Faloutsos 2012

adjacency matrix 8 8 Dynamic Graphs: Epidemic? Alternating behaviors NIGHT (e.g., home) 52Prakash and Faloutsos 2012

SIS model – recovery rate δ – infection rate β Set of T arbitrary graphs Model Description day N N night N N, weekend….. Infected Healthy XN1 N3 N2 Prob. β Prob. δ 53Prakash and Faloutsos 2012

Informally, NO epidemic if eig (S) = < 1 Our result: Dynamic Graphs Threshold Single number! Largest eigenvalue of The system matrix S In Prakash+, ECML-PKDD S = Prakash and Faloutsos 2012

Synthetic MIT Reality Mining log(fraction infected) Time BELOW AT ABOVE AT BELOW Infection-profile 55Prakash and Faloutsos 2012

“Take-off” plots Footprint (# “steady state”) Our threshold (log scale) NO EPIDEMIC EPIDEMIC NO EPIDEMIC SyntheticMIT Reality 56Prakash and Faloutsos 2012

Part 1: Theory Q1: What is the epidemic threshold? Q2: What happens when viruses compete? – Mutually-exclusive viruses – Interacting viruses 57Prakash and Faloutsos 2012

Competing Contagions iPhone v Android Blu-ray v HD-DVD Biological common flu/avian flu, pneumococcal inf etc 58 AttackRetreat v Prakash and Faloutsos 2012

A simple model Modified flu-like Mutual Immunity (“pick one of the two”) Susceptible-Infected1-Infected2-Susceptible 59 Virus 1 Virus 2 Prakash and Faloutsos 2012

Question: What happens in the end? 60 green: virus 1 red: virus 2 Steady State = ? Number of Infections ASSUME: Virus 1 is stronger than Virus 2 Prakash and Faloutsos 2012

Question: What happens in the end? 61 green: virus 1 red: virus 2 Number of Infections ASSUME: Virus 1 is stronger than Virus 2 Strength ?? = Strength 2 Steady State Prakash and Faloutsos 2012

Answer: Winner-Takes-All 62 green: virus 1 red: virus 2 ASSUME: Virus 1 is stronger than Virus 2 Number of Infections Prakash and Faloutsos 2012

Our Result: Winner-Takes-All 63 In Prakash+ WWW 2012 Given our model, and any graph, the weaker virus always dies-out completely 1.The stronger survives only if it is above threshold 2.Virus 1 is stronger than Virus 2, if: strength(Virus 1) > strength(Virus 2) 3.Strength(Virus) = λ β / δ  same as before! Prakash and Faloutsos 2012

Real Examples 64 Reddit v DiggBlu-Ray v HD-DVD [Google Search Trends data] Prakash and Faloutsos 2012

Part 1: Theory Q1: What is the epidemic threshold? Q2: What happens when viruses compete? – Mutually-exclusive viruses – Interacting viruses 65Prakash and Faloutsos 2012

A simple model: SI 1|2 S Modified flu-like (SIS) Susceptible-Infected 1 or 2 -Susceptible Interaction Factor ε – Full Mutual Immunity: ε = 0 – Partial Mutual Immunity (competition): ε < 0 – Cooperation: ε > 0 66 Virus 1 Virus 2 & Prakash and Faloutsos 2012

Question: What happens in the end? 67 ASSUME: Virus 1 is stronger than Virus 2 ε = 0 Winner takes all ε = 1 Co-exist independently ε = 2 Viruses cooperate What about for 0 < ε <1 ? Is there a point at which both viruses can co-exist? What about for 0 < ε <1 ? Is there a point at which both viruses can co-exist? Prakash and Faloutsos 2012

Answer: Yes! There is a phase transition 68 ASSUME: Virus 1 is stronger than Virus 2 Prakash and Faloutsos 2012

Answer: Yes! There is a phase transition 69 ASSUME: Virus 1 is stronger than Virus 2 Prakash and Faloutsos 2012

Answer: Yes! There is a phase transition 70 ASSUME: Virus 1 is stronger than Virus 2 Prakash and Faloutsos 2012

1.The stronger survives only if it is above threshold 2.Virus 1 is stronger than Virus 2, if: strength(Virus 1) > strength(Virus 2) 3.Strength(Virus) σ = N β / δ Our Result: Viruses can Co-exist 71 Given our model and a fully connected graph, there exists an ε critical such that for ε ≥ ε critical, there is a fixed point where both viruses survive. In Beutel+ KDD 2012 Prakash and Faloutsos 2012

Real Examples 72 Hulu v Blockbuster [Google Search Trends data] Prakash and Faloutsos 2012

Real Examples 73 Chrome v Firefox [Google Search Trends data] Prakash and Faloutsos 2012

Outline Motivation Part 1: Understanding Epidemics (Theory) Part 2: Policy and Action (Algorithms) Part 3: Learning Models (Empirical Studies) Conclusion 74Prakash and Faloutsos 2012

Part 2: Algorithms Q3: Whom to immunize? Q4: How to detect outbreaks? Q5: Who are the culprits? 75Prakash and Faloutsos 2012

? ? Given: a graph A, virus prop. model and budget k; Find: k ‘best’ nodes for immunization (removal). k = 2 ? ? Full Static Immunization 76Prakash and Faloutsos 2012

Part 2: Algorithms Q3: Whom to immunize? – Full Immunization (Static Graphs) – Full Immunization (Dynamic Graphs) – Fractional Immunization Q4: How to detect outbreaks? Q5: Who are the culprits? 77Prakash and Faloutsos 2012

Challenges Given a graph A, budget k, Q1 (Metric) How to measure the ‘shield- value’ for a set of nodes (S)? Q2 (Algorithm) How to find a set of k nodes with highest ‘shield-value’? 78Prakash and Faloutsos 2012

Proposed vulnerability measure λ Increasing λ Increasing vulnerability λ is the epidemic threshold “Safe”“Vulnerable”“Deadly” 79Prakash and Faloutsos 2012

Original Graph Without {2, 6} Eigen-Drop(S) Δ λ = λ - λ s Eigen-Drop(S) Δ λ = λ - λ s Δ A1: “Eigen-Drop”: an ideal shield value 80Prakash and Faloutsos 2012

(Q2) - Direct Algorithm too expensive! Immunize k nodes which maximize Δ λ S = argmax Δ λ Combinatorial! Complexity: – Example: 1,000 nodes, with 10,000 edges It takes 0.01 seconds to compute λ It takes 2,615 years to find 5-best nodes ! 81Prakash and Faloutsos 2012

A2: Our Solution Part 1: Shield Value – Carefully approximate Eigen-drop (Δ λ) – Matrix perturbation theory Part 2: Algorithm – Greedily pick best node at each step – Near-optimal due to submodularity NetShield (linear complexity) – O(nk 2 +m) n = # nodes; m = # edges 82 In Tong+ ICDM 2010 Prakash and Faloutsos 2012

Our Solution: Part 1 Approximate Eigen-drop (Δ λ) Δ λ ≈ SV(S) = – Result using Matrix perturbation theory – u(i) == ‘eigenscore’ ~~ pagerank(i) Au = λ. u u(i) 83Prakash and Faloutsos 2012

P1: node importanceP2: set diversity Original Graph Select by P1Select by P1+P2 84Prakash and Faloutsos 2012

Our Solution: Part 2: NetShield We prove that: SV(S) is sub-modular ( & monotone non-decreasing ) NetShield: Greedily add best node at each step Corollary: Greedy algorithm works 1. NetShield is near-optimal (w.r.t. max SV(S)) 2. NetShield is O(nk 2 +m) Footnote: near-optimal means SV(S NetShield ) >= (1-1/e) SV(S Opt ) 85Prakash and Faloutsos 2012

Experiment: Immunization quality Log(fraction of infected nodes) NetShield Degree PageRank Eigs (=HITS) Acquaintance Betweeness (shortest path) Lower is better Time 86Prakash and Faloutsos 2012

Part 2: Algorithms Q3: Whom to immunize? – Full Immunization (Static Graphs) – Full Immunization (Dynamic Graphs) – Fractional Immunization Q4: How to detect outbreaks? Q5: Who are the culprits? 87Prakash and Faloutsos 2012

Full Dynamic Immunization Given: Set of T arbitrary graphs Find: k ‘best’ nodes to immunize (remove) day N N night N N, weekend….. In Prakash+ ECML-PKDD Prakash and Faloutsos 2012

Full Dynamic Immunization Our solution – Recall theorem – Simple: reduce (= ) Goal: max eigendrop Δ No competing policy for comparison We propose and evaluate many policies Matrix Product Δ = daynight 89Prakash and Faloutsos 2012

Performance of Policies MIT Reality Mining Lower is better 90Prakash and Faloutsos 2012

Part 2: Algorithms Q3: Whom to immunize? – Full Immunization (Static Graphs) – Full Immunization (Dynamic Graphs) – Fractional Immunization Q4: How to detect outbreaks? Q5: Who are the culprits? 91Prakash and Faloutsos 2012

92 Fractional Immunization of Networks B. Aditya Prakash, Lada Adamic, Theodore Iwashyna (M.D.), Hanghang Tong, Christos Faloutsos Under Submission Prakash and Faloutsos 2012

? ? Given: a graph A, virus prop. model and budget k; Find: k ‘best’ nodes for immunization (removal). k = 2 Previously: Full Static Immunization 93Prakash and Faloutsos 2012

Fractional Asymmetric Immunization 94 Fractional Effect [ f(x) = ] Asymmetric Effect # antidotes = 3 Prakash and Faloutsos 2012

Now: Fractional Asymmetric Immunization 95 Fractional Effect [ f(x) = ] Asymmetric Effect # antidotes = 3 Prakash and Faloutsos 2012

Fractional Asymmetric Immunization 96 Fractional Effect [ f(x) = ] Asymmetric Effect # antidotes = 3 Prakash and Faloutsos 2012

Fractional Asymmetric Immunization Hospital Another Hospital 97 Drug-resistant Bacteria (like XDR-TB) Prakash and Faloutsos 2012

Fractional Asymmetric Immunization Hospital Another Hospital Drug-resistant Bacteria (like XDR-TB) 98 = f Prakash and Faloutsos 2012

Fractional Asymmetric Immunization Hospital Another Hospital 99 Problem: Given k units of disinfectant, how to distribute them to maximize hospitals saved? Prakash and Faloutsos 2012

Our Algorithm “SMART-ALLOC” CURRENT PRACTICESMART-ALLOC [US-MEDICARE NETWORK 2005] 100 Each circle is a hospital, ~3000 hospitals More than 30,000 patients transferred ~6x fewer! Prakash and Faloutsos 2012

Running Time 101 ≈ SimulationsSMART-ALLOC > 1 week 14 secs > 30,000x speed-up! Wall-Clock Time Lower is better Prakash and Faloutsos 2012

Experiments 102 K = 200K = 2000 PENN-NETWORK SECOND-LIFE ~5 x ~2.5 x Lower is better Prakash and Faloutsos 2012

Part 2: Algorithms Q3: Whom to immunize? Q4: How to detect outbreaks? Q5: Who are the culprits? 103Prakash and Faloutsos 2012

Break! 104Prakash and Faloutsos 2012

Part 2: Algorithms Q3: Whom to immunize? Q4: How to detect outbreaks? Q5: Who are the culprits? 105Prakash and Faloutsos 2012

Outbreak detection Spot contamination points – Minimize time to detection, population affected – Maximize probability of detection. – Minimize sensor placement cost. Blo gs Po sts Links Information cascade 106Prakash and Faloutsos 2012

Outbreak detection Spot `hot blogs’ – Minimize time to detection, population affected – Maximize probability of detection. – Minimize sensor placement cost. Blo gs Po sts Links Information cascade 107Prakash and Faloutsos 2012

J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, N. Glance. "Cost- effective Outbreak Detection in Networks” KDD Prakash and Faloutsos 2012

CELF: Main idea Given: a graph G(V,E) – a budget of B sensors – data on how contaminations spread over the network: Place the sensors To minimize time to detect outbreak CELF algorithm uses submodularity and lazy evaluation 109Prakash and Faloutsos 2012

Blogs: Comparison to heuristics Benefit (higher= better) 110Prakash and Faloutsos 2012

kPA score Blog NP IL OLO OLA http://instapundit.com http://donsurber.blogspot.com http://donsurber.blogspot.com http://sciencepolitics.blogspot.com http:// http://michellemalkin.com http://themodulator.org http:// http:// http://atrios.blogspot.com http://atrios.blogspot.com “Best 10 blogs to read” NP - number of posts, IL- in-links, OLO- blog out links, OLA- all out links 111Prakash and Faloutsos 2012

Part 2: Algorithms Q3: Whom to immunize? Q4: How to detect outbreaks? Q5: Who are the culprits? 112Prakash and Faloutsos 2012

B. Aditya Prakash, Jilles Vreeken, Christos Faloutsos ‘Detecting Culprits in Epidemics: Who and How many?’ ICDM 2012, Brussels 113Prakash and Faloutsos 2012

Problem definition d grid ‘+’ -> infected Who started it? Prakash and Faloutsos 2012

Problem definition d grid ‘+’ -> infected Who started it? Prakash and Faloutsos 2012 Prior work: [Lappas et al. 2010, Shah et al. 2011]

Culprits: Exoneration 116Prakash and Faloutsos 2012

Culprits: Exoneration 117Prakash and Faloutsos 2012

Who are the culprits Two-part solution – use MDL for number of seeds – for a given number: exoneration = centrality + penalty our method uses smallest eigenvector of Laplacian submatrix Running time = – linear! (in edges and nodes) 118Prakash and Faloutsos 2012

Outline Motivation Part 1: Understanding Epidemics (Theory) Part 2: Policy and Action (Algorithms) Part 3: Learning Models (Empirical Studies) Conclusion 119Prakash and Faloutsos 2012

Part 3: Empirical Studies Q6: How do cascades look like? Q7: How does activity evolve over time? Q8: How does external influence act? 120Prakash and Faloutsos 2012

Cascading Behavior in Large Blog Graphs How does information propagate over the blogosphere? Blogs Posts Links Information cascade J. Leskovec, M.McGlohon, C. Faloutsos, N. Glance, M. Hurst. Cascading Behavior in Large Blog Graphs. SDM

Cascades on the Blogosphere Cascade is graph induced by a time ordered propagation of information (edges) Cascades B1B1 B2B2 B4B4 B3B3 a b c d e B1B1 B2B2 B4B4 B3B d e b c e a Blogosphere blogs + posts Blog network links among blogs Post network links among posts 122Prakash and Faloutsos 2012

Blog data  45,000 blogs participating in cascades  All their posts for 3 months (Aug-Sept ‘05)  2.4 million posts  ~5 million links (245,404 inside the dataset) Time [1 day] Number of posts 123

Popularity over time Post popularity drops-off – exponentially? lag: days after post # in links 1 + lag 124Prakash and Faloutsos 2012

Popularity over time Post popularity drops-off – exponentially? POWER LAW! Exponent? # in links (log) days after post (log) 125Prakash and Faloutsos 2012

Popularity over time Post popularity drops-off – exponentially? POWER LAW! Exponent? -1.6 close to -1.5: Barabasi’s stack model and like the zero-crossings of a random walk # in links (log) -1.6 days after post (log) 126Prakash and Faloutsos 2012

-1.5 slope Prakash and Faloutsos J. G. Oliveira & A.-L. Barabási Human Dynamics: The Correspondence Patterns of Darwin and Einstein. Nature 437, 1251 (2005). [PDF]PDF

Topological Observations How do we measure how information flows through the network? Common cascade shapes extracted using algorithms in [Leskovec, Singh, Kleinberg; PAKDD 2006]. 128Prakash and Faloutsos 2012

Topological Observations Cascade size distributions also follow power law. What graph properties do cascades exhibit? Observation 2: The probability of observing a cascade on n nodes follows a Zipf distribution: p(n) n -2 Cascade size (# of nodes) Count a=-2 129Prakash and Faloutsos 2012

Topological Observations What graph properties do cascades exhibit? Stars and chains also follow a power law, with different exponents (star -3.1, chain -8.5). Size of chain (# nodes) Count Size of star (# nodes) Count a=-3.1 a= Prakash and Faloutsos 2012

Blogs and structure Cascades take on different shapes (sorted by frequency): How can we use cascades to identify communities? 131Prakash and Faloutsos 2012

PCA on cascade types Perform PCA on sparse matrix. Use log(count+1) Project onto 2 PC….01 … … … 5.1 … 4.2 … boingboing slashdot ………… ~9,000 cascade types ~44,000 blogs % -> 132Prakash and Faloutsos 2012

28 PCA on cascade types Observation: Content of blogs and cascade behavior are often related. Distinct clusters for “conservative” and “humorous” blogs (hand-labeling). 133Prakash and Faloutsos 2012 M. McGlohon, J. Leskovec, C. Faloutsos, M. Hurst, N. Glance. Finding Patterns in Blog Shapes and Blog Evolution. ICWSM 2007.

29 PCA on cascade types Observation: Content of blogs and cascade behavior are often related. Distinct clusters for “conservative” and “humorous” blogs (hand-labeling). M. McGlohon, J. Leskovec, C. Faloutsos, M. Hurst, N. Glance. Finding Patterns in Blog Shapes and Blog Evolution. ICWSM Prakash and Faloutsos 2012

Part 3: Empirical Studies Q6: How do cascades look like? Q7: How does activity evolve over time? Q8: How does external influence act? 135Prakash and Faloutsos 2012

Meme (# of mentions in blogs) – short phrases Sourced from U.S. politics in “you can put lipstick on a pig” “yes we can” Rise and fall patterns in social media Prakash and Faloutsos 2012

Rise and fall patterns in social media 137 Can we find a unifying model, which includes these patterns? four classes on YouTube [Crane et al. ’08] six classes on Meme [Yang et al. ’11] Prakash and Faloutsos 2012

Rise and fall patterns in social media 138 Answer: YES! We can represent all patterns by single model In Matsubara+ SIGKDD 2012 Prakash and Faloutsos 2012

Main idea - SpikeM -1. Un-informed bloggers (uninformed about rumor) -2. External shock at time n b (e.g, breaking news) -3. Infection (word-of-mouth) 139 Infectiveness of a blog-post at age n: -Strength of infection (quality of news) -Decay function (how infective a blog posting is) Time n=0Time n=n b Time n=n b +1 β Power Law Prakash and Faloutsos 2012

-1.5 slope Prakash and Faloutsos J. G. Oliveira & A.-L. Barabási Human Dynamics: The Correspondence Patterns of Darwin and Einstein. Nature 437, 1251 (2005). [PDF]PDF

SpikeM - with periodicity Full equation of SpikeM 141 Periodicity 12pm Peak activity 3am Low activity Time n Bloggers change their activity over time (e.g., daily, weekly, yearly) Bloggers change their activity over time (e.g., daily, weekly, yearly) activity Prakash and Faloutsos 2012

Details Analysis – exponential rise and power-raw fall 142 Liner-log Log-log Rise-part SI -> exponential SpikeM -> exponential Rise-part SI -> exponential SpikeM -> exponential Prakash and Faloutsos 2012

Details Analysis – exponential rise and power-raw fall 143 Liner-log Log-log Fall-part SI -> exponential SpikeM -> power law Fall-part SI -> exponential SpikeM -> power law Prakash and Faloutsos 2012

Tail-part forecasts 144 SpikeM can capture tail part Prakash and Faloutsos 2012

“What-if” forecasting 145 e.g., given (1) first spike, (2) release date of two sequel movies (3) access volume before the release date ? ? ? ? (1) First spike(2) Release date(3) Two weeks before release Prakash and Faloutsos 2012

“What-if” forecasting 146 – SpikeM can forecast not only tail-part, but also rise-part ! SpikeM can forecast upcoming spikes (1) First spike(2) Release date(3) Two weeks before release Prakash and Faloutsos 2012

Part 3: Empirical Studies Q6: How do cascades look like? Q7: How does activity evolve over time? Q8: How does external influence act? 147Prakash and Faloutsos 2012

Tweets Diffusion: Problem Definition Given: – Action log of people tweeting a #hashtag ( ) – A network of users ( ) Find: – How external influence varies with #hashtags? ?? ? ? ?? 148Prakash and Faloutsos 2012

Results: External Influence vs Time time “External Effects” #nowwatching, #nowplaying, #epictweets #purpleglasses, #brits, #famouslies #oscar, #25jan #openfollow, #ihatequotes, #tweetmyjobs Can also use for Forecasting, Anomaly Detection! Bursty, external events “Word-of-mouth” Not trending Long-running tags “Word-of-mouth” 149Prakash and Faloutsos 2012

Outline Motivation Part 1: Understanding Epidemics (Theory) Part 2: Policy and Action (Algorithms) Part 3: Learning Models (Empirical Studies) Conclusion 150Prakash and Faloutsos 2012

Conclusions Epidemic Threshold – It’s the Eigenvalue Fast Immunization – Max. drop in eigenvalue, linear-time near-optimal algorithm Bursts: SpikeM model – Exponential growth, Power-law decay 151Prakash and Faloutsos 2012

ML & Stats. Comp. Systems Theory & Algo. Biology Econ. Social Science Physics 152 Propagation on Networks Prakash and Faloutsos 2012

References 1. Winner-takes-all: Competing Viruses or Ideas on fair-play networks (B. Aditya Prakash, Alex Beutel, Roni Rosenfeld, Christos Faloutsos) – In WWW 2012, Lyon 2. Threshold Conditions for Arbitrary Cascade Models on Arbitrary Networks (B. Aditya Prakash, Deepayan Chakrabarti, Michalis Faloutsos, Nicholas Valler, Christos Faloutsos) - In IEEE ICDM 2011, Vancouver (Invited to KAIS Journal Best Papers of ICDM.) 3. Times Series Clustering: Complex is Simpler! (Lei Li, B. Aditya Prakash) - In ICML 2011, Bellevue 4. Epidemic Spreading on Mobile Ad Hoc Networks: Determining the Tipping Point (Nicholas Valler, B. Aditya Prakash, Hanghang Tong, Michalis Faloutsos and Christos Faloutsos) – In IEEE NETWORKING 2011, Valencia, Spain 5. Formalizing the BGP stability problem: patterns and a chaotic model (B. Aditya Prakash, Michalis Faloutsos and Christos Faloutsos) – In IEEE INFOCOM NetSciCom Workshop, On the Vulnerability of Large Graphs (Hanghang Tong, B. Aditya Prakash, Tina Eliassi-Rad and Christos Faloutsos) – In IEEE ICDM 2010, Sydney, Australia 7. Virus Propagation on Time-Varying Networks: Theory and Immunization Algorithms (B. Aditya Prakash, Hanghang Tong, Nicholas Valler, Michalis Faloutsos and Christos Faloutsos) – In ECML-PKDD 2010, Barcelona, Spain 8. MetricForensics: A Multi-Level Approach for Mining Volatile Graphs (Keith Henderson, Tina Eliassi-Rad, Christos Faloutsos, Leman Akoglu, Lei Li, Koji Maruhashi, B. Aditya Prakash and Hanghang Tong) - In SIGKDD 2010, Washington D.C. 9. Parsimonious Linear Fingerprinting for Time Series (Lei Li, B. Aditya Prakash and Christos Faloutsos) - In VLDB 2010, Singapore 10. EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs (B. Aditya Prakash, Ashwin Sridharan, Mukund Seshadri, Sridhar Machiraju and Christos Faloutsos) – In PAKDD 2010, Hyderabad, India 11. BGP-lens: Patterns and Anomalies in Internet-Routing Updates (B. Aditya Prakash, Nicholas Valler, David Andersen, Michalis Faloutsos and Christos Faloutsos) – In ACM SIGKDD 2009, Paris, France. 12. Surprising Patterns and Scalable Community Detection in Large Graphs (B. Aditya Prakash, Ashwin Sridharan, Mukund Seshadri, Sridhar Machiraju and Christos Faloutsos) – In IEEE ICDM Large Data Workshop 2009, Miami 13. FRAPP: A Framework for high-Accuracy Privacy-Preserving Mining (Shipra Agarwal, Jayant R. Haritsa and B. Aditya Prakash) – In Intl. Journal on Data Mining and Knowledge Discovery (DKMD), Springer, vol. 18, no. 1, February 2009, Ed: Johannes Gehrke. 14. Complex Group-By Queries For XML (C. Gokhale, N. Gupta, P. Kumar, L. V. S. Lakshmanan, R. Ng and B. Aditya Prakash) – In IEEE ICDE 2007, Istanbul, Turkey. 153Prakash and Faloutsos 2012

Acknowledgements Collaborators Christos Faloutsos Roni Rosenfeld, Michalis Faloutsos, Lada Adamic, Theodore Iwashyna (M.D.), Dave Andersen, Tina Eliassi-Rad, Iulian Neamtiu, Varun Gupta, Jilles Vreeken, Deepayan Chakrabarti, Hanghang Tong, Kunal Punera, Ashwin Sridharan, Sridhar Machiraju, Mukund Seshadri, Alice Zheng, Lei Li, Polo Chau, Nicholas Valler, Alex Beutel, Xuetao Wei 154Prakash and Faloutsos 2012

Acknowledgements Funding 155Prakash and Faloutsos 2012

Analysis Policy/Action Data Dynamical Processes on Large Networks B. Aditya Prakash Christos Faloutsos 156Prakash and Faloutsos 2012