Cost-effective Outbreak Detection in Networks Presented by Amlan Pradhan, Yining Zhou, Yingfei Xiang, Abhinav Rungta -Group 1.

Slides:



Advertisements
Similar presentations
Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.
Advertisements

1 Adaptive Submodularity: A New Approach to Active Learning and Stochastic Optimization Joint work with Andreas Krause 1 Daniel Golovin.
Non-competitive VM – Parting Shots Where do we go from here?
Learning Influence Probabilities in Social Networks 1 2 Amit Goyal 1 Francesco Bonchi 2 Laks V. S. Lakshmanan 1 U. of British Columbia Yahoo! Research.
LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
Spread of Influence through a Social Network Adapted from :
Maximizing the Spread of Influence through a Social Network
Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.
Cost-effective Outbreak Detection in Networks Jure Leskovec Machine Learning Department, CMU Joint work with Andreas Krause, Carlos Guestrin, Christos.
Randomized Sensing in Adversarial Environments Andreas Krause Joint work with Daniel Golovin and Alex Roper International Joint Conference on Artificial.
Maximizing the Spread of Influence through a Social Network
Suqi Cheng Research Center of Web Data Sciences & Engineering
Online Distributed Sensor Selection Daniel Golovin, Matthew Faulkner, Andreas Krause theory and practice collide 1.
CIKM’2008 Presentation Oct. 27, 2008 Napa, California
A Decentralised Coordination Algorithm for Maximising Sensor Coverage in Large Sensor Networks Ruben Stranders, Alex Rogers and Nicholas R. Jennings School.
Near-optimal Nonmyopic Value of Information in Graphical Models Andreas Krause, Carlos Guestrin Computer Science Department Carnegie Mellon University.
Sensor placement applications Monitoring of spatial phenomena Temperature Precipitation... Active learning, Experiment design Precipitation data from Pacific.
INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.
1 Efficient planning of informative paths for multiple robots Amarjeet Singh *, Andreas Krause +, Carlos Guestrin +, William J. Kaiser *, Maxim Batalin.
Simpath: An Efficient Algorithm for Influence Maximization under Linear Threshold Model Amit Goyal Wei Lu Laks V. S. Lakshmanan University of British Columbia.
Internet Quarantine: Requirements for Containing Self-Propagating Code David Moore et. al. University of California, San Diego.
Models of Influence in Online Social Networks
Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.
Dynamics of networks Jure Leskovec Computer Science Department
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
Thang N. Dinh, Dung T. Nguyen, My T. Thai Dept. of Computer & Information Science & Engineering University of Florida, Gainesville, FL Hypertext-2012,
December 7-10, 2013, Dallas, Texas
Maximizing the Spread of Influence through a Social Network David Kempe, Jon Kleinberg, Eva Tardos Cornell University KDD 2003.
Randomized Composable Core-sets for Submodular Maximization Morteza Zadimoghaddam and Vahab Mirrokni Google Research New York.
Maximizing the Spread of Influence through a Social Network Authors: David Kempe, Jon Kleinberg, É va Tardos KDD 2003.
Online Social Networks and Media
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Submodular Maximization with Cardinality Constraints Moran Feldman Based On Submodular Maximization with Cardinality Constraints. Niv Buchbinder, Moran.
Manuel Gomez Rodriguez Bernhard Schölkopf I NFLUENCE M AXIMIZATION IN C ONTINUOUS T IME D IFFUSION N ETWORKS , ICML ‘12.
1 Latency-Bounded Minimum Influential Node Selection in Social Networks Incheol Shin
Algorithms For Solving History Sensitive Cascade in Diffusion Networks Research Proposal Georgi Smilyanov, Maksim Tsikhanovich Advisor Dr Yu Zhang Trinity.
Distributed Optimization Yen-Ling Kuo Der-Yeuan Yu May 27, 2010.
1 1 MPI for Intelligent Systems 2 Stanford University Manuel Gomez Rodriguez 1,2 Bernhard Schölkopf 1 S UBMODULAR I NFERENCE OF D IFFUSION NETWORKS FROM.
Controlling Propagation at Group Scale on Networks Yao Zhang*, Abhijin Adiga +, Anil Vullikanti + *, and B. Aditya Prakash* *Department of Computer Science.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
Biao Wang 1, Ge Chen 1, Luoyi Fu 1, Li Song 1, Xinbing Wang 1, Xue Liu 2 1 Shanghai Jiao Tong University 2 McGill University
Observability-Based Approach to Optimal Design of Networks Atiye Alaeddini Spring 2016.
Inferring Networks of Diffusion and Influence
Seed Selection.
Nanyang Technological University
Independent Cascade Model and Linear Threshold Model
Greedy & Heuristic algorithms in Influence Maximization
Monitoring rivers and lakes [IJCAI ‘07]
Near-optimal Observation Selection using Submodular Functions
Moran Feldman The Open University of Israel
Influence Maximization
Who are the most influential bloggers?
Optimizing Submodular Functions
Distributed Submodular Maximization in Massive Datasets
Jure Leskovec and Christos Faloutsos Machine Learning Department
Independent Cascade Model and Linear Threshold Model
Influence Maximization
The Importance of Communities for Learning to Influence
Effective Social Network Quarantine with Minimal Isolation Costs
Coverage Approximation Algorithms
Cost-effective Outbreak Detection in Networks
Alan Kuhnle*, Victoria G. Crawford, and My T. Thai
Influence Maximization
Viral Marketing over Social Networks
Independent Cascade Model and Linear Threshold Model
Navigation and Propagation in Networks
Submodular Maximization with Cardinality Constraints
Presentation transcript:

Cost-effective Outbreak Detection in Networks Presented by Amlan Pradhan, Yining Zhou, Yingfei Xiang, Abhinav Rungta -Group 1

Diffusion in Social Networks ▪ A fundamental process in social networks: Behaviors that cascade from node to node like an epidemic – News, opinions, rumors, fads, urban legends, … – Word‐of‐mouth effects in marketing: rise of new websites, free web based services – Virus, disease propagation – Change in social priorities: smoking, recycling – Saturation news coverage: topic diffusion among bloggers – Internet‐energized political campaigns – Cascading failures in financial markets – Localized effects: riots, people walking out of a lecture

Empirical Studies of Diffusion ▪ Experimental studies of diffusion have long history: – Spread of new agricultural practices [Ryan‐Gross 1943] Adoption of a new hybrid‐corn between the 259 farmers in Iowa Classical study of diffusion Interpersonal network plays important role in adoption Diffusion is a social process – Spread of new medical practices [Coleman et al 1966] Studied the adoption of a new drug between doctors in Illinois Clinical studies and scientific evaluations were not sufficient to convince the doctors It was the social power of peers that led to adoption

Diffusion in Networks ▪ Initially some nodes are active ▪ Active nodes spread their influence on the other nodes, and so on … b e a g h i d f c

Scenario 1: Water network  Given a real city water distribution network  And data on how contaminants spread in the network  Problem posed by US Environmental Protection Agency S On which nodes should we place sensors to efficiently detect the all possible contaminations? S

Scenario 2: Cascades in blogs Blogs Posts Time ordered hyperlinks Information cascade Which blogs should one read to detect cascades as effectively as possible?

General problem  Given a dynamic process spreading over the network  We want to select a set of nodes to detect the process effectively  Many other applications:  Epidemics  Influence propagation  Network security

Problem setting  Given a graph G(V,E)  and a budget B for sensors  and data on how contaminations spread over the network:  for each contamination i we know the time T(i, u) when it contaminated node u  Select a subset of nodes A that maximize the expected reward subject to cost(A) < B SS Reward for detecting contamination i

Two parts to the problem  Reward, e.g.:  1) Minimize time to detection  2) Maximize number of detected propagations  3) Minimize number of infected people  Cost (location dependent):  Reading big blogs is more time consuming  Placing a sensor in a remote location is expensive

Overview  Problem definition  Properties of objective functions  Submodularity  Our solution  CELF algorithm  New bound  Experiments  Conclusion

Solving the problem  Solving the problem exactly is NP-hard  Our observation:  objective functions are submodular, i.e. diminishing returns S1S1 S2S2 Placement A={S 1, S 2 } S’S’ New sensor: Adding S’ helps a lot S2S2 S4S4 S1S1 S3S3 Placement A={S 1, S 2, S 3, S 4 } S’S’ Adding S’ helps very little

Analysis ▪ Analysis: diminishing returns at individual nodes implies diminishing returns at a “global” level – Covered area grows slower and slower with placement size R e w a r d R ( c o v e r e d a r ea) Number of sensors Δ1

Reward functions: Submodularity We must show thatR is submodular: – 2) Natural example: Sets A 1, A 2, …, A n : R(S) = size of union of A i A ▪ What do we know about submodular functions? – 1) If R 1, R 2, …, R k are submodular, and a 1, a 2, … a k > 0 then ∑a i R i is also submodular B x Benefit of adding a sensor to a small placement Benefit of adding a sensor to a large placement

An Approximation Result ▪ Diminishing returns: Covered area grows slower and slower with placement size R is submodular: ifA  B then R(A  {x}) – R(A)≥R(B  {x}) – R(B) Theorem [Nehmhauser et al. ‘78]: If f is a function that is monotone and submodular, then k‐step hill‐climbing produces set S for which f(S) is within(1‐1/e) of optimal.

Objective functions are submodular  Objective functions from Battle of Water Sensor Networks competition [Ostfeld et al]:  1) Time to detection (DT)  How long does it take to detect a contamination?  2) Detection likelihood (DL)  How many contaminations do we detect?  3) Population affected (PA)  How many people drank contaminated water?  Our result: all are submodular

Background: Optimizing submodular functions  How well can we do?  A greedy is near optimal  at least 1-1/e (~63%) of optimal [Nemhauser et al ’78]  But  1) this only works for unit cost case (each sensor/location costs the same)  2) Greedy algorithm is slow  scales as O(|V|B) a b c a b c d d reward e e Greedy algorithm

Towards a New Algorithm ▪ Possible algorithm: hill‐climbing ignoring the cost – Repeatedly select sensor with highest marginal gain – Ignore sensor cost ▪ It always prefers more expensive sensor with reward r to a cheaper sensor with reward r‐ε → For variable cost it can fail arbitrarily badly ▪ Idea – What if we optimize benefit‐cost ratio?

Benefit‐Cost: More Problems ▪ Bad news: Optimizing benefit‐cost ratio can fail arbitrarily badly ▪ Example: Given a budget B, consider: – 2 locations s 1 and s 2 : )=B Costs: c(s 1 )=ε,c(s 2 )=B Only 1 cascade with reward: R(s 1 )=2ε, R(s 2 en benefit‐cost ratio is bc(s 1 )=2 and bc(s 2 )=1 – So, we first select s 1 and then can not afford s 2 →We get reward 2ε instead of B Now send ε to 0 and we get arbitrarily bad

Solution: CELF Algorithm ▪ CELF (cost‐effective lazy forward‐selection): A two pass greedy algorithm: Set (solution) A: use benefit‐cost greedy Set (solution) B: use unit cost greedy – Final solution: argmax(R(A), R(B)) ▪ How far is CELF from (unknown) optimal solution? ▪ Theorem: CELF is near optimal – CELF achieves ½(1-1/e ) factor approximation ▪ CELF is much faster than standard hill‐climbing

Scaling up CELF algorithm  Submodularity guarantees that marginal benefits decrease with the solution size  Idea: exploit submodularity, doing lazy evaluations! (considered by Robertazzi et al for unit cost case) d reward

Scaling up CELF  CELF algorithm:  Keep an ordered list of marginal benefits b i from previous iteration  Re-evaluate b i only for top sensor  Re-sort and prune a b c a b c d d reward e e

Scaling up CELF  CELF algorithm:  Keep an ordered list of marginal benefits b i from previous iteration  Re-evaluate b i only for top sensor  Re-sort and prune a a b c d dbc reward e e

Scaling up CELF  CELF algorithm:  Keep an ordered list of marginal benefits b i from previous iteration  Re-evaluate b i only for top sensor  Re-sort and prune a c a b c d d b reward e e

Overview  Problem definition  Properties of objective functions  Submodularity  Our solution  CELF algorithm  New bound  Experiments  Conclusion

Experiments: Questions  Q1: How close to optimal is CELF?  Q2: How tight is our bound?  Q3: Unit vs. variable cost  Q4: CELF vs. heuristic selection  Q5: Scalability

Experiments: Case Study  We have real propagation data  Water distribution network:  Real city water distribution networks  Realistic simulator of water consumption provided by US Environmental Protection Agency

Case study : Water network  Real metropolitan area water network (largest network optimized):  V = 21,000 nodes  E = 25,000 pipes  3.6 million epidemic scenarios (152 GB of epidemic data)  By exploiting sparsity we fit it into main memory (16GB)

Solution quality  Again our bound is much tighter Old bound Our bound CELF

Heuristic placement  Again, CELF consistently wins

Placement visualization  Different objective functions give different sensor placements Population affected Detection likelihood

Scalability  CELF is 10 times faster than greedy

Conclusion  General methodology for selecting nodes to detect outbreaks  Results:  Submodularity observation  Variable-cost algorithm with optimality guarantee  Tighter bound  Significant speed-up (700 times)  Evaluation on large real datasets (150GB)  CELF won consistently

QUESTIONS ? 33