Controlling Propagation at Group Scale on Networks Yao Zhang, Abhijin Adiga +, Anil Vullikanti + , and B. Aditya Prakash* *Department of Computer Science.

Slides:

Advertisements

Similar presentations

On the Vulnerability of Large Graphs

Advertisements

CMU SCS I2.2 Large Scale Information Network Processing INARC 1 Overview Goal: scalable algorithms to find patterns and anomalies on graphs 1. Mining Large.

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.

Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.

Spread of Influence through a Social Network Adapted from :

Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.

DAVA: Distributing Vaccines over Networks under Prior Information

Modularity and community structure in networks

Community Detection Algorithm and Community Quality Metric Mingming Chen & Boleslaw K. Szymanski Department of Computer Science Rensselaer Polytechnic.

Maximizing the Spread of Influence through a Social Network

Suqi Cheng Research Center of Web Data Sciences & Engineering

© 2012 IBM Corporation IBM Research Gelling, and Melting, Large Graphs by Edge Manipulation Joint Work by Hanghang Tong (IBM) B. Aditya Prakash (Virginia.

Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.

Maximizing the Spread of Influence through a Social Network By David Kempe, Jon Kleinberg, Eva Tardos Report by Joe Abrams.

Scalable Vaccine Distribution in Large Graphs given Uncertain Data Yao Zhang, B. Aditya Prakash Department of Computer Science Virginia Tech CIKM, Shanghai,

1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.

Computational problems, algorithms, runtime, hardness

CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU.

1 Epidemic Spreading in Real Networks: an Eigenvalue Viewpoint Yang Wang Deepayan Chakrabarti Chenxi Wang Christos Faloutsos.

Placement of Integration Points in Multi-hop Community Networks Ranveer Chandra (Cornell University) Lili Qiu, Kamal Jain and Mohammad Mahdian (Microsoft.

INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.

Geographic Gossip: Efficient Aggregations for Sensor Networks Author: Alex Dimakis, Anand Sarwate, Martin Wainwright University: UC Berkeley Venue: IPSN.

The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.

Simpath: An Efficient Algorithm for Influence Maximization under Linear Threshold Model Amit Goyal Wei Lu Laks V. S. Lakshmanan University of British Columbia.

Active Learning for Probabilistic Models Lee Wee Sun Department of Computer Science National University of Singapore LARC-IMS Workshop.

Models of Influence in Online Social Networks

Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.

1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.

1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.

Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.

Approximation Algorithms

December 7-10, 2013, Dallas, Texas

Scalable Symbolic Model Order Reduction Yiyu Shi*, Lei He* and C. J. Richard Shi + *Electrical Engineering Department, UCLA + Electrical Engineering Department,

Maximizing the Spread of Influence through a Social Network Authors: David Kempe, Jon Kleinberg, É va Tardos KDD 2003.

Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.

Online Social Networks and Media

ECML-PKDD 2010, Barcelona, Spain B. Aditya Prakash*, Hanghang Tong* ^, Nicholas Valler+, Michalis Faloutsos+, Christos Faloutsos* * Carnegie Mellon University,

De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.

Domain decomposition in parallel computing Ashok Srinivasan Florida State University.

Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.

C&O 355 Lecture 24 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A A A A A A A.

CS 590 Term Project Epidemic model on Facebook

Mix networks with restricted routes PET 2003 Mix Networks with Restricted Routes George Danezis University of Cambridge Computer Laboratory Privacy Enhancing.

Optimal Interventions in Infectious Disease Epidemics: A Simulation Methodology Jiangzhuo Chen Network Dynamics & Simulation Science Laboratory INFORMS.

Algorithms For Solving History Sensitive Cascade in Diffusion Networks Research Proposal Georgi Smilyanov, Maksim Tsikhanovich Advisor Dr Yu Zhang Trinity.

Distributed Optimization Yen-Ling Kuo Der-Yeuan Yu May 27, 2010.

 In the previews parts we have seen some kind of segmentation method.  In this lecture we will see graph cut, which is a another segmentation method.

Instructor: Shengyu Zhang 1. Location change for the final 2 classes Nov 17: YIA 404 (Yasumoto International Academic Park 康本國際學術園 ) Nov 24: No class.

1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.

Biao Wang 1, Ge Chen 1, Luoyi Fu 1, Li Song 1, Xinbing Wang 1, Xue Liu 2 1 Shanghai Jiao Tong University 2 McGill University

Arizona State University Fast Eigen-Functions Tracking on Dynamic Graphs Chen Chen and Hanghang Tong - 1 -

1 Link Privacy in Social Networks Aleksandra Korolova, Rajeev Motwani, Shubha U. Nabar CIKM’08 Advisor: Dr. Koh, JiaLing Speaker: Li, HueiJyun Date: 2009/3/30.

Yu Wang1, Gao Cong2, Guojie Song1, Kunqing Xie1

Nanyang Technological University

Finding Dense and Connected Subgraphs in Dual Networks

Independent Cascade Model and Linear Threshold Model

Data Driven Resource Allocation for Distributed Learning

MEIKE: Influence-based Communities in Networks

Discrete ABC Based on Similarity for GCP

Computing and Compressive Sensing in Wireless Sensor Networks

Distributed Representations of Subgraphs

Independent Cascade Model and Linear Threshold Model

Effective Social Network Quarantine with Minimal Isolation Costs

Cost-effective Outbreak Detection in Networks

3.3 Network-Centric Community Detection

Automatic Segmentation of Data Sequences

Viral Marketing over Social Networks

Independent Cascade Model and Linear Threshold Model

Presentation transcript:

Controlling Propagation at Group Scale on Networks Yao Zhang*, Abhijin Adiga +, Anil Vullikanti + *, and B. Aditya Prakash* *Department of Computer Science + NDSSL, Virginia Bioinformatics Institute Virginia Tech ICDM, Atlantic City, November 17 th, 2015

Outline 2  Motivation  Problem Formulation  Our Proposed Methods  Experiments  Conclusion ZAVP, ICDM 2015

3 Epidemiology: disease spreads over contact networks Social Media: Information spreads over friendship networks [2014 Week 51 Flu spread in US from CDC] [from forbes.com] Propagation over networks Flu ZAVP, ICDM 2015 Meme

Immunization 4  Epidemiology  Centers for Disease Control (CDC)  Contain epidemic diseases  Social Media  Facebook, Twitter,...  How to stop rumor spread Immunization problem: How to control propagation over networks? ZAVP, ICDM 2015 Flu Meme

Immunization: two interventions 5 Two popular interventions  Vaccination:  Quarantining: We do both vaccination and quarantining! ZAVP, ICDM 2015 remove node remove edge

Background: Individual based immunization 6 Problem: find best nodes/edges to remove to control propagation over networks Popular individual based immunization strategies:  For threshold models [Khalil+ KDD2015]  E.g., LT model  For cascade style models [Tong+ CIKM2012, Tong+ ICDM2010]  E.g., SIR/SIS/IC model Which node to remove ? Example: ZAVP, ICDM 2015

In reality 7  Sometimes individual immunization cannot be easily turned into implementable policies  E.g., Hard to ensure specific individuals take the adequate vaccine vaccination ZAVP, ICDM 2015

In reality 8  Sometimes individual immunization cannot be easily turned into implementable policies  E.g., Hard to ensure specific individuals take the adequate vaccine  Observation: Groups naturally exist in underlying networks  People can be grouped by ages, demographics, occupations, …  Friends are grouped by the same interests, geolocations, … Note: groups need NOT be just link- based communities Occupation Groups Geolocation Groups ZAVP, ICDM 2015

Immunization at group scale 9  More realistic:  Epidemiology: CDC distributes flu vaccines based on demographics, locations,...  Social media: easier to put a warning bulletin on group pages  Cheaper  Expensive to target individuals Hence, we study: ZAVP, ICDM 2015 How to select groups to control propagation over networks?

Outline 10  Motivation  Problem Formulation  Our Proposed Methods  Experiments  Conclusion ZAVP, ICDM 2015

Problem Formulation 11 How to formulate the problem (wish list):  Aim 1: usefulness  Model the process of group immunization  Aim 2: consistency  Generalize individual immunization to group immunization ZAVP, ICDM 2015

Aim 1: process of group immunization 12  Idea:  Distribute vaccines to groups  Randomly vaccinate/quarantine within groups simulate the vaccine distribution process in the real life: Decision maker (e.g., CDC) … School CommunityPlant … …… Give vaccines to groups People volunteerly take vaccines ZAVP, ICDM 2015

Group Immunization: how to do it 13 Distribute vaccines Budget: 3 : gets one : gets two : gets zero Randomly remove nodes … Quarantining (Edge removal) process is similar ZAVP, ICDM 2015  Idea:  Distribute vaccines to groups  Randomly vaccinate/quarantine within groups Example: vaccination (node removal) all possible worlds

Aim 2: from individual to group immunization 14 Which metrics to measure the quality of immunizations?  For threshold models  Metric: epidemic size (min.)  E.g., LT model  For cascade style models  Metric: spectral radius (min.)  E.g., SIS/SIR/IC model We do both for group immunization! ZAVP, ICDM 2015 … Expected quality over all possible worlds

Background: threshold based model 15 Rumor spreading ZAVP, ICDM 2015

Problem 1: edge deletion under LT model 16 Given: graph G(V,E), partition of node set C, infected node set A, budget m vaccines Find: the best allocation of vaccines to groups Such that: the final expected epidemic size is minimized after removing edges within groups Quality function: the expected number of infected nodes Allocation vector over groups Formally: ZAVP, ICDM 2015

Problem 2: node deletion under LT model 17 Given: graph G(V,E), partition of node set C, infected node set A, budget m vaccines Find: the best allocation of vaccines to groups Such that: the final expected epidemic size is minimized after removing nodes within groups Allocation vector over groups How to allocate three vaccines? Distribute vaccines Among groups : one : two : zero Formally: ZAVP, ICDM 2015 Quality function: the expected number of infected nodes

Background: cascade style model 18  Epidemic threshold: spectral radius  The largest eigenvalue λ 1 of the adjacency matrix of a network  Connects to the reproduction number in epidemiology  Determines the phase-transition (‘epidemic threshold’) between epidemic/nonepidemic regimes  Cascade-style: SIR/SIS/IC model λ 1 is the epidemic threshold [Prakash+, ICDM 2011] ZAVP, ICDM 2015

Problem 3: edge deletion for spectral radius 19 Given: graph G(V,E), partition of node set C, budget m vaccines Find: the best allocation of vaccines to groups Such that: the expected drop of the first eigenvalue is maximized after removing edges within groups Formally: Quality function: the expected drop of the eigenvalue Allocation vector over groups ZAVP, ICDM 2015

Problem 4: node deletion for spectral radius 20 Given: graph G(V,E), partition of node set C, budget m vaccines Find: the best allocation of vaccines to groups Such that: the expected drop of the first eigenvalue is maximized after removing nodes within groups Formally: Quality function: the expected drop of the eigenvalue Allocation vector over groups ZAVP, ICDM 2015 … Expected quality (Eigendrop) over all possible worlds

Hardness of our problems ZAVP, ICDM  Individual based vs. group based immunization  If each node is equal to a group, our problems can be exactly reduced to individual based immunization problems:  P1 and P2 reduce to [Khalil+ KDD2014]  P3 and P4 reduce to [Tong+ CIKM2012, Tong+ ICDM2010]  Our problem: even harder All are NP-hard problems

Outline 22  Motivation  Problem Definition  Our Proposed Methods  Problem 1 and 2 (LT model/epidemic size)  Problem 3 and 4 (Cascade style/spectral radius)  Experiments  Conclusion ZAVP, ICDM 2015

Prob. 1: edge removal under LT model 23  Formally, our problem is  We rewrite it as  Hence we want to maximize f(x)  Note:  x is a vector  f(x) is not a function over sets, but a function over integer lattice the expected number of infected nodes after vaccines are allocated the expected number nodes SAVED after vaccines are allocated according to x ZAVP, ICDM 2015

Main idea: Diminishing returns over lattices 24  Result 1: we prove that has the following three properties:  P1: and  P2: (non-decreasing)  P3: (diminishing returns) if, then  Greedy algorithm: Greedy-LT  each time give one vaccine to a group i with max. marginal gain  Result 2: we prove that our algorithm provides (1-1/e)-approximation See paper for details Note: having diminishing return property is not equivalent to submodularity over integer lattice ZAVP, ICDM 2015

Prob. 2: node removal under LT model 25 Result:  The number of nodes saved after removing nodes within groups, also have the three properties:  P1: and  P2: (non-decreasing)  P3: (diminishing returns) if, then  Use a similar greedy algorithm with (1-1/e)- approximate guarantee the expected number of infected nodes after vaccines are allocated ZAVP, ICDM 2015

Outline 26  Motivation  Problem Definition  Our Proposed Methods  Problem 1 and 2 (LT model/epidemic size)  Problem 3 and 4 (Cascade style/spectral radius)  Experiments  Conclusion ZAVP, ICDM 2015

Prob. 3: edge removal for spectral radius 27  Formally, we want  Idea: stochastic process  Define the expected adjacency matrix of the graph  Instead of maximize, minimize, the first eigenvalue of the expected adjacency matrix of the graph  The allocation of x by minimizing can be obtained by solving a semi-definite program (SDP)  An approximation guarantee: give a constant factor of  Slow: running time O(|V| 4 ) Prob. that each edge is preserved ZAVP, ICDM 2015 the expected drop of the eigenvalue

28 Another method: matrix perturbation theory  the expected drop of the first eigenvalue can be estimated as:  x can be solved using Linear Programming (LP)  Faster: O(n 4 )  n: number of groups (much smaller compared to the size of graph) Proportion of edges been removed in group a Mu = λ. u See paper for details uiui Prob. 3: edge removal for spectral radius ZAVP, ICDM 2015

Prob. 4: node removal for spectral radius 29  Idea: using matrix perturbation theory  similar to LP, the expected drop of the first eigenvalue can be estimated as  Allocation x can be obtained using Quadratic Programming (QP)  Fast: O(n 4 )  n: number of groups (much smaller compared to the size of graph) Quadratic function on x 29ZAVP, ICDM 2015

Summary of our methods 30 ProblemOur Methods Approx. guarantee Running Time P1, P2 (LT model) GreedyLT (1-1/e)- approx. O(mnL|V|) P3 (spectral radius ) SDP constant factor O(|V| 4 polylog(|V|)) P3 (spectral radius) LP heuristicO(n 4 ) P4 (spectral radius) QP heuristicO(n 4 ) m: number of vaccines (budget) n: number of groups V: node set L: simulation times for greedy algorithm ZAVP, ICDM 2015

Outline 31  Motivation  Problem Definition  Our Proposed Methods  Experiments  Conclusion ZAVP, ICDM 2015

Experiments: datasets 32  Different Domains with range of sizes  SBM: Stochastic Block Model  PROTEIN: protein-protein interaction network  OREGON: Oregon AS router graph  YOUTUBE: friendship network  PORTLAND and MIAMI: epidemiology contact network  Large urban social-contact graphs used in national smallpox modeling studies [Eubank+, 2004] Each dataset has its natural division of groups SBMPROTEINOREGONYOUTUBEPORTLANDMIAMI |V|1,5002,36110K50K0.5 million0.6 million |E|5,0007,18222K450K1.6 million2.1 million Group ZAVP, ICDM 2015

Experiments: datasets 33  Baselines  RANDOM  uniformly randomly assign vaccines to groups  DEGREE  independently assign vaccines to groups based on their average degree of the groups  EIGEN  independently assign vaccines to groups based on their average eigenscore of the groups ZAVP, ICDM 2015

Results: Effectiveness (P1, P2) 34 P1/edge: YOUTUBE P2/node: PORTLAND GREEDY-LT consistently outperforms the baseline algorithms. 25K nodes Lower is better Ratio of Infected Nodes ZAVP, ICDM 2015 Our method

Results: Effectiveness (P3, P4) 35 P1/edge: PROTEINP2/node: PORTLAND SDP, LP and QP consistently outperform the baseline algorithms. Lower is better Ratio of EigenDrop ZAVP, ICDM 2015 Our methodsOur method

Results: Varying Num. of Group 36 Our algorithms consistently outperform other baseline algorithms as the number of groups changes P2/node: YOUTUBE Lower is better ZAVP, ICDM 2015 Our method P4/node: PORTLAND

Result: Case Study: age group 37 PORTLAND Observations: 1.our methods choose elder people; 2.other methods tend to uniformly distribute vaccines. The results match the current practice that CDC targets vulnerable people Vaccine Distributions for P4 on realistic epi. networks (Budget=10000). ZAVP, ICDM 2015 MIAMI

Outline 38  Motivation  Problem Definition  Our Proposed Methods  Experiments  Conclusion ZAVP, ICDM 2015

Conclusion: Group Immunization 39  Problem formulations  Group immunization policy  Select groups to distribute vaccines  Randomly remove edge/node from groups  Edge deletion and node deletion  Minimize the epidemic size and the spectral radius  Near-optimal algorithms  Greedy algorithm under LT model  (1-1/e)-approximation  Edge deletion for min. spectral radius  SDP: good approximation but slow  LP: fast  Node deletion for min. spectral radius  QP: fast ZAVP, ICDM 2015

Any questions? 40 Code at: Funding: Yao ZhangB. Aditya PrakashAbhijin AdigaAnil Vullikanti ZAVP, ICDM 2015

Backup slides ZAVP, ICDM

Why not epi size for cascade model ZAVP, ICDM  We do not specifically use IC model for primarily two reasons:  Spectral radius naturally generalize the corresponding individual-level immunization problems studied in past literature ([Tong+ ICDM2010], [Tong+ CIKM2012])  Using the spectral radius allows us to immediately formulate a general problem for multiple cascade-style models (like SIR/SIS/IC)  We can ignore the differences of their exact spreading process

Submodular over integer lattice ZAVP, ICDM  [Soma+ ICML2014] a function f over integer lattice is submodular if:  For an element s in a vector  Different from the diminishing return property in our paper