Download presentation
Presentation is loading. Please wait.
Published byBartholomew Lynch Modified over 9 years ago
1
Scalable Vaccine Distribution in Large Graphs given Uncertain Data Yao Zhang, B. Aditya Prakash Department of Computer Science Virginia Tech CIKM, Shanghai, November 6, 2014
2
Outline Motivation Problem Definition Our Proposed Methods Experiments Conclusion 2 Zhang and Prakash, CIKM2014
3
Propagation on networks 3 [from leverage.com] [from the Economist] Information spreads over social networks E.g., Millions of photos/messages sharing Virus outbreaks over population network E.g., WHO estimates 5,000 to 10,000 new Ebola cases weekly in West Africa by the first week of December Zhang and Prakash, CIKM2014
4
Motivation I: Diffusion models – Social Media In social media, information spreads over friendship networks E.g., Rumor spreads over Facebook fridendship network Independent cascade model (IC) [Kempe+, KDD03] Weights β ij : propagation prob. from i to j Each node has only one chance to infect its neighbors 4 Rumor spreading Zhang and Prakash, CIKM2014 β 12 β 13
5
Motivation I: Diffusion models – Epidemiology In epidemiology, virus spreads over population contact networks E.g., ebola, chickenpox, etc. may spread if people are coming to contact SIR model [Anderson+ 1991] Susceptible-Infectious-Recovered Weights β ij : propagation prob. from i to j Recovered prob. δ for each infected node 5 Ebola spreading Zhang and Prakash, CIKM2014 β 12 β 13 δ
6
Motivation II: Immunization Epidemiology Centers for Disease Control (CDC) Which people to vaccinate to control spread of Ebola? Social Media Twitter Which people to warn to stop rumors like “wall street crashing” 6 Common abstract goal: “find best nodes to remove” Zhang and Prakash, CIKM2014
7
Immunization Strategies Pre-emptive Strategy choose nodes before the epidemic starts Netshield [Tong+ 2010] Minimize the epidemic threshold (which is focusing on the largest eigenvalu e[Prakash+ 2011] ), above which a lot of people get infected 7 Which nodes to vaccinate Zhang and Prakash, CIKM2014
8
Immunization Strategies Pre-emptive Strategy choose nodes before the epidemic starts Netshield [Tong+ 2010] Data-aware Strategy choose nodes knowing current infections (which nodes are infected) DAVA-fast algorithm [Zhang and Prakash 2014] 8 Which nodes to vaccinate However… Zhang and Prakash, CIKM2014
9
Motivation III: Real Data is Uncertain Epidemiology Public-health surveillance 9 We don’t know who exactly are infected ? ? Each level have a certain probability to miss some truly infected people CNN headlines Not sure Surveillance Pyramid [Nishiura+, PLoS ONE 2011] CDC Lab Hospital Zhang and Prakash, CIKM2014
10
Social Media Twitter: due to the uniform samples [Morstatter+, ICWSM 2013], the relevant ‘infected’ tweets may be missed 10 ? ? Missing Motivation III: Real Data is Uncertain We don’t know who exactly are infected Tweets Sampled Tweets Sampling Zhang and Prakash, CIKM2014
11
How to design immunization strategy in the presence of uncertainty? Not sure if some nodes are infected More realistic intervention Challenge Cannot vaccinate/warn people who are already infected 11 Which nodes to vaccinate ? ? ? ? We call it Uncertain Data-Aware Vaccination Problem this paper Motivation III: Real Data is Uncertain Zhang and Prakash, CIKM2014
12
Outline Motivation Problem Definition Uncertainty Models Problem Formulation Our Proposed Methods Experiments Conclusion 12 Zhang and Prakash, CIKM2014
13
Uncertainty Models Uniform Identical prob. to be infected E.g., Twitter API Surveillance Each node takes a prob. from a set P E.g., Surveillance pyramid Prop-Deg The prob. to be infected is proportional to a node’s degree E.g., people with larger connections have higher prob. to be infected General Each node has its own infected prob. 13 Tweets Sampled Tweets Sampling We assume factorizable distributions: Zhang and Prakash, CIKM2014
14
Problem Formulation Uncertain Data-Aware Vaccination Problem (UDAV) Given: graph G(V,E), uncertain model U, infected node set I Find: the best set S of k nodes to vaccinate Such that: the final expected epidemic size is minimized 14 Which two nodes to vaccinate ? ? 0.5 ? ? 0.8 ? ? 0.5 ? ? 0.8 Formally: the expected number of infected nodes after vaccination in G i Expected epidemic size a “possible” world Zhang and Prakash, CIKM2014
15
Complexity of UDAV NP-hard, and cannot be approximated within an absolute error A special case of UDAV (equal to the deterministic case) is NP-hard [Zhang+ 2014] 15 Zhang and Prakash, CIKM2014
16
Outline Motivation Problem Definition Our Proposed Methods Experiments Conclusion 16 Zhang and Prakash, CIKM2014
17
Overview of proposed methods UDAV is a stochastic optimization problem Sampling based method the Sample Average Approximation (SAA) framework Expectation based method the expected “situation” 17 Which two nodes to vaccinate ? ? 0.5 ? ? 0.8 ? ? 0.5 ? ? 0.8 Hedging Uncertainty Zhang and Prakash, CIKM2014
18
Outline Motivation Problem Definition Our Proposed Methods Sample-Cascade Expect-Max Experiments Conclusion 18 Zhang and Prakash, CIKM2014
19
Sample-Cascade: Idea Idea: sample deterministic cases, and take the average 19 ? ? 0.5 ? ? 0.8 UDAV can also be formulated as: The benefit of vaccinating the healthy node set S i in deterministic graph G i Working on the sampled graphs... Sample L Sample 1... Expected benefit 4 “possible” worlds Zhang and Prakash, CIKM2014
20
Sample-Cascade Issue 1: how to approximate 20 See paper for details Solution: use its lower bound (Lemma 1) Expected benefit on the dominator tree of G i Dominator tree: every path from the root to v contains u (see [Lengauer and Tarjan, 1979]). Here, the root is the set of infected nodes. Working on trees... Dominator tree of sampled graphs... Samples Zhang and Prakash, CIKM2014
21
Algorithm: 1. Sample G i from G and U, and Build dominator trees of G i 2. Select a* such that 3. Remove a from G 4. Goto Step 2 until |S|=k 21 Sample-Cascade Working on trees... Dominator tree of sampled graphs Zhang and Prakash, CIKM2014...
22
Sample-Cascade Issue 2: number of samples l 22 Running time: O(l*(k|E|+k|V|+ |V|log|V|)) Accurate, but too slow for large networks! Solution: (Hoeffding's Inequality) Worse case l=O(|V| 2 ) Working on trees... Dominator tree of sampled graphs Zhang and Prakash, CIKM2014...
23
Outline Motivation Problem Definition Our Proposed Methods Sample-Cascade Expect-Max Experiments Conclusion 23 Zhang and Prakash, CIKM2014
24
Expect-Max: Idea Idea: construct the expected “situation” (graph) 24 ? ? 0.5 ? ? 0.8 Original Graph : edge from super node Create a “super node” 0.5 0.8 1.0 See more details in the paper Lemma: when the budget=1, UDAV can be exactly solved on the expected graph Expected Graph G E How to calculate it? Super node Zhang and Prakash, CIKM2014
25
Calculating Benefit on the Expected Graph We propose two methods to calculate Using dominator tree Expect-Dom Using the drop of the first eigenvalue Expect-Eig 25 Zhang and Prakash, CIKM2014
26
Expect-Dom 26 Idea: use to approximate, the benefit on the expected graph G E 0.5 0.8 1.0 Expected Graph G E Dominator tree of G E Construct Dominator tree Step : 1. G E =Construct the expected graph 2. T = Build a dominator tree of G E 3. Select v with max. benefit on T 4. Remove v from G 5. Goto Step 3 until |S|=k Zhang and Prakash, CIKM2014
27
Expect-Eig 27 Idea: use to approximate, the benefit on the expected graph G E 0.5 0.8 1.0 Expected Graph G E 0.5 0.8 1.0 Expected Graph G E : the drop of the first eigenvalue (Measuring the threshold of the epidemic). Lemma : The number of newly infected nodes is bounded by the first eigenvalue (details in the paper) (Can be computed fast [Tong+, ICDM 2010]) Calculate Zhang and Prakash, CIKM2014
28
Expect-Eig 28 Idea: use to approximate, the benefit on the expected graph G E 0.5 0.8 1.0 Expected Graph G E 0.5 0.8 1.0 Calculate Step : 1. G E =Construct the expected graph 2. Select v with max. 3. Remove v from G 4. Goto Step 2 until |S|=k Zhang and Prakash, CIKM2014
29
Expect-Dom vs. Expect-Eig 29 Let α be the support of U the percentage of nodes that may be initially infected Zhang and Prakash, CIKM2014 ? ? 0.5 ? ? 0.8 α=0.5
30
Expect-Dom vs. Expect-Eig 30 More formal justification in the paper Let α be the support of U the percentage of nodes that may be initially infected As α increases, Observation I: Expect-Dom becomes worse Intuition: α is equal to 0: the deterministic case of UDAV (can be solved by DAVA-fast [Zhang and Prakash 2014] ) Observation II: Expect-Eig becomes better α increases, we have more and more uncertainty, which is close to the pre-emptive case (can be solved by Netshield [Tong+ 2010] ) Zhang and Prakash, CIKM2014
31
Expect-Max: a hybrid algorithm As they are complementary for different distributions and different networks (we don’t know where the crosspoint is) pick the better one between Expect- Dom and Expect-Eig 31 Idea: put Expect-Dom and Expect-Eig together Running time (subquadratic): O(k(|V|+|E|)+|V|log|V|+T) Zhang and Prakash, CIKM2014
32
Extending to SIR Our methods can be extended to SIR model Idea: using an equivalent IC model with the propagation probability 32 See paper for details Zhang and Prakash, CIKM2014
33
Outline Motivation Problem Definition Our Proposed Methods Experiments Conclusion 33 Zhang and Prakash, CIKM2014
34
Experiments: datasets Social Media AS router graph: OREGON Hyperlink network: STANFORD Peer-to-peer network: GNUTELLA Friendship network: BRIGHTKITE Epidemiology PORTLAND and MIAMI large urban social-contact graph used in national smallpox modeling studies [Eubank+, 2004] 34 KARATEOREGONSTANFORDGNUTELLABRIGHTKITEPORTLANDMIAMI |V|346338,92910,87659,2280.5 million0.6 million |E|1562,17253,82939,9940.2 million1.6 million2.1 million ModelIC SIR Zhang and Prakash, CIKM2014
35
Experiments: setup Uncertainty models Uniform: p=0.6 Surveillance: p is chosen from {0.1, 0.5} Prop-Deg: p i =d i /d max Settings Uniformly randomly pick 5% of nodes as infected Number of samples: 500 35 See more details in the paper Tweets Sampled Tweets Sampling Zhang and Prakash, CIKM2014
36
Experiments: baselines OPTIMAL: brute-force algorithm which tries all possible cases (optimal, and only run it on KARATE) RANDOM: randomly uniformly choose nodes from W DEGREE: choose top-k nodes from W according to weighted degrees PAGERANK: choose top-k nodes from W with top pageranks PER-PRANK: choose top-k nodes from W with top personalized pageranks with respect to infected nodes DAVA-fast A fast data-aware immunization method in presence of already infected nodes [Zhang+, SDM 14] 36 W: a set of nodes that are not definitely infected (0<=p<1) Zhang and Prakash, CIKM2014
37
Results: Sample-Cas 37 Sample-Case Saves at least 90% of nodes compared to OPTIMAL Higher is better Close to optimal Zhang and Prakash, CIKM2014
38
Results: Expect-Max: α matters 38 STANFORD BRIGHTKITE R>1: Expect-Dom is better R<1: Expect-Eig is better R=1: cross point (different for different networks and different distributions) This is why we use Expect-Max Zhang and Prakash, CIKM2014
39
Results: Effectiveness 39 (See more results in the paper) GNUTELLA (IC) MIAMI (SIR) Higher is better Sample-Cas and Expect-Max consistently outperform the baseline algorithms. 10K nodes Zhang and Prakash, CIKM2014
40
Results: Scalability 40 Lower is better did not finish within 24 hours Running time(sec.) Zhang and Prakash, CIKM2014
41
Outline Motivation Problem Definition Our Proposed Methods Experiments Conclusion 41 Zhang and Prakash, CIKM2014
42
Conclusion 42 Uncertain Data-Aware Vaccination Given: Graph and Uncertain model Find: ‘best’ k nodes for vaccination Uncertainty models Uniform, Surveillance, Prop-Deg, General Proposed Methods Sample-Cas: sampling graphs (slow, accurate) Expect-Max: constructing expected graph (fast, subquadratic) 0.5 0.8 1.0 ? ? 0.5 ? ? 0.8... Expected Graph Sampling Zhang and Prakash, CIKM2014
43
Any questions? 43 Code at: http://people.cs.vt.edu/~yaozhang Funding: Yao Zhang B. Aditya Prakash Zhang and Prakash, CIKM2014
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.