Scalable Vaccine Distribution in Large Graphs given Uncertain Data Yao Zhang, B. Aditya Prakash Department of Computer Science Virginia Tech CIKM, Shanghai,

Scalable Vaccine Distribution in Large Graphs given Uncertain Data Yao Zhang, B. Aditya Prakash Department of Computer Science Virginia Tech CIKM, Shanghai, November 6, 2014

Outline Motivation Problem Definition Our Proposed Methods Experiments Conclusion 2 Zhang and Prakash, CIKM2014

Propagation on networks 3 [from leverage.com] [from the Economist] Information spreads over social networks E.g., Millions of photos/messages sharing Virus outbreaks over population network E.g., WHO estimates 5,000 to 10,000 new Ebola cases weekly in West Africa by the first week of December Zhang and Prakash, CIKM2014

Motivation I: Diffusion models – Social Media In social media, information spreads over friendship networks E.g., Rumor spreads over Facebook fridendship network Independent cascade model (IC) [Kempe+, KDD03] Weights β ij : propagation prob. from i to j Each node has only one chance to infect its neighbors 4 Rumor spreading Zhang and Prakash, CIKM2014 β 12 β 13

Motivation I: Diffusion models – Epidemiology In epidemiology, virus spreads over population contact networks E.g., ebola, chickenpox, etc. may spread if people are coming to contact SIR model [Anderson+ 1991] Susceptible-Infectious-Recovered Weights β ij : propagation prob. from i to j Recovered prob. δ for each infected node 5 Ebola spreading Zhang and Prakash, CIKM2014 β 12 β 13 δ

Motivation II: Immunization Epidemiology Centers for Disease Control (CDC) Which people to vaccinate to control spread of Ebola? Social Media Twitter Which people to warn to stop rumors like “wall street crashing” 6 Common abstract goal: “find best nodes to remove” Zhang and Prakash, CIKM2014

Immunization Strategies Pre-emptive Strategy choose nodes before the epidemic starts Netshield [Tong+ 2010] Minimize the epidemic threshold (which is focusing on the largest eigenvalu e[Prakash+ 2011] ), above which a lot of people get infected 7 Which nodes to vaccinate Zhang and Prakash, CIKM2014

Immunization Strategies Pre-emptive Strategy choose nodes before the epidemic starts Netshield [Tong+ 2010] Data-aware Strategy choose nodes knowing current infections (which nodes are infected) DAVA-fast algorithm [Zhang and Prakash 2014] 8 Which nodes to vaccinate However… Zhang and Prakash, CIKM2014

Motivation III: Real Data is Uncertain Epidemiology Public-health surveillance 9 We don’t know who exactly are infected ? ? Each level have a certain probability to miss some truly infected people CNN headlines Not sure Surveillance Pyramid [Nishiura+, PLoS ONE 2011] CDC Lab Hospital Zhang and Prakash, CIKM2014

Social Media Twitter: due to the uniform samples [Morstatter+, ICWSM 2013], the relevant ‘infected’ tweets may be missed 10 ? ? Missing Motivation III: Real Data is Uncertain We don’t know who exactly are infected Tweets Sampled Tweets Sampling Zhang and Prakash, CIKM2014

How to design immunization strategy in the presence of uncertainty? Not sure if some nodes are infected More realistic intervention Challenge Cannot vaccinate/warn people who are already infected 11 Which nodes to vaccinate ? ? ? ? We call it Uncertain Data-Aware Vaccination Problem this paper Motivation III: Real Data is Uncertain Zhang and Prakash, CIKM2014

Outline Motivation Problem Definition Uncertainty Models Problem Formulation Our Proposed Methods Experiments Conclusion 12 Zhang and Prakash, CIKM2014

Uncertainty Models Uniform Identical prob. to be infected E.g., Twitter API Surveillance Each node takes a prob. from a set P E.g., Surveillance pyramid Prop-Deg The prob. to be infected is proportional to a node’s degree E.g., people with larger connections have higher prob. to be infected General Each node has its own infected prob. 13 Tweets Sampled Tweets Sampling We assume factorizable distributions: Zhang and Prakash, CIKM2014

Problem Formulation Uncertain Data-Aware Vaccination Problem (UDAV) Given: graph G(V,E), uncertain model U, infected node set I Find: the best set S of k nodes to vaccinate Such that: the final expected epidemic size is minimized 14 Which two nodes to vaccinate ? ? 0.5 ? ? 0.8 ? ? 0.5 ? ? 0.8 Formally: the expected number of infected nodes after vaccination in G i Expected epidemic size a “possible” world Zhang and Prakash, CIKM2014

Complexity of UDAV NP-hard, and cannot be approximated within an absolute error A special case of UDAV (equal to the deterministic case) is NP-hard [Zhang+ 2014] 15 Zhang and Prakash, CIKM2014

Overview of proposed methods UDAV is a stochastic optimization problem Sampling based method the Sample Average Approximation (SAA) framework Expectation based method the expected “situation” 17 Which two nodes to vaccinate ? ? 0.5 ? ? 0.8 ? ? 0.5 ? ? 0.8 Hedging Uncertainty Zhang and Prakash, CIKM2014

Outline Motivation Problem Definition Our Proposed Methods Sample-Cascade Expect-Max Experiments Conclusion 18 Zhang and Prakash, CIKM2014

Sample-Cascade: Idea Idea: sample deterministic cases, and take the average 19 ? ? 0.5 ? ? 0.8 UDAV can also be formulated as: The benefit of vaccinating the healthy node set S i in deterministic graph G i Working on the sampled graphs... Sample L Sample 1... Expected benefit 4 “possible” worlds Zhang and Prakash, CIKM2014

Sample-Cascade Issue 1: how to approximate 20 See paper for details Solution: use its lower bound (Lemma 1) Expected benefit on the dominator tree of G i Dominator tree: every path from the root to v contains u (see [Lengauer and Tarjan, 1979]). Here, the root is the set of infected nodes. Working on trees... Dominator tree of sampled graphs... Samples Zhang and Prakash, CIKM2014

Algorithm: 1. Sample G i from G and U, and Build dominator trees of G i 2. Select a* such that 3. Remove a from G 4. Goto Step 2 until |S|=k 21 Sample-Cascade Working on trees... Dominator tree of sampled graphs Zhang and Prakash, CIKM2014...

Sample-Cascade Issue 2: number of samples l 22 Running time: O(l*(k|E|+k|V|+ |V|log|V|)) Accurate, but too slow for large networks! Solution: (Hoeffding's Inequality) Worse case l=O(|V| 2 ) Working on trees... Dominator tree of sampled graphs Zhang and Prakash, CIKM2014...

Outline Motivation Problem Definition Our Proposed Methods Sample-Cascade Expect-Max Experiments Conclusion 23 Zhang and Prakash, CIKM2014

Expect-Max: Idea Idea: construct the expected “situation” (graph) 24 ? ? 0.5 ? ? 0.8 Original Graph : edge from super node Create a “super node” 0.5 0.8 1.0 See more details in the paper Lemma: when the budget=1, UDAV can be exactly solved on the expected graph Expected Graph G E How to calculate it? Super node Zhang and Prakash, CIKM2014

Calculating Benefit on the Expected Graph We propose two methods to calculate Using dominator tree Expect-Dom Using the drop of the first eigenvalue Expect-Eig 25 Zhang and Prakash, CIKM2014

Expect-Dom 26 Idea: use to approximate, the benefit on the expected graph G E 0.5 0.8 1.0 Expected Graph G E Dominator tree of G E Construct Dominator tree Step : 1. G E =Construct the expected graph 2. T = Build a dominator tree of G E 3. Select v with max. benefit on T 4. Remove v from G 5. Goto Step 3 until |S|=k Zhang and Prakash, CIKM2014

Expect-Eig 27 Idea: use to approximate, the benefit on the expected graph G E 0.5 0.8 1.0 Expected Graph G E 0.5 0.8 1.0 Expected Graph G E : the drop of the first eigenvalue (Measuring the threshold of the epidemic). Lemma : The number of newly infected nodes is bounded by the first eigenvalue (details in the paper) (Can be computed fast [Tong+, ICDM 2010]) Calculate Zhang and Prakash, CIKM2014

Expect-Eig 28 Idea: use to approximate, the benefit on the expected graph G E 0.5 0.8 1.0 Expected Graph G E 0.5 0.8 1.0 Calculate Step : 1. G E =Construct the expected graph 2. Select v with max. 3. Remove v from G 4. Goto Step 2 until |S|=k Zhang and Prakash, CIKM2014

Expect-Dom vs. Expect-Eig 29 Let α be the support of U the percentage of nodes that may be initially infected Zhang and Prakash, CIKM2014 ? ? 0.5 ? ? 0.8 α=0.5

Expect-Dom vs. Expect-Eig 30 More formal justification in the paper Let α be the support of U the percentage of nodes that may be initially infected As α increases, Observation I: Expect-Dom becomes worse Intuition: α is equal to 0: the deterministic case of UDAV (can be solved by DAVA-fast [Zhang and Prakash 2014] ) Observation II: Expect-Eig becomes better α increases, we have more and more uncertainty, which is close to the pre-emptive case (can be solved by Netshield [Tong+ 2010] ) Zhang and Prakash, CIKM2014

Expect-Max: a hybrid algorithm As they are complementary for different distributions and different networks (we don’t know where the crosspoint is) pick the better one between Expect- Dom and Expect-Eig 31 Idea: put Expect-Dom and Expect-Eig together Running time (subquadratic): O(k(|V|+|E|)+|V|log|V|+T) Zhang and Prakash, CIKM2014

Extending to SIR Our methods can be extended to SIR model Idea: using an equivalent IC model with the propagation probability 32 See paper for details Zhang and Prakash, CIKM2014

Experiments: datasets Social Media AS router graph: OREGON Hyperlink network: STANFORD Peer-to-peer network: GNUTELLA Friendship network: BRIGHTKITE Epidemiology PORTLAND and MIAMI large urban social-contact graph used in national smallpox modeling studies [Eubank+, 2004] 34 KARATEOREGONSTANFORDGNUTELLABRIGHTKITEPORTLANDMIAMI |V|346338,92910,87659,2280.5 million0.6 million |E|1562,17253,82939,9940.2 million1.6 million2.1 million ModelIC SIR Zhang and Prakash, CIKM2014

Experiments: setup Uncertainty models Uniform: p=0.6 Surveillance: p is chosen from {0.1, 0.5} Prop-Deg: p i =d i /d max Settings Uniformly randomly pick 5% of nodes as infected Number of samples: 500 35 See more details in the paper Tweets Sampled Tweets Sampling Zhang and Prakash, CIKM2014

Experiments: baselines OPTIMAL: brute-force algorithm which tries all possible cases (optimal, and only run it on KARATE) RANDOM: randomly uniformly choose nodes from W DEGREE: choose top-k nodes from W according to weighted degrees PAGERANK: choose top-k nodes from W with top pageranks PER-PRANK: choose top-k nodes from W with top personalized pageranks with respect to infected nodes DAVA-fast A fast data-aware immunization method in presence of already infected nodes [Zhang+, SDM 14] 36 W: a set of nodes that are not definitely infected (0<=p<1) Zhang and Prakash, CIKM2014

Results: Sample-Cas 37 Sample-Case Saves at least 90% of nodes compared to OPTIMAL Higher is better Close to optimal Zhang and Prakash, CIKM2014

Results: Expect-Max: α matters 38 STANFORD BRIGHTKITE R>1: Expect-Dom is better R<1: Expect-Eig is better R=1: cross point (different for different networks and different distributions) This is why we use Expect-Max Zhang and Prakash, CIKM2014

Results: Effectiveness 39 (See more results in the paper) GNUTELLA (IC) MIAMI (SIR) Higher is better Sample-Cas and Expect-Max consistently outperform the baseline algorithms. 10K nodes Zhang and Prakash, CIKM2014

Results: Scalability 40 Lower is better did not finish within 24 hours Running time(sec.) Zhang and Prakash, CIKM2014

Conclusion 42 Uncertain Data-Aware Vaccination Given: Graph and Uncertain model Find: ‘best’ k nodes for vaccination Uncertainty models Uniform, Surveillance, Prop-Deg, General Proposed Methods Sample-Cas: sampling graphs (slow, accurate) Expect-Max: constructing expected graph (fast, subquadratic) 0.5 0.8 1.0 ? ? 0.5 ? ? 0.8... Expected Graph Sampling Zhang and Prakash, CIKM2014

Any questions? 43 Code at: http://people.cs.vt.edu/~yaozhang Funding: Yao Zhang B. Aditya Prakash Zhang and Prakash, CIKM2014

Scalable Vaccine Distribution in Large Graphs given Uncertain Data Yao Zhang, B. Aditya Prakash Department of Computer Science Virginia Tech CIKM, Shanghai,

Similar presentations

Presentation on theme: "Scalable Vaccine Distribution in Large Graphs given Uncertain Data Yao Zhang, B. Aditya Prakash Department of Computer Science Virginia Tech CIKM, Shanghai,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Scalable Vaccine Distribution in Large Graphs given Uncertain Data Yao Zhang, B. Aditya Prakash Department of Computer Science Virginia Tech CIKM, Shanghai,

Similar presentations

Presentation on theme: "Scalable Vaccine Distribution in Large Graphs given Uncertain Data Yao Zhang, B. Aditya Prakash Department of Computer Science Virginia Tech CIKM, Shanghai,"— Presentation transcript:

Similar presentations

About project

Feedback