Presentation is loading. Please wait.

Presentation is loading. Please wait.

Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015.

Similar presentations


Presentation on theme: "Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015."— Presentation transcript:

1 Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

2 Networks are everywhere! Human Disease Network [Barabasi 2007] Gene Regulatory Network [Decourty 2008] Facebook Network [2010] The Internet [2005] Prakash 20152

3 Dynamical Processes over networks are also everywhere! Prakash 20153

4 Why do we care? Social collaboration Information Diffusion Viral Marketing Epidemiology and Public Health Cyber Security Human mobility Games and Virtual Worlds Ecology........ Prakash 20154

5 Why do we care? (1: Epidemiology) Dynamical Processes over networks [AJPH 2007] CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts Diseases over contact networks SI Model Prakash 20155

6 Why do we care? (1: Epidemiology) Dynamical Processes over networks Each circle is a hospital ~3000 hospitals More than 30,000 patients transferred [US-MEDICARE NETWORK 2005] Problem: Given k units of disinfectant, whom to immunize? Prakash 20156

7 Why do we care? (1: Epidemiology) CURRENT PRACTICEOUR METHOD ~6x fewer! [US-MEDICARE NETWORK 2005] Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year) Prakash 20157

8 Why do we care? (2: Online Diffusion) > 800m users, ~$1B revenue [WSJ 2010] ~100m active users > 50m users Prakash 20158

9 Why do we care? (2: Online Diffusion) Dynamical Processes over networks Celebrity Buy Versace™! Followers Social Media Marketing Prakash 20159

10 Why do we care? (3: To change the world?) Dynamical Processes over networks Social networks and Collaborative Action Prakash 201510

11 High Impact – Multiple Settings Q. How to squash rumors faster? Q. How do opinions spread? Q. How to market better? epidemic out-breaks products/viruses transmit s/w patches Prakash 201511

12 Research Theme DATA Large real-world networks & processes ANALYSIS Understanding POLICY/ ACTION Managing/Utili zing Prakash 201512

13 Research Theme – Public Health DATA Modeling # patient transfers ANALYSIS Will an epidemic happen? POLICY/ ACTION How to control out-breaks? Prakash 201513

14 Research Theme – Social Media DATA Modeling Tweets spreading POLICY/ ACTION How to market better? ANALYSIS # cascades in future? Prakash 201514

15 In this talk DATA Large real-world networks & processes Q1: How to predict Flu- trends better? Q2: How does ‘activity’ evolve over time? Prakash 201515

16 In this talk Q3: How to control out- breaks? POLICY/ ACTION Utilizing Prakash 201516

17 Outline Motivation Part 1: Learning Models (Empirical Studies) Part 2: Policy and Action (Algorithms) Conclusion Prakash 201517

18 Part 1: Empirical Studies Q1: How to predict Flu-trends better? Q2: How does activity evolve over time? Prakash 201518

19 Surveillance How to estimate and predict flu trends? 19 Population survey Hospital record Lab survey Surveillance Report Prakash 2015

20 GFT & Twitter Estimate flu trends using online electronic sources 20 So cold today, I’m catching cold. I have headache, sore throat, I can’t go to school today. My nose is totally congested, I have a hard time understanding what I’m saying. Prakash 2015

21 Observation 1: States There are different states in an infection cycle. SEIR model: 1. Susceptible 2. Exposed 3. Infected 4. Recovered 21 Prakash 2015

22 Observation 2: Ep. & So. Gap Infection cases drop exponentially in epidemiology (Hethcote 2000) Keyword mentions drop in a power-law pattern in social media (Matsubara 2012) 22 Prakash 2015

23 HFSTM Model Hidden Flu-State from Tweet Model (HFSTM) – Each word (w) in a tweet (O i ) can be generated by: A background topic Non-flu related topics State related topics 23 Binary background switch Binary non- flu related switch Word distribution Latent state Initial prob. Transit. prob. Transit. switch Prakash 2015

24 HFSTM Model Generating tweets 24 Generate the state for a tweet Generate the topic for a word State: [S,E,I] Topic: [Background, Non-flu, State] S:S: good This restaurant is really E:E:Themovie was good but it was freezing I:I: I think I haveflu Prakash 2015

25 EM-based algorithm: HFSTM-FIT – E-step: A t (i)=P(O 1,O 2,…,O t,S t =i) B t (i)=P(O t+1,…,O Tu |S t =i) γ t (i)=P(S t =i|O u ) – M-step: Other parameters such as state transition probabilities, topic distributions, etc. – Parameters learned: Inference 25 Prakash 2015

26 A possible issue with HFSTM Suffers from large, noisy vocabulary. Semi-supervision for improvement – Introduce weak supervision into HFSTM. 26 Prakash 2015

27 HFSTM-A HFSTM-A(spect) – Introduce an aspect variable y, expressing our belief on whether a word is flu-related or not. – The value of y biases the switch variables s.t. flu-related words are more likely to be explained by state topics. 27 When the aspect value (y) is introduced, the switching probability are updated accordingly. Prakash 2015

28 Vocabulary & Dataset Vocabulary (230 words): – Flu-related keyword list by Chakraborty SDM 2014 – Extra state-related keyword list Dataset (34,000 tweets): – Identify infected users and collect their tweets – Train on data from Jun 20, 2013-Aug 06, 2013 – Test on two time period: Dec 01, 2012- July 08, 2013 Nov 10, 2013-Jan 26, 2014 28 Prakash 2015

29 Learned word distributions The most probable words learned in each state 29 Probably healthy: S Having symptons: EDefinitely sick: I Prakash 2015

30 Learned state transition Transition probabilitiesTransition in real tweets 30 Not directly flu-related, yet correctly identified Learned by HFSTM: Prakash 2015

31 Flu trend fitting Ground-truth: – The Pan American Health Organization (PAHO) Algorithms: – Baseline: Count the number of keywords weekly as features, and regress to the ground-truth curve. – Google flu trend: Take the google flu trend data as input, regress to the PAHO curve. – HFSTM: Distinguish different states of keyword, and only use the number of keywords in I state. Again regress to PAHO. 31 Prakash 2015

32 Flu trend fitting Linear regression to the case count reported by PAHO (the ground-truth) 32 Prakash 2015

33 HFSTM-A Results are qualitatively similar with HFSTM, when the vocabulary is 10 times larger. 33 See Poster! Prakash 2015

34 Part 1: Empirical Studies Q1: How to predict Flu-trends better? Q2: How does activity evolve over time? Prakash 201534

35 Google Search Volume e.g., given (1) first spike, (2) release date of two sequel movies (3) access volume before the release date ? ? ? ? (1) First spike(2) Release date(3) Two weeks before release Prakash 201535

36 Patterns X Y Prakash 201536

37 Patterns X Y More Data Prakash 201537

38 Patterns X Y Anomaly ? Prakash 201538

39 Patterns X Y Anomaly ? Extrapolation Prakash 201539

40 Patterns X Y Anomaly Imputation Extrapolation Prakash 201540

41 Patterns Anomaly Imputation Extrapolation Compression Prakash 201541

42 Meme (# of mentions in blogs) – short phrases Sourced from U.S. politics in 2008 “you can put lipstick on a pig” “yes we can” Rise and fall patterns in social media Prakash 201542

43 Rise and fall patterns in social media Can we find a unifying model, which includes these patterns? four classes on YouTube [Crane et al. ’08] six classes on Meme [Yang et al. ’11] Prakash 201543

44 Rise and fall patterns in social media Answer: YES! We can represent all patterns by single model In Matsubara, Sakurai, Prakash+ SIGKDD 2012 Prakash 201544

45 Main idea - SpikeM -1. Un-informed bloggers (uninformed about rumor) -2. External shock at time n b (e.g, breaking news) -3. Infection (word-of-mouth) Infectiveness of a blog-post at age n: -Strength of infection (quality of news) -Decay function (how infective a blog posting is) Time n=0Time n=n b Time n=n b +1 β Power Law Prakash 201545

46 -1.5 slope J. G. Oliveira et. al. Human Dynamics: The Correspondence Patterns of Darwin and Einstein. Nature 437, 1251 (2005). [PDF]PDF (also in Leskovec, McGlohon+, SDM 2007) Prakash 201546

47 SpikeM - with periodicity Full equation of SpikeM Periodicity 12pm Peak activity 3am Low activity Time n Bloggers change their activity over time (e.g., daily, weekly, yearly) Bloggers change their activity over time (e.g., daily, weekly, yearly) activity Prakash 201547

48 Tail-part forecasts SpikeM can capture tail part Prakash 201548

49 “What-if” forecasting e.g., given (1) first spike, (2) release date of two sequel movies (3) access volume before the release date ? ? ? ? (1) First spike(2) Release date(3) Two weeks before release Prakash 201549

50 “What-if” forecasting – SpikeM can forecast not only tail-part, but also rise-part ! SpikeM can forecast upcoming spikes (1) First spike(2) Release date(3) Two weeks before release Prakash 201550

51 Modeling Malware Penetration Worldwide Intelligence Network – Which machine got which malware (or legitimate files) – 1 Billion nodes – 37 Billion edges Q: Temporal patterns? [Papalexakakis et. al. + 2013] Prakash 201551

52 Q: Temporal Patterns Looks familiar? Prakash 201552

53 SpikeM again (or SharkFin) 7 parameters only! ~ 400 points Prakash 201553

54 Latent Propagation Patterns Prakash 201554

55 Bonus: Protest Predictions Can Twitter provide a lead time? South American twitter dataset – Language: Spanish/Portuguese – Idea 1.Look for trending keywords. 2.Predict event type for protest using SpikeM parameters! A political tweet Violent Protest (VP) Non Violent Protest (P) [Sundereisan et al. ASONAM 2014] [Jin et al. SIGKDD 2014] Prakash 201555 VP P

56 Part 1: Algorithms Q3: How to control out-breaks? (Broad theme: Network Topology Manipulation) Prakash 201556

57 Immunization (= Interventions) Different Flavors: – Pre-emptive – Data-aware Prakash 201557

58 Pre-emptive: Vulnerability First eigenvalue λ 1 (of adjacency matrix) is sufficient for most diffusion models. [Prakash et al. ICDM’12 selected for best papers] λ 1 is the epidemic threshold “ Safe” “Vulnerable” “Deadly” Increasing λ 1, Increasing vulnerability Prakash 201558

59 Goal Decrease λ 1 as much as possible Node based [Tong, Prakash, + ICDM 2010] Edge-based [Tong, Prakash, Eliassi-Rad+ CIKM 2012, Best Paper Award] Edge-Manipulation (see next) Prakash 201559

60 Fractional Asymmetric Immunization Hospital Another Hospital Drug-resistant Bacteria (like XDR-TB) Prakash 201560 [Prakash, Adamic, Iwashnya (M.D.) SDM 2013]

61 Fractional Asymmetric Immunization Hospital Another Hospital Drug-resistant Bacteria (like XDR-TB) = f Prakash 201561

62 Fractional Asymmetric Immunization Hospital Another Hospital Problem: Given k units of disinfectant, how to distribute them to maximize hospitals saved? Prakash 201562

63 Our Solution Part 1: Value – Approximate Eigen-drop (Δ λ) – Matrix perturbation theory Part 2: Algorithm – Greedily pick best node at each step – Near-optimal due to submodularity SmartAlloc (linear complexity) 63Prakash 2015

64 Our Algorithm “SMART-ALLOC” ~CURRENT PRACTICESMART-ALLOC [US-MEDICARE NETWORK 2005] Each circle is a hospital, ~3000 hospitals More than 30,000 patients transferred ~6x fewer! Prakash 201564

65 Running Time ≈ Simulations (Best competitor) SMART-ALLOC > 1 week 14 secs > 30,000x speed-up! Wall-Clock Time Lower is better 65Prakash 2015

66 Experiments K = 200K = 2000 PENN-NETWORK SECOND-LIFE ~5 x ~2.5 x Lower is better Prakash 201566

67 Latest results First (provable) approximation algorithms for edge-based problem ([Saha, Adiga, Prakash, Vullikanti SDM 2015]) – O(log^2 n)--factor (can be improved to O(log n)) Based on the idea of removing closed walks – Semi-Definite Programming Rounding-based O(1) factor Prakash 201567

68 Data-aware Immunization Dominator tree Graph with infected nodes Given: Graph and Infected nodes Find: ‘best’ nodes for immunization Complexity – NP-hard – Hard to approximate within an absolute error DAVA-tree – Optimal solution on the tree DAVA and DAVA-fast – Merging infected nodes – Build a “dominator tree”, and run DAVA-tree Running time: subquadratic – DAVA: O(k(|E|+ |V|log|V|)) – DAVA-fast: O(|E|+|V|log|V|) [Zhang and Prakash, SDM 2014] Prakash 201568

69 Extensions Can be extended to Uncertain and noisy initial data as well! [Zhang and Prakash, CIKM 2014] Twitter Firehose API 1% sample Prakash 201569

70 Group-based Immunization How to select groups to minimize the epidemic? A F E D C B Epidemiology Contact networks People are grouped by ages, demographics, occupations … Social Media Friendship networks Friends are grouped by the same interests E.g., Facebook pages [Zhang, Adiga, Vullikanti, Prakash, ICDM 2015] See Poster! Prakash 201570

71 Outline Motivation Part 1: Learning Models (Empirical Studies) Part 2: Policy and Action (Algorithms) Conclusion and Future Plans Prakash 201571

72 Future Plans DATA Large real-world networks & processes ANALYSIS Understanding POLICY/ ACTION Managing Prakash 201572

73 Scalability – Big Data Datasets of unprecedented scale – High dimensionality and sample size! Need scalable algorithms for – Learning Models – Developing Policy Leverage parallel systems – Map-Reduce clusters (like Hadoop) for data-intensive jobs (more than 6000 machines) – Parallelized compute-intensive simulations (like Condor) Prakash 201573

74 Uncertain Data in Cascade analysis (more implementable policies) Original, Nodes sampled off Culprits, and missing nodes filled in Sundereisan, Vreeken, Prakash. 2014 Correcting for missing data Designing More Robust Immunization Policies Zhang and Prakash. CIKM 2014 Prakash 201574

75 References 1. Scalable Vaccine Distribution in Large Graphs given Uncertain Data (Yao Zhang and B. Aditya Prakash) -- In CIKM 2014. 2. Fast Influence-based Coarsening for Large Networks (Manish Purohit, B. Aditya Prakash, Chahhyun Kang, Yao Zhang and V. S. Subrahmanian) – In SIGKDD 2014 3. DAVA: Distributing Vaccines over Large Networks under Prior Information (Yao Zhang and B. Aditya Prakash) -- In SDM 2014 4. Fractional Immunization on Networks (B. Aditya Prakash, Lada Adamic, Jack Iwashnya, Hanghang Tong, Christos Faloutsos) – In SDM 2013 5. Spotting Culprits in Epidemics: Who and How many? (B. Aditya Prakash, Jilles Vreeken, Christos Faloutsos) – In ICDM 2012, Brussels Vancouver (Invited to KAIS Journal Best Papers of ICDM.) 6. Gelling, and Melting, Large Graphs through Edge Manipulation (Hanghang Tong, B. Aditya Prakash, Tina Eliassi-Rad, Michalis Faloutsos, Christos Faloutsos) – In ACM CIKM 2012, Hawaii (Best Paper Award) 7. Rise and Fall Patterns of Information Diffusion: Model and Implications (Yasuko Matsubara, Yasushi Sakurai, B. Aditya Prakash, Lei Li, Christos Faloutsos) – In SIGKDD 2012, Beijing 8. Interacting Viruses on a Network: Can both survive? (Alex Beutel, B. Aditya Prakash, Roni Rosenfeld, Christos Faloutsos) – In SIGKDD 2012, Beijing 9. Winner-takes-all: Competing Viruses or Ideas on fair-play networks (B. Aditya Prakash, Alex Beutel, Roni Rosenfeld, Christos Faloutsos) – In WWW 2012, Lyon 10. Threshold Conditions for Arbitrary Cascade Models on Arbitrary Networks (B. Aditya Prakash, Deepayan Chakrabarti, Michalis Faloutsos, Nicholas Valler, Christos Faloutsos) - In IEEE ICDM 2011, Vancouver (Invited to KAIS Journal Best Papers of ICDM.) 11. Times Series Clustering: Complex is Simpler! (Lei Li, B. Aditya Prakash) - In ICML 2011, Bellevue 12. Epidemic Spreading on Mobile Ad Hoc Networks: Determining the Tipping Point (Nicholas Valler, B. Aditya Prakash, Hanghang Tong, Michalis Faloutsos and Christos Faloutsos) – In IEEE NETWORKING 2011, Valencia, Spain 13. Formalizing the BGP stability problem: patterns and a chaotic model (B. Aditya Prakash, Michalis Faloutsos and Christos Faloutsos) – In IEEE INFOCOM NetSciCom Workshop, 2011. 14. On the Vulnerability of Large Graphs (Hanghang Tong, B. Aditya Prakash, Tina Eliassi-Rad and Christos Faloutsos) – In IEEE ICDM 2010, Sydney, Australia 15. Virus Propagation on Time-Varying Networks: Theory and Immunization Algorithms (B. Aditya Prakash, Hanghang Tong, Nicholas Valler, Michalis Faloutsos and Christos Faloutsos) – In ECML-PKDD 2010, Barcelona, Spain 16. MetricForensics: A Multi-Level Approach for Mining Volatile Graphs (Keith Henderson, Tina Eliassi-Rad, Christos Faloutsos, Leman Akoglu, Lei Li, Koji Maruhashi, B. Aditya Prakash and Hanghang Tong) - In SIGKDD 2010, Washington D.C. Prakash 201575

76 Acknowledgements Collaborators Christos Faloutsos Roni Rosenfeld, Michalis Faloutsos, Lada Adamic, Theodore Iwashyna (M.D.), Dave Andersen, Tina Eliassi-Rad, Iulian Neamtiu, Varun Gupta, Jilles Vreeken, V. S. Subrahmanian John Brownstein (M.D.) Deepayan Chakrabarti, Hanghang Tong, Kunal Punera, Ashwin Sridharan, Sridhar Machiraju, Mukund Seshadri, Alice Zheng, Lei Li, Polo Chau, Nicholas Valler, Alex Beutel, Xuetao Wei Prakash 201576

77 Acknowledgements Students Liangzhe Chen Shashidhar Sundereisan Benjamin Wang Yao Zhang Sorour Amiri Prakash 201577

78 Acknowledgements Funding Prakash 201578

79 Analysis Policy/Action Data Making Diffusion Work for You B. Aditya Prakash http://www.cs.vt.edu/~badityap Prakash 201579


Download ppt "Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015."

Similar presentations


Ads by Google