Presentation is loading. Please wait.

Presentation is loading. Please wait.

B. Aditya Prakash Naren Ramakrishnan

Similar presentations


Presentation on theme: "B. Aditya Prakash Naren Ramakrishnan"— Presentation transcript:

1 B. Aditya Prakash Naren Ramakrishnan
Leveraging Propagation for Data Mining Models, Algorithms & Applications B. Aditya Prakash Naren Ramakrishnan August 10, Tutorial, SIGKDD 2016, San Francisco

2 Prakash and Ramakrishnan 2016
About us B. Aditya Prakash Asst. Professor CS, Virginia Tech. PhD. CMU, 2012. Data Mining, Applied ML Graph and Time-series mining Applications to Social Media, Epidemiology/Public Health, Cyber Security Homepage: Prakash and Ramakrishnan 2016

3 Prakash and Ramakrishnan 2016
About us Naren Ramakrishnan Thomas L. Phillips Prof. CS, Virginia Tech. PhD. Purdue, 1997. Data mining for intelligence analysis, forecasting, sustainability, and health informatics Homepage: Prakash and Ramakrishnan 2016

4 Prakash and Ramakrishnan 2016
Tutorial webpage All Slides will be posted there. Talk video as well (later). Prakash and Ramakrishnan 2016

5 Networks are everywhere!
Facebook Network [2010] Gene Regulatory Network [Decourty 2008] Human Disease Network [Barabasi 2007] The Internet [2005] Prakash and Ramakrishnan 2016

6 Dynamical Processes over networks are also everywhere!
Prakash and Ramakrishnan 2016

7 Prakash and Ramakrishnan 2016
Why do we care? Social collaboration Information Diffusion Viral Marketing Epidemiology and Public Health Cyber Security Human mobility Games and Virtual Worlds Ecology Prakash and Ramakrishnan 2016

8 Why do we care? (1: Epidemiology)
Dynamical Processes over networks [AJPH 2007] CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts Diseases over contact networks Prakash and Ramakrishnan 2016

9 Why do we care? (1: Epidemiology)
Dynamical Processes over networks Each circle is a hospital ~3000 hospitals More than 30,000 patients transferred Mention number of hospitals Patients transferred [US-MEDICARE NETWORK 2005] Problem: Given k units of disinfectant, whom to immunize? Prakash and Ramakrishnan 2016

10 Why do we care? (1: Epidemiology)
~6x fewer! [US-MEDICARE NETWORK 2005] CURRENT PRACTICE OUR METHOD Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year) Prakash and Ramakrishnan 2016

11 Why do we care? (2: Online Diffusion)
> 800m users, ~$1B revenue [WSJ 2010] ~100m active users > 50m users Prakash and Ramakrishnan 2016

12 Why do we care? (2: Online Diffusion)
Dynamical Processes over networks Buy Versace™! Celebrity Followers Social Media Marketing Prakash and Ramakrishnan 2016

13 Why do we care? (3: To change the world?)
Dynamical Processes over networks Social networks and Collaborative Action Prakash and Ramakrishnan 2016

14 High Impact – Multiple Settings
epidemic out-breaks Q. How to squash rumors faster? Q. How do opinions spread? Q. How to market better? products/viruses transmit s/w patches Prakash and Ramakrishnan 2016

15 Research Theme ANALYSIS POLICY/ ACTION DATA Understanding Managing
Large real-world networks & processes Prakash and Ramakrishnan 2016

16 Research Theme – Public Health
ANALYSIS Will an epidemic happen? POLICY/ ACTION How to control out-breaks? DATA Modeling # patient transfers Prakash and Ramakrishnan 2016

17 Research Theme – Social Media
ANALYSIS # cascades in future? POLICY/ ACTION How to market better? DATA Modeling Tweets spreading Prakash and Ramakrishnan 2016

18 Prakash and Ramakrishnan 2016
In this tutorial Given propagation models, on arbitrary networks: Q1: What is the epidemic threshold? Q2: How do viruses compete? With extensions to dynamic networks, multi-profile networks etc. Fundamental Models Understanding Prakash and Ramakrishnan 2016

19 In this tutorial Algorithms Q3: How to estimate and learn
influence and networks? Q4: How to immunize and control out-breaks better? Q5: How to reverse-engineer epidemics? Q6: How to leverage viral marketing? Q7: How to pick sensors for graphs? Algorithms Managing/Manipulating Prakash and Ramakrishnan 2016

20 In this tutorial How to use propagation for _________
Q8: Memes, Tweets, Blogs Q9: Disease Surveillance Q10: Protest Trends Q11: Malware Attacks Q12: General Graph Mining Applications Large real-world networks & processes Prakash and Ramakrishnan 2016

21 Prakash and Ramakrishnan 2016
Plan Three breaks! 2-2:05pm 3-3:30pm (conference coffee break) 4:15-4:20pm Part 2: Algorithms starts at roughly 1:50pm Part 3: Applications at 3:30pm (after the coffee break) Please interrupt anytime for questions Prakash and Ramakrishnan 2016

22 Prakash and Ramakrishnan 2016
Outline Motivation Part 1: Understanding Epidemics (Theory) Part 2: Policy and Action (Algorithms) Part 3: Applications (Data-Driven) Conclusion single virus VS multiple viruses Prakash and Ramakrishnan 2016

23 Prakash and Ramakrishnan 2016
Part 1: Theory Q1: What is the epidemic threshold? Q2: How do viruses compete? Prakash and Ramakrishnan 2016

24 A fundamental question
Strong Virus Epidemic? Prakash and Ramakrishnan 2016

25 example (static graph)
Weak Virus Epidemic? Prakash and Ramakrishnan 2016

26 Prakash and Ramakrishnan 2016
Problem Statement above (epidemic) below (extinction) # Infected time Find, a condition under which virus will die out exponentially quickly regardless of initial infection condition Separate the regimes? Prakash and Ramakrishnan 2016

27 Threshold (static version)
Problem Statement Given: Graph G, and Virus specs (attack prob. etc.) Find: A condition for virus extinction/invasion Prakash and Ramakrishnan 2016

28 Threshold: Why important?
Accelerating simulations Forecasting (‘What-if’ scenarios Design of contagion and/or topology A great handle to manipulate the spreading Immunization Maximize collaboration ….. Bring the snapshot of previous slide Prakash and Ramakrishnan 2016

29 Prakash and Ramakrishnan 2016
Part 1: Theory Q1: What is the epidemic threshold? Background Result and Intuition (Static Graphs) Proof Ideas (Static Graphs) Bonus: Dynamic Graphs Q2: How do viruses compete? Prakash and Ramakrishnan 2016

30 “SIR” model: life immunity (mumps)
Background “SIR” model: life immunity (mumps) Each node in the graph is in one of three states Susceptible (i.e. healthy) Infected Removed (i.e. can’t get infected again) Prob. δ Prob. β spend less time explaining the stuff…. t = 1 t = 2 t = 3 Prakash and Ramakrishnan 2016

31 Terminology: continued
Background Terminology: continued Other virus propagation models (“VPM”) SIS : susceptible-infected-susceptible, flu-like SIRS : temporary immunity, like pertussis SEIR : mumps-like, with virus incubation (E = Exposed) ….…………. Underlying contact-network – ‘who-can-infect-whom’ Prakash and Ramakrishnan 2016

32 Prakash and Ramakrishnan 2016
Background Related Work R. M. Anderson and R. M. May. Infectious Diseases of Humans. Oxford University Press, 1991. A. Barrat, M. Barthélemy, and A. Vespignani. Dynamical Processes on Complex Networks. Cambridge University Press, 2010. F. M. Bass. A new product growth for model consumer durables. Management Science, 15(5):215–227, 1969. D. Chakrabarti, Y. Wang, C. Wang, J. Leskovec, and C. Faloutsos. Epidemic thresholds in real networks. ACM TISSEC, 10(4), 2008. D. Easley and J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, 2010. A. Ganesh, L. Massoulie, and D. Towsley. The effect of network topology in spread of epidemics. IEEE INFOCOM, 2005. Y. Hayashi, M. Minoura, and J. Matsukubo. Recoverable prevalence in growing scale-free networks and the effective immunization. arXiv:cond-at/ v2, Aug H. W. Hethcote. The mathematics of infectious diseases. SIAM Review, 42, 2000. H. W. Hethcote and J. A. Yorke. Gonorrhea transmission dynamics and control. Springer Lecture Notes in Biomathematics, 46, 1984. J. O. Kephart and S. R. White. Directed-graph epidemiological models of computer viruses. IEEE Computer Society Symposium on Research in Security and Privacy, 1991. J. O. Kephart and S. R. White. Measuring and modeling computer virus prevalence. IEEE Computer Society Symposium on Research in Security and Privacy, 1993. R. Pastor-Santorras and A. Vespignani. Epidemic spreading in scale-free networks. Physical Review Letters 86, 14, 2001. ……… All are about either: Structured topologies (cliques, block-diagonals, hierarchies, random) Specific virus propagation models Static graphs Point out may anderson is a book, survey Prakash and Ramakrishnan 2016

33 Prakash and Ramakrishnan 2016
Part 1: Theory Q1: What is the epidemic threshold? Background Result and Intuition (Static Graphs) Proof Ideas (Static Graphs) Bonus: Dynamic Graphs Q2: How do viruses compete? Prakash and Ramakrishnan 2016

34 How should the answer look like?
Answer should depend on: Graph Virus Propagation Model (VPM) But how?? Graph – average degree? max. degree? diameter? VPM – which parameters? How to combine – linear? quadratic? exponential? Put example equations (\beta d + \delta/diameter) \beta\sqrt(s) ….. Prakash and Ramakrishnan 2016

35 Static Graphs: Our Main Result
Informally, For, any arbitrary topology (adjacency matrix A) any virus propagation model (VPM) in standard literature the epidemic threshold depends only on the λ, first eigenvalue of A, and some constant , determined by the virus propagation model λ No epidemic if λ * < 1 In Prakash+ ICDM 2011

36 Our thresholds for some models
s = effective strength s < 1 : below threshold Models Effective Strength (s) Threshold (tipping point) SIS, SIR, SIRS, SEIR s = λ . s = 1 SIV, SEIV (H.I.V.)

37 Our result: Intuition for λ
“Official” definition: “Un-official” Intuition  Let A be the adjacency matrix. Then λ is the root with the largest magnitude of the characteristic polynomial of A [det(A – xI)]. Doesn’t give much intuition! λ ~ # paths in the graph u u ≈ . Have a better intuition slide ‘unofficial intuition’ (i, j) = # of paths i  j of length k Prakash and Ramakrishnan 2016

38 Largest Eigenvalue (λ)
better connectivity higher λ λ ≈ 2 λ = N λ = N-1 Square root λ ≈ 2 λ= 31.67 λ= 999 N = 1000 N nodes Prakash and Ramakrishnan 2016

39 Examples: Simulations – SIR (mumps)
Fraction of Infections Footprint Cite Barrett and Marathe (a) Infection profile (b) “Take-off” plot PORTLAND graph 31 million links, 6 million nodes Effective Strength Time ticks

40 Examples: Simulations – SIRS (pertusis)
Fraction of Infections Footprint (a) Infection profile (b) “Take-off” plot PORTLAND graph 31 million links, 6 million nodes Time ticks Effective Strength

41 Prakash and Ramakrishnan 2016
Part 1: Theory Q1: What is the epidemic threshold? Background Result and Intuition (Static Graphs) Proof Ideas (Static Graphs) Bonus: Dynamic Graphs Q2: How do viruses compete? Prakash and Ramakrishnan 2016

42 Prakash and Ramakrishnan 2016
Proof Sketch General VPM structure Model-based λ * < 1 Dimensional arguments… Full proof 30 pages on arxiv Graph-based Topology and stability Prakash and Ramakrishnan 2016

43 Prakash and Ramakrishnan 2016
Models and more models Model Used for SIR Mumps SIS Flu SIRS Pertussis SEIR Chicken-pox …….. SICR Tuberculosis MSIR Measles SIV Sensor Stability H.I.V. ………. Prakash and Ramakrishnan 2016

44 Ingredient 1: Our generalized model
Exogenous Transitions Endogenous Transitions Susceptible Infected Vigilant Endogenous Transitions Susceptible Infected Vigilant (S*I*V*?) Prakash and Ramakrishnan 2016

45 Prakash and Ramakrishnan 2016
Special case: SIR Susceptible Infected (S*I*V*?) Vigilant Prakash and Ramakrishnan 2016

46 Prakash and Ramakrishnan 2016
Special case: H.I.V. “Non-terminal” “Terminal” Multiple Infectious, Vigilant states Prakash and Ramakrishnan 2016

47 Ingredient 2: NLDS + Stability
Details Ingredient 2: NLDS + Stability View as a NLDS discrete time non-linear dynamical system (NLDS) size mN x 1 . size N (number of nodes in the graph) S Probability vector Specifies the state of the system at time t Curly braces…say what the blue yellow are… I V Prakash and Ramakrishnan 2016

48 Ingredient 2: NLDS + Stability
Details Ingredient 2: NLDS + Stability View as a NLDS discrete time non-linear dynamical system (NLDS) size mN x 1 . Non-linear function Explicitly gives the evolution of system Prakash and Ramakrishnan 2016

49 Ingredient 2: NLDS + Stability
View as a NLDS discrete time non-linear dynamical system (NLDS) Threshold  Stability of NLDS Prakash and Ramakrishnan 2016

50 Prakash and Ramakrishnan 2016
Details Special case: SIR I R S S size 3N x 1 I R Apply g on p_T, you get p_t+1 = probability that node i is not attacked by any of its infectious neighbors NLDS Prakash and Ramakrishnan 2016

51 Prakash and Ramakrishnan 2016
Details Fixed Point 1 . State when no node is infected Q: Is it stable? Prakash and Ramakrishnan 2016

52 Prakash and Ramakrishnan 2016
Stability for SIR Stable under threshold Unstable above threshold Prakash and Ramakrishnan 2016

53 Prakash and Ramakrishnan 2016
See paper for full proof General VPM structure Model-based λ * < 1 Dimensional arguments… Graph-based Topology and stability Prakash and Ramakrishnan 2016

54 Prakash and Ramakrishnan 2016
Part 1: Theory Q1: What is the epidemic threshold? Background Result and Intuition (Static Graphs) Proof Ideas (Static Graphs) Bonus: Dynamic Graphs Q2: How do viruses compete? Prakash and Ramakrishnan 2016

55 Dynamic Graphs: Epidemic?
Alternating behaviors DAY (e.g., work) adjacency matrix 8 Prakash and Ramakrishnan 2016

56 Dynamic Graphs: Epidemic?
Alternating behaviors NIGHT (e.g., home) adjacency matrix 8 Prakash and Ramakrishnan 2016

57 Prakash and Ramakrishnan 2016
Model Description Infected Healthy X N1 N3 N2 Prob. β Prob. δ SIS model recovery rate δ infection rate β Set of T arbitrary graphs weekend day N night N , weekend….. Prakash and Ramakrishnan 2016

58 Our result: Dynamic Graphs Threshold
Informally, NO epidemic if eig (S) = < 1 Single number! Largest eigenvalue of The system matrix S Details S = In Prakash+, ECML-PKDD 2010

59 Prakash and Ramakrishnan 2016
Infection-profile log(fraction infected) Synthetic MIT Reality Mining ABOVE ABOVE AT AT write ‘below threshold’ for eig < 1 AT threshold above threshold BELOW BELOW Time Prakash and Ramakrishnan 2016

60 Prakash and Ramakrishnan 2016
“Take-off” plots Footprint (# “steady state”) Synthetic MIT Reality EPIDEMIC Our threshold Our threshold EPIDEMIC Plot variance instead? Size and NO EPIDEMIC NO EPIDEMIC (log scale) Prakash and Ramakrishnan 2016

61 Extension: Multi-profile networks
Setting: NOT fair play---same network, multiple profiles Example: PS4 tweet on Twitter network PS4 fans are positive People like brother may not care  XBOX fans may be hostile So, different βs and δs for different profiles In Rapti, Sioutas, Tsichlas, Tzimas SIGKDD 2015 Prakash and Ramakrishnan 2016

62 Extension: Multi-profile networks
Setting: NOT fair play---same network, multiple profiles Main results: Situation much more complex High sensitivity in one profile and low in another can still lead to epidemic Prakash and Ramakrishnan 2016

63 Prakash and Ramakrishnan 2016
Part 1: Theory Q1: What is the epidemic threshold? Q2: What happens when viruses compete? Mutually-exclusive viruses Interacting viruses Prakash and Ramakrishnan 2016

64 Prakash and Ramakrishnan 2016
Competing Contagions iPhone v Android Blu-ray v HD-DVD v Attack Retreat Biological common flu/avian flu, pneumococcal inf etc Prakash and Ramakrishnan 2016

65 Prakash and Ramakrishnan 2016
Details A simple model Modified flu-like Mutual Immunity (“pick one of the two”) Susceptible-Infected1-Infected2-Susceptible Android apple sign for the viruses Virus 1 Virus 2 Prakash and Ramakrishnan 2016

66 Question: What happens in the end?
green: virus 1 red: virus 2 Number of Infections Steady State = ? Steady-state number of greens/num of reds = ? ASSUME: Virus 1 is stronger than Virus 2 Prakash and Ramakrishnan 2016

67 Question: What happens in the end?
Steady State green: virus 1 red: virus 2 Number of Infections Strength ?? = 2 Strength Steady-state number of greens/num of reds = ? ASSUME: Virus 1 is stronger than Virus 2 Prakash and Ramakrishnan 2016

68 Answer: Winner-Takes-All
green: virus 1 red: virus 2 Number of Infections ASSUME: Virus 1 is stronger than Virus 2 Prakash and Ramakrishnan 2016

69 Our Result: Winner-Takes-All
Given our model, and any graph, the weaker virus always dies-out completely Details The stronger survives only if it is above threshold Virus 1 is stronger than Virus 2, if: strength(Virus 1) > strength(Virus 2) Strength(Virus) = λ β / δ  same as before! In Prakash+ WWW 2012 Prakash and Ramakrishnan 2016

70 Prakash and Ramakrishnan 2016
Real Examples [Google Search Trends data] logos Reddit v Digg Blu-Ray v HD-DVD Prakash and Ramakrishnan 2016

71 Prakash and Ramakrishnan 2016
Part 1: Theory Q1: What is the epidemic threshold? Q2: What happens when viruses compete? Mutually-exclusive viruses Interacting viruses Prakash and Ramakrishnan 2016

72 Prakash and Ramakrishnan 2016
A simple model: SI1|2S Modified flu-like (SIS) Susceptible-Infected1 or 2-Susceptible Interaction Factor ε Full Mutual Immunity: ε = 0 Partial Mutual Immunity (competition): ε < 0 Cooperation: ε > 0 & Virus 1 Virus 2 Prakash and Ramakrishnan 2016

73 Question: What happens in the end?
ε = 0 Winner takes all ε = 1 Co-exist independently ε = 2 Viruses cooperate Strengths: 6 and 4 What about for 0 < ε <1? Is there a point at which both viruses can co-exist? ASSUME: Virus 1 is stronger than Virus 2 Prakash and Ramakrishnan 2016

74 Answer: Yes! There is a phase transition
ASSUME: Virus 1 is stronger than Virus 2 Prakash and Ramakrishnan 2016

75 Answer: Yes! There is a phase transition
ASSUME: Virus 1 is stronger than Virus 2 Prakash and Ramakrishnan 2016

76 Answer: Yes! There is a phase transition
ASSUME: Virus 1 is stronger than Virus 2 Prakash and Ramakrishnan 2016

77 Our Result: Viruses can Co-exist
Given our model and a fully connected graph, there exists an εcritical such that for ε ≥ εcritical, there is a fixed point where both viruses survive. Details The stronger survives only if it is above threshold Virus 1 is stronger than Virus 2, if: strength(Virus 1) > strength(Virus 2) Strength(Virus) σ = N β / δ In Beutel+ SIGKDD 2012

78 Prakash and Ramakrishnan 2016
Real Examples [Google Search Trends data] Hulu v Blockbuster Prakash and Ramakrishnan 2016

79 Prakash and Ramakrishnan 2016
Real Examples [Google Search Trends data] Chrome v Firefox Prakash and Ramakrishnan 2016

80 We can also extend this to
Composite networks [Wei+ JSAC 2013] phone call + SMS networks power grid + telecomm networks Main result: Behavior depends on viral strengths in the different networks Also depends on strength of interaction between the networks Prakash and Ramakrishnan 2016


Download ppt "B. Aditya Prakash Naren Ramakrishnan"

Similar presentations


Ads by Google