Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understanding and Managing Cascades on Large Graphs B. Aditya Prakash Computer Science Virginia Tech. CS Seminar 11/30/2012.

Similar presentations


Presentation on theme: "Understanding and Managing Cascades on Large Graphs B. Aditya Prakash Computer Science Virginia Tech. CS Seminar 11/30/2012."— Presentation transcript:

1 Understanding and Managing Cascades on Large Graphs B. Aditya Prakash Computer Science Virginia Tech. CS Seminar 11/30/2012

2 Networks are everywhere! Human Disease Network [Barabasi 2007] Gene Regulatory Network [Decourty 2008] Facebook Network [2010] The Internet [2005] Prakash 2012

3 Dynamical Processes over networks are also everywhere! Prakash 2012

4 Why do we care? Social collaboration Information Diffusion Viral Marketing Epidemiology and Public Health Cyber Security Human mobility Games and Virtual Worlds Ecology Localized effects: riots…

5 Why do we care? (1: Epidemiology) Dynamical Processes over networks [AJPH 2007] CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts Diseases over contact networks Prakash 2012

6 Why do we care? (1: Epidemiology) Dynamical Processes over networks Each circle is a hospital ~3000 hospitals More than 30,000 patients transferred [US-MEDICARE NETWORK 2005] Problem: Given k units of disinfectant, whom to immunize? Prakash 2012

7 Why do we care? (1: Epidemiology) CURRENT PRACTICEOUR METHOD ~6x fewer! [US-MEDICARE NETWORK 2005] Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year) Prakash 2012

8 Why do we care? (2: Online Diffusion) > 800m users, ~$1B revenue [WSJ 2010] ~100m active users > 50m users Prakash 2012

9 Why do we care? (2: Online Diffusion) Dynamical Processes over networks Celebrity Buy Versace™! Followers Social Media Marketing Prakash 2012

10 Why do we care? (4: To change the world?) Dynamical Processes over networks Social networks and Collaborative Action Prakash 2012

11 High Impact – Multiple Settings Q. How to squash rumors faster? Q. How do opinions spread? Q. How to market better? epidemic out-breaks products/viruses transmit s/w patches Prakash 2012

12 Research Theme DATA Large real-world networks & processes ANALYSIS Understanding POLICY/ ACTION Managing Prakash 2012

13 Research Theme – Public Health DATA Modeling # patient transfers ANALYSIS Will an epidemic happen? POLICY/ ACTION How to control out-breaks? Prakash 2012

14 Research Theme – Social Media DATA Modeling Tweets spreading POLICY/ ACTION How to market better? ANALYSIS # cascades in future? Prakash 2012

15 In this talk Q1: How to immunize and control out-breaks better? Q2: How to find culprits of epidemics? POLICY/ ACTION Managing Prakash 2012

16 In this lecture DATA Large real-world networks & processes Q3: How do cascades look like? Q4: How does activity evolve over time? Prakash 2012

17 Outline Motivation Part 1: Policy and Action (Algorithms) Part 2: Learning Models (Empirical Studies) Conclusion Prakash 2012

18 Part 1: Algorithms Q1: Whom to immunize? Q2: How to detect culprits? Prakash 2012

19 Hanghang Tong, B. Aditya Prakash, Tina Eliassi- Rad, Michalis Faloutsos, Christos Faloutsos “Gelling, and Melting, Large Graphs by Edge Manipulation” in ACM CIKM 2012 (Best Paper Award) Prakash 2012 [Thanks to Hanghang Tong for some slides!]

20 An Example: Flu/Virus Propagation HealthySick Contact 1: Sneeze to neighbors 2: Some neighbors  Sick 3: Try to recover Q: How to guild propagation by opt. link structure? - Q1: Understand tipping point  existing work - Q2: Minimize the propagation - Q3: Maximize the propagation 20  This paper

21 Vulnerability measure λ [ICDM 2011, PKDD2010] Increasing λ Increasing vulnerability λ is the epidemic threshold “Safe”“Vulnerable”“Deadly” Prakash 2012

22 Minimizing Propagation: Edge Deletion Given: a graph A, virus prop model and budget k; Find: delete k ‘best’ edges from A to minimize λ Bad Good

23 Q: How to find k best edges to delete efficiently? Left eigen-score of source Right eigen-score of target

24 Minimizing Propagation: Evaluations Time Ticks Log (Infected Ratio) (better) Our Method Aa Data set: Oregon Autonomous System Graph (14K node, 61K edges)

25 Discussions: Node Deletion vs. Edge Deletion Observations: Node or Edge Deletion  λ Decrease Nodes on A = Edges on its line graph L(A) Questions? Edge Deletion on A = Node Deletion on L(A)? Which strategy is better (when both feasible)? Original Graph ALine Graph L(A)

26 Discussions: Node Deletion vs. Edge Deletion Q: Is Edge Deletion on A = Node Deletion on L(A)? A: Yes! But, Node Deletion itself is not easy: 26 Theorem: Hardness of Node Deletion. Find Optimal k-node Immunization is NP-Hard Theorem: Line Graph Spectrum. Eigenvalue of A  Eigenvalue of L(A)

27 Discussions: Node Deletion vs. Edge Deletion Q: Which strategy is better (when both feasible)? A: Edge Deletion > Node Deletion 27 (better) Green: Node Deletion (e.g., shutdown a twitter account) Red: Edge Deletion (e.g., un-friend two users)

28 Maximizing Propagation: Edge Addition Given: a graph A, virus prop model and budget k; Find: add k ‘best’ new edges into A. By 1 st order perturbation, we have λ s - λ ≈Gv(S)= c ∑ eєS u(i e )v(j e ) So, we are done  need O(n 2 -m) complexity Left eigen-score of source Right eigen-score of target Low Gv High Gv 28

29 λ s - λ ≈Gv(S)= c ∑ eєS u(i e )v(j e ) Q: How to Find k new edges w/ highest Gv(S) ? A: Modified Fagin’s algorithm k k #3: Search space k+d Search space :existing edgeTime Complexity: O(m+nt+kt 2 ), t = max(k,d) #1: Sorting Sources by u #2: Sorting Targets by v Maximizing Propagation: Edge Addition

30 Maximizing Propagation: Evaluation Time Ticks Log (Infected Ratio) (better) 30 Our Method

31 Fractional Immunization of Networks B. Aditya Prakash, Lada Adamic, Theodore Iwashyna (M.D.), Hanghang Tong, Christos Faloutsos Under Submission Prakash 2012

32 ? ? Given: a graph A, virus prop. model and budget k; Find: k ‘best’ nodes for immunization (removal). k = 2 Previously: Full Static Immunization Prakash 2012

33 Fractional Asymmetric Immunization Fractional Effect [ f(x) = ] Asymmetric Effect # antidotes = 3 Prakash 2012

34 Now: Fractional Asymmetric Immunization Fractional Effect [ f(x) = ] Asymmetric Effect # antidotes = 3 Prakash 2012

35 Fractional Asymmetric Immunization Fractional Effect [ f(x) = ] Asymmetric Effect # antidotes = 3 Prakash 2012

36 Fractional Asymmetric Immunization Hospital Another Hospital Drug-resistant Bacteria (like XDR-TB) Prakash 2012

37 Fractional Asymmetric Immunization Hospital Another Hospital Drug-resistant Bacteria (like XDR-TB) = f Prakash 2012

38 Fractional Asymmetric Immunization Hospital Another Hospital Problem: Given k units of disinfectant, how to distribute them to maximize hospitals saved? Prakash 2012

39 Our Algorithm “SMART-ALLOC” CURRENT PRACTICESMART-ALLOC [US-MEDICARE NETWORK 2005] Each circle is a hospital, ~3000 hospitals More than 30,000 patients transferred ~6x fewer! Prakash 2012

40 Running Time ≈ SimulationsSMART-ALLOC > 1 week 14 secs > 30,000x speed-up! Wall-Clock Time Lower is better Prakash 2012

41 Experiments K = 200K = 2000 PENN-NETWORK SECOND-LIFE ~5 x ~2.5 x Lower is better Prakash 2012

42 Part 1: Algorithms Q2: Whom to immunize? Q3: How to detect culprits? Prakash 2012

43 B. Aditya Prakash, Jilles Vreeken, Christos Faloutsos ‘Detecting Culprits in Epidemics: Who and How many?’ in ICDM 2012, Brussels 43Prakash and Faloutsos 2012

44 Culprits: Problem definition 44 2-d grid ‘+’ -> infected Who started it? Prakash and Faloutsos 2012

45 Culprits: Problem definition 45 2-d grid ‘+’ -> infected Who started it? Prakash and Faloutsos 2012 Prior work: [Lappas et al. 2010, Shah et al. 2011]

46 Culprits: Exoneration 46Prakash and Faloutsos 2012

47 Culprits: Exoneration 47Prakash and Faloutsos 2012

48 Who are the culprits Two-part solution – use MDL for number of seeds – for a given number: exoneration = centrality + penalty Running time = – linear! (in edges and nodes) 48Prakash and Faloutsos 2012

49 Modeling using MDL Minimum Description Length Principle == Induction by compression Related to Bayesian approaches MDL = Model + Data Model – Scoring the seed-set Number of possible |S|- sized sets En-coding integer |S|

50 Modeling using MDL Data: Propagation Ripples Original Graph Infected Snapshot Ripple R2 Ripple R1

51 Modeling using MDL Ripple cost Total MDL cost How the ‘frontier’ advances How long is the ripple Ripple R

52 How to optimize the score? Two-step process – Given k, quickly identify high-quality set – Given these nodes, optimize the ripple R

53 Optimizing the score High-quality k-seed-set – Exoneration Best single seed: – Smallest eigenvector of Laplacian sub-matrix Exonerate neighbors Repeat

54 Optimizing the score Optimizing R – Just get the MLE ripple! Finally use MDL score to tell us the best set NetSleuth: Linear running time in nodes and edges

55 Experiments Evaluation functions: – MDL based – Overlap based (JD == Jaccard distance) Closer to 1 the better

56 Experiments

57

58 Part 2: Empirical Studies Q4: How do cascades look like? Q5: How does activity evolve over time? Prakash 2012

59 Cascading Behavior in Large Blog Graphs How does information propagate over the blogosphere? Blogs Posts Links Information cascade J. Leskovec, M.McGlohon, C. Faloutsos, N. Glance, M. Hurst. Cascading Behavior in Large Blog Graphs. SDM 2007.

60 Cascades on the Blogosphere Cascade is graph induced by a time ordered propagation of information (edges) Cascades B1B1 B2B2 B4B4 B3B3 a b c d e B1B1 B2B2 B4B4 B3B3 1 1 2 1 3 1 d e b c e a Blogosphere blogs + posts Blog network links among blogs Post network links among posts Prakash 2012

61 Blog data  45,000 blogs participating in cascades  All their posts for 3 months (Aug-Sept ‘05)  2.4 million posts  ~5 million links (245,404 inside the dataset) Time [1 day] Number of posts

62 Popularity over time Post popularity drops-off – exponentially? lag: days after post # in links 1 2 3 @t @t + lag Prakash 2012

63 Popularity over time Post popularity drops-off – exponentially? POWER LAW! Exponent? # in links (log) days after post (log) Prakash 2012

64 Popularity over time Post popularity drops-off – exponentially? POWER LAW! Exponent? -1.6 close to -1.5: Barabasi’s stack model and like the zero-crossings of a random walk # in links (log) -1.6 days after post (log) Prakash 2012

65 -1.5 slope Prakash 2012 J. G. Oliveira & A.-L. Barabási Human Dynamics: The Correspondence Patterns of Darwin and Einstein. Nature 437, 1251 (2005). [PDF]PDF

66 Part 2: Empirical Studies Q4: How do cascades look like? Q5: How does activity evolve over time? Prakash 2012

67 Meme (# of mentions in blogs) – short phrases Sourced from U.S. politics in 2008 “you can put lipstick on a pig” “yes we can” Rise and fall patterns in social media Prakash 2012

68 Rise and fall patterns in social media Can we find a unifying model, which includes these patterns? four classes on YouTube [Crane et al. ’08] six classes on Meme [Yang et al. ’11] Prakash 2012

69 Rise and fall patterns in social media Answer: YES! We can represent all patterns by single model In Matsubara+ SIGKDD 2012 Prakash 2012

70 Main idea - SpikeM -1. Un-informed bloggers (uninformed about rumor) -2. External shock at time n b (e.g, breaking news) -3. Infection (word-of-mouth) Infectiveness of a blog-post at age n: -Strength of infection (quality of news) -Decay function (how infective a blog posting is) Time n=0Time n=n b Time n=n b +1 β Power Law Prakash 2012

71 -1.5 slope Prakash 2012 J. G. Oliveira & A.-L. Barabási Human Dynamics: The Correspondence Patterns of Darwin and Einstein. Nature 437, 1251 (2005). [PDF]PDF

72 SpikeM - with periodicity Full equation of SpikeM Periodicity 12pm Peak activity 3am Low activity Time n Bloggers change their activity over time (e.g., daily, weekly, yearly) Bloggers change their activity over time (e.g., daily, weekly, yearly) activity Prakash 2012

73 Tail-part forecasts SpikeM can capture tail part Prakash 2012

74 “What-if” forecasting e.g., given (1) first spike, (2) release date of two sequel movies (3) access volume before the release date ? ? ? ? (1) First spike(2) Release date(3) Two weeks before release Prakash 2012

75 “What-if” forecasting – SpikeM can forecast not only tail-part, but also rise-part ! SpikeM can forecast upcoming spikes (1) First spike(2) Release date(3) Two weeks before release Prakash 2012

76 Outline Motivation Part 1: Understanding Epidemics (Theory) Part 2: Policy and Action (Algorithms) Part 3: Learning Models (Empirical Studies) Conclusion Prakash 2012

77 Conclusions Fast Immunization – Min/Max. drop in eigenvalue, NetGel, NetMelt Finding Culprits Automatically – MDL+Exoneration, Linear Time Algo Bursts: SpikeM model – Exponential growth, Power-law decay Prakash 2012

78 ML & Stats. Comp. Systems Theory & Algo. Biology Econ. Social Science Engg. Propagation on Networks Prakash 2012

79 References 1. Winner-takes-all: Competing Viruses or Ideas on fair-play networks (B. Aditya Prakash, Alex Beutel, Roni Rosenfeld, Christos Faloutsos) – In WWW 2012, Lyon 2. Threshold Conditions for Arbitrary Cascade Models on Arbitrary Networks (B. Aditya Prakash, Deepayan Chakrabarti, Michalis Faloutsos, Nicholas Valler, Christos Faloutsos) - In IEEE ICDM 2011, Vancouver (Invited to KAIS Journal Best Papers of ICDM.) 3. Times Series Clustering: Complex is Simpler! (Lei Li, B. Aditya Prakash) - In ICML 2011, Bellevue 4. Epidemic Spreading on Mobile Ad Hoc Networks: Determining the Tipping Point (Nicholas Valler, B. Aditya Prakash, Hanghang Tong, Michalis Faloutsos and Christos Faloutsos) – In IEEE NETWORKING 2011, Valencia, Spain 5. Formalizing the BGP stability problem: patterns and a chaotic model (B. Aditya Prakash, Michalis Faloutsos and Christos Faloutsos) – In IEEE INFOCOM NetSciCom Workshop, 2011. 6. On the Vulnerability of Large Graphs (Hanghang Tong, B. Aditya Prakash, Tina Eliassi-Rad and Christos Faloutsos) – In IEEE ICDM 2010, Sydney, Australia 7. Virus Propagation on Time-Varying Networks: Theory and Immunization Algorithms (B. Aditya Prakash, Hanghang Tong, Nicholas Valler, Michalis Faloutsos and Christos Faloutsos) – In ECML-PKDD 2010, Barcelona, Spain 8. MetricForensics: A Multi-Level Approach for Mining Volatile Graphs (Keith Henderson, Tina Eliassi-Rad, Christos Faloutsos, Leman Akoglu, Lei Li, Koji Maruhashi, B. Aditya Prakash and Hanghang Tong) - In SIGKDD 2010, Washington D.C. 9. Parsimonious Linear Fingerprinting for Time Series (Lei Li, B. Aditya Prakash and Christos Faloutsos) - In VLDB 2010, Singapore 10. EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs (B. Aditya Prakash, Ashwin Sridharan, Mukund Seshadri, Sridhar Machiraju and Christos Faloutsos) – In PAKDD 2010, Hyderabad, India 11. BGP-lens: Patterns and Anomalies in Internet-Routing Updates (B. Aditya Prakash, Nicholas Valler, David Andersen, Michalis Faloutsos and Christos Faloutsos) – In ACM SIGKDD 2009, Paris, France. 12. Surprising Patterns and Scalable Community Detection in Large Graphs (B. Aditya Prakash, Ashwin Sridharan, Mukund Seshadri, Sridhar Machiraju and Christos Faloutsos) – In IEEE ICDM Large Data Workshop 2009, Miami 13. FRAPP: A Framework for high-Accuracy Privacy-Preserving Mining (Shipra Agarwal, Jayant R. Haritsa and B. Aditya Prakash) – In Intl. Journal on Data Mining and Knowledge Discovery (DKMD), Springer, vol. 18, no. 1, February 2009, Ed: Johannes Gehrke. 14. Complex Group-By Queries For XML (C. Gokhale, N. Gupta, P. Kumar, L. V. S. Lakshmanan, R. Ng and B. Aditya Prakash) – In IEEE ICDE 2007, Istanbul, Turkey. Prakash 2012

80 Understanding and Managing Cascades on Large Networks B. Aditya Prakash http://www.cs.vt.edu/~badityap Prakash 2012 Sounds Interesting? I am looking for Ph.D. students---drop me an email with your CV!


Download ppt "Understanding and Managing Cascades on Large Graphs B. Aditya Prakash Computer Science Virginia Tech. CS Seminar 11/30/2012."

Similar presentations


Ads by Google