Presentation is loading. Please wait.

Presentation is loading. Please wait.

Worms, Viruses, and Cascading Failures in networks D. Towsley U. Massachusetts Collaborators: W. Gong, C. Zou (UMass) A. Ganesh, L. Massoulie (Microsoft)

Similar presentations


Presentation on theme: "Worms, Viruses, and Cascading Failures in networks D. Towsley U. Massachusetts Collaborators: W. Gong, C. Zou (UMass) A. Ganesh, L. Massoulie (Microsoft)"— Presentation transcript:

1 Worms, Viruses, and Cascading Failures in networks D. Towsley U. Massachusetts Collaborators: W. Gong, C. Zou (UMass) A. Ganesh, L. Massoulie (Microsoft)

2 o Internet as enabler of terrific apps

3 o … but also of malicious behavior  worms, viruses o Internet as a complex system  critical DNS, BGP infrastructures

4 Worms and failures o Code Red worm  more than 360,000 infected in less than one day  disrupted parts of BGP infrastructure o SQL Slammer  less than 15 minutes to infect 75,000 hosts  congested parts of Internet  BGP errors in one network → cascade of faults in BGP in another network

5 Goals o what are appropriate models?  deterministic  stochastic o what makes worm/virus/failure virulent? o how does topology affect virulence?

6 Outline o worms, deterministic models o cascading failures, stochastic models o summary

7 Worm spreading behavior o scan for vulnerable hosts  sequential, random, topological  uniform, local preference o virulence sensitive to  scanning strategy  host speed, bandwidth  protocol  …

8 Worm spreading model  address space, size  o N vulnerable hosts  scan rate (per host),   N

9 Simple worm spreading model I(t) - number of infected hosts at time t Epidemic model: with initial condition I(0)

10 Code Red: model o measurements from two Class A networks  scan rate  I(t) o epidemic model matches increasing part of observed Code Red data (Staniford) What about decrease? o human countermeasures o congestion Zou, etal, 2002 time scan rate D. Goldsmith K. Eichman

11 Assumptions o classic epidemic model  ignore countermeasures  ignore congestion o Code Red parameters   = 358/min  N = 360,000  uniform scan,  2 32 o I(0) = 10 o 100s minutes to spread

12 Worm virulence  increase  o increase I(0)  decrease 

13 Worm virulence  increase  o increase I(0)  decrease  o smarter scanning

14 The perfect worm o perfect worm  scan vulnerable nodes exactly once o flash worm (Staniford,…)  uniform scan of vulnerable nodes (  N)

15 Perfect Code Red worm o I(0) = 10   = 358/min o N = 360,000 o all hosts infected within 2 sec. o add 2 sec. infection delay -> six-fold slowdown o random scan almost perfect!

16 o I(0) = 10   = 358/min o N = 360,000 o all hosts infected within 2 sec. o add 2 sec. infection delay -> six-fold slowdown o random scan almost perfect! Perfect Code Red worm

17 Hitlist, routing worms o hitlist worm  increases I(0) o routing worm  decreases   BGP table information:  =.29  2 32 –29% of IP address space

18 Hitlist, routing worms o Code Red style worm   = 358/min o N = 360,000 o hitlist, I(0) = 10,000 o routing worm as effective as hitlist worm o hitlist/routing worm extremely virulent

19 1 Local preference worm o K subnetworks o p – probability scan local subnet o (1-p) – prob. scan outside local subnet 2 K 1-p p …

20 Local preference worm o N k, no. vulnerable hosts in subnet k o I k (t), no. infected hosts in subnet k o fits epidemic model for interacting groups set of coupled ODEs

21 Local preference worm o K = 116 o N k = 360,000/K o I 1 (0) = 10; I k (0) = 0, k>1   = 358/min o provides some of the locality of a routing worm

22 Questions o topological worms o sequential scan o bandwidth constraints

23 o topology? o failure recovery?

24 Topology and fast/slow recovery o model description o general network topologies  conditions for fast-slow recovery o specific network topologies  complete graphs (BGP routers)  hypercubes (peer-to-peer networks)  power-law graphs (Internet AS graph; E- mail address book graph)

25 Susceptible-Infective-Susceptible (SIS) epidemic model Also known as contact process; see [Liggett] o topology: undirected, finite graph G=(V,E), connected ; o X v = 1 if node v down (infected) X v = 0 if node v up (healthy)

26 Model o {X v v  V} Markov process on {0,1} V with jump rates:  X v → 1 with rate  w → v X w  X v → 0 with rate  o unique absorbing state at 0 o all other states communicate, 0 is reachable

27 Time to absorption o system eventually recovers o how long does this take? o T = time to hit 0 (from a given initial condition)  how does E[T] depend on  G?

28 Example o G = line segment or ring with n nodes  Fix   Theorem (Durrett and Liu): There is critical  c > 0 such that,  if  c, then E[T] = O(log n)  if  c, then log E[T] ≈ n a o signature of phase transition in infinite 1-D lattice.

29 Fast recovery, spectral radius  - spectral radius of graph adjacency matrix, A; n=|V|. Then, P(X(t)  0) ≤ c n ½ exp([   -  ]t) Hence, when   < , Survival time T satisfies: E(T) ≤ [log(n)+1]/[  -  ]

30 Coupling proof Consider “Branching Random Walk”, i.e. Markov process {Y v } v  V  Y v → Y v +1 with rate  w ~ v Y w =  (AY) v  Y v → Y v -1 with rate  Y v Can couple processes so that, for all t, X(t) ≤ Y(t).

31 Branching random walk bound By “linearity” of Y, dE[Y(t)]/dt = (  A -  I) Y(t), so E[Y(t)] = exp(  A -  I) Y(0) ; Use P(X(t)  0) ≤  v  V E[Y v (t)]

32 Slow recovery Graph isoperimetric constant: “perimeter” “area” S

33 Generalized isoperimetric constant

34 Slow die-out and isoperimetric constant Suppose for some m ≤ n/2, r := [   m ] /  > 1 Then, with positive probability, epidemics survive for time at least r m /[2  m] Hence, if m = n , survival time T satisfies log (E[T]) =  (n a )

35 Coupling proof Let |X| =  v X v. Then |X| dominates process Z on {0,…,m} with transition rates: z → z+1 at rate   z, z → z-1 at rate  z. Then study absorption time for Z

36 Complete graph Here,  = n-1,  m = n-m By picking m = n a, a < 1, Thresholds: fast recovery if  /  < 1/(n-1) slow recovery if  /  > 1/(n-n a )

37 Hypercube {0,1} d Here, d = log 2 (n) and  = d For m=2 k, k < d,  m = d-k Hence, for k =  d, Thresholds:, fast recovery if  /  < 1/d slow recovery if  /  > 1/[d(1-  )]

38 Erdős-Rényi random graph o edge between each pair of nodes present with probability p n independent of others o dense: d n := np n = Ω(log n)  then ρ ~  ~ d n with high probability

39 Star network o spectral radius: n 1/2  isoperimetric constant:  m = 1 for all m < n/2 o general results not useful Specialized analysis yields:  for arbitrary constant c > 0, if  < c/n 1/2, fast recovery, E[T] = O(log(n))  if  /  > n a-1/2, for a > 0, slow recovery, log(E[T]) =  (n a )

40 Power-law random graph Power-law graph with exponent  : number of degree k vertices  k -  E.g. Internet AS graph with  = 2.1 Expected degree PLRG [Chung et al] : o expected degrees w 1 > ··· > w n : edge (i,j) present w.p. w i w j /  k w k  particular choice: w i = c 1 (i+c 2 ) - 1/(  -1)

41 Power-law random graph (2) Spectral radius of PLRG [Chung et al.,03]: Denote by m max. expected degree (m=w 1 ), and by d average of expected degrees. Then:

42 PLRG,  > 2.5 Epidemics on full graph live longer than on sub-graph. Look at star induced by node 1: slow die-out for  /  > m  -1/2 Compare to spectral radius condition: Fast die-out for  /  < m -1/2 Two thresholds differ by m  ; same gap as for star

43 PLRG, 2 <  < 2.5 Consider top N nodes, for suitable N; Erdős-Rényi core, with isoperimetric constant:  = F(  )  Gap between thresholds  and  : constant factor, F(  )

44 Open problems o gap between upper and lower bounds in  sparse ER graphs  power law random graphs for  < 2.5 o spectral radius bound tight in examples, always true? o conditioned on slow recovery, how many nodes are down at intermediate times? o extensions to other graphs and to SIR epidemics

45 Observations o neither parameter tight o gap for topologies with diverse degrees  spectral radius “seems” to be right o nothing between log n and exp(n  ) ?

46 Hitlist, routing worms o hitlist worm  increase I(0) o routing worm  decrease   BGP table information:  =.29  2 32 –29% of IP address space  /8 aggregation:  =.45  2 32 –116 out of 256 possible 8 bit prefixes 0110…0xxx 8

47 The appearance of phase transitions N=200, k s =1, k l =0.01 Mean time to absorption goes down from 10 47, to about 0 in a matter of few states

48 Accuracy of fluid model o population: 360,000  scan rate  = N(358/min, 1002) normal distr. o scanning space: 2 32 o I(0) =1 o 100 simulations

49 Accuracy of fluid model o population: 360,000  scan rate  = N(358/min, 1002) normal distr. o scanning space: 2 32 o I(0) =10 o 100 simulations

50 Accuracy of fluid model o population: 360,000  scan rate  = N(358/min, 1002) normal distr. o scanning space: 2 32 o I(0) =10 o 100 simulations

51 Local preference worm o  - local scan rate o  ’- global scan rate o initial conditions I k (0)

52 Erdős-Rényi random graph o edge between each pair of nodes present with probability p n independent of others o sparse: p n = c log(n)/n, c > 1.  then ρ ≤ c’ log(n),  ≥ c’’ log(n) with high probability, for some c’’ < c < c’ o dense: d n := np n = Ω(log n)  then ρ ~  ~ d n with high probability


Download ppt "Worms, Viruses, and Cascading Failures in networks D. Towsley U. Massachusetts Collaborators: W. Gong, C. Zou (UMass) A. Ganesh, L. Massoulie (Microsoft)"

Similar presentations


Ads by Google