Rumors, consensus and epidemics on networks

Presentation on theme: "Rumors, consensus and epidemics on networks"— Presentation transcript:

Rumors, consensus and epidemics on networks
J. Ganesh University of Bristol

Rumor spreading Population of size n
One person knows a rumor at time 0 Time is discrete In each time step, each person who knows the rumor chooses another person at random and informs them of it How long before everyone knows the rumor?

Motivation Simple model for diffusion of information, spread of infection etc., over social networks Basis of information dissemination algorithms in large-scale distributed systems Primitive of algorithms for distributed computing, distributed consensus etc.

Results With high probability, all n people learn the rumor in time
log2n + log n + o(log n) : Frieze and Grimmett log2n + log n + O(1) : Pittel Intuition: in early stages, number of informed people doubles in each time step in late stages, number uninformed decreases by factor of 1/e in each time step

A continuous time model
Identify individuals with nodes of a complete graph Associate mutually independent unit rate Poisson processes, one with each node If a node is informed, then, at points of the associated Poisson process, it informs a randomly chosen node How long till all nodes are informed?

Edge-driven model

Analysis Tk : first time that k nodes know the rumor
Number of edges between informed and uninformed nodes : k(n-k) Time to inform one more node is minimum of k(n-k) independent Exp(1) r.v.s.

Analysis (cont.) Time to inform all nodes is
Tn = (TnTn1)+(Tn1Tn2)+...+(T2T1)+T1 So E[Tn]  2 log n Similar calculations show variance < 2/3 Chebyshev’s inequality implies Tn = 2 log n + O(1) in probability

G=(V,E) : directed, strongly connected graph R = (rij), i,jV : contact rate matrix Model: node i contacts node j at the points of a Poisson process of rate rij informs node j at this time if node i is informed Mosk-Aoyama & Shah: Bound the time to inform all nodes, based on properties of G or R

Graph properties The generalized conductance of the non-negative matrix R is defined as If R is the adjacency matrix, this is closely related to the isoperimetric constant

Tk : first time that k nodes are informed S(k) : set of informed nodes at this time Total contact rate of uninformed nodes by informed nodes is iS(k), jS(k) rij Time to inform one more node is stochastically dominated by Exp(k(nk)(R)/n) Implies that mean time to inform all nodes is bounded by 2log(n) / (R)

Examples G=Kn ; rij =1/n for all i,jV
(R)=1, bound is 2 log n, E[T]  2 log n, G is the star on n nodes, rij = 1/n, if i is the hub and j a leaf, 1, if i is a leaf and j the hub (R)1/n, bound is 2n log n, E[T]  n log n G is the cycle on n nodes, rij=1/2 for all (i,j)E (R)=4/n, bound is (n log n)/2, E[T] = n1

fast on the complete graph, expander graphs slow on the cycle, grids, geometric random graphs Can it be speeded up by passing the rumor to random contacts rather than neighbors? Not obvious: sampling random contacts takes time Dimakis, Sarwate, Wainwright: Geographic gossip Benezit, Dimakis, Thiran, Vetterli: Randomized path averaging

Other models: stifling
Stop spreading rumors when they are stale Nodes may be uninformed (U), spreaders (I) or stiflers (S) U+I = I+I; I+I = I+S; I+S = S+S Rumor only reaches a fraction of population, rather than all nodes Daley & Kendall, Maki & Thomson

Other models: push-pull
Discrete time model: Push is effective in early stages In late stages, Pull is much better Say fraction  of nodes is uninformed at time t Push:  e uninformed at time t+1 Pull :  2 uninformed at time t+1 Exploited by Karp et al. to reduce number of messages required, from nlog(n) to nloglog(n)

Consensus: de Groot model
n agents, initial opinions xi(0), i=1,...,n Discrete time Agents update opinions according to xi(t+1) = j pij xj(t), where P is a stochastic matrix Do all agents reach consensus on a value? If so, what is the consensus value, and how long does it take?

Results for de Groot model
Recursion is x(t+1)=Px(t) Reaching consensus means x(t) c1 as t, where c is a constant and 1 is the all-1 vector This is guaranteed for all initial conditions if and only if P is irreducible and aperiodic Consensus value is x(0), where  is the unique invariant distribution of P Time to reach consensus is determined by the spectral gap of P

Consensus: the voter model
n agents, with opinions in {0,1} Agent i contacts agents j at the points of a Poisson(qij) process, and adopts its opinion Once all agents have the same opinion, no further change is possible How long does it take to reach consensus? What is the probability that the consensus value is 1?

The voter model in pictures

Voter model on the complete graph
All agents can contact all other agents. They do so at equal rates: qij = 1/n for all i,j Equivalently, each undirected edge is activated at rate 2/n, and then oriented at random agent at tail of arrow copies agent at head

Motivation Voter model on complete graph is same as Moran model in population genetics also used to model cultural transmission, and competition between products or technologies, especially with network externalities Consensus is important in distributed systems and algorithms and in collective decision making in biology

Final state: complete graph case
Each direction equally likely to be chosen. So, 01 and 10 are equally likely transitions. Hence, number of 1s is a martingale. P(consensus value is 1) = initial fraction of 1s

Final state: general case
Contact rates qij, ij Define qii = ji qij Assume Q is an irreducible rate matrix Then it has unique invariant distribution  Hassin and Peleg:  X(t) is a martingale. Therefore, P(consensus value is 1) =  X(0)

Duality with coalescing random walks

Coalescing random walks
Initially a single particle at each site Particles perform random walks according to rate matrix Q, but if particle moves from node i to node j and j is occupied, it coalesces with the particle there random walks are independent between coalescence events When there is a single particle left, consensus has been reached

Coalescence time: complete graph
Tk : time when k particles remain. Tn=0. At Tk, have k(k1) directed edges between occupied nodes, rate 1/n on each edge Tk1  Tk  Exp(k(k1)/n) Mean time to consensus bounded by n linear in population size for consensus logarithmic for rumor spreading

Coalescence time: general graphs
Suppose Q is the generator of a reversible random walk, with invariant distribution  Aldous and Fill : Mean coalescence time of two independent random walks started at any nodes i, j bounded by n log 4, where

General graphs (continued)
Example: G is a connected, undirected graph and qij = 1{(i,j)E} Then, i = 1/n for all i   n/4 since there is always at least one edge between any A and Ac Mean coalescence time of any two random walks bounded by n2 log(2)/2

Consensus time on general graphs
Even-Dar and Shapira: For Q as above, use Markov’s inequality to bound the probability that two random walks haven’t coalesced then union bound to bound the probability that there is some random walk that hasn’t coalesced with a specific one, say one starting at i implies that, with high probability, consensus reached within O(n3 log n) time

Open problem: Evolving voter model
Graph G, nodes in state 0 or 1 Pick a discordant edge at random and orient it at random with probability 1p, caller copies called node with probability p, it rewires to a random node with same current state Simulations show critical value of p, below which network reaches consensus, and above which it fragments

Epidemics: SIS model Graph G=(V,E) on n nodes, undirected
Each node in one of two states, {S,I} Nodes change state independently, S I at rate  (# of infected neighbours) I S at rate  How long is it until all nodes are in state S?

SIS model in pictures

Motivation Models spread of certain diseases, and certain kinds of malware (SIR model better for others) Propagation of faults Models persistence of data in peer-to-peer / cloud networks Can be used to model diffusion of certain technologies or behaviours

Upper bound: branching random walk
Infected individuals initially placed on graph Each individual gives birth to offspring at rate  at each neighboring node, dies at rate  How long does it take for the population to die out?

Branching random walks
Yi(t) : # of individuals at node i at time t +1 at rate ji Yj(t) 1 at rate Yi(t) A : adjacency matrix of graph G dE[Y(t)]/dt = (A) EY(t) E[Y(t)] = exp((A) t) Y(0)  : spectral radius of A

G., Massoulie, Towsley: Epidemic stochastically bounded by branching random walk therefore, so is epidemic lifetime If , then E[Y(t)]  0 By Markov’s inequality, P(|Y(t)|1)  0 Implies that mean epidemic lifetime is bounded by log(n)/()

Lower bound Generalised isoperimetric constant of G :
S(t) : set of infected nodes at time t If S(t)  m, then rate of infecting new nodes  mS(t) rate of recovery of infected nodes = S(t)

G., Massoulie, Towsley If m   , then epidemic lifetime is exponential in m , because when # of infected nodes is less than m, new nodes are infected faster than infected nodes recover biased random walk, hits m exponentially many times before hitting 0

Remarks Upper and lower bounds Results imply that epidemic lifetime is
match on complete graphs, hypercubes, dense Erdos-Renyi and random regular graphs separated by big gap on cycles, grids etc. Gap is also big on scale-free random graphs, but can handle them by focusing on high-degree stars Results imply that epidemic lifetime is logarithmic in population size for small infection rates, exponential in population size for large infection rates

Epidemics: SIR model G=(V,E) : n nodes, undirected, connected
Each node in one of three states, {S,I,R} Nodes change state independently, S I at rate  (# of infected neighbours) I R at rate  , or after random time with specified distribution How many nodes are ever infected?

SIR model description Single initial infective
p : probability that a node which becomes infected ever tries to infect a given neighbor p =  E(length of infectious period) insensitive to distribution of infectious period i : probability that node i is ever infected

Upper bound on epidemic sizes
Draief, G., Massoulie j  pij i : union bound (pA)  es , where s is the initial infective If p < 1, implies that mean number of nodes ever infected  (n)/(1p) Upper bound can be improved to 1/(1p) if graph is regular Matching lower bounds in some cases – star, Erdos-Renyi random graphs

Lower bound on epidemic sizes
Bandyopadhyay and Sajadi Consider any BFS spanning tree T of G Epidemic on G stochastically dominates epidemic on T Hence, i  pd(i,s), d(,) – graph distance Implies lower bound on mean number of infected nodes: How good is this lower bound?

Results Gn : sequence of graphs indexed by |V|
sn : infection source in Gn Xn : mean number of infected nodes LBn : lower bound based on BFS spanning tree Theorem: If there is an (log n) sequence of neighborhoods of sn in which Gn is a tree, then there is a pc>0 such that, for p<pc, Xn/LBn  1

Results (continued) Theorem: Suppose there is a deterministic or random rooted tree (T,s) such that (Gn,sn)  (T,s) in the sense of local weak convergence. Suppose the maximum node degree in all Gn is bounded uniformly by , and p < 1. Then, Xn  LBn  0

What if node is only influenced if some number, or some fraction, of neighbors have a different opinion? Can be motivated by best response dynamics in network games

Bootstrap percolation
Connected, undirected graph G=(V,E) Initial states of nodes in {0,1} Node changes state from 0 to 1 if at least k of its neighbors are in state 1 Nodes don’t change from 1 to 0 Can we guarantee that all nodes will eventually be in state 1?

Results G=(V,E) is the d-regular random graph
Bernoulli initial condition : each node in state 1, with probability p, independent of others Theorem (Balogh and Pittel) : Suppose 1<k<d-1. There is a p*(0,1) such that p>p*+ : all nodes eventually in state 1 whp p<p* : the fraction of nodes in state 0 tends to a non-zero constant whp

Conclusions Variety of stochastic processes on graphs can be studied using elementary probabilistic tools Analysis can often be greatly simplified by choosing the right model Often, exact analysis is intractable, but can get good (?) bounds Many applications!