Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information diffusion

Similar presentations


Presentation on theme: "Information diffusion"— Presentation transcript:

1 Information diffusion
Lecture 27: Information diffusion CS 765 Complex Networks Slides are modified from Lada Adamic, David Kempe, Bill Hackbor

2 factors influencing information diffusion
outline factors influencing information diffusion network structure: which nodes are connected? strength of ties: how strong are the connections? studies in information diffusion: Granovetter: the strength of weak ties J-P Onnela et al: strength of intermediate ties Kossinets et al: strength of backbone ties Davis: board interlocks and adoption of practices network position and access to information Burt: Structural holes and good ideas Aral and van Alstyne: networks and information advantage networks and innovation Lazer and Friedman: innovation

3 factors influencing diffusion
network structure (unweighted) density degree distribution clustering connected components community structure strength of ties (weighted) frequency of communication strength of influence spreading agent attractiveness and specificity of information

4 Less likely to be a bridge (or a local bridge)
Strong tie defined A strong tie frequent contact affinity many mutual contacts Less likely to be a bridge (or a local bridge) “forbidden triad”: strong ties are likely to “close” Source: Granovetter, M. (1973). "The Strength of Weak Ties",

5 edge embeddeness embeddeness: number of common neighbors the two endpoints have neighborhood overlap:

6 school kids and 1st through 8th choices of friends
snowball sampling: will you reach more different kids by asking each kid to name their 2 best friends, or their 7th & 8th closest friend? Source: M. van Alstyne, S. Aral. Networks, Information & Social Capital

7 is it good to be embedded?
What are the advantages of occupying an embedded position in the network? What are the disadvantages of being embedded? Advantages of being a broker (spanning structural holes)? Disadvantages of being a broker?

8 factors influencing information diffusion
outline factors influencing information diffusion network structure: which nodes are connected? strength of ties: how strong are the connections? studies in information diffusion: Granovetter: the strength of weak ties J-P Onnela et al: strength of intermediate ties Kossinets et al: strength of backbone ties Davis: board interlocks and adoption of practices network position and access to information Burt: Structural holes and good ideas Aral and van Alstyne: networks and information advantage networks and innovation Lazer and Friedman: innovation

9 how does strength of a tie influence diffusion?
M. S. Granovetter: The Strength of Weak Ties, AJS, 1973: finding a job through a contact that one saw frequently (2+ times/week) 16.7% occasionally (more than once a year but < 2x week) 55.6% rarely 27.8% but… length of path is short contact directly works for/is the employer or is connected directly to employer

10 strength of tie: frequency of communication
Kossinets, Watts, Kleinberg, KDD 2008: which paths yield the most up to date info? how many of the edges form the “backbone”? source: Kossinets et al. “The structure of information pathways in a social communication network”

11 the strength of intermediate ties
strong ties frequent communication, but ties are redundant due to high clustering weak ties reach far across network, but communication is infrequent… “Structure and tie strengths in mobile communication networks” use nation-wide cellphone call records and simulate diffusion using actual call timing The dynamics of spreading on the weighted mobile call graph, assuming that the probability for a node vi to pass on the information to its neighbor vj in one time step is given by Pij = xwij, with x = 2.59 × 10−4. (A) The fraction of infected nodes as a function of time t. The blue curve (circles) corresponds to spreading on the network with the real tie strengths, whereas the black curve (asterisks) represents the control simulation, in which all tie strengths are considered equal. (B) Number of infected nodes as a function of time for a single realization of the spreading process. Each steep part of the curve corresponds to invading a small community. The flatter part indicates that the spreading becomes trapped within the community. (C and D) Distribution of strengths of the links responsible for the first infection for a node in the real network (C) and control simulation (D). (E and F) Spreading in a small neighborhood in the simulation using the real weights (E) or the control case, in which all weights are taken to be equal (F). The infection in all cases was released from the node marked in red, and the empirically observed tie strength is shown as the thickness of the arrows (right-hand scale). The simulation was repeated 1,000 times; the size of the arrowheads is proportional to the number of times that information was passed in the given direction, and the color indicates the total number of transmissions on that link (the numbers in the color scale refer to percentages of 1,000). The contours are guides to the eye, illustrating the difference in the information direction flow in the two simulations.

12 Localized strong ties slow infection spread.
source: Onnela J. et.al. Structure and tie strengths in mobile communication networks

13 in lab: complex contagion (Centola & Macy, AJS, 2007)
how can information diffusion be different from simple contagion (e.g. a virus)? simple contagion: infected individual infects neighbors with information at some rate threshold contagion: individuals must hear information (or observe behavior) from a number or fraction of friends before adopting in lab: complex contagion (Centola & Macy, AJS, 2007) how do you pick individuals to “infect” such that your opinion prevails

14 Each node is in one of two states
Framework The network consists of nodes (individual) and edges (links between nodes) Each node is in one of two states Susceptible (in other words, healthy) Infected Susceptible-Infected-Susceptible (SIS) model Cured nodes immediately become susceptible Infected by neighbor Susceptible Infected Cured internally

15 Framework (Continued)
Homogeneous birth rate β on all edges between infected and susceptible nodes Homogeneous death rate δ for infected nodes Healthy Prob. δ N2 Prob. β N1 X Infected N3

16 SIR and SIS Models An SIS model consists of two group
Susceptible: Those who may contract the disease Infected: Those infected An SIR model consists of three group Recovered: Those with natural immunity or those that have died.

17 If R0 > 1 then ∆I > 0 and an epidemic occurs
Important Parameters α is the transmission coefficient, which determines the rate at which the disease travels from one population to another. γ is the recovery rate: (I persons)/(days required to recover) R0 is the basic reproduction number. (Number of new cases arising from one infective) x (Average duration of infection) If R0 > 1 then ∆I > 0 and an epidemic occurs

18 SIR and SIS Models SIS Model: SIR Model:

19 Diffusion in networks: ER graphs
review: diffusion in ER graphs

20 When the density of the network increases, diffusion in the network is
Quiz Q: When the density of the network increases, diffusion in the network is faster slower unaffected

21 ER graphs: connectivity and density
nodes infected after 10 steps, infection rate = 0.15 average degree = 2.5 average degree = 10

22 Quiz Q: When nodes preferentially attach to high degree nodes, the diffusion over the network is faster slower unaffected

23 Diffusion in “grown networks”
nodes infected after 4 steps, infection rate = 1 preferential attachment non-preferential growth

24 Diffusion in small worlds
What is the role of the long-range links in diffusion over small world topologies?

25 Quiz Q: Relative to the simple contagion process the threshold contagion process: is better able to use shortcuts advances more rapidly through the network infects a greater number of nodes

26 diffusion of innovation
surveys: farmers adopting new varieties of hybrid corn by observing what their neighbors were planting (Ryan and Gross, 1943) doctors prescribing new medication (Coleman et al. 1957) spread of obesity & happiness in social networks (Christakis and Fowler, 2008) online behavioral data: Spread of Flickr photos & Digg stories (Lerman, 2007) joining LiveJournal groups & CS conferences (Backstrom et al. 2006) + others e.g. Anagnostopoulos et al. 2008

27 Open question: how do we tell influence from correlation?
Did Alice purchase a camera because her friend Bob influenced her to (social influence in action)? Or did Alice and Bob become friends due to homophily (photographers stick together) and they would have both bought the camera anyway (correlation between interests and friendships)? approaches: time resolved data: if adoption time is shuffled, does it yield the same patterns? if edges are directed: does reversing the edge direction yield less predictive power?

28 Example: adopting new practices
poison pills diffused through interlocks geography had little to do with it more likely to be influenced by tie to firm doing something similar & having similar centrality golden parachutes did not diffuse through interlocks geography was a significant factor more likely to follow “central” firms why did one diffuse through the “network” while the other did not? Source: Corporate Elite Networks and Governance Changes in the 1980s.

29 Social Network and Spread of Influence
Social network plays a fundamental role as a medium for the spread of INFLUENCE among its members Opinions, ideas, information, innovation… Direct Marketing takes the “word-of-mouth” effects to significantly increase profits Gmail, Tupperware popularization, Microsoft Origami …

30 Application besides product marketing
Problem Setting Given a limited budget B for initial advertising e.g. give away free samples of product estimates for influence between individuals Goal trigger a large cascade of influence e.g. further adoptions of a product Question Which set of individuals should B target at? Application besides product marketing spread an innovation detect stories in blogs if we can try to convince a subset of individuals to adopt a new product or innovation But how should we choose the few key individuals to use for seeding this process? Which blogs should one read to be most up to date?

31 First mathematical models Large body of subsequent work:
Models of Influence First mathematical models [Schelling '70/'78, Granovetter '78] Large body of subsequent work: [Rogers '95, Valente '95, Wasserman/Faust '94] Two basic classes of diffusion models: threshold and cascade General operational view: A social network is represented as a directed graph, with each person (customer) as a node Nodes start either active or inactive An active node may trigger activation of neighboring nodes Monotonicity assumption: active nodes never deactivate we focus on more operational models from mathematical sociology [15, 28] and interacting particle systems [11, 17] that explicitly represent the step-by-step dynamics of adoption. Assumption: node can switch to active from inactive, but does not switch in the other direction

32 Threshold dynamics The network: aij is the adjacency matrix (N ×N)
un-weighted undirected The nodes: are labelled i , i from 1 to N; have a state ; and a threshold ri from some distribution.

33 Threshold dynamics The fraction of nodes in state vi=1 is r(t):
Node i has state and threshold Neighbourhood average: Updating: The fraction of nodes in state vi=1 is r(t):

34 Linear Threshold Model
A node v has random threshold θv ~ U[0,1] A node v is influenced by each neighbor w according to a weight bvw such that A node v becomes active when at least (weighted) θv fraction of its neighbors are active Given a random choice of thresholds, and an initial set of active nodes A0 (with all other nodes inactive), the diffusion process unfolds deterministically in discrete steps: in step t, all nodes that were active in step t-1 remain active, and we activate any node v for which the total weight of its active neighbors is at least Theta(v)

35 Stop! Example w v Inactive Node 0.6 Active Node 0.2 Threshold 0.2 0.3
Active neighbors X 0.1 0.4 U 0.3 0.5 Stop! 0.2 0.5 w v

36 Independent Cascade Model
When node v becomes active, it has a single chance of activating each currently inactive neighbor w. The activation attempt succeeds with probability pvw . We again start with an initial set of active nodes A0, and the process unfolds in discrete steps according to the following randomized rule. When node v first becomes active in step t, it is given a single chance to activate each currently inactive neighbor w; it succeeds with a probability pv;w —a parameter of the system — independently of the history thus far. (If w has multiple newly activated neighbors, their attempts are sequenced in an arbitrary order.) If v succeeds, then w will become active in step t+1; but whether or not v succeeds, it cannot make any further attempts to activate w in subsequent rounds. Again, the process runs until no more activations are possible.

37 Example 0.6 Inactive Node 0.2 0.2 0.3 Active Node Newly active node X U 0.1 0.4 Successful attempt 0.5 0.3 0.2 Unsuccessful attempt 0.5 w v Stop!

38 factors influencing information diffusion
outline factors influencing information diffusion network structure: which nodes are connected? strength of ties: how strong are the connections? studies in information diffusion: Granovetter: the strength of weak ties J-P Onnela et al: strength of intermediate ties Kossinets et al: strength of backbone ties Davis: board interlocks and adoption of practices network position and access to information Burt: Structural holes and good ideas Aral and van Alstyne: networks and information advantage networks and innovation Lazer and Friedman: innovation

39 Burt: structural holes and good ideas
Managers asked to come up with an idea to improve the supply chain Then asked: whom did you discuss the idea with? whom do you discuss supply-chain issues with in general do those contacts discuss ideas with one another? 673 managers (455 (68%) completed the survey) ~ 4000 relationships (edges)

40

41

42 people whose networks bridge structural holes have
results people whose networks bridge structural holes have higher compensation positive performance evaluations more promotions more good ideas these brokers are more likely to express ideas less likely to have their ideas dismissed by judges more likely to have their ideas evaluated as valuable

43 networks & information advantage
Betweenness Constrained vs. Unconstrained Source: M. van Alstyne, S. Aral. Networks, Information & Social Capital slides: Marshall van Alstyne

44 Aral & Alstyne: Study of a head hunter firm
Three firms initially Unusually measurable inputs and outputs 1300 projects over 5 yrs and 125,000 messages over 10 months (avg 20% of time!) Metrics (i) Revenues per person and per project, (ii) number of completed projects, (iii) duration of projects, (iv) number of simultaneous projects, (v) compensation per person Main firm 71 people in executive search (+2 firms partial data) 27 Partners, 29 Consultants, 13 Research, 2 IT staff Four Data Sets per firm 52 Question Survey (86% response rate) Accounting 15 Semi-structured interviews

45 Email structure matters
New Contract Revenue Coefficients a Contract Execution Revenue Coefficients a Unstandardized Coefficients Unstandardized Coefficients B Std. Error Adj. R2 Sig. F  B Std. Error Adj. R2 Sig. F  (Base Model) 0.40 0.19 Best structural pred. *** 4454.0 0.52 .006 1544.0** 639.0 0.30 .021 Ave. Size -10.7** 4.9 0.56 .042 -9.3* 4.7 0.34 .095 Colleagues’ Ave. 0.56 .248 ** 0.42 .026 Response Time a. Dependent Variable: Bookings02 a. Dependent Variable: Billings02 b. b. Base Model: YRS_EXP, PARTDUM, %_CEO_SRCH, SECTOR(dummies), %_SOLO. N=39. *** p<.01, ** p<.05, * p<.1 Slight negative correlation for more structural holes and execution (billings) controlling for bookings => maintaining diverse contacts requires a lot of work. Sending shorter helps get contracts and finish them. Faster response from colleagues helps finish them.

46 getting information sooner correlates with $$ brought in
diverse networks drive performance by providing access to novel information network structure (having high degree) correlates with receiving novel information sooner (as deduced from hashed versions of their ) getting information sooner correlates with $$ brought in controlling for # of years worked job level ….

47 Network Structure Matters
New Contract Revenue Coefficients a Contract Execution Revenue Coefficients a Unstandardized Coefficients Unstandardized Coefficients B Std. Error Adj. R2 Sig. F  B Std. Error Adj. R2 Sig. F  (Base Model) 0.40 0.19 Size Struct. Holes 13770*** 4647 0.52 .006 7890* 4656 0.24 .100 Betweenness 1297* 773 0.47 .040 1696** 697 0.30 .021 a. Dependent Variable: Bookings02 a. Dependent Variable: Billings02 b. b. Base Model: YRS_EXP, PARTDUM, %_CEO_SRCH, SECTOR(dummies), %_SOLO. N=39. *** p<.01, ** p<.05, * p<.1 Slight negative correlation for more structural holes and execution (billings) controlling for bookings => maintaining diverse contacts requires a lot of work. Bridging diverse communities is significant. Being in the thick of information flows is significant.

48 factors influencing information diffusion
outline factors influencing information diffusion network structure: which nodes are connected? strength of ties: how strong are the connections? studies in information diffusion: Granovetter: the strength of weak ties J-P Onnela et al: strength of intermediate ties Kossinets et al: strength of backbone ties Davis: board interlocks and adoption of practices network position and access to information Burt: Structural holes and good ideas Aral and van Alstyne: networks and information advantage networks and innovation Lazer and Friedman: innovation

49 networked coordination game
choice between two things, A and B e.g. basketball and soccer if friends choose A, they get payoff a if friends choose B, they get payoff b if one chooses A while the other chooses B, their payoff is 0

50 coordinating with one’s friends
Let A = basketball, B = soccer. Which one should you learn to play? fraction p = 3/5 play basketball fraction p = 2/5 play soccer

51 which choice has higher payoff?
d neighbors p fraction play basketball (A) (1-p) fraction play soccer (B) if choose A, get payoff p * d *a if choose B, get payoff (1-p) * d * b so should choose A if p d a ≥ (1-p) d b or p ≥ b / (a + b)

52 two equilibria everyone adopts A everyone adopts B

53 what happens in between?
What if two nodes switch at random? Will a cascade occur? example: a = 3, b = 2 payoff for nodes interaction using behavior A is 3/2 as large as what they get if they both choose B nodes will switch from B to A if at least q = 2/(3+2) = 2/5 of their neighbors are using A

54 how does a cascade occur
suppose 2 nodes start playing basketball due to external factors e.g. they are bribed with a free pair of shoes by some devious corporation

55 Quiz Q: Which node(s) will switch to playing basketball next?

56 the complete cascade

57 you pick the initial 2 nodes
A larger example (Easley/Kleinberg Ch. 19) does the cascade spread throughout the network?

58 implications for viral marketing
if you could pay a small number of individuals to use your product, which individuals would you pick?

59 What is the role of communities in complex contagion
Quiz question: What is the role of communities in complex contagion enabling ideas to spread in the presence of thresholds creating isolated pockets impervious to outside ideas allowing different opinions to take hold in different parts of the network

60 so far nodes could only choose between A and B
bilingual nodes so far nodes could only choose between A and B what if you can play both A and B, but pay an additional cost c?

61 try it on a line Increase the cost of being bilingual so that no node chooses to do so. Let the cascade run Now lower the cost. What happens?

62 The presence of bilingual nodes
Quiz Q: The presence of bilingual nodes helps the superior solution to spread throughout the network helps inferior options to persist in the network causes everyone in the network to become bilingual

63 knowledge, thresholds, and collective action
nodes need to coordinate across a network, but have limited horizons

64 can individuals coordinate?
each node will act if at least x people (including itself) mobilize nodes will not mobilize

65 mobilization there will be some turnout

66 Quiz Q: will this network mobilize (at least some fraction of the nodes will protest)?

67 innovation in networks
network topology influences who talks to whom who talks to whom has important implications for innovation and learning

68 better to innovate or imitate?
brainstorming: more minds together, but also danger of groupthink working in isolation: more independence slower progress

69 in a network context

70 modeling the problem space
Kauffman’s NK model N dimensional problem space N bits, each can be 0 or 1 K describes the smoothness of the fitness landscape how similar is the fitness of sequences with only 1-2 bits flipped (K = 0, no similarity, K large, smooth fitness)

71 Kauffman’s NK model K large K medium K small fitness distance

72 As a node, you start out with a random bit string At each iteration
Update rules As a node, you start out with a random bit string At each iteration If one of your neighbors has a solution that is more fit than yours, imitate (copy their solution) Otherwise innovate by flipping one of your bits

73 Quiz Q: Relative to the regular lattice, the network with many additional, random connections has on average: slower convergence to a local optimum smaller improvement in the best solution relative to the initial maximum more oscillations between solutions

74 networks and innovation: is more information diffusion always better?
linear network fully connected network Nodes can innovate on their own (slowly) or adopt their neighbor’s solution Best solutions propagate through the network source: Lazer and Friedman, The Parable of the Hare and the Tortoise: Small Worlds, Diversity, and System Performance

75 networks and innovation
fully connected network converges more quickly on a solution, but if there are lots of local maxima in the solution space, it may get stuck without finding optimum. linear network (fewer edges) arrives at better solution eventually because individuals innovate longer

76 Coordination: graph coloring
Application: coloring a map: limited set of colors, no two adjacent countries should have the same color

77 graph coloring on a network
Each node is a human subject. Different experimental conditions: knowledge of neighbors’ color knowledge of entire network Compare: regular ring lattice small-world topology scale-free networks Kearns et al., ‘An Experimental Study of the Coloring Problem on Human Subject Networks’, Science, 313(5788), pp , 2006

78 simulation Kearns et al. “An Experimental Study of the Coloring Problem on Human Subject Networks” network structure affects convergence in coordination games, e.g. graph coloring

79 Quiz Q: As the rewiring probability is increased from 0 to 1 the following happens: the solution time decreases the solution time increases the solution time initially decreases then increases again

80 network topology influences processes occurring on networks
to sum up network topology influences processes occurring on networks what state the nodes converge to how quickly they get there process mechanism matters: simple vs. complex contagion coordination learning diffusion can be simple (person to person) or complex (individuals having thresholds) people in special network positions (the brokers) have an advantage in receiving novel info & coming up with “novel” ideas in some scenarios, information diffusion may hinder innovation


Download ppt "Information diffusion"

Similar presentations


Ads by Google