Presentation on theme: "Information diffusion"— Presentation transcript:
1 Information diffusion Lecture 27:Information diffusionCS 765 Complex NetworksSlides are modified from Lada Adamic, David Kempe, Bill Hackbor
2 factors influencing information diffusion outlinefactors influencing information diffusionnetwork structure: which nodes are connected?strength of ties: how strong are the connections?studies in information diffusion:Granovetter: the strength of weak tiesJ-P Onnela et al: strength of intermediate tiesKossinets et al: strength of backbone tiesDavis: board interlocks and adoption of practicesnetwork position and access to informationBurt: Structural holes and good ideasAral and van Alstyne: networks and information advantagenetworks and innovationLazer and Friedman: innovation
3 factors influencing diffusion network structure (unweighted)densitydegree distributionclusteringconnected componentscommunity structurestrength of ties (weighted)frequency of communicationstrength of influencespreading agentattractiveness and specificity of information
4 Less likely to be a bridge (or a local bridge) Strong tie definedA strong tiefrequent contactaffinitymany mutual contactsLess likely to be a bridge (or a local bridge)“forbidden triad”:strong ties are likely to “close”Source: Granovetter, M. (1973). "The Strength of Weak Ties",
5 edge embeddenessembeddeness: number of common neighbors the two endpoints haveneighborhood overlap:
6 school kids and 1st through 8th choices of friends snowball sampling:will you reach more different kids by asking each kid to name their 2 best friends, or their 7th & 8th closest friend?Source: M. van Alstyne, S. Aral. Networks, Information & Social Capital
7 is it good to be embedded? What are the advantages of occupying an embedded position in the network?What are the disadvantages of being embedded?Advantages of being a broker (spanning structural holes)?Disadvantages of being a broker?
8 factors influencing information diffusion outlinefactors influencing information diffusionnetwork structure: which nodes are connected?strength of ties: how strong are the connections?studies in information diffusion:Granovetter: the strength of weak tiesJ-P Onnela et al: strength of intermediate tiesKossinets et al: strength of backbone tiesDavis: board interlocks and adoption of practicesnetwork position and access to informationBurt: Structural holes and good ideasAral and van Alstyne: networks and information advantagenetworks and innovationLazer and Friedman: innovation
9 how does strength of a tie influence diffusion? M. S. Granovetter: The Strength of Weak Ties, AJS, 1973:finding a job through a contact that one sawfrequently (2+ times/week) 16.7%occasionally (more than once a year but < 2x week) 55.6%rarely 27.8%but… length of path is shortcontact directly works for/is the employeror is connected directly to employer
10 strength of tie: frequency of communication Kossinets, Watts, Kleinberg, KDD 2008:which paths yield the most up to date info?how many of the edges form the “backbone”?source: Kossinets et al. “The structure of information pathways in a social communication network”
11 the strength of intermediate ties strong tiesfrequent communication, but ties are redundant due to high clusteringweak tiesreach far across network, but communication is infrequent…“Structure and tie strengths in mobile communication networks”use nation-wide cellphone call records and simulate diffusion using actual call timingThe dynamics of spreading on the weighted mobile call graph, assuming that the probability for a node vi to pass on the information to its neighbor vj in one time step is given by Pij = xwij, with x = 2.59 × 10−4. (A) The fraction of infected nodes as a function of time t. The blue curve (circles) corresponds to spreading on the network with the real tie strengths, whereas the black curve (asterisks) represents the control simulation, in which all tie strengths are considered equal. (B) Number of infected nodes as a function of time for a single realization of the spreading process. Each steep part of the curve corresponds to invading a small community. The flatter part indicates that the spreading becomes trapped within the community. (C and D) Distribution of strengths of the links responsible for the first infection for a node in the real network (C) and control simulation (D). (E and F) Spreading in a small neighborhood in the simulation using the real weights (E) or the control case, in which all weights are taken to be equal (F). The infection in all cases was released from the node marked in red, and the empirically observed tie strength is shown as the thickness of the arrows (right-hand scale). The simulation was repeated 1,000 times; the size of the arrowheads is proportional to the number of times that information was passed in the given direction, and the color indicates the total number of transmissions on that link (the numbers in the color scale refer to percentages of 1,000). The contours are guides to the eye, illustrating the difference in the information direction flow in the two simulations.
12 Localized strong ties slow infection spread. source: Onnela J. et.al. Structure and tie strengths in mobile communication networks
13 in lab: complex contagion (Centola & Macy, AJS, 2007) how can information diffusion be different from simple contagion (e.g. a virus)?simple contagion:infected individual infects neighbors with information at some ratethreshold contagion:individuals must hear information (or observe behavior) from a number or fraction of friends before adoptingin lab: complex contagion (Centola & Macy, AJS, 2007)how do you pick individuals to “infect” such that your opinion prevails
14 Each node is in one of two states FrameworkThe network consists of nodes (individual) and edges (links between nodes)Each node is in one of two statesSusceptible (in other words, healthy)InfectedSusceptible-Infected-Susceptible (SIS) modelCured nodes immediately become susceptibleInfected by neighborSusceptibleInfectedCured internally
15 Framework (Continued) Homogeneous birth rate β on all edges between infected and susceptible nodesHomogeneous death rate δ for infected nodesHealthyProb. δN2Prob. βN1XInfectedN3
16 SIR and SIS Models An SIS model consists of two group Susceptible: Those who may contract the diseaseInfected: Those infectedAn SIR model consists of three groupRecovered: Those with natural immunity or those that have died.
17 If R0 > 1 then ∆I > 0 and an epidemic occurs Important Parametersα is the transmission coefficient, which determines the rate at which the disease travels from one population to another.γ is the recovery rate: (I persons)/(days required to recover)R0 is the basic reproduction number.(Number of new cases arising from one infective) x (Average duration of infection)If R0 > 1 then ∆I > 0 and an epidemic occurs
19 Diffusion in networks: ER graphs review: diffusion in ER graphs
20 When the density of the network increases, diffusion in the network is Quiz Q:When the density of the network increases, diffusion in the network isfasterslowerunaffected
21 ER graphs: connectivity and density nodes infected after 10 steps, infection rate = 0.15average degree = 2.5average degree = 10
22 Quiz Q:When nodes preferentially attach to high degree nodes, the diffusion over the network isfasterslowerunaffected
23 Diffusion in “grown networks” nodes infected after 4 steps, infection rate = 1preferential attachmentnon-preferential growth
24 Diffusion in small worlds What is the role of the long-range links in diffusion over small world topologies?
25 Quiz Q:Relative to the simple contagion process the threshold contagion process:is better able to use shortcutsadvances more rapidly through the networkinfects a greater number of nodes
26 diffusion of innovation surveys:farmers adopting new varieties of hybrid corn by observing what their neighbors were planting (Ryan and Gross, 1943)doctors prescribing new medication (Coleman et al. 1957)spread of obesity & happiness in social networks (Christakis and Fowler, 2008)online behavioral data:Spread of Flickr photos & Digg stories (Lerman, 2007)joining LiveJournal groups & CS conferences (Backstrom et al. 2006)+ others e.g. Anagnostopoulos et al. 2008
27 Open question: how do we tell influence from correlation? Did Alice purchase a camera because her friend Bob influenced her to (social influence in action)?Or did Alice and Bob become friends due to homophily (photographers stick together) and they would have both bought the camera anyway (correlation between interests and friendships)?approaches:time resolved data: if adoption time is shuffled, does it yield the same patterns?if edges are directed: does reversing the edge direction yield less predictive power?
28 Example: adopting new practices poison pillsdiffused through interlocksgeography had little to do with itmore likely to be influencedby tie to firm doing somethingsimilar & having similar centralitygolden parachutesdid not diffuse through interlocksgeography was a significant factormore likely to follow “central” firmswhy did one diffuse through the “network” while the other did not?Source: Corporate Elite Networks and Governance Changes in the 1980s.
29 Social Network and Spread of Influence Social network plays a fundamental role as a medium for the spread of INFLUENCE among its membersOpinions, ideas, information, innovation…Direct Marketing takes the “word-of-mouth” effects to significantly increase profitsGmail, Tupperware popularization, Microsoft Origami …
30 Application besides product marketing Problem SettingGivena limited budget B for initial advertisinge.g. give away free samples of productestimates for influence between individualsGoaltrigger a large cascade of influencee.g. further adoptions of a productQuestionWhich set of individuals should B target at?Application besides product marketingspread an innovationdetect stories in blogsif we can try to convince a subset of individuals to adopt a newproduct or innovationBut how should we choose the few key individuals to use for seeding this process?Which blogs should one read to be most up to date?
31 First mathematical models Large body of subsequent work: Models of InfluenceFirst mathematical models[Schelling '70/'78, Granovetter '78]Large body of subsequent work:[Rogers '95, Valente '95, Wasserman/Faust '94]Two basic classes of diffusion models: threshold and cascadeGeneral operational view:A social network is represented as a directed graph, with each person (customer) as a nodeNodes start either active or inactiveAn active node may trigger activation of neighboring nodesMonotonicity assumption: active nodes never deactivatewe focus on more operational models from mathematical sociology [15, 28] and interacting particle systems[11, 17] that explicitly represent the step-by-step dynamics of adoption.Assumption: node can switch to active from inactive, but does not switch in the other direction
32 Threshold dynamics The network: aij is the adjacency matrix (N ×N) un-weightedundirectedThe nodes:are labelled i , i from 1 to N;have a state ;and a threshold ri from some distribution.
33 Threshold dynamics The fraction of nodes in state vi=1 is r(t): Node i has stateand thresholdNeighbourhood average:Updating:The fraction of nodes in state vi=1 is r(t):
34 Linear Threshold Model A node v has random threshold θv ~ U[0,1]A node v is influenced by each neighbor w according to a weight bvw such thatA node v becomes active when at least(weighted) θv fraction of its neighbors are activeGiven a random choice of thresholds, and an initial set ofactive nodes A0 (with all other nodes inactive), the diffusion processunfolds deterministically in discrete steps: in step t, all nodesthat were active in step t-1 remain active, and we activate anynode v for which the total weight of its active neighbors is at leastTheta(v)
35 Stop! Example w v Inactive Node 0.6 Active Node 0.2 Threshold 0.2 0.3 Active neighborsX0.10.4U0.30.5Stop!0.20.5wv
36 Independent Cascade Model When node v becomes active, it has a single chance of activating each currently inactive neighbor w.The activation attempt succeeds with probability pvw .We again start with an initial set of active nodes A0, and the process unfolds in discrete steps according tothe following randomized rule. When node v first becomes activein step t, it is given a single chance to activate each currently inactiveneighbor w; it succeeds with a probability pv;w —a parameterof the system — independently of the history thus far. (If w hasmultiple newly activated neighbors, their attempts are sequenced inan arbitrary order.) If v succeeds, then w will become active in stept+1; but whether or not v succeeds, it cannot make any further attemptsto activate w in subsequent rounds. Again, the process runsuntil no more activations are possible.
38 factors influencing information diffusion outlinefactors influencing information diffusionnetwork structure: which nodes are connected?strength of ties: how strong are the connections?studies in information diffusion:Granovetter: the strength of weak tiesJ-P Onnela et al: strength of intermediate tiesKossinets et al: strength of backbone tiesDavis: board interlocks and adoption of practicesnetwork position and access to informationBurt: Structural holes and good ideasAral and van Alstyne: networks and information advantagenetworks and innovationLazer and Friedman: innovation
39 Burt: structural holes and good ideas Managers asked to come up with an idea to improve the supply chainThen asked:whom did you discuss the idea with?whom do you discuss supply-chain issues with in generaldo those contacts discuss ideas with one another?673 managers (455 (68%) completed the survey)~ 4000 relationships (edges)
42 people whose networks bridge structural holes have resultspeople whose networks bridge structural holes havehigher compensationpositive performance evaluationsmore promotionsmore good ideasthese brokers aremore likely to express ideasless likely to have their ideas dismissed by judgesmore likely to have their ideas evaluated as valuable
43 networks & information advantage BetweennessConstrained vs. UnconstrainedSource: M. van Alstyne, S. Aral. Networks, Information & Social Capitalslides: Marshall van Alstyne
44 Aral & Alstyne: Study of a head hunter firm Three firms initiallyUnusually measurable inputs and outputs1300 projects over 5 yrs and125,000 messages over 10 months (avg 20% of time!)Metrics(i) Revenues per person and per project,(ii) number of completed projects,(iii) duration of projects,(iv) number of simultaneous projects,(v) compensation per personMain firm 71 people in executive search (+2 firms partial data)27 Partners, 29 Consultants, 13 Research, 2 IT staffFour Data Sets per firm52 Question Survey (86% response rate)Accounting15 Semi-structured interviews
45 Email structure matters New Contract RevenueCoefficientsaContract Execution RevenueCoefficientsaUnstandardized CoefficientsUnstandardized CoefficientsBStd. ErrorAdj. R2Sig. F BStd. ErrorAdj. R2Sig. F (Base Model)0.400.19Best structural pred.***4454.00.52.0061544.0**639.00.30.021Ave. Size-10.7**4.90.56.042-9.3*4.70.34.095Colleagues’ Ave.0.56.248**0.42.026Response Timea.Dependent Variable: Bookings02a.Dependent Variable: Billings02b.b.Base Model: YRS_EXP, PARTDUM, %_CEO_SRCH, SECTOR(dummies), %_SOLO.N=39. *** p<.01, ** p<.05, * p<.1Slight negative correlation for more structural holes and execution (billings) controlling for bookings => maintaining diverse contacts requires a lot of work.Sending shorter helps get contracts and finish them.Faster response from colleagues helps finish them.
46 getting information sooner correlates with $$ brought in diverse networks drive performance by providing access to novel informationnetwork structure (having high degree) correlates with receiving novel information sooner (as deduced from hashed versions of their )getting information sooner correlates with $$ brought incontrolling for # of years workedjob level….
47 Network Structure Matters New Contract RevenueCoefficientsaContract Execution RevenueCoefficientsaUnstandardized CoefficientsUnstandardized CoefficientsBStd. ErrorAdj. R2Sig. F BStd. ErrorAdj. R2Sig. F (Base Model)0.400.19Size Struct. Holes13770***46470.52.0067890*46560.24.100Betweenness1297*7730.47.0401696**6970.30.021a.Dependent Variable: Bookings02a.Dependent Variable: Billings02b.b.Base Model: YRS_EXP, PARTDUM, %_CEO_SRCH, SECTOR(dummies), %_SOLO.N=39. *** p<.01, ** p<.05, * p<.1Slight negative correlation for more structural holes and execution (billings) controlling for bookings => maintaining diverse contacts requires a lot of work.Bridging diverse communities is significant.Being in the thick of information flows is significant.
48 factors influencing information diffusion outlinefactors influencing information diffusionnetwork structure: which nodes are connected?strength of ties: how strong are the connections?studies in information diffusion:Granovetter: the strength of weak tiesJ-P Onnela et al: strength of intermediate tiesKossinets et al: strength of backbone tiesDavis: board interlocks and adoption of practicesnetwork position and access to informationBurt: Structural holes and good ideasAral and van Alstyne: networks and information advantagenetworks and innovationLazer and Friedman: innovation
49 networked coordination game choice between two things, A and Be.g. basketball and soccerif friends choose A, they get payoff aif friends choose B, they get payoff bif one chooses A while the other chooses B, their payoff is 0
50 coordinating with one’s friends Let A = basketball, B = soccer. Which one should you learn to play?fraction p = 3/5 play basketballfraction p = 2/5 play soccer
51 which choice has higher payoff? d neighborsp fraction play basketball (A)(1-p) fraction play soccer (B)if choose A, get payoff p * d *aif choose B, get payoff (1-p) * d * bso should choose A ifp d a ≥ (1-p) d borp ≥ b / (a + b)
52 two equilibriaeveryone adopts Aeveryone adopts B
53 what happens in between? What if two nodes switch at random? Will a cascade occur?example:a = 3, b = 2payoff for nodes interaction using behavior A is 3/2as large as what they get if they both choose Bnodes will switch from B to A if at least q = 2/(3+2) = 2/5 of their neighbors are using A
54 how does a cascade occur suppose 2 nodes start playing basketball due to external factorse.g. they are bribed with a free pair of shoes by some devious corporation
55 Quiz Q:Which node(s) will switch to playing basketball next?
57 you pick the initial 2 nodes A larger example (Easley/Kleinberg Ch. 19)does the cascade spread throughout the network?
58 implications for viral marketing if you could pay a small number of individuals to use your product,which individuals would you pick?
59 What is the role of communities in complex contagion Quiz question:What is the role of communities in complex contagionenabling ideas to spread in the presence of thresholdscreating isolated pockets impervious to outside ideasallowing different opinions to take hold in different parts of the network
60 so far nodes could only choose between A and B bilingual nodesso far nodes could only choose between A and Bwhat if you can play both A and B,but pay an additional cost c?
61 try it on a lineIncrease the cost of being bilingual so that no node chooses to do so. Let the cascade runNow lower the cost.What happens?
62 The presence of bilingual nodes Quiz Q:The presence of bilingual nodeshelps the superior solution to spread throughout the networkhelps inferior options to persist in the networkcauses everyone in the network to become bilingual
63 knowledge, thresholds, and collective action nodes need to coordinate across a network, but have limited horizons
64 can individuals coordinate? each node will act if at least x people (including itself) mobilizenodes will not mobilize
70 modeling the problem space Kauffman’s NK modelN dimensional problem spaceN bits, each can be 0 or 1K describes the smoothness of the fitness landscapehow similar is the fitness of sequences with only 1-2 bits flipped (K = 0, no similarity, K large, smooth fitness)
71 Kauffman’s NK modelK largeK mediumK smallfitnessdistance
72 As a node, you start out with a random bit string At each iteration Update rulesAs a node, you start out with a random bit stringAt each iterationIf one of your neighbors has a solution that is more fit than yours, imitate (copy their solution)Otherwise innovate by flipping one of your bits
73 Quiz Q:Relative to the regular lattice, the network with many additional, random connections has on average:slower convergence to a local optimumsmaller improvement in the best solution relative to the initial maximummore oscillations between solutions
74 networks and innovation: is more information diffusion always better? linear networkfully connected networkNodes can innovate on their own (slowly) or adopt their neighbor’s solutionBest solutions propagate through the networksource: Lazer and Friedman, The Parable of the Hare and the Tortoise: Small Worlds, Diversity, and System Performance
75 networks and innovation fully connected network converges more quickly on a solution, but if there are lots of local maxima in the solution space, it may get stuck without finding optimum.linear network (fewer edges) arrives at better solution eventually because individuals innovate longer
76 Coordination: graph coloring Application: coloring a map: limited set of colors, no two adjacent countries should have the same color
77 graph coloring on a network Each node is a human subject. Different experimental conditions:knowledge of neighbors’ colorknowledge of entire networkCompare:regular ring latticesmall-world topologyscale-free networksKearns et al., ‘An Experimental Study of the Coloring Problem on Human Subject Networks’,Science, 313(5788), pp , 2006
78 simulationKearns et al. “An Experimental Study of the Coloring Problem on Human Subject Networks”network structure affects convergence in coordination games, e.g. graph coloring
79 Quiz Q:As the rewiring probability is increased from 0 to 1 the following happens:the solution time decreasesthe solution time increasesthe solution time initially decreases then increases again
80 network topology influences processes occurring on networks to sum upnetwork topology influences processes occurring on networkswhat state the nodes converge tohow quickly they get thereprocess mechanism matters:simple vs. complex contagioncoordinationlearningdiffusion can be simple (person to person) or complex (individuals having thresholds)people in special network positions (the brokers) have an advantage in receiving novel info & coming up with “novel” ideasin some scenarios, information diffusion may hinder innovation