Presentation is loading. Please wait.

Presentation is loading. Please wait.

Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian.

Similar presentations


Presentation on theme: "Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian."— Presentation transcript:

1 Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian

2 Preliminaries - Correlations exist in users' behaviors

3 Preliminaries - Correlations exist in users' behaviors - Representation: individuals are nodes of a social graph, G every node is "active" or "inactive" - Formally, correlation = if u and v are adjacent in G: the event that u becomes active is correlated with v becoming active

4 Preliminaries - Correlations exist in users' behaviors - Representation: individuals are nodes of a social graph, G every node is "active" or "inactive" - Formally, correlation = if u and v are adjacent in G: the event that u becomes active is correlated with v becoming active - Want to distinguish between different sources of social correlation

5 Models of Social Correlation - Homophily = tendency for individuals to choose friends with similar characteristics / preferences

6 Models of Social Correlation - Homophily = tendency for individuals to choose friends with similar characteristics / preferences - Confounding = external influence from elements in the environment (confounding factors)‏

7 Models of Social Correlation - Homophily = tendency for individuals to choose friends with similar characteristics / preferences - Confounding = external influence from elements in the environment (confounding factors)‏ - Influence = the action of one individual induces another individual to act in a similar way.

8 Motivation - Useful to know when social influence is the source of correlation

9 Motivation - Useful to know when social influence is the source of correlation - Viral marketing -> want to target select individuals

10 Motivation - Useful to know when social influence is the source of correlation - Viral marketing -> want to target select individuals - Influence behavior -> create "role models" (e.g. in fashion)‏

11 Motivation - Useful to know when social influence is the source of correlation - Viral marketing -> want to target select individuals - Influence behavior -> create "role models" (e.g. in fashion)‏ - We want to identify situations when such techniques can be applied.

12 Motivation - Useful to know when social influence is the source of correlation - Viral marketing -> want to target select individuals - Influence behavior -> create "role models" (e.g. in fashion)‏ - We want to identify situations when such techniques can be applied. - Also useful for analysis (predicting future state of network)‏

13 Modeling Influence 1. Graph G drawn according to some distribution

14 Modeling Influence 1. Graph G drawn according to some distribution 2. In each of the time steps 1,..., T, each non-active agent decides whether to become active.

15 Modeling Influence 1. Graph G drawn according to some distribution 2. In each of the time steps 1,..., T, each non-active agent decides whether to become active. 3. An agent becomes active with probability p(a), a function of the number of neighboring and active nodes.

16 or, alternatively,

17 Some remarks... - The coefficient α measures social correlation.

18 Some remarks... - The coefficient α measures social correlation. - Since actions are stored, a represents the number of users active at any earlier time step

19 Some remarks... - The coefficient α measures social correlation. - Since actions are stored, a represents the number of users active at any earlier time step - This model is relatively simplistic: - the probability does not vary between nodes - or as time passes

20 Some remarks... - The coefficient α measures social correlation. - Since actions are stored, a represents the number of users active at any earlier time step - This model is relatively simplistic: - the probability does not vary between nodes - or as time passes - However, these simplifying assumption are practical

21 Estimating α, β - Can estimate using maximum likelihood logistic regression - Maximize expression where is the number of users who at the beginning of time had a active friends and became active at time t

22 The Shuffle Test - Idea: if influence does not play a role, then the timing of activations amongst users should be independent of each other: Pr(a active before b) = Pr(b active before a)‏

23 The Shuffle Test 1. Estimate α for initial graph 2. Randomly permute the order in which active nodes have been activated: set the time of 3. Estimate α' for this configuration 4. If the values for α and α' are close to each other, the model exhibits little or no social influence.

24 The Edge-reversal Test 1. reverse direction of all the edges 2. run the same logistic regression on the data using the new graph If correlation is not due to influence, then α should not change

25 Generative Models - No Correlation - Influence - Correlation, no influence

26 Generative Models - No Correlation - network grows just as the real data - at every step, randomly pick n nodes, and make them active

27 Influence Model - network grows just as the real data - at every step, every inactive node flips a coin, with

28 Correlation, No Influence Model - network grows just as the real data - Pick a subset S of G: - randomly pick centers, add a ball of radius 2 from each to S - do this until |S| reaches parameter L - Pick nodes to become active uniformly at random, from S

29 Distinguishing Influence: Shuffle Test Influence: Correlation:

30 Distinguishing Influence: Edge Reversal Correlation: Influence:

31 Real Data: the Flickr Dataset - analyzed 800K users over 16 months - about 340K exhibited tagging behavior - size of giant component: 160K - 2.8M directed edges, 28.5% not mutual - analyzed 1,700 tags independently - various types (event, color, object, etc)‏ - various numbers of users - various growth patterns (bursty, smooth, periodic)‏

32 Distinguishing Influence in Flickr Shuffle test

33 Distinguishing Influence in Flickr Edge reversal test

34 Some Influence - can discover traces of influence by looking at similar tags

35 Some Influence - can discover traces of influence by looking at similar tags - for the tag "graffiti", the difference between αs was 0 - however, for the misspelling "grafitti", difference was slightly larger - with even less common misspelling "graffitti", difference increased even more

36 Conclusions - distinguishing between correlation and causation is difficult

37 Conclusions - distinguishing between correlation and causation is difficult - timing information can help answer the question (shuffle)‏

38 Conclusions - distinguishing between correlation and causation is difficult - timing information can help answer the question (shuffle)‏ - knowing of asymmetric social ties is also useful (edge-reversal)‏

39 Further research directions - formal verification of results? (controlled experiments)‏ - quantification of the strength of influence? - identify which nodes influence others - what if social ties are symmetric? - distinguishing between other forms of correlation - distinguishing between different forms of social influence

40 Questions?


Download ppt "Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian."

Similar presentations


Ads by Google