Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reti complesse 2011/2012 Social media analysis: From raw data to services.

Similar presentations


Presentation on theme: "Reti complesse 2011/2012 Social media analysis: From raw data to services."— Presentation transcript:

1 Reti complesse 2011/2012 Social media analysis: From raw data to services

2 WORKFLOW Data collection Static data analysis Dynamics (homophily and influence) Prediction Social media website Services Why social links are formed? To what extent are users influenced by each other? Can we predict the evolution of the network?

3 WORKFLOW Data collection Static data analysis Dynamics (homophily and influence) Prediction Social media website Services

4 ANOBII PAGE OVERVIEW

5 MODELING THE SOCIAL NETWORK Friendship Neighborhood Communication Social ID, gender, books, groups, …

6 DATA COLLECTION Crawlers Scraper Storage Get Web BFS or “Snowball sampling” Python standard libraries: urllib, urllib2, cookielib, threading Pyhton igraph to load the graph from file: g=igraph.read(networkfile) 01001 10101 11010

7 WORKFLOW Data collection Static data analysis Dynamics (homophily and influence) Prediction Social media website Services

8 g.diameter(), g.average_path_length(directed=True), g.components(mode=STRONG/WEAK), g.density(loops=FALSE), g.reciprocity(), … BASIC STATISTICS FriendshipNeighborhoodSocialCommunication #Nodes126,85877,356140,68680,303 #Edges557,258633,6351,187,650574,281 #Loops00022,579 Reciprocation0.600.430.540.61 〈 k out 〉 4.48.28.47.2 GWCC size121,14376,760140,68675,965 GSCC size81,29241,063100,49238,336 Density3.4 · 10 -5 1.1 · 10 -4 6.0 · 10 -5 8.9 · 10 -5 〈 SPL 〉 7.34.75.34.8 Diameter25152017 S O W H A T ? Take your time, look carefully and compare…

9 CLUSTERING AND FIRST VISUALIZATION Visualization with Gephi 0.8 (plus some post-editing). Gephi can import several graph formats, including simple CSV gephi.org

10 DISTRIBUTIONS LEGEND g=groups b=books w=wishlist r=reviews s=ratings t=tags a=annotations Broad behavior Python igraph: g.degree_distribution() XMGrace, a very useful plotting tool: http://plasma-gate.weizmann.ac.il/Grace

11 CORRELATIONS Are the different activities of a user correlated between each other? Strong correlations emerge Activity1 Activity2

12 MIXING PATTERNS General assortative behavior Disassortative trend for some particular ranges and features Is the activity of a user correlated with the activity of her neighbors?

13 TOPICAL OVERLAP Does similarity between user profiles depend on the social distance? Statistical correlation because of assortative biases? Null model to discern real overlap from purely statistical effects No topical overlap other than that caused by statistical mixing patters

14 TOPICAL OVERLAP d Do “interaction” ties imply higher similarity? (Slightly) Stronger similarity in interaction network

15 GEOGRAPHIC OVERLAP What about local overlap of the “geographic” features? Null model test with random link rewire g.rewire()

16 Two-core network (language barriers) Friendship & neighborhood used slightly differently High reciprocation Broad activity behavior Assortative mixing patterns Correlations between different activities People residing closer in the network are more similar, on average Communication determines stronger ties SUMMARY OF FINDINGS

17 WORKFLOW Data collection Static data analysis Dynamics (homophily and influence) Prediction Social media website Services

18 DYADIC CENSUS AND TRIANGLE CLOSURE Direct 75% Reciprocated 20% Bidirectional 25% Closure 30% Double closure 10% New edges can be classified as: Python iGraph: g=Graph.Erdos Renyi(100, 0.2, directed=True) dc=g.dyad_census() tc=g.triad_census()

19 DYNAMICS TO EXPLORE CAUSES Explore the causality relationship between profile similarity and link creation using the time dimension Topical overlap might be caused by: 1. Homophily2. Influence 3. Both

20 SIMILARITY  LINK CREATION 〈n cb 〉σbσb 〈n cg 〉σgσg d uv = 29.50.021.120.05 u → v 12.90.041.100.08 u ↔ v 18.50.041.670.11 Closure18.20.041.810.10 Dbl closure23.40.051.200.12 Average similarity of pairs forming new links between t and t+1, compared with average similarity of all the pairs at distance 2 at time t. Pairs that are going to get connected show a substantially higher similarity

21 LINK CREATION  SIMILARITY Groups Books Evolution of the similarity between pairs linking together at different times

22 INFLUENCE AS “BOOK CONTAGION” Susceptible Infected Social tie TIME = 0 A B C K b (A)=2, F b (A)=1 K b (B)=0, F b (B)=0 K b (C)=3, F b (C)=0.75 TIME = 1

23 ADOPTERS vs NON-ADOPTERS Adopters are more likely to have a higher number/fraction of neighbors with the book At fixed out-degree, adopters have on average much more neighbors with the book than non-adopters

24 INFLUENCE IS STRONGER WHEN INTERACTING P a = #adopters with K b / #users with K b

25 Link creation is driven by – Balance (triangle closure) – Reciprocation – Homophily – Proximity Link creation triggers a boost in similarity Influence spreads along social ties SUMMARY OF FINDINGS

26 WORKFLOW Data collection Static data analysis Dynamics (homophily and influence) Prediction Social media website Services

27 FRIENDSHIP RECOMMENDATION SERVICE Predicting the creation of new links and anticipate the action of the users can be used for contact recommendation 1.Focus on a user u 2.Compute some similarity between u and all* the other users 3.Rank the users according to their similarity 4.Recommend the top N * Computational constraints apply…

28 FRIENDSHIP RECOMMENDATION Precision at N Many features can be used (common neighbors, reciprocity, similarity of profile features, etc.) Features can be profitably combined with classifiers ( http://www.cs.waikato.ac.nz/ml/weka )

29 NOTE ON METHODOLOGY Compare your results with other real networks or ad-hoc null models Quantitative and qualitative analysis Keep in mind that correlation != causation Plots help lots!

30 REFERENCES L. M. Aiello, A. Barrat, R. Schifanella, C. Cattuto, B. Markines, F. Menczer Friendship prediction and homophily in social media ACM Transaction on the Web (TWEB). To appear www.di.unito.it/~aiello L. M. Aiello, A. Barrat, C. Cattuto, G. Ruffo, R. Schifanella "Link creation and profile alignment in the aNobii social network." In SocialCom’10: Proceedings of the 2nd IEEE International Conference on Social Computing, Minneapolis, MN, USA, 2010 Ask for more references! aiello@di.unito.it

31


Download ppt "Reti complesse 2011/2012 Social media analysis: From raw data to services."

Similar presentations


Ads by Google