Presentation is loading. Please wait.

Presentation is loading. Please wait.

Network Structure of Folksonomies

Similar presentations


Presentation on theme: "Network Structure of Folksonomies"— Presentation transcript:

1 Network Structure of Folksonomies
Vito D. P. Servedio Dipartimento di Fisica, Università di Roma "La Sapienza“ Centro Studi e Ricerche "Enrico Fermi" TAGora: Semiotic Dynamics of Online Social Communities EU-IST

2 In collaboration with:
Andrea Baldassarri, Ciro Cattuto, Vittorio Loreto Miranda Grahl, Andreas Hotho, Christoph Schmitz, Gerd Stumme

3 Properties of folksonomy hypergraphs Network of tag co-occurrences
AGENDA Properties of folksonomy hypergraphs Network of tag co-occurrences Clustering of resources V D P Servedio ~ECCS07~

4 a folksonomy example: del.icio.us screenshot
resource user tags V D P Servedio ~ECCS07~

5 Post = ({tags}, user, resource)
data structure: basic units of information Post = ({tags}, user, resource) TAS = tag assignment (tag, user, resource) ({bookmarking, sharing, collaborative, folksonomy}, andreab, (bookmarking, andreab, (sharing, andreab, (collaborative, andreab, (folksonomy, andreab, V D P Servedio ~ECCS07~

6 folksonomy hypergraph structure
Can be viewed as a 3-modes network User 4 User 3 User 2 User 1 Tag 1 Res 1 Res 2 Res 3 Tag 2 hyperlink Tag 3 V D P Servedio ~ECCS07~

7 TAGora Project (STREP FP6)
data collection TAGora Project (STREP FP6) Semiotic Dynamics in Online Social Communities del.icio.us Work co-ordinated by Uni of Kassel Collected data from Nov Over 667K users ~19 million resources Nearly 2.5 million tags ~ 140 million tag assignments 50GB of data del.icio.us Consortium: University of Roma “La Sapienza” SONY CSL Paris University of Kassel University of Koblenz-Landau University of Southampton flickr Work co-ordinated by Uni of Koblenz Collected data 21st May 2007. ~ 300K users ~25 million photos 1.5 million tags over 110 million tag assignments bibsonomy Complete dataset (june 2007) ~1385 users 37651 tags Over 149K resources

8 artificial networks: permuted and binomial
In the following slides we shall use some artificial networks defined as: PERMUTED: take the original folksonomy and shuffle all nodes in the same class. example Resource1 User1 Tag1 Resource1 User1 Tag2 Resource1 User1 Tag3 Resource1 User2 Tag1 Resource1 User2 Tag2 Resource2 User3 Tag4 permuted example Resource1 User3 Tag2 Resource2 User2 Tag3 Resource1 User1 Tag1 Resource1 User1 Tag2 Resource1 User2 Tag1 Resource1 User1 Tag4 We end up with a hypergraph with same degree of the original one BINOMIAL: same number of hyperedges; endpoints chosen uniformly at random among T, U, R V D P Servedio ~ECCS07~

9 average path length (extimated)
moving on hyperedges T1 U1 R1 T2 U1 R1 T2 U2 R2 T3 U2 R3 time V D P Servedio ~ECCS07~

10 cliquishness A high resource cliquishness indicates that many of the users related to that resource assign overlapping sets of tags to it r2 r t1 t2 u1 u2 t3 |Tr|=3 |Ur|=2 |tur|=3 |tur| = # of hyperlinks connected to r |Tr | = # of adjacent tags |Ur | = # of adjacent users V D P Servedio ~ECCS07~

11 connectedness / transitivity
|tur| = # of hyperlinks connected to r |tur| = number of tag-user pairs from tur that also occur with some other resource other than r r2 r t1 t2 u1 u2 t3 |tur|=3 |tur|=1 V D P Servedio ~ECCS07~

12 Properties of folksonomy hypergraphs Network of tag co-occurrences
AGENDA Properties of folksonomy hypergraphs Network of tag co-occurrences Clustering of resources V D P Servedio ~ECCS07~

13 networks of tag co-occurrence
Tags acquire a stronger semantic context when they co-occur each other e.g.: {Roma, holidays, Italy} vs {Roma, football, we_won} vs {Roma, love, girls} etc. Tag co-occurrences in posts Weighted graph of tags Weight = number of common posts Strength of a tag Sum of its edge weights Can we study “sematics“ of tags? ({japan,tokyo} more frequent than {physics,sex}) --check with Google!— Compare statistics with shuffled graphs V D P Servedio ~ECCS07~

14 weighted network of tag co-occurrence
Two Tags co-occur if they are present in the same post We can say more: Two tags t, t’ co-occur with weight w if they are simultaneously present in w posts. In terms of adjacency Tensors: tensor contraction in flat space… We examine the weighted undirected network defined by W V D P Servedio ~ECCS07~

15 strength cumulative distribution
Strength of node i: Strength distribution SPAM tag shuffled example Resource1 User1 Tag2 Resource1 User1 Tag3 Resource1 User1 Tag1 Resource1 User2 Tag4 Resource1 User2 Tag1 Resource2 User3 Tag2 The tag reshuffling procedure makes almost no changes in the P(s): the strength is related to frequency of tags, not on semantics V D P Servedio ~ECCS07~

16 Average neighbour strength
Examine strength correlation between neighbors: Positive correlation: Assortative mixing e.g. Social networks Negative correlation: Disassortative mixing e.g. Technological networks Look for spam infection Reveal semantics via shuffled graph V D P Servedio ~ECCS07~

17 average neighbor strength
Scatter plot Tags introduced with spamming, cluster together Shuffling the graph changes the measure Correlations related to semantics correspond to a region in the graph spam spam spam V D P Servedio ~ECCS07~

18 Properties of folksonomy hypergraphs Network of tag co-occurrences
AGENDA Properties of folksonomy hypergraphs Network of tag co-occurrences Clustering of resources V D P Servedio ~ECCS07~

19 clustering and community detection
Folksonomies: complex tripartite networks (tag, user, resource) Clustering detection can reveal sub-set of users (social communities) sub-set of tags (semantic frames, jargons…) sub-set of resources (social classification) Other… Now we focus on clustering of resources using only tag assignments V D P Servedio ~ECCS07~

20 resource similarity network
Weighted network How to choose weights? How to take into account tag frequency? V D P Servedio ~ECCS07~

21 tag clouds for resources
Each resource is characterised by a tag-cloud: tags are assigned by users, and appear with different frequency. V D P Servedio ~ECCS07~

22 TF/IDF-like weighting procedure
similarity metrics TF/IDF-like weighting procedure Tag frequencies: Global frequency Relative frequencies T1 T2 K STATEMENT: Resources sharing “rare” tags are closely related V D P Servedio ~ECCS07~

23 200 resources tagged with “design”
case in study Sample of 400 resources: 200 resources tagged with “design” 200 resources tagged with “politics” Does the similarity network show two clusters? Finer structure? Subclusters?

24 Broad variability of similarity strengths on logarithmic scale.
similarity matrix Broad variability of similarity strengths on logarithmic scale. P(w) A small power (0.1) is used as an effective way to treat with vanishing weights. W = { w’ } w TASK: Find column and row permutations that uncover a block structure V D P Servedio ~ECCS07~

25 First non trivial eigenvalues
spectral analysis A. Capocci, V.D.P. Servedio, G. Caldarelli and F. Colaiori, Physica A 352, 669 (2005). and many others First non trivial eigenvalues Q Eigenvalues « Laplacian » matrix V D P Servedio ~ECCS07~

26 cluster identification
Correlation of homologous components reveals cluster structure. V2 = {v2,1, v2,2, ...,v2,n }, V3 = { v3,1, v3,2, ..., v3,n }, V4 = { v4,1, v4,2, ..., v4,n } [ v2,i ; v3,i ; v4,i ] reordered matrix politics design 2 4 3 1

27 cooperative classification
Tag clouds of the six identified clusters of resources: “humor” in politics news in politics visual design web design

28 Conclusions and outlooks
Folksonomies are the way people is building the information and communication systems of our future. Folksonomies are a laboratory to study human/social/semiotic dynamics. A Folksonomy is a growing tri-partite network, whose nodes are users, resources and metadata (tags), while (hyper)links are annotation events (note that this structure is similar to search queries: user, search string, resource retrieved). Folksonomies’ statistical structure reveals many complex features, typical of interacting humans. Projections of folksonomy on different spaces can be useful to study: spam infection; semantic of tags; emerging resource classification. V D P Servedio ~ECCS07~


Download ppt "Network Structure of Folksonomies"

Similar presentations


Ads by Google