Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Complex Dynamics of Collaborative Tagging Harry Halpin University of Edinburgh Valentin Robu CWI, Netherlands Hana Shepherd Princeton University WWW.

Similar presentations


Presentation on theme: "The Complex Dynamics of Collaborative Tagging Harry Halpin University of Edinburgh Valentin Robu CWI, Netherlands Hana Shepherd Princeton University WWW."— Presentation transcript:

1 The Complex Dynamics of Collaborative Tagging Harry Halpin University of Edinburgh Valentin Robu CWI, Netherlands Hana Shepherd Princeton University WWW 2007

2 Introduction An issue continues to be a central concern: How metadata for web resources should be generated? An issue continues to be a central concern: How metadata for web resources should be generated? –concerned with efficiency and efficacy Social bookmarking Social bookmarking –An increasingly influential web application –del.icio.us, Flickr, Furl, Rojo, Connotea, Technorati,etc Folksonomies vs. Ontologies Folksonomies vs. Ontologies –categorization (tagging) by unsupervised users vs. classification by formal ontologies defined by experts –Multi-categories vs. exact one class

3 Benefits and drawbacks of collaborative tagging Benefits Benefits –higher malleability and adaptability ( “ users do not have to agree on a hierarchy of tags or detailed taxonomy ” ) –Enable retrieving and sharing data more efficiently Drawbacks Drawbacks –Ambiguity in the meaning of tags –The use of synonyms creates informational redundancy –The central concern: whether or not the system becomes relatively stable with time and use? The most problematic claim for tagging systems: The most problematic claim for tagging systems: Because users are not under a centralized controlling vocabulary, no coherent categorization scheme can emerge at all from collaborative tagging.

4 The Dynamics of Tagging Tag distribution Tag distribution –The collection of all tags and their frequencies ordered by rank frequency for a given resource Features of complex systems Features of complex systems –A large number of users –A lack of central coordination –Non-linear dynamics Two important features of collaborative tagging systems Two important features of collaborative tagging systems –Imitation of others –Shared knowledge

5 The Tripartite Structure of Tagging Figure: tripartite graph structure of a tagging system. An edge linking a user, a tag and a resource (website) represents one tagging instance Tags provide the link between the users and the resources (search  tagging [feedback] ) Tags provide the link between the users and the resources (search  tagging [feedback] )

6 A Generative Model Preferential attachment Preferential attachment –Known popularly as the “ rich get richer ” model –P(a) = the probability of a user committing a tagging action –P(o) = the probability that an “ old tag ” is reinforced –If an old tag x is added, it happens with the probability Preferential attachment do not explain why a particular new tag is added. Preferential attachment do not explain why a particular new tag is added. –In practice, a new tag may be added that uncovers an informational dimension not captured by older tags. –Information value: the information conveyed by the tag Linear combination: Linear combination:

7 An Example of Preferential Attachment Figure: an example of how shuffling leads to preferential attachment. This process produces a power law distribution.

8 Abstract Example of Information Value I(t 1 )=1, I(t 3 )=0, I(t 2 )> I(t 4 ), I(t 2,t 4 )=1, I(t 1,t 5 )=0 (not additive) I(t 1 )=1, I(t 3 )=0, I(t 2 )> I(t 4 ), I(t 2,t 4 )=1, I(t 1,t 5 )=0 (not additive) Following Zipf ’ s famous “ Principle of Least Effort ”, users presumably minimize the number of tags used. Following Zipf ’ s famous “ Principle of Least Effort ”, users presumably minimize the number of tags used.

9 Empirical Study Data set Data set –500 sites from the “ Popular ” section of del.icio.us Mean 2074.8 users, standard deviation of 92.9 Mean 2074.8 users, standard deviation of 92.9 –500 from the “ Recent ” section Mean 286.1 users, standard deviation of 18.2 Mean 286.1 users, standard deviation of 18.2 Power law distribution Power law distribution y = cx α y = cx α  log y = αlog x + log c

10 Power Law Regression for Popular Sites Figure: frequency of tag usage, based on relative position (the 25 most frequently used tags) Average α=-1.22 and standard deviation ±0.03 Average α=-1.22 and standard deviation ±0.03

11 Empirical Results for Popular Sites Figure: cumulative frequency of tag use, based on relative position In positions seven to ten have a considerably sharper drop In positions seven to ten have a considerably sharper drop

12 Regression Results for Less Popular Sites Average α=-3.9 and standard deviation ±4.63 Average α=-3.9 and standard deviation ±4.63

13 The Dynamics of Tag Distributions Study how the shape of these distributions forms in time from the tagging actions of individual users Study how the shape of these distributions forms in time from the tagging actions of individual users Kullback-Leibler Divergence (relative entropy) Kullback-Leibler Divergence (relative entropy) Two complementary ways to detect whether or not a distribution has converged to a steady state Two complementary ways to detect whether or not a distribution has converged to a steady state –Take the relative entropy between every two consecutive points in time of the distribution –Take the relative entropy of the tag distribution for each time point with respect to the final tag distribution

14 Empirical Results for Tag Dynamics (1/2) Figure: relative entropy between tag frequency distributions at consecutive time-steps

15 Empirical Results for Tag Dynamics (2/2) Figure: the relative entropy of the tag distribution for each time point with respect to the final distribution

16 Constructing Inter-Tag Correlation Graphs The information value of tags is a central aspect governing the evolution of tag distributions. The information value of tags is a central aspect governing the evolution of tag distributions. Distance between two tags Distance between two tags N(T i ) =the number of pages tagged by T i

17 Tag Correlation Network Figure: visualization of a tag correlation network, considering only the correlations corresponding to one central node “ complexity ”

18 Tag Correlation Network Figure: visualization of a tag correlation network, considering all relevant correlations ( “ small world ” structure  Zipf ’ s law)

19 Conclusion and Future Work This work has explored a number of issues highly relevant to the question of whether a coherent way of organizing metadata can emerge from distributive tagging systems. This work has explored a number of issues highly relevant to the question of whether a coherent way of organizing metadata can emerge from distributive tagging systems. It ’ s shown that tagging distributions tend to stabilize into power law distributions. It ’ s shown that tagging distributions tend to stabilize into power law distributions. Using an example domain, we explored one of the most empirically challenging aspects of the generative model: the information value of a tag as a function of the number of pages. Using an example domain, we explored one of the most empirically challenging aspects of the generative model: the information value of a tag as a function of the number of pages. Future work will elaborate on the results presented here regarding categorization schemes based on tag co-occurrence and information value and will examine whether these results hold among many different tagging applications. Future work will elaborate on the results presented here regarding categorization schemes based on tag co-occurrence and information value and will examine whether these results hold among many different tagging applications.


Download ppt "The Complex Dynamics of Collaborative Tagging Harry Halpin University of Edinburgh Valentin Robu CWI, Netherlands Hana Shepherd Princeton University WWW."

Similar presentations


Ads by Google