TAGora: Semiotic Dynamics of Online Social Communities EU-IST Semantic Modelling of User Interests Based on Cross-Folksonomy Analysis Martin Szomszor, Harith Alani, Kieron O’Hara, Nigel Shadbolt University of Southampton Iván Cantador Universidad Autonoma de Madrid
Outline Introduction and Motivation –Why is your folksonomy interaction useful? –How could it be exploited? Architecture –Matching user accounts –Collecting Data –Tag Filtering –Profile Building Experiment and Evaluation Conclusions and Future Work
Introduction delicious.com Dream Theater Metallica Rush
Increasing number of online identities Recent Ofcom study found that UK adults have on average 1.6 profiles. 39% of those that have one profile have at least 2 Many predict that in the near future, individuals will have in excess of 10 profiles –[Ofcom 2008] Social Networking: A quantative and qualitative research report into attitudes, behaviours, and use.
Profile of Interests The Big Picture delicious.com
Profiles could be exported to other sites to improve recommendation quality Profile of Interests Personalisation Profiles could be used to support personalised searching Better user experience
Consolidation and Integration currency travel hotels cuba cuba holiday
User Tagging delicious.com
Tag Clouds
Tagging Variation [1] Szomszor, M., Cantador, I. and Alani, H. (2008). Correlating User Profiles from Multiple Folksonomies. In: ACM Conference on Hypertext and Hypermedia, 2008, Pittsburgh, Pennsylvania. Raw Tags Filtered Tags
Architecture for Building Profiles of Interests
Account Correlation Using Google’s Social Graph API delicious.com account homepage
Delicious –Custom python scripts Flickr –Using public API Only public information is harvested Data Collection
Tag Filtering Process
Three stage process: 1.Identify Wikipedia page London is matched with 2.Extract Category list Host cities of the Summer Olympic Games | Host cities of the Commonwealth Games | London | 1st century establishments | British capitals | Capitals in Europe | Port cities and towns in the United Kingdom 3.Select representative Categories Only choose categories that match the tag string Excludes spurious categories such as: –Host cities of the Summer Olympic Games –Needs more sources Creating User Profiles
Profile of Interest
Experiment Setup Bootstrapped using 667,141 delicious profiles obtained in previous work Only accounts with a matching Flickr profile and > 50 distinct tags were added Final list contains 1,392 users DeliciousFlickr Total Posts1,134,527Total Posts2,215,913 Distinct Tags138,028Distinct Tags307,182
Evaluation Four evaluation procedures: –The performance of the tag filtering and matching to Wikipedia Entries –The difference between the most common categories found in delicious and Flickr –The amount learnt from merging profiles from the two folksonomies –The accuracy of matching tags to Wikipedia categories
Tag Filtering and Matching
Global Category View What are the differences in the interests that are learnt from each domain? DeliciousFlickr Wikipedia CategoryTotal FreqWikipedia CategoryTotal Freq Design 69,215 Travel51,674 Blogs68,319Australia51,617 Music45,063London46,623 Photography41,356Festivals42,504 Tools35,795Music40,943 Video34,318Cats38,230 Arts29,966Holidays37,610 Software28,746Family37,100 Maps26,912Japan36,513 Teaching22,120Concerts35,374 Games21,549Surnames34,947 How-to19,533Washington33,924 Technology18,032Given Names32,843 News17,737Dogs32,206 Humor15,816Birthdays22,290
Learning More About Users How much more can we learn by using multiple profiles?
Category Matching How good is the category matching? Take 100 random users and choose 1 Delicious tag and 1 Flickr tag Classify tag into one of 3 classes: –Correct –Unresolved (not matched to any category) –Ambiguous (Disambiguation required) CorrectUnresolvedAmbiguous Delicious66%20%14% Flickr63%25%12%
Conclusions We have proposed a novel method for the creation of Profiles of Interest by exploiting an individual’s tagging activities across two popular folksonomy sites Frequently used tags often specify areas of interest but not always! –Common delicious tags are daily, toread, howto –Flickr tags often include names of people Expanding the analysis across folksonomies increases the amount learnt –On Average 15 new concepts per user
Future Work Improve page matching –22.5% of sample tags unresolved Handle disambiguation –13% of sample tags refer to ambiguous terms Cooccurrence networks Category hierarchy Increase network coverage –Already have the data to include Last.fm Understand which tags actually specify an interest of the individual –Filter out categories such as ‘Surname’