Presentation is loading. Please wait.

Presentation is loading. Please wait.

Center for E-Business Technology Seoul National University Seoul, Korea Social Ranking: Uncovering Relevant Content Using Tag-based Recommender Systems.

Similar presentations


Presentation on theme: "Center for E-Business Technology Seoul National University Seoul, Korea Social Ranking: Uncovering Relevant Content Using Tag-based Recommender Systems."— Presentation transcript:

1 Center for E-Business Technology Seoul National University Seoul, Korea Social Ranking: Uncovering Relevant Content Using Tag-based Recommender Systems Valentina Zanardi, Licia Capra Dept. of Computer Science, University College London 2 nd ACM International Conference on Recommender Systems October 23-25, 2008, Lausanne, Switzerland 2009. 05. 29. Summarized & presented by Babar Tareen, IDS Lab., Seoul National University

2 Copyright  2008 by CEBT Introduction  Taxonomies Hierarchical classification Standardized Expert opinions  Social (or folksonomic) Tagging enhance content by enabling users to Describe Categories Search Discover Navigate (Tag Clouds) 2

3 Copyright  2008 by CEBT Introduction (2)  At times, use of tagging may lower search efficiency  Downsides of Social Tagging Informally defined Dynamically Changing Ungoverned Heterogeneity of users Heterogeneity of context  Language related problems Synonyms: Words with similar meaning – Book (Schedule, Reserve, Record) Homonyms: Words with same pronunciation but different meaning – Berry (Fruit), Bury (take under) Polysemy: Words having different meanings – Foot (Length, Body Part) – Left (Direction, Action of leaving a place) 3

4 Copyright  2008 by CEBT Social Ranking (In a Nutshell)  Aims to efficiently find content that is relevant to a user’s query Assumptions – Typical Web 2.0 content – Content is arbitrarily Tagged by users  Answers queries by exploiting recommender system techniques User similarity is based on past tag activity Tag relationship based on association to content Ranking by – Inferred distance of the query to the tags associated to such content – Weighted by the similarity of the querying user to the user who created those tags 4

5 Copyright  2008 by CEBT Dataset Analysis  CiteULike dataset (Social Bookmarking site for researchers) Article, User, Tag 820,000 Articles (papers) 28,000 Users 240,000 Tags  Pre-Processing Removed Bookmarks and Tags used by only one users 100,000 Articles (papers) 28,000 Users 55,000 Tags 5

6 Copyright  2008 by CEBT Long Tails  Long Tail of Tags 70% of the tags used by 20 users On Avg. 5 Tags per paper (Max. 10) This suggests that standard keyword search will likely fail  Long Tails of Papers 85% of the papers tagged by less than 5 users This suggests that standard recommender systems techniques would likely perform poorly in terms of accuracy and coverage 6

7 Copyright  2008 by CEBT Ranking (Basic Model)  The higher the number of query tags associated to the resource, the higher its ranking (Accuracy)  The higher the number of users u i who tagged the resource using (some of the) query tags, the higher its ranking  Works fine for popular content  Fails to address queries that look for long tail of medium-to-low popularity content (Accuracy Problem)  If user running the query also uses tags that belong to long tail of tags then chances are that relevant content is not found (Coverage Problem) 7

8 Copyright  2008 by CEBT Social Ranking  Based on following observation Clustering of Users for Improved Accuracy – Most active users bookmark a tiny portion of the whole paper set – Users have clear defined interests – Each users masters small subset of the whole folksonomy – Users sharing parts of folksonomy form fairly small clusters Clustering of Tags for Improved Coverage – Each paper was described by just a handful of tags – This suggests that there is a core of shared knowledge about tags within communities 8

9 Copyright  2008 by CEBT Social Ranking (2)  Identify the users with similar interests to querying user Based on users’ tag activity  Identifying similar tags to query tags 9

10 Copyright  2008 by CEBT Two Step Query Model  Query  Query Expansion  Ranking Papers with tags from original query should rank higher than extra tags from expanded query Papers shared by similar user should be ranked higher 10

11 Copyright  2008 by CEBT Evaluation  Dataset Considered only those tags – Which are used on at least 15 different papers – By at least 20 different users Users: 12,000 Papers: 83,000 Tags: 16,000  Long Tails 11

12 Copyright  2008 by CEBT Simulation Setup 12 Q

13 Copyright  2008 by CEBT Results (without query expansion) 13

14 Copyright  2008 by CEBT Results (2) 14

15 Copyright  2008 by CEBT Results (3) 15

16 Copyright  2008 by CEBT Discussion  Paper assumes that users have fixed interests Work for CiteULike because many people will have limited research directions May not work well enough for Delicious because people tend to bookmark different types of pages  Tags in CiteULike may be comparatively well organized because of technical users adding tags to technical papers Maximum tags per paper on CiteULike: 10 May not work well enough for Delicious, some bookmarks with 46 tags 16


Download ppt "Center for E-Business Technology Seoul National University Seoul, Korea Social Ranking: Uncovering Relevant Content Using Tag-based Recommender Systems."

Similar presentations


Ads by Google