Presentation is loading. Please wait.

Presentation is loading. Please wait.

Thesis Proposal: Prediction of popular social annotations Abon.

Similar presentations


Presentation on theme: "Thesis Proposal: Prediction of popular social annotations Abon."— Presentation transcript:

1 Thesis Proposal: Prediction of popular social annotations Abon

2 Outline Background Related Work Problem Definition Possible Solution Experiment Plan Evaluation Plan

3 Background Prevalence of social web services e.g. MY WEBSITE WHAT DO THEY HAVE IN COMMON TAGS & User Generated Content

4 Background TAGs are for ? According to del.icio.us founder Tags are one-word descriptors that you can assign to your bookmarks on del.icio.us to help you organize and remember them. Tags are a little bit like keywords, but they're chosen by you, and they do not form a hierarchy. You can assign as many tags to a bookmark as you like and rename or delete the tags later. So, tagging can be a lot easier and more flexible than fitting your information into preconceived categories or folders. Blah blah blah…..

5 Background TAGs are for ? According to del.icio.us founder Tags are one-word descriptors that you can assign to your bookmarks on del.icio.us to help you organize and to remember them. Tags are a little bit like keywords, but they're chosen by you, and they do not form a hierarchy. You can assign as many tags to a bookmark as you like and rename or delete the tags later. So, tagging can be a lot easier and more flexible than fitting your information into preconceived categories or folders.

6 Background An usage example

7 Why TAGs are useful In Information Retrieval field, it is a common technique to expand query to get more related data. Tags are like human-expanded index term.

8

9 Query expansion here

10 Why TAGs are useful Traditional term expansion scheme relies on term-document relations. And each tag’s importance to a document is often determined by tf-idf. For each tag user applies, it is like voting for what tag should be with some document. Thus the term-document relations could be measured by tag applications.

11 Why TAGs are useful Tags are human-expanded query set which enables more complete concept mapping. With more and more people applying tags, the popularity of tags reach a stable pattern. and top tags could be used as weighting parameters for search optimization

12 Related Work Usage patterns of collaborative tagging systems J. Inf. Sci., Vol. 32, No. 2. (April 2006), pp. 198-208.by Golder SA, Huberman BA. Usage patterns of collaborative tagging systemsGolderHuberman 100+ users, stable pattern appear Urn model

13 Stable pattern: top 7 tags remain for one year+

14 Related Work Collaborative Tagging and Semiotic Dynamics Cattuto C,LoretoV, Pietronero L. Long-term memory version of the classic Yule–Simon process Memory model based on cognitive model

15 Yule–Simon process Qt (x) = a(t)/(x + τ). a(t) is a normalizing factor τis memory parameter

16 Related work The Complex Dynamics of Collaborative Tagging,'‘ H.~Halpin,V.~Robu,H.~Shepherd in Proceedings of WWW 2007

17 Empirical Results for Power Law Regression for Popular Sites

18 P(x) : tag probability distribution at each time point Q(x) : The final tag probability distribution

19 Problem definition In initial stage, each url is not sufficiently annotated by people. Thus, it is hard to be retrieved at this time. For an immature url, predicting future popular tags could provide better retrieval experience. Mature url : Borrowed from [Halpin] ‘s empirical results for tag dynamics. They are defined as urls with 3+ more years of history on del.icio.us

20 Expanding tag set Ti{ } : The tag set applied by the ith user for an url. ETi {}:The expanded tag set after the ith user. T0{ } : The tag set suggested by tf-idf term extraction. STi=T0 ETi=ET i-1 ∪ relevant n (T i ) relevant n (Ti)=The n tags with top mutual information to each tag in Ti Mutual information: f(t i,t j )/f(t i )*f(t j )

21 Cohesivity Each tag in ETi has a score which indicates its cohesivity to ET i cohesivity of tj to ET i Σ f(t k,t j )/f(t j )*f(t k ) t k belongs to ETi

22 Pruning ET i 1. Sort tags in ET i by popularity, take top 7 as suggesting tag set ST i 2. Sort tags in ET i by popularity*cohesivity, take top 7 as suggesting tag set ST i

23 Experiment Plan Dataset from del.icio.us rss api Mar 28~April 19, 30000 of url, 234982 of tagging, 8392 of users 1.del.icio.us/rss/popular every 30min del.icio.us/rss/recent every 2 min 2.del.icio.us/rss/url?url= xxx.com Suggesting tags from no user to the 10th user.

24 Evaluation Plan For each url, we have mature tags and suggested tags at each iteration. Recall rate and precision rate could be calculated. withwithout with4.2. without3.1.Baseline Expanding with relevant tags Pruning with cohesivity


Download ppt "Thesis Proposal: Prediction of popular social annotations Abon."

Similar presentations


Ads by Google