Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Mining a Web 2.0 service for the discovery of semantically similar terms: A case study with Del.icio.us Kwan Yi School of Library and Information Science.

Similar presentations


Presentation on theme: "1 Mining a Web 2.0 service for the discovery of semantically similar terms: A case study with Del.icio.us Kwan Yi School of Library and Information Science."— Presentation transcript:

1 1 Mining a Web 2.0 service for the discovery of semantically similar terms: A case study with Del.icio.us Kwan Yi School of Library and Information Science College of Communications and Information Studies University of Kentucky

2 Social bookmarking: Del.icio.us Del.icio.us is one of most popular social bookmarking systems: – 3 million registered users and – 100 million unique URLs bookmarked, as of September 2007

3 Folksonomy We define folksonomy as a collective set of tags (keywords or terms) assigned by participants in a social tagging system. – User-created vocabulary – Uncontrolled vocabulary – Built in a collaborative manner

4 Example: A folksonomy in Delicious.com Resource title Resource taggers Resource URL Tagging history Popular tags

5 Objective of the Study To examine an effective way of mining semantically similar terms from folksonomy for the purpose of investigating the feasibility of folksonomy as a potential data source of semantically similar terms

6 Proposed algorithms for mining similar terms from Folksonomy Co-occurrence-based similarity algorithm Correlation-based similarity algorithm

7 Experiment (I) To identify similar terms of each of the 121 most popular tags on Del.icio.us posted on the fifteenth of May 2008

8

9 Result: How many similar terms for the 121 popular tags? Co-occurrence-based algorithm – 2.6 similar terms (Level of similarity = 0.9) – 5.1 similar terms (Level of similarity = 0.7) – 10.1 similar terms (Level of similarity = 0.5) Correlation-based algorithm – 0.9 similar terms (Level of similarity = 0.9) – 1.6 similar terms (Level of similarity = 0.7) – 2.6 similar terms (Level of similarity = 0.5)

10 Experiment (II) To identify similar terms of each of the 32 tags (out of the 121) that are not listed on the online version of Merriam-Webster Dictionary

11 Result: How many similar terms for the 32 not-in-the-dictionary tags? Co-occurrence-based algorithm – 3.3 similar terms (Level of similarity = 0.9) – 5.9 similar terms (Level of similarity = 0.7) – 10.1 similar terms (Level of similarity = 0.5) Correlation-based algorithm – 1 similar terms (Level of similarity = 0.9) – 1.7 similar terms (Level of similarity = 0.7) – 2.4 similar terms (Level of similarity = 0.5)

12 Webdesign (similarity level: 0.9) Co-occurrence [12]: resources css web design reference html tutorial tutorials inspiration gallery development webdev Correlation [4]: css design html inspiration

13 Findings The correlation-based is more selective than the co-occurrence- based. The co-occurrence-based appears to be most attractive with the similarity level of 0.7.

14 Conclusion As social bookmarking systems are more popularly utilized, the potential of their folksonomies for the mining task will be more increased.

15 Thanks!

16

17

18 Co-occurrence-based similarity algorithm (Identifying similar terms of the term W) W (100) A (50) B (20) C (10) W (87) B (57) C (40) A (30) W (1032) A (250) F (120) D (78) W (37) A (29) B (16) F (9) A (4) B (3) C (2) F (2) D (1) 1 1 2 2 CoSA(s=1: A W) CoSA(s=0.75: B W) CoSA(s=0.5: C W) CoSA(s=0.5: F W) 3 3 CoSA(s=0.25: D W)

19 Correlation-based similarity algorithm Term X is said to be similar to term W on the basis of the correlation-based algorithm: CrSA(s: X W) CrSA(s: X W) can be defined only if both CoSA(s: X W) and CoSA(s: W X) are satisfied.


Download ppt "1 Mining a Web 2.0 service for the discovery of semantically similar terms: A case study with Del.icio.us Kwan Yi School of Library and Information Science."

Similar presentations


Ads by Google