Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.

Similar presentations


Presentation on theme: "An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009."— Presentation transcript:

1 An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009

2 Contents  Introduction  Related Research  System Architecture  Experimental Evaluation  Conclusion and Future Work 2

3 Introduction (1/3) 3

4 Introduction (2/3)  Problem with folksonomies –Tags can be idiosyncratic –Not understood by many users –Concept and internal structure are not explicit to the machine  Various solutions have been proposed –Refine the query result  Clustering, tag cloud –Takes an existing upper ontology as the base structure  WordNet  An integrated approach –Knowledge extracted from folksonomies + relevant terms from an existing upper ontology 4

5 Introduction (3/3)  Ontological structure extracted from folksonomies can be useful in many areas in CTS –Providing multi-dimensional views –Cataloguing and indexing –Query translation and tagging suggestion  Can enhance the precision and recall –by matching the query keywords and the potential results at the level of semantics 5

6 Related Research  Cosine similarity between tags –Measure the distance from one tag to another –Organize them into a hierarchical tree  Association rule mining has been adopted to analyze and structure folksonomies –Output of association rule mining on a folksonomy dataset  Association rule like A → B  To discover the relationships within tags in clusters, several existing ontology resources can be used as reference –E.g. WordNet  An et al., “Automatic Generation of Ontology from the Deep Web”  Laniado et al., “Using WordNet to turn a folksonomy into a hierarchy of concepts” 6

7 System Architecture (1/7)  Vocabularies used in folksonomy –Standard tags: genomics –Compound tags: evolutionary-genomics –Jargon tags: scientometrics, CSCW –Other nonsense tags: misspelling tags 7

8 System Architecture (2/7) - Low Support Association rule mining  Aim of association rule mining in CTS –Generate associations in the form t a → t c between tags t a and t c that have support and confidence above certain thresholds  Traditional association rule mining –Set a relatively high support and confidence threshold –This is likely to miss important associations among tags  Tags in folksonomies usually follow a Zipf distribution  Majority of the tags do not occur very frequently in the dataset  Low Support Association rule mining –Very low support threshold –Lower support may bring lots of noise in the rule set  Cosine similarity to filter out possible noise 8

9 System Architecture (3/7) - Low Support Association rule mining  LApriori algorithm (a simplified version of Apriori algorithm) –Only calculate the relationship between tag pairs * Apriori algorithm –Finding frequent itemsets using candidate generation –Find L k-1, the set of frequent (k-1)-itemsets and L k-1 is used to find L k 9

10 System Architecture (4/7) - Standard Tags  Use WordNet as the upper ontology –Compute each semantic relation between tags in terms of hypernym relation from WordNet –Possible semantic relation  more general( ⊇ ), less general( ⊆ ), equivalence(=)  In folksonomies, another definitions –essential tags: all distinct tags existing in association rules filtered by thresholds –candidate hypernyms: hypernyms that exist in its related tags 10

11 System Architecture (5/7) - Standard Tags  Folk2Onto algorithm 11 {food, beverage, wine, milk} For tag “wine”, ① U k = {} Candidate hypernym = {food} Then U k = {food} ② U k = {beverage} Candidate hypernym = {food} Then U k = {beverage} – break! ③ U k = {food} Candidate hypernym = {beverage} Then U k = {beverage} food beverage wine milk ① ② ③

12 System Architecture (6/7) - Compound Tags  Compound Tags are non-standard terms –Cannot be processed by WordNet without transformation  Jawbone (by Mike Wallace) –If they match certain defined criteria, the compound tags will be reserved and represented by its base term for more general parent finding –EndWithFilter  The last one is used to represent the whole compound  collaborative_tagging → tagging –StartsWithFilter  The first token is used to represent the whole word  Apply after the EndWithFilter 12

13 System Architecture (7/7) - Jargon Tags  Association rules show their relations with other common tags  Jargon tags are incorporated to the previously built ontological structure with a matcher using graph centrality in a similarity graph of tags –Considers each jargon tag as the central node of a subgraph –If there is more than one standard tag associated with the jargon tag  Tag with the highest cosine similarity index will have the priority  “folksonomy” and “tagging, plurality, social, ontology” –“Folksonomy → tagging” was selected (ranked by cosine similarity) 13

14 Experimental Evaluation (1/6)  Citeulike –Crawling keywords: including “science”, “philosophy”, “research” –30,769 rows of data  Flickr –Crawling keyword: “fruit” –18,555 rows of data  Pre-processing operations were performed to clean up the datasets –For dataset from Flickr, only kept one record for each user –Remove the tags called “no-tag” (a system generated tag for empty tag) –Remove objects with only one tag 14

15 Experimental Evaluation (2/6)  Threshold of parameters –Minimum support: 0.02% –Minimum confidence: 0.8 –Minimum cosine similarity: 0.2  Get 24,025 rules from citeulike at 0.02% minsup, 0.2 cosine similarity, 0.8 confidence thresholds 15

16 Experimental Evaluation (3/6)  Measure how far the extracted ontological structure will help to influence and improve the results of certain tasks –Multi-dimensional view, cataloguing and indexing  Multi-dimensional view –Result retrieved with the “fruit” was organized into several dimensions 16

17 Experimental Evaluation (4/6)  Multi-dimensional view –Comparing structure to an ontology (sei.cmu.edu) 17

18 Experimental Evaluation (5/6)  Multi-dimensional view –Comparing structure to cluster result (flickr.com) 18

19 Experimental Evaluation (6/6)  Cataloguing and Indexing –Evaluated the catalogues manually –Observe that compound and jargon terms have been appropriately incorporated  In total, 1540 terms were incorporated into the ontological structure –35.65%: standard terms –64%: non-standard terms (including 36.17% compound and 28.18% jargon terms) 19

20 Conclusion and Future Work  Mapping terms with WordNet ontology is not enough to find the relationships among them –WordNet does not cover special domain vocabulary and cannot reflect usage change –In CTS, many of the tags are in the form of jargon and compound terms  Applied the association rules to find semantically related tags  Ontological structures could be enriched and deepened using larger tag datasets and more specialized semantic lexical resources  Represent the extracted ontologies in the web using RDF and SPARQL will enable the integration with other web services 20


Download ppt "An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009."

Similar presentations


Ads by Google