Presentation is loading. Please wait.

Presentation is loading. Please wait.

June 19-21, 2006WMS'06, Chania, Crete1 Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies.

Similar presentations


Presentation on theme: "June 19-21, 2006WMS'06, Chania, Crete1 Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies."— Presentation transcript:

1 June 19-21, 2006WMS'06, Chania, Crete1 Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies Euripides G.M. Petrakis Giannis Varelas Angelos Hliaoutakis Paraskevi Raftopoulou

2 June 19-21, 2006WMS'06, Chania, Crete2 Semantic Similarity  Relates to computing the conceptual similarity between terms which are not necessarily lexicacally similar “car”-“automobile”-“vehicle”, “drug”- “medicine”  Tool for making knowledge commonly understandable in applications such as IR, information communication in general

3 June 19-21, 2006WMS'06, Chania, Crete3 Methodology  Terms from different communicating sources are represented by ontologies  Map two terms to an ontology and compute their relationship in that ontology  Terms from different ontologies: Discover linguistic relationships or affinities between terms in different ontologies

4 June 19-21, 2006WMS'06, Chania, Crete4 Contributions  We investigate several Semantic Similarity Methods and we evaluate their performance http://www.intelligence.tuc.gr/similarity  We propose a novel semantic similarity measure for comparing concepts from different ontologies

5 June 19-21, 2006WMS'06, Chania, Crete5 Ontologies  Tools of information representation on a subject  Hierarchical categorization of terms from general to most specific terms object  artifact  construction  stadium  Domain Ontologies representing knowledge of a domain e.g., MeSH medical ontology  General Ontologies representing common sense knowledge about the world e.g., WordNet

6 June 19-21, 2006WMS'06, Chania, Crete6 WordNet  A vocabulary and a thesaurus offering a hierarchical categorization of natural language terms More than 100,000 terms  Nouns, verbs, adjectives and adverbs are grouped into synonym sets (synsets)  Synsets represent terms or concepts with similar meaning stadium, bowl, arena, sports stadium – (a large structure for open-air sports or entertainments)

7 June 19-21, 2006WMS'06, Chania, Crete7 WordNet Hierarchies  The synsets are also organized into senses Senses: Different meanings of the same term  The synsets are related to other synsets higher or lower in the hierarchy by different types of relationships e.g. Hyponym/Hypernym (Is-A relationships) Meronym/Holonym (Part-Of relationships)  Nine noun and several verb Is-A hierarchies

8 June 19-21, 2006WMS'06, Chania, Crete8 A Fragment of the WordNet Is-A Hierarchy

9 June 19-21, 2006WMS'06, Chania, Crete9 MeSH  MeSH: ontology for medical and biological terms by the N.L.M.  Organized in IS-A hierarchies More than 15 taxonomies, more than 22,000 terms  No part-of relationships  The terms are organized into synsets called “entry terms’’

10 June 19-21, 2006WMS'06, Chania, Crete10 A Fragment of the MeSH Is-A Hierarchy

11 June 19-21, 2006WMS'06, Chania, Crete11 Semantic Similarity Methods  Map terms to an ontology and compute their relationship in that ontology  Four main categories of methods: Edge counting: path length between terms Information content: as a function of their probability of occurrence in a corpus Feature based: similarity between their properties (e.g., definitions) or based on their relationships to other similar terms Hybrid: combine the above ideas

12 June 19-21, 2006WMS'06, Chania, Crete12 Example  Edge counting distance between “conveyance” and “ceramic” is 2  An information content method, would associate the two terms with their common subsumer and with their probabilities of occurrence in a corpus

13 June 19-21, 2006WMS'06, Chania, Crete13 X-Similarity  Relies on matching between synsets and set description sets  A,B: synsets or term description sets  Do the same with all IS-A, Part-Of relationships and take their maximum

14 June 19-21, 2006WMS'06, Chania, Crete14 WordNet term: “Hypothyroidism”MeSH term: “Hyperthyroidism” hypothyroidism An underactive thyroid gland; a glandular disorder Resulting from insufficient production of thyroid hormones. Hypothyroidism glandular disease, disorder, condition, state myxedema, cretinism hyperthyroidism Hypersecretion of Thyroid Hormones from Thyroid Gland. Elevated levels of thyroid hormones increase Basal Metabolic Rate. Hyperthyroidism disease, thyroid, Endocrine System Diseases, diseases thyrotoxicosis, thyrotoxicoses Example  S (Hypothyroidism, Hyperthyroidism) = 0.387

15 June 19-21, 2006WMS'06, Chania, Crete15 Evaluation  The most popular methods are evaluated  All methods applied on a set of 38 term pairs  Their similarity values are correlated with scores obtained by humans  The higher the correlation of a method the better the method is

16 June 19-21, 2006WMS'06, Chania, Crete16 Evaluation on WordNet MethodTypeCorrelation Rada 1989Edge Counting0.59 Wu 1994Edge Counting0.74 Li 2003Edge Counting0.82 Leackok 1998Edge Counting0.82 Richardson 1994Edge Counting0.63 Resnik 1999Info. Content0.79 Lin 1993Info. Content0.82 Lord 2003Info. Content0.79 Jiang 1998Info. Content0.83 Tversky 1977Feature Based0.73 X-SimilarityFeature Based0.74 Rodriguez 2003Hybrid0.71

17 June 19-21, 2006WMS'06, Chania, Crete17 Evaluation on MeSH MethodTypeCorrelation Rada 1989Edge Counting0.50 Wu 1994Edge Counting0.67 Li 2003Edge Counting0.70 Leackok 1998Edge Counting0.74 Richardson 1994Edge Counting0.64 Resnik 1999Info. Content0.71 Lin 1993Info. Content0.72 Lord 2003Info. Content0.70 Jiang 1998Info. Content0.71 Tversky 1977Feature Based0.67 X-SimilarityFeature Based0.71 Rodriguez 2003Hybrid0.71

18 June 19-21, 2006WMS'06, Chania, Crete18 Cross Ontology Measures  We used 40 MeSH terms pairs  One of the terms is a also a WordNet term  We measured correlation with scores obtained by experts MethodTypeCorrelation X-SimilarityFeature-Based0.70 RodriguezHybrid0.55

19 June 19-21, 2006WMS'06, Chania, Crete19 Comments  Edge counting/Info. Content methods work by exploiting structure information  Good methods take the position of the terms into account Higher similarity for terms which are close together but lower in the hierarchy e.g., [Li et.al. 2003]  X – Similarity performs at least as good as other Feature-Based methods  Outperforms other Cross-Ontology methods

20 June 19-21, 2006WMS'06, Chania, Crete20 Conclusions  Semantic similarity methods approximated the human notion of similarity reaching correlation up to 83%  Cross ontology similarity is a difficult problem that required further investigation  Work towards integrating Sem. Sim within IntelliSearch information Retrieval System for Web documents http://www.intelligence.tuc.gr/intellisearch

21 June 19-21, 2006WMS'06, Chania, Crete21 Try our system on the Web http://www.intelligence.tuc.gr/similarity Implementation: Giannis Varelas Spyros Argyropoulos

22 June 19-21, 2006WMS'06, Chania, Crete22 www.intelligence.tuc.gr/similarity


Download ppt "June 19-21, 2006WMS'06, Chania, Crete1 Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies."

Similar presentations


Ads by Google