An Empirical Study of Instance-Based Ontology Mapping Antoine Isaac, Lourens van der Meij, Stefan Schlobach, Shenghui Wang funded by NWO Vrije.

Slides:



Advertisements
Similar presentations
Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop November 2007.
Advertisements

OAEI 2007: Library Track Results Antoine Isaac, Lourens van der Meij, Shenghui Wang, Henk Matthezing Claus Zinn, Stefan Schlobach, Frank van Harmelen Ontology.
Accessing Cultural Heritage Collections using Semantic Web Techniques Antoine ISAAC STITCH Project SIKS Semantic Web Seminar, Utrecht April 11 th, 2007.
STITCH final event KB July Agenda Brief presentation of STITCH main achievements Demo: annotation suggestion at KB The future use of STITCH results.
Traditional IR models Jian-Yun Nie.
Interoperability Aspects in Europeana Antoine Isaac Workshop on Research Metadata in Context 7./8. September 2010, Nijmegen.
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
Culture and psychological knowledge: A Recap
Beginning the Research Design
A web-based repository service for vocabularies and alignments in the Cultural Heritage domain Lourens van der Meij Antoine Isaac Claus Zinn.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Using quantitative aspects of alignment generation for argumentation on mappings Antoine Isaac, Cassia Trojahn, Shenghui Wang, Paulo Quaresma Vrije Universteit.
Notes on ThoughtLab / Athena WP4 November 13, 2009 Antoine Isaac
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
Aligning Thesauri for an integrated Access to Cultural Heritage Collections Antoine ISAAC (including slides by Frank van Harmelen) STITCH Project UDC Conference.
The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context Antoine Isaac, Claus Zinn, Henk Matthezing, Lourens van der Meij, Stefan.
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11 th, 2007.
Vocabulary Matching for Book Indexing Suggestion in Linked Libraries – A Prototype Implementation & Evaluation Antoine Isaac, Dirk Kramer, Lourens van.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Putting ontology alignment in context: Usage scenarios, deployment and evaluation in a library case Antoine Isaac Henk Matthezing Lourens van der Meij.
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
SemanTic Interoperability To access Cultural Heritage Frank van Harmelen Henk Matthezing Peter Wittenburg Marjolein van Gendt Antoine Isaac Lourens van.
Measuring Social Life Ch. 5, pp
Accessing Cultural Heritage using Semantic Web Techniques Antoine ISAAC VU Amsterdam - KB Digital Access to Cultural Heritage Master March 20 th, 2008.
Query Relevance Feedback and Ontologies How to Make Queries Better.
 Copyright 2006 Digital Enterprise Research Institute. All rights reserved. Collaborative Building of Controlled Vocabularies Crosswalks Mateusz.
Technical Adequacy Session One Part Three.
BACKGROUND KNOWLEDGE IN ONTOLOGY MATCHING Pavel Shvaiko joint work with Fausto Giunchiglia and Mikalai Yatskevich INFINT 2007 Bertinoro Workshop on Information.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
JASS 2005 Next-Generation User-Centered Information Management Information visualization Alexander S. Babaev Faculty of Applied Mathematics.
Estimating Importance Features for Fact Mining (With a Case Study in Biography Mining) Sisay Fissaha Adafre School of Computing Dublin City University.
A Compositional Context Sensitive Multi-document Summarizer: Exploring the Factors That Influence Summarization Ani Nenkova, Stanford University Lucy Vanderwende,
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
12th of October, 2006KEG seminar1 Combining Ontology Mapping Methods Using Bayesian Networks Ontology Alignment Evaluation Initiative 'Conference'
Europeana and semantic alignment of vocabularies Antoine Isaac Jacco van Ossenbruggen, Victor de Boer, Jan Wielemaker, Guus Schreiber Europeana & Vrije.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
D4: SKOS and HIVE—Enhancing the Creation, Design and Flow of Information Speakers: Hollie White Jane Greenberg Coordinator: Alan Keely.
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
SKOS Tutorial Catch Mark van Assem, Antoine Isaac Vrije Universiteit Amsterdam Based on slides by Alistair Miles CCLRC Rutherford Appleton Laboratory
ISKO 2010 TERMINOLOGY AS ORGANIZED KNOWLEDGE Boyan Alexiev Nancy Marksbury.
Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference.
WEB SEARCH PERSONALIZATION WITH ONTOLOGICAL USER PROFILES Data Mining Lab XUAN MAN.
Controlled Vocabulary & Thesaurus Design Hierarchies & Taxonomies.
Predicting Question Quality Bruce Croft and Stephen Cronen-Townsend University of Massachusetts Amherst.
1 Controlled Vocabularies Paul Miller Interoperability Focus UKOLN U KOLN is funded by Resource: the Council.
Very Large Cross-lingual Resources at OAEI 2008 Laura Hollink Véronique Malaisé Vrije Universiteit Amsterdam.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.
Applying impact evaluation tools A hypothetical fertilizer project.
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
Instance-based mapping between thesauri and folksonomies Christian Wartena Rogier Brussee Telematica Instituut.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 3. Word Association.
Shridhar Bhalerao CMSC 601 Finding Implicit Relations in the Semantic Web.
August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.
KB subject prediction tool. STITCH final event KB subject prediction prototype Introduction Subject prediction is a special case of book reindexing What.
Controlled Vocabulary & Thesaurus Design Associative Relationships & Thesauri.
Distinguishing humans from robots in web search logs preliminary results using query rates and intervals Omer Duskin Dror G. Feitelson School of Computer.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Charlyn P. Salcedo Instructor Types of Indexing Languages.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Measurement Chapter 6. Measuring Variables Measurement Classifying units of analysis by categories to represent variable concepts.
Slide 6 HMD1SPI376 - Slide 6. What is the Relationship Between BT and NT?  Normally, BT and NT are "inverse" links. In other words, if X is a broader.
Erasmus University Rotterdam
Measuring Social Life: How Many? How Much? What Type?
Introduction to Survey Design
An Empirical Study of Property Collocation on Large Scale of Knowledge Base 龚赛赛
Presentation transcript:

An Empirical Study of Instance-Based Ontology Mapping Antoine Isaac, Lourens van der Meij, Stefan Schlobach, Shenghui Wang funded by NWO Vrije Universiteit Amsterdam Koninklijke Bibliotheek Den Haag Max Planck Instutute Nijmegen

ISWC 2007 Metamotivation Ontology mapping in practise Based on real problems in the host institution at the Dutch Royal Library Task-driven Annotation support Merging of thesauri Real thesauri (100 years of tradition) Really messy Conceptually difficult Inexpressive Generic Solutions to Specific Questions & Tasks Using Semantic Web Standards (SKOSification)

ISWC 2007 Overview Use-case Instance-based mapping Evaluation Experiments Results Conclusions

ISWC 2007 The Alignment Task: Context National Library of the Netherlands (KB) 2 main collections Legal Deposit: all Dutch printed books Scientific Collections: history, language… Each described (indexed) by its own thesaurus

ISWC 2007 A need for thesaurus mapping The KB wants (Scenario 1) Possibly discontinue one of both annotation and retrieval methods. (Scenario 2) Possibly merge the thesauri We try to explore mapping (Task 1) In case of single/new/merged retrieval system, find books annotated with old system, facilitated by using mappings (Task 2) Candidate terms for merged thesaurus We make use of the doubly annotated corpus to calculate Instance-Based mappings

ISWC 2007 Overview Use-case Instance-based mapping Evaluation Experiments Results Conclusions

ISWC 2007 Calculating mappings using Concept Extensions how much are they related?

ISWC 2007 Standard approach (Jaccard) Use co-occurrence measure to calculate similarity between 2 concepts: e.g. B G Elements of B Elements of G Joint Elements Similarity = 5/9 = 55 % (overlap, e.g. Degree of Greenness )Similarity = 1/7 = 14 % (overlap, e.g. Degree of Greenness ) Set of books in the library

ISWC 2007 Issues with this measure (sparse data) What is more reliable? We need more reliable measures Or thresholds (at least n doubly annotated books) Or? Jacc = 18/21 = 86 % Jacc = 1/1 = 100 % The second solution is worse: b B = {MemberOfParliament} and b G = {Cricket}

ISWC 2007 Issue with measure (hierarchy): B G Non hierarchical Set of books in the library · Hierarchical Elements B’ Jacc(B’,G) = ½ = 50% Jacc(B’,G) = 2/6 = 33% Consider a hierarchy

ISWC 2007 An empirical study of instance-based OM We experimented with three dimensions Similarity measure Threshold Hierarchy Jaccard Corrected Jaccard Pointwise Mutual Information Log Likelihood Ratio Information Gain 0 10 Yes No Why only 2 thresholds? Because of evaluation costs!

ISWC 2007 Overview Use-case Instance-based mapping Evaluation Experiments Results Conclusions

ISWC 2007 Evaluation: building a gold standard GTT Brinkman Possible Thesaurus relations (~ SKOS)

ISWC 2007 User Evaluation Statistics 3 evaluators with 1500 evaluations 90% agreement ONLYEQ If some evaluator says "equivalent", 73% of other evaluators say the same Comparing two evaluators, correspondence in assignment is best for equivalence, followed by "No Link", "Narrower than", "Broader than", at or above 50% agreement, "Related To" has 35% agreement. There are correlations between evaluators. For example, Ev1 and Ev2 agreed much more on saying that there is no link than the Ev3.

ISWC 2007 Evaluation Interpretation: What is a good mapping? Is use case specific. We considered: ONLYEQ: Only Equivalent answer → correct NOTREL: EQ, BT,NT → correct ALL: EQ, BT, NT, RT → correct ONLYEQ  NOTREL  ALL The question is obviously: do they produce the same results

ISWC 2007 Evaluation: validity of the (different) methods Answer is: yes All evaluations produce the same results (in different scales)

ISWC 2007 A remark about Evaluation Use of mappings strongly task dependant Scenario 1 (legacy data/annotation support) and Scenario 2 (thesaurus merging) require different mappings. Our evaluation is useful (correct) for Scenario 2 (intensional) Scenario 1 can be evaluated differently (e.g. cross- validation on test-data) See our paper at the Cultural Heritage Workshop.

ISWC 2007 Overview Use-case Instance-based mapping Evaluation Experiments Results Conclusions

ISWC 2007 Experiments: Setup, Data and Thesauri We calculated 5 different similarity measures with Threshold: 0 and 10 Hierarchy: yes or no. Based on on GTT concepts with Brinkman concepts based on books with double annotations

ISWC 2007 Experiments: Result calculation Average precision at similarity position i: P i = N good,i /N i (take the first i mappings, and return the percentage of correct ones) Example: This means that from the first 798 mappings 86% were correct Recall is estimated based on lexical mappings F-measure is calculated as usual 100% 798 th mapping 86 %

ISWC 2007 Overview Use-case Instance-based mapping Evaluation Experiments Results Conclusions

ISWC 2007 Results: Three research questions 1.What is the influence of the choice of threshold? 2.What is the influence of hierarchical information? 3.What is the best measure and setting for instance-based mapping?

ISWC 2007 What is the influence of the choice of threshold? Threshold needed for Jaccard Threshold NOT needed for LLR

ISWC 2007 What is the influence of hierarchical information? Results are inconclusive!

ISWC 2007 Best measure and setting for instance-based mapping? 10 We have two winners! The corrected Jaccard measures

ISWC 2007 Conclusion Summary About 80% precision at estimated 80% recall Simple measures perform better, if statistical correction applied, (threshold or explicit statistical correction) Hierarchical aspects unresolved Some measures really unsuited Future work: Generalize results Other use cases, web directories, … Study other measures

ISWC 2007 Thank you.

ISWC 2007 Similarity measures Formulae Jaccard: Corrected Jaccard: assign a smaller score to less frequently co-occurring annotations.

ISWC 2007 Information Theoretic Measures Pointwise Mutual Information: Measures the reduction of uncertainty that the annotation of one concept yields for the annotation with another concept. -> disadvantage: inadequate for spare data LogLikelihoodRatio: Information Gain: Information gain is the difference in entropy, determine the attribute that distinguishes best between positive an negative example