Very Large Cross-lingual Resources at OAEI 2008 Laura Hollink Véronique Malaisé Vrije Universiteit Amsterdam.

Slides:



Advertisements
Similar presentations
Semi-automatic compound nouns annotation for data integration systems Tuesday, 23 June 2009 SEBD 2009 Sonia Bergamaschi Serena Sorrentino
Advertisements

Multilingual Access to Online Content - the Europeana Experience Vivien Petras (Humboldt-Universität zu Berlin) With the help of.
Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop November 2007.
OAEI 2007: Library Track Results Antoine Isaac, Lourens van der Meij, Shenghui Wang, Henk Matthezing Claus Zinn, Stefan Schlobach, Frank van Harmelen Ontology.
A Robust Approach to Aligning Heterogeneous Lexical Resources Mohammad Taher Pilehvar Roberto Navigli MultiJEDI ERC
1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth.
Project 2 Ontology alignment. SIGNAL-ONTOLOGY (SigO) Immune Response i- Allergic Response i- Antigen Processing and Presentation i- B Cell Activation.
Maurice Hermans.  Ontologies  Ontology Mapping  Research Question  String Similarities  Winkler Extension  Proposed Extension  Evaluation  Results.
Leveraging Community-built Knowledge For Type Coercion In Question Answering Aditya Kalyanpur, J William Murdock, James Fan and Chris Welty Mehdi AllahyariSpring.
Matching Systems ● SAMBO ● Falcon ● DSSim ● RiMOM ● ASMOV ● Anchor-Flood ● AgreementMaker.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
First Insights into the Library Track of the OAEI Dominique Ritze Mannheim University Library.
Semantic Web Opportunities for Digital Libraries ELAG 2008 Laura Hollink, Antoine Isaac, Véronique Malaisé, Guus Schreiber Vrije Universiteit Amsterdam.
Klaus M. Frei1 WordNet „An On-line Lexical Database“ (Miller, G. A.; Beckwith, R.; Fellbaum, Chr.; Gross, D.; Miller, K. 1993, title). Based on psycho-linguistic.
Using quantitative aspects of alignment generation for argumentation on mappings Antoine Isaac, Cassia Trojahn, Shenghui Wang, Paulo Quaresma Vrije Universteit.
PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment Natalya Fridman Noy and Mark A. Musen.
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context Antoine Isaac, Claus Zinn, Henk Matthezing, Lourens van der Meij, Stefan.
An Empirical Study of Instance-Based Ontology Mapping Antoine Isaac, Lourens van der Meij, Stefan Schlobach, Shenghui Wang funded by NWO Vrije.
Vocabulary Matching for Book Indexing Suggestion in Linked Libraries – A Prototype Implementation & Evaluation Antoine Isaac, Dirk Kramer, Lourens van.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Putting ontology alignment in context: Usage scenarios, deployment and evaluation in a library case Antoine Isaac Henk Matthezing Lourens van der Meij.
SemanTic Interoperability To access Cultural Heritage Frank van Harmelen Henk Matthezing Peter Wittenburg Marjolein van Gendt Antoine Isaac Lourens van.
IBM User Technology March 2004 | Dynamic Navigation in DITA © 2004 IBM Corporation Dynamic Navigation in DITA Erik Hennum and Robert Anderson.
A Statistical and Schema Independent Approach to Identify Equivalent Properties on Linked Data † Kno.e.sis Center Wright State University Dayton OH, USA.
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Thesaurusmanagement Quickstart Introduction. What are controlled vocabularies? organized arrangement of words and phrases used to index content and/or.
Erasmus University Rotterdam Introduction With the vast amount of information available on the Web, there is an increasing need to structure Web data in.
Classifying Tags Using Open Content Resources Simon Overell, Borkur Sigurbjornsson & Roelof van Zwol WSDM ‘09.
 Copyright 2006 Digital Enterprise Research Institute. All rights reserved. Collaborative Building of Controlled Vocabularies Crosswalks Mateusz.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.
Machine Learning Approach for Ontology Mapping using Multiple Concept Similarity Measures IEEE/ACIS International Conference on Computer and Information.
INF 384 C, Spring 2009 Ontologies Knowledge representation to support computer reasoning.
Multilingual Information Exchange APAN, Bangkok 27 January 2005
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
Related terms search based on WordNet / Wiktionary and its application in ontology matching RCDL'2009 St. Petersburg Institute for Informatics and Automation.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
LOD for the Rest of Us Tim Finin, Anupam Joshi, Varish Mulwad and Lushan Han University of Maryland, Baltimore County 15 March 2012
Semi-Automatic Quality Assessment of Linked Data without Requiring Ontology Saemi Jang, Megawati, Jiyeon Choi, and Mun Yong Yi KIRD, KAIST NLP&DBPEDIA.
Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar
Shridhar Bhalerao CMSC 601 Finding Implicit Relations in the Semantic Web.
Similarity Measures for Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I Fachrichtung 6.2.
Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.
Trait ontology approach Marie-Angélique LAPORTE NCEAS June 7 th 2010.
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
Objectives and scope of semantic enrichment and tools Europeana v1.0 work package 3 meeting Berlin, 25/26 January 2010 Stefan Gradmann / Marlies Olensky.
1 Berendt: Advanced databases, first semester 2008, 1 Advanced databases – Defining and combining.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Constructing A Yami Language Lexicon Database from Yami Archiving Projects Meng-Chien Yang(Providence University, Taiwan) D. Victoria Rau(National Chung.
Mapping the NCI Thesaurus and the Collaborative Inter-Lingual Index Amanda Hicks University of Florida HealthInsight Workshop, Oslo, Norway.
MICHAEL Culture Association WP4 Integration of existing data structure into Europeana ATHENA, WP4 Working group technical meeting Konstanz, 7th of May.
Linked Open Data Approaches within the ARIADNE project
Automatically Extending NE coverage of Arabic WordNet using Wikipedia
Cross-language Information Retrieval
Extracting Semantic Concept Relations
WordNet WordNet, WSD.
CS 620 Class Presentation Using WordNet to Improve User Modelling in a Web Document Recommender System Using WordNet to Improve User Modelling in a Web.
[jws13] Evaluation of instance matching tools: The experience of OAEI
DBpedia 2014 Liang Zheng 9.22.
deepschema.org: An Ontology for Typing Entities in the Web of Data
Actively Learning Ontology Matching via User Interaction
Chaitali Gupta, Madhusudhan Govindaraju
Presentation transcript:

Very Large Cross-lingual Resources at OAEI 2008 Laura Hollink Véronique Malaisé Vrije Universiteit Amsterdam

Task description Mappings between three resources: – GTAA Thesaurus of the Dutch Institute for Sound and Vision. 4 Facets: Subject, Location, People, Name – WordNet Lexical database – DBPedia ‘structured version of Wikipedia’ 3 sets of mappings exactMatch, broadMatch, narrowMatch.

Rationale Different languages – Archive with GTAA metadata in Dutch only – Broaden user group – Integrated access to archives of other countries? Large resources – Disambiguation becomes serious problem Heterogeneous resources – Different structure – Weak or inconsistent structure – Large parts have no counterparts, when to stop mapping?

Cross-Lingual GTAA in Dutch: – Preferred labels – Alternative labels – Scope notes WordNet in English – Word-senses – Glosses DBPedia in English and (most of the time) Dutch – Titles – Abstracts

Different Schema’s GTAA in SKOS – Skos:concepts with pref- and altLabels – Narrower/Broader relations between the concepts WordNet – Synsets with word-senses – Hyponym relations between synsets. DBPedia – Things with titles and abstracts – links to dbpedia categories, rdf:type links to yago classes – Hierarchical structure of yago classes and categories.

Results One Participant: DSSIM Vocabulary#Concepts#Mappings to In VocabularyWordNetDBPediaGTAA WordNet82,000NA28,9742,405 DBPedia2,700,00028,974NA13,156 GTAA160,0002,40513,156NA Subject3, ,363NA People97,000822,238NA Name27, ,989NA Location14, ,566NA

Evaluation Process Precision – GTAA-WordNet & GTAA-DBPedia Inspection of 100 mappings per GTAA facet. – WordNet-DBPedia Inspection of 100 mappings – Correct or Incorrect or Narrow/Broader/Related Recall – GTAA-WordNet & GTAA-DBPedia Comparison to a reference alignment of 100 GTAA People and 100 GTAA Subjects

Pre-existing WN-DBPedia Mappings Type Links – Air_New_Zealand wordnet-type synset-airline-noun-2 – No overlap between DSSIM mapping and wordnet-type links. – DSSIM mappings of things with a wordnet-link performed less than DSSIM mappings of things without a wordnet-link. Yago – most DBPedia "things" are instances in the YAGO ontology. – Dbpedia categories are classes in the YAGO ontology, subclasses of wordnet synsets. – “Crazy”_Joe_Davola rdf:type FictionalCharacter Overlap between DSSIM results and Yago? We are looking (mainly) for exactMatches, not type links.

Conclusions Also other types of links than exactMatch are necessary: – Subject:pausbezoeken -> List_of_pastoral_visits_of_Pope_John_Paul_II_out side_Italy. – Location:Venezuela -> synset-Venezuelan-noun-1 – Subject:Verdedigingswerken -> fortification

Conclusions Context would help a lot. GTAA facet information – Person:GoghVincentvan -> synset-vacationing-noun-1 – Location:Harlem -> synset-hammer-noun-8 – Location:Melbourne -> synset-Melbourne-noun-1 Titles of DBPedia ‘referring pages’ used as alternative labels. Person:SummerGordon -> Sting_(musician) But: no longer a generally applicable tool.

Reflection on the evaluation process Disambiguation of DBPedia/WordNet concepts very hard, also for evaluator. – Subject:leguanen -> Iguana or Iguanidae? – Multiple mappings are reasonable. When is a mapping ‘Related’? DBPedia disambiguation pages. GTAA contains may Dutch-specific concepts – Diogenes = TV program Take into account the confidence measures

Input from you What do OAEI participants think of this task? How can we improve it?

Why Not? Too much preprocessing required? The three resources have different schema’s? Tool was not built for resources that large? Tool was not built for multi-lingual matching?

Thank you!