Multilingual Ontologies – Standards and Technologies Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 22nd APAN Meeting NUS, Singapore 20 July,

Slides:



Advertisements
Similar presentations
Chapter 7 System Models.
Advertisements

Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004.
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
ISO TC 37 Terminology and Other Language and Content Resources OntoIOp Telecon Sue Ellen Wright October 20,
1 eXtended Metadata Registry (XMDR) Two Slides for Ontology Summit Presentation Bruce Bargmeyer Lawrence Berkeley National Laboratory and University of.
DC2001, Tokyo DCMI Registry : Background and demonstration DC2001 Tokyo October 2001 Rachel Heery, UKOLN, University of Bath Harry Wagner, OCLC
Requirements. UC&R: Phase Compliance model –RIF must define a compliance model that will identify required/optional features Default.
Terminology Markup Framework and TBX-SKOS Interoperability
MDI 2010, Oslo, Norway Behavioural Interoperability to Support Model-Driven Systems Integration Alek Radjenovic, Richard Paige The University of York,
A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
1 CIS224 Software Projects: Software Engineering and Research Methods Lecture 11 Brief introduction to the UML Specification (Based on UML Superstructure.
Direction of Proposals for New Edition (E3) of ISO/IEC 11179
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
ANSI TAG 37 Committee F43 Language Services and Products Interagency Language Roundtable September 30, 2011 Sue Ellen Wright ISO TC 37, Terminology and.
MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information ISO TC37 SC4 WG Samuel Cruz-Lara, Gil Francopoulo, Laurent Romary,
Information and Business Work
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
DDI 3.0 Conceptual Model Chris Nelson. Why Have a Model Non syntactic representation of the business domain Useful for identifying common constructs –Identification,
Interchange using TBX 8 th Metadata conference Berlin April 2005 Alan K. Melby Brigham Young University, Provo campus.
Course Instructor: Aisha Azeem
TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
1/ 27 The Agriculture Ontology Service Initiative APAN Conference 20 July 2006 Singapore.
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
9 th Open Forum on Metadata Registries Harmonization of Terminology, Ontology and Metadata 20th – 22nd March, 2006, Kobe Japan. Commonalities and Differences.
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
9 th Open Forum on Metadata Registries Harmonization of Terminology, Ontology and Metadata 20th – 22nd March, 2006, Kobe Japan. XMDR Prototype Day: 21.
Practical RDF Chapter 1. RDF: An Introduction
Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment.
An Introduction to Software Architecture
9 th Open Forum on Metadata Registries Harmonization of Terminology, Ontology and Metadata 20th – 22nd March, 2006, Kobe Japan. TBX TermBase Exchange Format.
Nancy Lawler U.S. Department of Defense ISO/IEC Part 2: Classification Schemes Metadata Registries — Part 2: Classification Schemes The revision.
Scalable Metadata Definition Frameworks Raymond Plante NCSA/NVO Toward an International Virtual Observatory How do we encourage a smooth evolution of metadata.
Interfacing Registry Systems December 2000.
Košice, 10 February Experience Management based on Text Notes The EMBET System Michal Laclavik.
1 Ontology-based Semantic Annotatoin of Process Template for Reuse Yun Lin, Darijus Strasunskas Depart. Of Computer and Information Science Norwegian Univ.
Development Process and Testing Tools for Content Standards OASIS Symposium: The Meaning of Interoperability May 9, 2006 Simon Frechette, NIST.
The Agricultural Ontology Service (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Library and Documentation Systems.
Coastal Atlas Interoperability - Ontologies (Advanced topics that we did not get to in detail) Luis Bermudez Stephanie Watson Marine Metadata Interoperability.
, 1/21, © Library and Documentation Systems Division 21 st APAN Meeting Tokyo, January 2006 AGROVOC and AOS, Margherita Sini, FAO From.
10/24/09CK The Open Ontology Repository Initiative: Requirements and Research Challenges Ken Baclawski Todd Schneider.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
1 eXtended Metadata Registry (XMDR) Ecoterm Rome, Italy May 17, 2006 Bruce Bargmeyer, Lawrence Berkley National Laboratory University of California Tel:
Xml:tm XML Text Memory Using XML technology to reduce the cost of translating XML documents.
Oreste Signore- Quality/1 Amman, December 2006 Standards for quality of cultural websites Ministerial NEtwoRk for Valorising Activities in digitisation.
Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
TMF - Terminological Markup Framework Laurent Romary Laboratoire LORIA (CNRS, INRIA, Universités de Nancy) ISO meeting London, 14 August 2000.
Working with Ontologies Introduction to DOGMA and related research.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
ISO TC 37/CLARIN SEMANTIC DATA REGISTRY WORKSHOP UTRECHT, DECEMBER ISOcat: Metadata Registry SUE ELLEN WRIGHT DECEMBER 2013.
Dictionary based interchanges for iSURF -An Interoperability Service Utility for Collaborative Supply Chain Planning across Multiple Domains David Webber.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
UNEP Terminology Workshop - Geneva, April 15, Environmental Terminology & Thesaurus Workshop UN Environment Programme Regional Office of Europe.
Manufacturing Systems Integration Division Development Process and Testing Tools for Content Standards Simon Frechette National Institute of Standards.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
ISO TC37/SC4 N435 Nov 12, 2007 Presented by Miran Choi/ETRI Written by Jae Sung Lee/Chungbuk National Univ.
WonderWeb. Ontology Infrastructure for the Semantic Web. IST Project Review Meeting, 11 th March, WP2: Tools Raphael Volz Universität.
Ontology Technology applied to Catalogues Paul Kopp.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Extended Metadata Registries and Semantics (Part 2: Implementation) Karlo Berket Ecoterm IV Environmental Terminology Workshop April 18, 2007 Diplomatic.
Semantic Web. P2 Introduction Information management facilities not keeping pace with the capacity of our information storage. –Information Overload –haphazardly.
The Semantic Web By: Maulik Parikh.
The Re3gistry software and the INSPIRE Registry
Ontology-Based Approaches to Data Integration
LOD reference architecture
RDA in a non-MARC environment
Presentation transcript:

Multilingual Ontologies – Standards and Technologies Gerhard Budin University of Vienna Chair, ISO/TC 37/SC 2 22nd APAN Meeting NUS, Singapore 20 July, 2006

Outline Problem description Methods integration Multi-layer data modeling Multi-standard frameworks Multiple representation languages Interlinking and harmonization of standards and specifications Interoperability frameworks Integration of tools Examples from risk management ontology engineering

Problem Description 1.There is (still) a communication gap between formalized knowledge representations such as ontologies and users of information and communication systems, where such ontologies are used, also on user interfaces. 2.Although the Semantic Web has been designed primarily for machine-to-machine-communication, we need seamless natural language interaction workflows in (semantic) web services of any kind 3.While the Semantic Web is (still) essentially monolingual and the international lingua franca is English, there is a growing need for multilingual ontology resources as well as ontology-based translation services that overcome communication barriers arising from cultural-linguistic differences, lack of excellent command of English, need for high precision in communication, etc.

Need for integration of diverse methods As expressed in standards and implemented in technologies, the following traditions increasingly merge: –Ontology engineering standards, frameworks, technologies e.g. OWL (based on RDF), SKOS (also on RDF) (W3C), DOLCE/SUMO, description logic, frame logic, unified logic, annotation Types of ontologies (e.g. domain o., upper o., application o., task o.) Editors such as Protégé, Altova, OntoEdit, div. merging/annotation tools –Translation engineering standards i.e. various paradigms in machine translation and computer-assisted translation (language-based, statistical MT, Transl. Memories, patterns) –Terminology and language engineering standards (as the pre-requisite for and interface between ontology and translation) Terminology and lexical markup frameworks: TMF, LMF (ISO) Markup languages such as TBX (language industry+ISO) Lexical databases/ling.ontol: WordNet, Ontowordnet, EuroWordNet Linguistic enrichment of ontologies (e.g. FrameNet) Interaction mechanisms, translation of ontologies Integration of multilingual ontologies in machine translation processes

Diversity and interoperability Strong diversity of lexico-terminological resources –Data models, data structures + data semantics –Diversity of semantic, linguistic/cultural complexity and semantic depth/richness Diversity of user groups and their requirements Sheer quantity of resources Data interchange between organizations (within and across domains) as well as (distributed) data integration – early needs asking for immediate solutions History of data modeling History of interchange standards History of semantic interoperability management

Need for multi-level modeling architectures

generic interoperability framework terminological interoperability

Developing the Terminology Markup Framework in order to cope with this complexity-diversity Based on empirical studies and practical user-driven requirements analysis Markup/representation/modeling: XML, XMLS, RDF, UML Open standards strategy (ISO TC 37) –ISO Data categories – meta-model element + semantics registry (RDF) –ISO Terminology Markup Framework (TMF) – meta-model architecture and specifications (UML) –ISO – Terminology Markup Language (XML) Instance for language industry: TBX Termbase Exchange Format (XML) Instance for lexicography/publishing: LexML ISO 1951 –Lexical Markup Framework (LMF) (UML) –ISO 704 and ISO 1087 (foundational level) –ISO (workflow and collaborative issues) –Alignment with ISO 11179, W3C, OASIS, etc.

Introduction to TBX TBX® stands for TermBase eXchange TBX is a Terminological Markup Framework (TMF) markup language –TMF is an ISO standard (16642) TBX is consistent with ISO (MARTIF) TBX is maintained by OSCAR ( The TBX specification is free Serving portability of resources across proprietary terminology management systems, as well as interoperability of application-specific resources

TBX structure A TBX file is an XML document A TBX file consists of: –A header that describes the file –A set of entries, one per concept in the termbase –For each concept, a set of terms, grouped by language, that designate the concept A terminological concept entry (termEntry) –Can be multilingual –Can be monolingual

TBX and Other Standards (1) TBX and ISO (TMF) (2) TBX and ISO (Data Categories) (3) TBX and SKOS

1 : TBX and ISO TBX is a TML (Terminological Markup Language) of TMF (ISO 16642) (see Annex B) TBX maps to the TMF meta-model –A TBX file is a TDC (terminological data collection) –martifHeader provides GI (global information) –termEntry: TE (terminological entry) –langSet: LS (language section) –tig/ntig: TS (term section) A TMF DCS (Data Category Selection) in TBX is in XCS (eXtensible Constraint Specification) format TBX uses ISO for its XML style

TMF Metamodel Global Information (GI) Complementary Information (CI) Term Section(s) (TS) Term Component Section(s) (TCS) Language Section(s) (LS) Terminological (Concept) Entry/Entries (TE) Terminological Data Collection (TDC)

TMF and lexical resources In general, a terminological resource is organized into concept entries, each of which includes one or more terms designating a particular concept In general, a lexical resource is organized into lexical entries, each of which includes one or more senses of a particular lexical item (a word or phrase) A concept entry containing multiple terms can be split into multiple lexical entries, one per term, and multiple lexical entries associated with the same concept can be combined into one concept entry Link to Lexical Markup Framework (LMF)

2: TBX and ISO All data categories in the default TBX DCS are taken from ISO ISO is organized as an online registry and serves as a meta-ontology for resource modeling and for resource interoperability

3: TBX and SKOS A typical concept entry will contain a subject field to specify the domain of the concept. However, the subject field is typically some kind of hierarchy that is flattened into a string within TBX SKOS makes it possible to represent the subject field hierarchy as a hierarchy and then create a link within TBX

Simple Knowledge Organization System (SKOS) SKOS is an area of work developing specifications and standards to support the use of knowledge organisation systems (KOS) such as thesauri, classification schemes, subject heading lists, taxonomies, other types of controlled vocabulary, and perhaps also terminologies and glossaries, within the framework of the Semantic Web. - (Accessed on 3/17/06)

Sample SKOS Food Recipe Ingredient Restaurant Menu Item

Visual Representation of SKOS Food Recipe Ingredient Restaurant Menu Item AppetizerEntreeSaladSoup Grocery Store Item Homemade Item

GEvTerm Initiative The information previously used dealing with food has been taken from FooNaVar, a project of the GEvTerm Initiative. The GEvTerm Initiative is a terminological database that has committed to being fully TBX and SKOS compliant

C: Multilingual Thesaurus for Medieval Studies (MLTMS) Imagine the ability to search across web-resources using your native modern european language and find appropriate primary and secondary sources in Latin, French, Italian, German, Spanish, English, etc., based upon the meaning rather than the form of the search term. Imagine having a tool that would enable you to search for a concept and be able to construct the forms it has taken historically as well as the ability to link outward for both evidence and argument. Imagine a tool that would enable you to study the slippage of concept which is beyond naming. Imagine having a tool that can deconstruct ontological orders asking for different kinds of readings. (Accessed on 3/17/06)

Why did MLTSM use TBX? integration of terminological data from multiple sources; querying multiple termbases through a single user interface by passing data through a common intermediate format on a batch or dynamic basis; placing data on an FTP site for download by interested parties; peer review by colleagues of tentative entries - (Accessed on 3/17/06)

MLTSM Sample personnel personne qui accomplit un travail copie ou d'écriture copiste entryTerm écrivain synonym scribe entryTerm

MLTSM Sample (Rendered with XSLT)

TBX HTML The last few slides have provided an example of rendering HTML from a TBX file. Here is a brief diagram of the process. TBXXSLTHTML Processed byResults in

D: Other Standards ISO and XCS, which defines a flavor of TBX, both provide a list of data element types XMDR RDF OWL Topic Maps/XTM

E: Tasks for TBX Encourage translation technology vendors to implement TBX Revise the specification Compare ISO to XCS Render TBX in RDF -> TBR for TBX-SKOS interoperability implementation TBR -> OWL TBX – TMX (translation memory exchange standard) TBX in Machine Translation applications

Ontology Editor Protege OWL Ontology XMDR Prototype Architecture: Initial Implemented Modules MetadataValidator (defer) schema-driven syntax checker Authentication Service (defer) MappingEngine (defer) Registry External Interface Generalization Composition (tight ownership) Aggregation (loose ownership) Jena, Xerces Java RetrievalIndex FullTextIndex Lucene LogicBasedIndex Jena, OWI KS Racer,Kowari RegistryStoreWritableRegistryStore Subversion

OWL, RDF & XML Schema used to specify XMDR as UML used for Edition 2 UML11179 Metamodel Relational Schema Relational Metadata OWL XMDR Ontology & annotations XMDRs Relax NG Schema XMDR XML Schema RDF Spec TRang XML Schema Language spec XML Objects Types & Cardinalities What things go in own files? Which property direction stored? Sequential ordering of properties Triples: binary labeled relationships

XMDR XML schema provides a number of important benefits… Schema specifies what is required as well as what is legal Divides metadata into files conforming to XML schema Normalizes data (ala relational one fact in one place) Facilitates XSLT transformations by reducing degrees of freedom to a canonical encoding within the RDF standard Relax NG used to create and check XMDR-it schema RNG validator enforces many OWL ontology constraints TRang automatically translates into XML schema syntax

From texts and terminologies to ontologies Using the Risk scenario –Termbase Export XML Domain Models – meta-models -> patterns –Text corpus Term extraction – comparative testing ProTerm, MultiTerm Extract, MultiCorpora Aligning with termbase Convert to RDF –Ontology import -> editor The MULTH-WIN Project as an example of methods integration:

Bornemisza

Terminological frame semantics INTERVENTION (ACTOR(S), ACTIVITIES/PHASES): RISK DETECTING (PRE-EVENT) -R-ASSESSMENT -R-PERCEPTION (X is risk) -EXPERIENCE (statistics, case studies) -OBSERVATION (monitoring) -METHOD -SATELLITE -PROGNOSES -R-ANALYSIS -R-FEATURES -SITUATION/CONTEXT (danger/hazard) -SIMULATION (course of events) -PROBALISTIC METHODS (safety) -RELIABILITY -R-IDENTIFICATION (DAMAGE) -R-SOURCE -DAMAGE CAUSE -VULNERABILITY (DAMAGE TARGET) -SUSCEPTABILITY (capacity/people)

Terminological frame semantics I. Pre-event B. Public awareness and planning, II. In-event: C. Events and response afflux/Hochwasser durch Aufstau BE [[TYPE=flood], [PLACE=], [TIME=]], HAVE [CAUSE [[ORIGIN=], [NIEDERSCHLAG [TYPE=]], [STAU [TYPE= Aufstau]]], DAMAGE [TARGET=, SOURCE=, DEGREE=]], HAPPEN [STATES=, PROCESSES=]] backwater/Rückstau BE [[TYPE=flood], [PLACE=], [TIME=]], HAVE [CAUSE [[ORIGIN=], [NIEDERSCHLAG [TYPE=]], [STAU [TYPE= Rückstau]]], DAMAGE [TARGET=, SOURCE=, DEGREE=]], HAPPEN [STATES=, PROCESSES=]]

Relationship modeling disaster general rain hail snow type origin cause percipitation man- made natura l Stau Aufstau `afflux` Rückstau `backwater` im Entwässerungssystem `drainage flood´

TBX-SKOS interoperability Differences –XML vs. RDF (-> TBX will be turned into TBR) –Inherent flexibility + open data modeling for a large variety of resources vs. traditional thesaurus data model as a default for a KOS (diff. scopes) –TBX has documented use cases and mapping tools -> language industry standard –Different semantics + vocabularies (12620 vs. thesaurus standard) Commonalities –Conceptual approach –W3C –Integrated applications in the future Vocabulary mapping (RDF)

Global Information (GI) Complementary Information (CI) Term Section(s) (TS) TMF Metamodel Term Component Section(s) (TCS) Language Section(s) (LS) Terminological (Concept) Entry/Entries (TE) Terminological Data Collection (TDC)

Global Information (GI) Complementary Information (CI) Term Entry Level (Level 1) Terminological (Concept) Entry/Entries (TE) Terminological Data Collection (TDC) Concept- Related Dat-cats Subject Field Note Definition SourceID Responsibility Date Transaction Adminis- trative Dat-cats Notes Concept System DatCats

Language Section Level (Level 2) Language Section(s) (LS) Language Section(s) (LS) Language Section(s) (LS) (LS * n …) Concept- Related Dat-cats Note Definition SourceID Responsibility Date Transaction Language- Related Dat-cats Notes Adminis- trative Dat-cats xml:lang Transfer-comment Equivalence Concept System Dat-cats Terminological Entry

Term-Level 3 Language Section(s) (LS) Term Section(s) (TS) Term Section(s) (TS) (TS * n …) Definition Term-related DatCats (TRD) Term Context Note SourceID Responsibility Date Transaction Notes Concept- Related Dat-cats Adminis- trative DatCats Term Section(s) (TS) Transfer- comment Transfer- comment

SKOS Vocabulary SKOS Core is a model for expressing the structure and content of concept schemes (thesauri, classification schemes, subject heading lists, taxonomies, terminologies, glossaries and other types of controlled vocabulary). The SKOS Core Vocabulary is an application of the Resource Description Framework (RDF), that can be used to express a concept scheme as an RDF graph. Using RDF allows data to be linked to and/or merged with other RDF data by semantic web applications. Resource Description Framework (RDF)

SKOS Graphs

RDF Representation of SKOS Graph milk by source animal buffalo milk cow milk goat milk sheep milk

Mapping TBX/12620 DatCats to SKOS Vocabulary TBX data categories (data element concepts in the sense of ISO/IEC ) contain instantiations of information that are expressed in SKOS using SKOS core vocabulary. Interoperability (a cross-walk between the two standards) depends on mapping between the two systems

Terminological knowledge engineering framework 1. meta modeling level: Terminology Markup Framework (TMF) Lexical Markup Framework (LMF) UML, RDF, XML DC selectionISO in RDF is the meta-ontologyDC selectionsubsets, value sets 2. modeling level Terminology Markup Languages, e.g. TBX Lexical Markup, e.g. LexML (ISO 1951) XML 3. resource level Terminological resources Lexical resources Markup, Annotation, Alignment, Analysis, Term Extraction 4. workflow level ISO and other workflow specifications govern resource management processes (logistics, organizational measures, maintenance, quality assurance, etc.)

Framework integration ontology translation engineering framework interoperable integrative multilingual applications e.g. MULTH-WIN project terminology and language engineering framework

Thank you for your attention Acknowledgements: Slides 9-27 together with Alan Melby, Sue Ellen Wright Slides Bruce Bargmeyer Slide 33 Entry from the WordNet database Slides WIN project (Rothkegel) Slides Flood Risk Project Slides 44 WIN, 45: ThesShow Legat/Stallbaumer 46: GEMET, 47: Bandholtz, 48/49: Gangemi, 56-59: Miles/SKOS