MLIF: The Multi Lingual Information Framework ISO WD 24616 Samuel CRUZ-LARA LORIA / INRIA, France.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

DC2001, Tokyo DCMI Registry : Background and demonstration DC2001 Tokyo October 2001 Rachel Heery, UKOLN, University of Bath Harry Wagner, OCLC
A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
ANSI TAG 37 Committee F43 Language Services and Products Interagency Language Roundtable September 30, 2011 Sue Ellen Wright ISO TC 37, Terminology and.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
Multilingual eLearning in LANGuage Engineering. Project Overview  Project span: Oct 2004 – Oct 2007  Kick-off meeting Oct  Project goals:
MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information ISO TC37 SC4 WG Samuel Cruz-Lara, Gil Francopoulo, Laurent Romary,
LIRICS International Standards in Lexicography Gerhard Budin University of Vienna August 2005.
XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
TC3 Meeting in Montreal (Montreal/Secretariat)6 page 1 of 10 Structure and purpose of IEC ISO - IEC Specifications for Document Management.
© Tefko Saracevic, Rutgers University1 metadata considerations for digital libraries.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Ontology-based Access Ontology-based Access to Digital Libraries Sonia Bergamaschi University of Modena and Reggio Emilia Modena Italy Fausto Rabitti.
11 Data Interface Standard for Accounting Software Project Progress Report China National Audit Office June, 2015.
10 December, 2013 Katrin Heinze, Bundesbank CEN/WS XBRL CWA1: DPM Meta model CWA1Page 1.
CASE Tools And Their Effect On Software Quality Peter Geddis – pxg07u.
TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria.
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
9 th Open Forum on Metadata Registries Harmonization of Terminology, Ontology and Metadata 20th – 22nd March, 2006, Kobe Japan. Commonalities and Differences.
/21LIRICS IAG Meeting Barcelona LIRICS IAG Meeting /21 Universitat Pompeu Fabra Barcelona Introduction Gerhard Budin.
Principles of the GOLD Ontology & Conversion of GOLD to DCIF Presenters: Anthony Aristar, Evelyn Richter.
CLARIN web services and workflow Marc Kemps-Snijders.
Addressing Metadata in the MPEG-21 and PDF-A ISO Standards NISO Workshop: Metadata on the Cutting Edge May 2004 William G. LeFurgy U.S. Library of Congress.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
FLAVIUS Technical presentation (Overblog, Qype, TVTrip) - WP2 Platform architecture.
Experiments with ODD outside the TEI framework Laurent Romary & Piotr Banski The ISO-TEI connection.
Save time. Reduce costs. Find and reuse interoperability solutions on Joinup for developing European public services Nikolaos Loutas
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
© Copyright 2008 STI INNSBRUCK NLP Interchange Format José M. García.
ISO TC 37 / SC4 Language Resources An overview (Ammended 2-5 février 2002) Laurent Romary.
1 Synchronize work on DEXs and reference data between PLCS pilots and OASIS/PLCS - Proposed PLCS TC Organization and Functional Responsibilities.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
What is MOF? The Meta Object Facility (MOF) specification provides a set of CORBA interfaces that can be used to define and manipulate a set of interoperable.
EuroRoadS for JRC Workshop Lars Wikström, Triona Editor of EuroRoadS deliverables D6.3, D6.6, D6.7.
Interfacing Registry Systems December 2000.
24 Jan 2005 Kick off meeting (Luxembourg) 1 LIRICS Linguistic Infrastructure for Interoperable Resources and Systems ►Kick off meeting presentation ►Proposal.
ISO/IEC CD and WD : Core Model and Model Mapping ISO/IEC JTC1/SC32/WG September 2005, Toronto SC32/WG2 Japan (Kanrikogaku Ltd) Masaharu.
9 th Open Forum on Metadata Registries Harmonization of Terminology, Ontology and Metadata 20th – 22nd March, 2006, Kobe Japan. Presentation Title: Day:
EVA Workshop, 26 March 2003, Florence, Italy1 COINE Cultural Objects In Networked Environments Anthi Baliou University of Macedonia,Library Thessaloniki,
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
© Copyright 2013 STI INNSBRUCK “How to put an annotation in HTML?” Ioannis Stavrakantonakis.
Common Terminology Services 2 CTS 2 Submission Team Status Update HL7 Vocabulary Working Group May 17, 2011.
WG9 Report ISO/TC211 Plenary Meeting Montreal, Canada
TMF - Terminological Markup Framework Laurent Romary Laboratoire LORIA (CNRS, INRIA, Universités de Nancy) ISO meeting London, 14 August 2000.
Overview of SC 32/WG 2 Standards Projects Supporting Semantics Management Open Forum 2005 on Metadata Registries 14:45 to 15:30 13 April 2005 Larry Fitzwater.
ISO/TC37/SC4/N377 secretary report
WIGOS Data model – standards introduction.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Tutorial on XML Tag and Schema Registration in an ISO/IEC Metadata Registry Open Forum 2003 on Metadata Registries Tuesday, January 21, 2003; 4:45-5:30.
® A Proposed UML Profile For EXPRESS David Price Seattle ISO STEP Meeting October 2004.
Towards a roadmap for standardization in language technology Laurent Romary & Nancy Ide Loria-INRIA — Vassar College.
ISO/IEC JTC 1/SC 32 Plenary and WGs Meetings Jeju, Korea, June 25, 2009 Jeong-Dong Kim, Doo-Kwon Baik, Dongwon Jeong {kjd4u,
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
An Ontology-based Approach to Context Modeling and Reasoning in Pervasive Computing Dejene Ejigu, Marian Scuturici, Lionel Brunie Laboratoire INSA de Lyon,
SemAF – Basics: Semantic annotation framework Harry Bunt Tilburg University isa -6 Joint ISO - ACL/SIGSEM workshop Oxford, January 2011 TC 37/SC.
ANALYSIS PHASE OF BUSINESS SYSTEM DEVELOPMENT METHODOLOGY.
Formats, interoperability and standards Marc Kemps-Snijders.
Ontologies Reasoning Components Agents Simulations An Overview of Model-Driven Engineering and Architecture Jacques Robin.
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
The Semantic Web By: Maulik Parikh.
ISO/IEC JTC 1/SC 7 Working Group 42 - Architecture Johan Bendz
The Re3gistry software and the INSPIRE Registry
Chapter 2 Database Environment.
Metadata for research outputs management
Towards Automatic Model Synchronization from Model Transformation
Presentation transcript:

MLIF: The Multi Lingual Information Framework ISO WD Samuel CRUZ-LARA LORIA / INRIA, France

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 2 Outline Introduction MLIF [ISO WD 24616] Ongoing activities Actions Date Extension Request Conclusion

Introduction

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 4 Introduction The “Multi Lingual Information Framework” MLIF ISO AWI (TC37/SC4 WG3) Nasredine Semmar (CEA, France), WG3 Convenor Samuel Cruz-Lara (LORIA / INRIA, France), MLIF Project Leader

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 5 New Work Item Proposal ISO TC37/SC4 Meeting China, Beijing August 2006 ISO AWI 24616

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 6 Scope This standard aims at proposing a specification platform for a computer- oriented representation of multilingual data within a large variety of applications such as translation memories, localization, computer-aided translation, multimedia, or electronic document management.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 7 Scope As with the “Terminological Markup Framework”, used in terminology [ISO 16642], the MLIF will introduce a metamodel in combination with chosen data categories that will be integrated within the TC37 Data Category Registry [ISO/DIS ] in order to allow the description of any specific domain.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 8 Scope The standard will thus provide a way to validate any instance of this metamodel, as well as, interoperability principles with numerous translation and localization standards.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 9 Purpose and Justification The extremely fast evolution of the technological development in the sector of Communication and Information Technologies, and in particular, in the field of natural language processing, makes particularly acute the question of standardization. The issues related to this standardization are of an industrial, economic and cultural nature.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 10 Purpose and Justification The control of the interoperability between the industrial standards currently used for localization (XLIFF), translation memory (TMX), or any other Multi Lingual Markup Language (ML2), constitutes a major objective for a coherent and global management of these data.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 11 Purpose and Justification The MLIF could be associated to several multimedia standards such as MPEG-4 [ISO/IEC 14496], MPEG-7 [ISO/IEC 15938], and W3C SMIL, in order to handle multilingual data within several multimedia applications such as, interactive TV, video conferencing, subtitling, karaoke and accessibility.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 12 Purpose and Justification The MLIF may also be used in cultural heritage related activities such as, digital museums, e-learning and electronic document management.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 13 Purpose and Justification As with the “Terminological Markup Framework” (TMF), used in terminology, the MLIF will introduce a metamodel in combination with chosen data categories.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 14 Purpose and Justification These data categories will be derived as a subset of a Data Category Registry (DCR) [ISO/DIS ], in order to ensure interoperability between several multilingual applications and corpora.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 15 Purpose and Justification A Data Category Specification (DCS) will define, in combination with the metamodel, the various constraints that apply to a given domain-specific information structure or interchange format. A DCS and a metamodel represent the organization of an individual application and the organization of a specific domain.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 16 Purpose and Justification The MLIF should be considered as a unified conceptual representation of multilingual content. The MLIF is not meant to replace or to compete with any other existing standard such as TMX or XLIFF.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 17 Purpose and Justification Rather, the MLIF is being designed with the objective of providing a common conceptual model and a platform allowing interoperability among several translation and localization standards, and by extension, their committed tools.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 18 Purpose and Justification The asset of MLIF is the interoperability which allows experts to gather, under the same conceptual unit, various tools and representations related to multilingual data. In addition, MLIF will also make it possible to evaluate and to compare these multilingual resources and tools.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 19 Purpose and Justification The description of all different XML elements will be done by using RelaxNG [ISO/IEC ] with the help of ODD, which is the creation and documentation language for XML schemas proposed by the TEI (Text Encoding Initiative). This follows a recent decision taken by the World Wide Web Consortium.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 20 Purpose and Justification MLIF will also provide a variety of filters allowing transformations between several MLIF-compatible formats, as well as, several automatic validation tools. The adoption of this new work item proposal may allow to instantiate the business plan proposal of TC 37/SC4 in the perspective of creating a work specification on multilingual content (WG 3).

MLIF ISO WD 24616

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 22 Introduction The scope of research and development in the localization and translation memory (TM) process development is very large, and numerous independent groups are working on these aspects, such as LISA, OASIS, W3C, ISO, etc.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 23 Introduction Under the guidance of the above- mentioned groups, many formats have been developed. Some of the major formats of specific interest for localization and TM are TMX (LISA/OSCAR) and XLIFF (OASIS).

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 24 Introduction There are many identical requirements for all the formats irrespective of the differences in final output. For example, all the formats aim at being user-friendly, easy-to-learn, and at reusing existing databases or knowledge.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 25 Introduction All these formats work well in the specific field they are designed for, but they lack a synergy that would make them interoperable when using one type of information in a slightly different context, giving rise to the fear of competition between them.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 26 Scope As with “Terminological Markup Framework” TMF [ISO 16642] in terminology, MLIF will introduce a metamodel in combination with chosen data categories [ISO/DIS ], as a means of ensuring interoperability between several multilingual applications and corpora.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 27 Scope MLIF deals with multilingual corpora, multilingual fragments, and the translation relations between them.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 28 Scope In each domain where MLIF can be used, we may consider a specific granularity of segmentation and description, built on MAF [ISO CD 24611], Synaf [ISO CD 24615], TMF [ISO 16642] or LAF [ISO CD 24612] respectively, for morphological description, syntactical annotation, terminological description, and linguistic annotation.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 29 Scope MLIF will thus describe elementary linguistic segments (i.e. sentence, syntactical component, word, …).

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 30 Scope Supporting the construction and the interoperability of localization and “Translation Memories” (TM) resources, MLIF deals with the description of a metamodel for multilingual content.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 31 Scope MLIF will not propose a closed list of description features. Rather, it will provide a list of Data Categories, which is much easier to update and extend. This list represents a point of reference for multilingual information in the context of various application scenarios.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 32 Normative References ISO Computer applications in terminology -- Terminological Markup Framework. ISO CD Syntactical Annotation Framework. ISO/DIS Computer applications in terminology -- Data categories.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 33 Normative References ISO/IEC 639-1, Information technology - ISO 639:1988, Code for the representation of names of languages. ISO Code for the representation of names and languages-part 2:Alpha-3 code. ISO 8601 Data elements and interchange formats - Information interchange - Representation of dates and times.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 34 Starting point of MLIF The MLIF promotes the use of a common framework for the future development of several different formats: TMX, XLIFF, … The MLIF can be considered as a parent for all these formats, since all of them deal with multilingual data expressed in the form of segments or text units. They all can be stored, manipulated and translated in a similar manner.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 35 MLIF Metamodel

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 36 MLIF Metamodel Multi Lingual Data Collection (MLDC) Represents a collection of data containing global information and several multilingual units. Global Information (GI) Represents technical and administrative information applying to the entire data collection. Example: title of the data collection, revision history, …

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 37 MLIF Metamodel Multi Lingual Component (MULTI) This component represents a unique multilingual entry. Mono Lingual Component (MONO) Part of a multilingual component containing information related to one language. Segment Component (SEG) Textual content itself (may be “decorated” with several attributes as in SynAF [ISO CD 24615])

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 38 MLIF Metamodel History Component This generic component allows to trace modifications on the component it is anchored to (i.e. versioning).

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 39 MLIF Metamodel In order to provide a larger description of the linguistic content, the MLIF metamodel allows anchoring of other metamodels, such as MAF (morphological description), SynAF (syntactical annotation), TMF (terminological description), LAF (linguistic annotation), or any other metamodel based on ISO

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 40 MLIF Metamodel

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 41 Data Categories for MLIF All models have very similar hierarchical structure but they have different terms and methods of storing metadata relevant to them in particular. The MLIF provides a generic structure that can establish basic foundation for all other models.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 42 Data Categories for MLIF

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 43 Data Categories for MLIF Global Information (GI) /domain/ Specifies the domain on which the MLDC is dependent. /project/ Specifies a project within the domain on which the MLDC is dependent. /source/ “A complete citation of the bibliographic information pertaining to a document or other resource. “[ISO12620] Reference to a resource from which the present resource is derived.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 44 Data Categories for MLIF Global Information (GI) /sourceType/ “In multilingual and translation-oriented language resource or terminology management, the kind of text used to document the selection of lexical or terminological equivalents, collocations, and the like. “[ISO12620] “Both parallel and background texts serve as sources for information used in documenting multilingual terminology entries. “[ISO12620]

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 45 Data Categories for MLIF Global Information (GI) /sourceLanguage/ “In a translation-oriented language resource or terminology database, the language that is taken as the language in which the original text is written.” [ISO12620] /note/ This is an optional descriptor providing further information on any part of a content.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 46 Data Categories for MLIF Multilingual Component (MULTI) /identifier/ A unique name [source:IMDI_Source_Tag] Dublin Core equivalent: DC:Identifier [source: IMDI_Source_Tag] XML equivalent “xml:id” [source: /class/ A hierarchical high level description of the component it is anchored to.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 47 Data Categories for MLIF Multilingual Component (MULTI) /subclass/ A hierarchical low level description of the component it is anchored to. /note/ This is an optional descriptor providing further information on any part of a content.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 48 Data Categories for MLIF Monolingual Component (MONO) /languageIdentifier/ A unique identifier in a language resource entry that indicates the name of a language. [source:ISO12620] XML equivalent “xml:lang” /languageLevel/ Specifies the language level of the unique language identifier associated to the monolingual component (e.g. adults, children, scientist, slang, …)

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 49 Data Categories for MLIF Monolingual Component (MONO) /identifier/ A unique name [source:IMDI_Source_Tag] Dublin Core equivalent: DC:Identifier [source: IMDI_Source_Tag] XML equivalent “xml:id” [source:

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 50 Data Categories for MLIF Monolingual Component (MONO) /class/ A hierarchical high level description of the component it is anchored to. /subclass/ A hierarchical low level description of the component it is anchored to.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 51 Data Categories for MLIF Monolingual Component (MONO) /xlink/: Is a data category refinement composed by several data categories. It is used to identify a name or a resource: /uri/: can be represented by an xlink:href attribute. /type/: can be represented by an xlink:type attribute. /label/: can be represented by an xlink:label attribute. /title/: can be represented by an xlink:title attribute. /from/: can be represented by an xlink:from attribute. /to/: can be represented by an xlink:to attribute.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 52 Data categories for MLIF Segment Component (SEG) /identifier/ A unique name [source:IMDI_Source_Tag] Dublin Core equivalent: DC:Identifier [source: IMDI_Source_Tag] XML equivalent “xml:id” [source:

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 53 Data Categories for MLIF Segment Component (SEG) /class/ A hierarchical high level description of the component it is anchored to. /subclass/ A hierarchical low level description of the component it is anchored to.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 54 Data Categories for MLIF Segment Component (SEG) /xlink/: Is a data category refinement composed by several data categories. It is used to identify a name or a resource: /uri/: can be represented by an xlink:href attribute. /type/: can be represented by an xlink:type attribute. /label/: can be represented by an xlink:label attribute. /title/: can be represented by an xlink:title attribute. /from/: can be represented by an xlink:from attribute. /to/: can be represented by an xlink:to attribute.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 55 Data Categories for MLIF The HistoryComponent is a generic component allowing to trace modifications on the component it is anchored to (e.g., creation, modification, validation). It can be anchored onto any component of the metamodel. In MLIF metamodel, the HistoryComponent may be anchored to the GlobalInformation or to the MonoLingualComponent.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 56 Data Categories for MLIF In the GlobalInformation component, the HistoryComponent keeps all information related to any modification on the context or on the domain; In the MonoLingualComponent, the HistoryComponent allows keeping all evolutions or any enhancement of the content.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 57 Data Categories for MLIF HistoryComponent /transaction/ One of the steps involved in the creation, approval, and use of a specific component (approval, check, exportation, importation, input, modification, origination, standardization, userAccess, withdrawal).

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 58 Data Categories for MLIF HistoryComponent /date/ A date. The date is encoded according to a profile of [ISO8601] as described in [W3CDTF] and follows the YYYY-MM-DD format. /author/ The person responsible for the creation of the content. /note/ This is an optional descriptor providing further information on any part of a content.

MLIF example 1

MLIF example 2

MLIF example 3

Ongoing Activities

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 63 Ongoing activities MLIF can be used in e-learning, interactive television programs and any other application having a user interface. It may be very helpful for future interactive television broadcasting. Within ITEA’s “Jules Verne” and “Passepartout” projects, we have identified several potential implementations of MLIF in association with interactive TV and multimedia.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 64 Ongoing activities The MLIF and multimedia association presents ample opportunity for giving value to different languages and cultures, as is the case in Europe and Asia.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 65 Ongoing activities Within the framework of ITEA “Passepartout” project, we have experimented with some basic scenarios by using: XMT (“eXtensible MPEG4 Textual format”) and, W3C SMIL (“Synchronized Multimedia Integration Language”).

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 66 Ongoing activities We are currently collaborating with the W3C SYMM (SYnchronized MultiMedia) Working Group SMIL (Synchronized Multimedia Integration Language) SMILText New text container element Multilinguality (yes) Linguistic granularity (no)

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 67 SMILText This module defines new functionality for SMIL 3.0. It extends the media types available for SMIL, but does not alter any other existing functionality from SMIL 2.1 or earlier versions. Editors: Dick Bulterman (CWI, The Netherlands), Sjoerd Mullender (CWI, The Netherlands), Samuel Cruz-Lara (LORIA / INRIA, France) TEXT IS BEAUTIFUL !

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 68 SMILText The SMILText modules provide a text container element with an explicit content model for defining in-line text, and a set of additional elements and attributes to control explicit in-line text rendering.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 69 SMILText Since the SMILText elements and attributes are defined in a series of modules, designers of other markup languages can reuse these modules when they need to include a simple form of timed text functionality into their language.

SMIL Localization Roundtrip

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 71 Ongoing activities We have recently joint four new projects where MLIF should be used: ITEA2 SEMbySEM (INRIA - S. Cruz-Lara) ITEA2 METAVERSE1 (INRIA - S. Cruz-Lara) FP7-ICT MEDAR (CEA - N. Semmar) ANR WEBCROSSLING (CEA - N. Semmar)

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 72 SEMbySEM This project will provide an innovative and comprehensive Semantic Services Management System, based on a common open infrastructure, to allow management of tomorrow mixed systems (made of thousands elementary software and hardware components) with facilities to build ad-hoc dynamic visualisations of the managed systems for information, management and SLA verification purposes.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 73 SEMbySEM To do so, it will extensively use semantic description with the help of a « new standard » that it will provide. Keywords: Software Engineering, UML, OMG (SBVR, MOF, KDM, … Countries: Finland, France, and Turkey INRIA’s contribution: Multilingual Ontologies

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 74 METAVERSE1 The Metaverse1 (global standards among real and virtual worlds) project will provide a standardized global framework that enables the interoperability between virtual worlds (as for example Second Life, World of Warcraft, IMVU, Google Earth and many others) and the real world (sensors, actuators, vision and rendering, social and welfare systems, banking, insurance, travel, real estate and many others).

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 75 METAVERSE1 The ‘Metaverse for all’ will be a special attention point aiming at the e-Inclusion of minorities in the society. Countries: Belgium, France, Germany, Greece, Israel, Luxembourg, The Netherlands, and Spain INRIA’s contribution: Standardise the management and the representation of multilingual textual data

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 76 FP7-ICT MEDAR Supporting the development of tools and resources in Machine Translation and MultiLingual Information Retrieval on the basis of other partners technologies and open source code CEA’s contribution: Multilingual Information Retrieval

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 77 ANR WebCrossling Developing a Machine Translation prototype based on MultiLingual Information Retrieval technology CEA’s contribution: Multilingual Information Retrieval

Actions

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 79 Actions MLIF is an ISO “Working Draft” [ISO WD 24616]. A Committee of Experts for MLIF has been constituted The proposal we have just presented needs comments, remarks, … so it will be shortly sent to the Committee of Experts

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 80 MLIF Experts Committee Member bodies Annelies Kesting - NEN (The Netherlands) Bettina Seitl - ON (Austria) Young-Shik Kang - KATS (Republic of Korea) Marketa Jindrakova - CNI (Czech Republic) Roberto Ravaglia - UNI (Italy) Surayuth Boonmatat - TISI (Thailand) Toni Hittema - AFNOR (France)

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 81 MLIF Experts Committee Experts Dewi Bryn Jones - Canolfan Bedwyr (UK) Elena Montiel - UPM (Spain) Elsa Sklavounou - SYSTRAN (France) Emmanuel Planas - Lingua et Machina (France) Felix Sasaki - W3C Gerhard Budin - University of Vienna (Austria) Guadalupe Aguado de Cea - UPM (Spain) Harry Bunt - Tilburg University (The Netherlands)

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 82 MLIF Experts Committee Experts Key-Sun Choi - KORTERM/KAIST (Korea) Kiyong Lee - Korea University (Korea) Laurent Romary - Max Planck Digital Library (Germany), INRIA (France) Mourad Amine - Université de Montréal (Canada) Nasredine Semmar CEA (France) Nicoletta Calzolari ILC-CNR (Italy)

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 83 MLIF Experts Committee Experts Samuel Cruz-Lara - LORIA / INRIA (France) Stéphane Albucher - Business Objects (France) Wim Peters - University of Sheffield (UK) Yves Savourel ENLASO - (USA) Julien Ducret - SAFARI Consulting (France)

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 84 Actions The basis of the discussion must be the new proposal Metamodel & Data Categories Do we need to modify them? Do we need to take into account any new aspect? How can we progress? Use cases!!!

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 85 Actions Use Cases Interoperability Publishing Linguistic Granularity Segmentation Related Standards Multimedia Multilingual Ontologies …

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 86 Interoperability TMX “the sentence contains different formatting information”

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 87 Interoperability

Interoperability TMX & MLIF

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 90 Publishing

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 91 Interoperability & Publishing

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 92 Linguistic Properties Sentence, word, lemma, POS, … Time related issues?

Linguistic Properties Example by: Nasredine Semmar

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 94 Segmentation How segmentation issues will be taken into account by MLIF?

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 95 Related Standards TEI (Text Encoding Initiative) The description of all different XML elements will be done by using RelaxNG [ISO ] with the help of ODD, which is the creation and documentation language for XML schemas proposed by the TEI.

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 96 Related Standards W3C ITS (International Tag Set) ITS is a set of rules, expressed in elements, that provide information on how parts of a given DTD or XML Schema are related to specific internationalization & localization properties. Should ITS may be used inside MLIF (as ITS may be used in SMILText)?

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 97 Multimedia W3C SMIL SMILText Multilinguality (yes) Linguistic Granularity (no)

Multimedia

Timed, Multilingual Textual Descriptions W3C SMIL Standardization - Development of Interactive TV Profile - Integration of Annotation Support - Definition of Temporal Text Processing ISO MLIF Standardization - Development of MLIF format - Development of a multilingual processing pipeline - Interaction with SMIL and MPEG standards multilingual component multilingual DB linguistic segment l’histoire du courage d’une femme pour démasquer un mystère Monolingual component linguistic segment la historia da la valentía de una mujer para desenmascarar un misterio Monolingual component

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 101 Multilingual Ontologies In what way can MLIF be related to Multilingual Ontologies? ITEA2 SEMbySEM

Date Extension Request

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 103 MLIF [ISO CD 24616] Current state: Warning Urgent actions to do?

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 104 Date Extension Request CurrentProposed DIS FDIS IS

Conclusion

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 106 Conclusion We suggest to develop several use cases in order to test and to validate the new metamodel and related data categories Each use case should be leaded by one or several members of the MLIF Experts Committee

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 107 Conclusion A new working draft of MLIF, taken into account all proposed use cases, should be submitted to the Committee of Experts soon

Pisa, 26/09/2008 ISO TC37/SC4 MLIF (S. Cruz-Lara) (c) ISO 2008 All rights reserved 108 Thank you! Thank you for your attention Any question? Mailing list Web site