The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands NP 24622 CMDI-1 Metadata Component Framework New Standardization.

Slides:



Advertisements
Similar presentations
Status on the Mapping of Metadata Standards
Advertisements

IRCS Workshop on Open Language Archives IMDI & Endangered Languages Archives Heidi Johnson / AILLA.
Building metadata components Dieter Van Uytvanck Max Planck Institute for Psycholinguistics CLARIN-NL Info Session Nijmegen
CLARIN Metadata & ISO DCR Daan Broeder. Max-Planck Institute for Psycholinguistics TKE ES05 Workshop, August 14th Dublin.
Andy Powell, Eduserv Foundation July 2006 Repository Roadmap – technical issues.
Interoperability aspects in the The Virtual Language Observatory Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
Effective management Accurate tracking Easier automation.
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
Interoperability Aspects in Europeana Antoine Isaac Workshop on Research Metadata in Context 7./8. September 2010, Nijmegen.
From CLARIN Component Metadata to Linked Open Data
ISOcat introduction 19 June 20121CLARIN-NL ISOcat workshop.
Flexible Syntax and Concept Registries as a basis for Metadata Daan Broeder TLA - MPI for Psycholinguistics & CLARIN Metadata in Context, APA/CLARIN Workshop,
Data Category specifications 19 June 20121CLARIN-NL 2012 ISOcat tutorial.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information ISO TC37 SC4 WG Samuel Cruz-Lara, Gil Francopoulo, Laurent Romary,
OneGeology-Europe - the first step to the European Geological SDI INSPIRE Conference 2010, Session Thematic Communities: Geology Krakow, June 24 th 2010.
The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure.
1 TECO-WIS, 6-8 November 2006 TECHNICAL CONFERENCE ON THE WMO INFORMATION SYSTEM Seoul, Republic of Korea, 6-8 November 2006 ISO 191xx series of geographic.
UKOLUG - July Metadata for the Web RDF and the Dublin Core Andy Powell UKOLN, University of Bath UKOLN.
CLARIN Centers for a Sustainable Infrastructure Daan Broeder, MPI for Psycholinguistics Jan Odijk, Utrecht University.
Populating the Infrastructure using Standards Daan Broeder CLARIN NL EB TLA - MPI for Psycholinguistics CLARIN Coordinators Meeting June 29,30 Budapest.
CLARIN-NL First Call Jan Odijk CLARIN-NL Kick-off Meeting Utrecht, 27 May 2009.
CLARIN-NL Second Open Call Jan Odijk CLARIN-NL Call 2 Info-session Amsterdam, 26 Aug 2010.
Agenda CMDI Workshop 9.15 Welcome 9.30 Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.15Coffee 10.30Use of ISOCat within CMDI.
CLARIN web services and workflow Marc Kemps-Snijders.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Towards Translating between XML and WSML based on mappings between.
The ISO-DCR 17 January /20111CMDI tutorial Marc Kemps-Snijders a, Menzo Windhouwer b, Sue Ellen Wright c a Meertens Institute, b MPI for.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Increasing the usage of endangered language archives in the.
ISOcat demo and providing RELcat input Menzo Windhouwer The Language Archive tla.mpi.nl Data Archiving and Networked Solutions
3 rd Annual European DDI Users Group Meeting, 5-6 December 2011 The Ongoing Work for a Technical Vocabulary of DDI and SDMX Terms Marco Pellegrino Eurostat.
Metadata & CMDI CLARIN Component Metadata Infrastructure Daan Broeder et al. Max-Planck Institute for Psycholinguistics CLARIN NL CMDI Metadata Tutorial.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Why should we invest in DWF? Peter Wittenburg CLARIN Research.
CMDI Component Registry Patrick Duin Max Planck Institute for Psycholinguistics 2011.
CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics.
LEXUS: a web based lexicon tool Jacquelijn Ringersma Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
Wishes from Hum infrastructures Examples: DOBES and CLARIN Peter Wittenburg Max Planck Institute for Psycholinguistics.
ENABLER, BLARK, what’s next? Steven Krauwer Utrecht University / ELSNET.
ET-ADRS-1, April ISO 191xx series of geographic information standards.
1  Bob Hager Director of Publishing Standards Metadata Specification.
Issues for ISO/IEC : Procedure for the Specification of Web Ontology (PSO) ISO/IEC JTC 1/SC 32/WG 2 Interim Meeting London, UK, November 17, 2009.
ISOcat introduction 20 March 20121CLARIN-NL ISOcat workshop.
CLARIN work packages. Conference Place yyyy-mm-dd
9 th Open Forum on Metadata Registries Harmonization of Terminology, Ontology and Metadata 20th – 22nd March, 2006, Kobe Japan. Presentation Title: Day:
Potential standardization items for the cloud computing in SC32 1 WG2 N1665 ISO/IEC JTC 1/SC 32 Plenary Meeting, Berlin, Germany, June 2012 Sungjoon Lim,
11 CMDI/ISOcat And Semantic Operability Ineke Schuurman ISOcat content coördinator CLARIN-NL Menzo Windhouwer ISOcat system administrator Utrecht
CLARIN Issues Peter Wittenburg MPI for Psycholinguistics Nijmegen, NL.
Technology – Broad View Aspects that play a role when integrating archives leave the details of some core topics to the 2. day Bernhard Neumair:Base Technologies.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
METS Application Profiles Morgan Cundiff Network Development and MARC Standards Office Library of Congress.
Domain Modeling In FREMA Yvonne Howard David Millard Hugh Davis Gary Wills Lester Gilbert Learning Societies Lab University of Southampton, UK.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands TLA/MPI requirements for a Semantic Registry.
Overview of SC 32/WG 2 Standards Projects Supporting Semantics Management Open Forum 2005 on Metadata Registries 14:45 to 15:30 13 April 2005 Larry Fitzwater.
Agenda CMDI Tutorial 9.30 Welcome & Coffee Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.30CMDI & ISO-DCR 10.50The CMDI.
ISO TC 37/CLARIN SEMANTIC DATA REGISTRY WORKSHOP UTRECHT, DECEMBER ISOcat: Metadata Registry SUE ELLEN WRIGHT DECEMBER 2013.
CLARIN Concept Registry: the new semantic registry Ineke Schuurman, Menzo Windhouwer, Oddrun Ohren, Daniel Zeman
ISOcat status
ISO/IEC JTC 1/SC 32 Plenary and WGs Meetings Jeju, Korea, June 25, 2009 Jeong-Dong Kim, Doo-Kwon Baik, Dongwon Jeong {kjd4u,
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
HARMONIZATION AND INTEGRATION OF METADATA AN URGENT TASK FOR FUTURE EFFICIENT USE OF THE WEB Prepared by Dusan Soltes, FM CM BRATISLAVA, SLOVAKIA for the.
Formats, interoperability and standards Marc Kemps-Snijders.
Statistical Data and Metadata Exchange SDMX Metadata Common Vocabulary Status of project and issues ( ) Marco Pellegrino Eurostat
Concept Proposal Sixth Open Forum on Metadata Registries Semantic Interoperability between Registries To be held January 20-24, 2003 Bruce Bargmeyer
ISO TC 37/CLARIN DISCUSSION UTRECHT, DECEMBER 9/ Thinning Down a Bloated Cat SUE ELLEN WRIGHT DECEMBER 2013.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
Enhancing the Quality of Metadata by using Authority Control Thorsten Trippel, Claus Zinn LDL 2016 Workshop at LREC May 23-28, Portorož (Slovenia)
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
IPDA Registry Definitions Project Dan Crichton Pedro Osuna Alain Sarkissian.
ISOcat introduction 10 May /20111CLARIN-NL ISOcat workshop.
UFNPT Planning and Development Work Plan, Milestones and Timelines
5.b3 Monitoring & Reporting 2019
Presentation transcript:

The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands NP CMDI-1 Metadata Component Framework New Standardization Work within TC37/SC4

BACKGROUND Metadata for Language Resources has already been discussed for more then a decade. Many initiatives have tried to create the definitive metadata set. However the landscape is still fragmented in DC/OLAC, IMDI, TEI and many metadata sets limited to individual projects and corpora

How to address the fragmentation? Assumption is that there is broad agreement about – Limitations existing metadata schemas: DC/OLAC, IMDI, TEI header Inflexible: too many (IMDI) or too few (OLAC) metadata elements Limited interoperability (both semantic and syntactic) Problematic (unfamiliar) terminology for some sub- communities. Limited support for LT tool & services descriptions – The way to address this by: Explicit defined schema & semantics User/project/community defined components

Current “new” European Initiatives European projects and initiatives have been funded that offer the possibility to create a flexible metadata framework that can overcome this fragmentation. CLARIN – Aimed at the research world – Work done for 3 years on CMDI – Uses ISO-DCR for semantic interoperability NaLiDa – Uses CMDI to describe LRs in Germany META-SHARE – Also oriented towards industry – Plans to use component strategy for metadata These are interested in ISO standardization of the metadata component “model”

Metadata Components NOT a single new metadata schema but rather allow coexistence of many (community/researcher) defined schemas with explicit semantics for interoperability How does this work? Components are bundles of related metadata elements that describe an aspect of the resource A complete description of a resource may require several components. Components may contain other components Components should be designed for reusability

Metadata Components Technical Metadata Sample frequency Format Size … Lets describe a speech recording

Metadata Components Language Technical Metadata Name Id … Lets describe a speech recording

Metadata Components Language Technical Metadata Actor Sex Language Age Name … Lets describe a speech recording

Metadata Components Language Technical Metadata Actor Location … Continent Country Address Lets describe a speech recording

Metadata Components Language Technical Metadata Actor Location Project … Name Contact Lets describe a speech recording

Metadata Components Language Technical Metadata Actor Location Project Metadata schema Metadata description Lets describe a speech recording Component definition XML W3C XML Schema XML File Profile definition XML Metadata profile

ActorLanguage Recursive Recursive Component model Components can contain other components Enhances reusability Actor Address Location Project

Country dcr:1001 Language dcr:1002 Location Country Coordinates Actor BirthDate MotherTongue Text Language Title Recording CreationDate Type Component registry BirthDate dcr:1000 ISOcat concept registry user Dance Name Type Semantic interoperability partly solved via references to ISO DCR or other registry Selecting metadata components & profiles from the registry Title: dc:title DCMI concept registry Reusability & Explicit Semantics User selects appropriate components to create a new metadata profile or selects an existing profile ISOCat or ISO DCR implementation of ISO standard for data categories under control of the linguistic community ISO TC37 Metadata is just one of the seven “thematic domains”

STANDARDIZATION ROADMAP It is worthwhile to separate the standardization work in a number of separate tasks so that the work can divided over multiple people and that different stakeholder projects can have a share in the responsibility

Standardization Roadmap Standardization of metadata DCs in the ISO-DCR – Metadata TDG, chair Peter Wittenburg Defining Requirements for a Metadata Component Model and standardizing the Model itself – Project leader: Daan Broeder, CLARIN, NEN Standardizing a Component Specification Language – Project leader: Thorsten Trippel, NaLiDa, DIN Design/Specify a number of recommended components for specific data types and usages. – Project leader Maria Gavrilidou, META-SHARE, ELOT

ISO/NP CMDI-1 Ballot Results Results known since votes cast, 3 not cast Main question: “agree to the addition of the proposed new work item to the program of work of the committee" 16 in favor, 2 against, 2 abstentions Experts added: – Germany (DIN), Italy (UNI), Japan (JISC), Netherlands (2) (NEN), Slovenia (SIST)

Objections to NW CA (SCC) This standard is a great idea, but it's going to send a shockwave through every standard that has been proposed by ISO TC 37/SC 4. This simply has not been addressed in this proposal. We believe this warrants further consideration before proceeding with a draft standard on this subject. In particular, we expect to see (1) an enumeration of what changes this standard will entail for the other language resource management standards, and (2) a schedule for synchronizing the adoption of a CLARIN-style meta-model across the subcommittee's standards and current work items.

Objections to NW UK (BSI) The UK are extremely concerned that the content will be duplicating several standards. Standards duplicated include the following: ISO 12620:2009 Terminology and other language content resources - Specification of data catagories and management of a Data Category Register for language resources and from a metadata point of view: – ISO/IEC series: Information technology - Metadata registries (MDR) – ISO/IEC series: Information technology - Metamodel framework for interoperability (MFI)

Possible Reference implementation CLARIN project has been working on: – Metadata component registry and editor – Metadata editor If a potential ISO standard for component model and specification is not too different from CLARIN requirements and practice these could serve as a reference implementation

Component Specification Language clarin.eu:cr1:c_ iso The list of ISO language families. Based on: aav afa alg alv [...] CLARIN Component example: ISO-635 component

ISO Recommended Components Our CMDI experience is that we may well need to limit the proliferation of components Offer a set of standardized ones for use with – specific data-types – for specific purposes We hope that input from more “industry” oriented initiatives as META-SHARE will motivate the design of stable well thought-out components and profiles that can be standardized.

METADATA COMPONENT MODEL The Metadata Component Model is an abstract model. It should be independent of the component specification language and any specific components. However it may be necessary to define a “required” component. Although it is inspired by the CMDI work, it is not be dependent on any of CMDIs implementations. However for the moment there is no alternative to CMDI for the terminology describing the model nor for the requirements analysis.

Requirements for the component model Component has attributes: – name, multiplicity, concept-reference, … Component model should support recursion A component contains a number of metadata elements A component can refer to a number of resources or to other metadata components A component grammar has to be fully deterministic to avoid ambiguity A component can contain information about resource relations

Terminology Need first to define some terminology and link this to established (metadata) practices: Metadata schema Metadata profile – (application profile) Metadata component (LMF) Metadata element Metadata element concept link – (DCR)

Metadata Components, Profiles and Schemas the dependencies between metadata components, profiles and schemas

Metadata Element Context The metadata element component relation

Metadata Component Model A complete component model as used by the CLARIN CMDI implementation

Persistent Existing (ISO) standards use All is based on the recent ISO PISA, ISO 12620:2009 DCR Cool URIs for the concept links to ISOCat and ISOCDB All references to resources and metadata can contain PIDs

The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Thank you for your attention