CLARIN Concept Registry: the new semantic registry Ineke Schuurman, Menzo Windhouwer, Oddrun Ohren, Daniel Zeman

Slides:



Advertisements
Similar presentations
DC2001, Tokyo DCMI Registry : Background and demonstration DC2001 Tokyo October 2001 Rachel Heery, UKOLN, University of Bath Harry Wagner, OCLC
Advertisements

CLARIN Metadata & ISO DCR Daan Broeder. Max-Planck Institute for Psycholinguistics TKE ES05 Workshop, August 14th Dublin.
Alexandria Digital Library Project Integration of Knowledge Organization Systems into Digital Library Architectures Linda Hill, Olha Buchel, Greg Janée.
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
11 CLARIN? ISOCAT! Ineke Schuurman ISOcat content coördinator CLARIN-NL Amsterdam
MP IP Strategy Stateye-GUI Provided by Edotronik Munich, May 05, 2006.
SYSTEM ANALYSIS & DESIGN (DCT 2013)
Systems Analysis and Design 9th Edition
Helping people find content … preparing content to be found Enabling the Semantic Web Joseph Busch.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
TLA/CLARIN CLAVAS Use Cases: Overview CMDI integration – Metadata editing Resource Annotation Kinship data.
The NSDL Registry Diane Hillmann  Jon Phipps. What We’re Doing Received an NSF grant in Oct. 2006, to: Register metadata schemas, vocabularies, application.
ISOcat: known issues 10 May /20111CLARIN-NL ISOcat workshop.
Exchange formats and APIs Questions – how and when to access metadata? – lifecycle/status – how to access? can things disappear? – is CSV enough? – is.
CLARIN-NL First Call Jan Odijk CLARIN-NL Kick-off Meeting Utrecht, 27 May 2009.
CLARIN-NL Second Open Call Jan Odijk CLARIN-NL Call 2 Info-session Amsterdam, 26 Aug 2010.
Agenda CMDI Workshop 9.15 Welcome 9.30 Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.15Coffee 10.30Use of ISOCat within CMDI.
Proposal for App Id and Service Provider Id registration Group Name: Shelby Kiewel Source: Shelby Kiewel, iconectiv / Ericsson,
CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.
The ISO-DCR 17 January /20111CMDI tutorial Marc Kemps-Snijders a, Menzo Windhouwer b, Sue Ellen Wright c a Meertens Institute, b MPI for.
Chapter 3 Sections 3.5 – 3.7. Vector Data Representation object-based “discrete objects”
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Increasing the usage of endangered language archives in the.
ISOcat demo and providing RELcat input Menzo Windhouwer The Language Archive tla.mpi.nl Data Archiving and Networked Solutions
CLARINO WP2 National Registry and Long- Term Archiving Freddy Wetjen and Oddrun Pauline Ohren National Library of Norway Bergen, 12. September 2013.
The role of Parthenos for CLARIN ERIC Steven Krauwer CLARIN ERIC Executive Director 1.
DC specifications or “Do’s and don’ts” when creating a DC.
Content of the Data Category Registry 10 May /20111CLARIN-NL ISOcat workshop.
CMDI Component Registry Patrick Duin Max Planck Institute for Psycholinguistics 2011.
CLARIN Infrastructure Vision (and some real needs) Daan Broeder CLARIN EU/NL Max-Planck Institute for Psycholinguistics.
CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics.
Nancy Lawler U.S. Department of Defense ISO/IEC Part 2: Classification Schemes Metadata Registries — Part 2: Classification Schemes The revision.
ISOcat: known issues 20 June 20131CLARIN-NL ISOcat workshop.
European Interoperability Architecture e-SENS Workshop : Collecting data for the Cartography Tool 7-8 January 2015.
CLARIN-NL Call 4 ISOcat follow-up 2/10/20131CLARIN-NL Call 4 ISOcat follow-up.
CLARIN work packages. Conference Place yyyy-mm-dd
Slide 1 ERFP Website The German Centre for Documentation and Information in Agriculture 10 th Workshop for European National.
CLARIN-NL ISOcat workshop 2012 part 2 ( ) Ineke Schuurman Menzo Windhouwer.
ISOcat: known issues 19 June 20121CLARIN-NL ISOcat workshop.
11 CMDI/ISOcat And Semantic Operability Ineke Schuurman ISOcat content coördinator CLARIN-NL Menzo Windhouwer ISOcat system administrator Utrecht
CLARIN Issues Peter Wittenburg MPI for Psycholinguistics Nijmegen, NL.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
CLARIN-NL Requirements and Desiderata Jan Odijk CLARIN-NL Call 3 Info-session Utrecht, 25 Aug 2011.
A Key Component for ICP 2004 A well-established product list  Compare „ like with like “  Collect prices for relevant products  Establish reliable PPPs.
Beyond ISOcat 20 June 2013CLARIN-NL ISOcat tutorial1.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands TLA/MPI requirements for a Semantic Registry.
CE Operating Systems Lecture 17 File systems – interface and implementation.
Agenda CMDI Tutorial 9.30 Welcome & Coffee Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.30CMDI & ISO-DCR 10.50The CMDI.
Database Objective Demonstrate basic database concepts and functions.
CLARIN Requirements for a Semantic Registry Daan Broeder The Language Archive – MPI Ineke Schuurman CLARIN-NL/VL – KU Leuven & Utrecht.
1 CLARIN? ISOCAT! Ineke Schuurman Hilversum,
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
Systems Analysis and Design 8th Edition
Today’s Lesson….. 1.Formative Assessment Given Back – Go through Answers. 2.Webpage Design.
Variants with be.as script. © beas group 2011 / Page 2 This documentation and training is provided to you by beas group AG. The documents are neither.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
CGI – GeoSciML Testbed 3 Status for BRGM Jean-Jacques Serrano.
ADN Framework Overview A Collaboration of ADEPT, DLESE and NASA (2002 Nov. 19)
Metadata Registries Workshop April 16, 1998 Slide 1 of 12 AIHW Knowledgebase Douglas D. Mann Presented by Douglas D. Mann Battelle Memorial.
Group work and standardization features in ISOcat Menzo Windhouwer 8/14/20101Standardizing Data Categories in ISOcat - Implementing Group.
AAI needs of the Distributed Computing Infrastructures - CLARIN Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
Enhancing the Quality of Metadata by using Authority Control Thorsten Trippel, Claus Zinn LDL 2016 Workshop at LREC May 23-28, Portorož (Slovenia)
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
Marc Kemps-Snijders Menzo Windhouwer Sue Ellen Wright
RECENT TRENDS IN METADATA GENERATION
The Re3gistry software and the INSPIRE Registry
Variants with be.as script
Using CAD Parcel data with the Local Government Solution
Tiffany Ong, Rushali Patel, Colin Dolese, Joseph Lim
Presentation transcript:

CLARIN Concept Registry: the new semantic registry Ineke Schuurman, Menzo Windhouwer, Oddrun Ohren, Daniel Zeman CLARIN Annual Conference October Wroclaw, Poland

Background  In the tools and resources offered by CLARIN many (de facto) standards are being referred to, concerning both metadata and content data, but …  What do they mean?  Do they mean the same in the various tools and resources?  CMDI (CLARIN Metadata Infrastructure)  Makes use of several registries

Clear metadata The metadata provided in CMDI should be clear, i.e., unambiguous, in order to be useful. The building blocks, components, elements, attributes and values, of a CMDI profile should be clearly defined in a Concept or, for value ranges, Vocabulary registry. Registries used: - Dublin Core - ISOcat (in the past) - CLARIN Concept Registry (CCR) - CLAVAS (in the future, CMDI 1.2)

Drawbacks ISOcat -Too much proliferation -everybody could enter stuff -entries quite often not meeting our standards -entries were out of control -Too complex -data category type, data type -while several ‘problematic’ fields were not useful for our (CLARIN) purposes In addition: last year ISOcat had to be migrated (decision Registration Authority) and became static  CLARIN decided to look for another solution.

New approach: CCR  CCR (CLARIN Concept Registry) SIMPLIFIED CONTROLLED Simplified: several ‘fields’ not adopted from ISOcat Controlled: national CCR-coordinators will filter the input

Characteristics CCR  Browser: Accessible for everybody  Editor: Just for CCR-coordinators to insert new entries  API: For tools, e.g., the Component Registry Browser: easy search for  Label (name)  Definition  Other text fields (example, history, …)

High quality concepts Definitions should be ‘as general as possible, as specific as necessary’, therefore they should be 1.Unique 2.Meaningful 3.Reusable 4.Concise 5.Unambiguous Also in other fields characteristic nr 5 is to be obeyed!

Entries are ‘for ever’ Trust and reliability  Issue in ISOcat!  CCR  controlled  Definitions cannot be updated in a way that changes their meaning  Only typos etc can be corrected  Preferred label (name) will not be changed  Instead a new entry will be created, the old one being expired if necessary  what can be added: examples, alternative labels, ‘higher’ status, notes, additional scheme and/or collection

OpenSKOS  Existing OpenSKOS infrastructure was adapted.  Already available  API to access, create, share thesauri and vocabularies  Editor  New  Concepts have a handle as Persistent Identifier  Faceted browser  Support for SKOS collections  Shibboleth-based access

From ISOcat to the CCR Imported in CCR -Entries used in CLARIN, e.g., in CMDI -Entries recognized as belonging to a standard -Entries selected by the national CCR coordinators ISOcat: over 5000 entries CCR: 3139 entries (for CLARIN) We will perform a clean-up action before adding new entries, in order to remove duplications, project or language specific definitions, empty definitions, misspellings (organization vs organisation), …

More details …

CCR Coordinators  If you need  a new concept, or  want to change an existing concept  contact your CCR coordinator:  If no CCR coordinator is appointed for your country:  For information on the CCR, the coordinators and the (upcoming) procedures see

Decision procedure  All ‘ERIC countries’ appointed a CCR content coordinator Wrt decisions about entries  All CCR content coordinators (or deputies) are involved  We aim for unanimity  If necessary we will vote  A change in CCR (like adding specific new entry) is accepted when 70% or more of the coordinators represented agree  All changes are recorded in the CLARIN CCR-section

Correction current entries, new entries There still are ‘incorrect’ entries, i.e., entries not meeting our demands  We are working on these.  In the future we will have 2 weeks to come to an agreement on a batch of entries,  The same holds wrt proposals for new entries Exceptions: holiday season, and the initial period (=now!)

Moving to the CCR  If you  have resources that contain references to ISOcat data categories which you want to replace by their CCR concept handles (if available), or  want to know which ISOcat data categories are imported into the CCR  Visit  where you can find  mapping files, and  a tool to use those files to replace ISOcat data category references by CCR concent handles  If you run into problems contact your national CCR coordinator or, if necessary,

Thank you for your attention ! (There will be a demo later today)