Agenda CMDI Workshop 9.15 Welcome 9.30 Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.15Coffee 10.30Use of ISOCat within CMDI.

Slides:



Advertisements
Similar presentations
Towards a Persistent Identifier Infrastructure for European e-Research Daan Broeder CLARIN / MPG 2008 CNRI Handle System Workshop.
Advertisements

Putting the Pieces Together Grace Agnew Slide User Description Rights Holder Authentication Rights Video Object Permission Administration.
DC2001, Tokyo DCMI Registry : Background and demonstration DC2001 Tokyo October 2001 Rachel Heery, UKOLN, University of Bath Harry Wagner, OCLC
The Seven Pillars of Open Language Archiving: Introducing the OLAC Vision Gary Simons SIL International LSA Symposium: The Open Language Archives Community.
Building metadata components Dieter Van Uytvanck Max Planck Institute for Psycholinguistics CLARIN-NL Info Session Nijmegen
CLARIN Metadata & ISO DCR Daan Broeder. Max-Planck Institute for Psycholinguistics TKE ES05 Workshop, August 14th Dublin.
UKOLN, University of Bath
February Harvesting RDF metadata Building digital library portals with harvested metadata workshop EU-DL All Projects concertation meeting DELOS.
CLARIN AAI, Web Services Security Requirements
Interoperability aspects in the The Virtual Language Observatory Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
Interoperability Aspects in Europeana Antoine Isaac Workshop on Research Metadata in Context 7./8. September 2010, Nijmegen.
From CLARIN Component Metadata to Linked Open Data
Flexible Syntax and Concept Registries as a basis for Metadata Daan Broeder TLA - MPI for Psycholinguistics & CLARIN Metadata in Context, APA/CLARIN Workshop,
The JISC IE Metadata Schema Registry Pete Johnston UKOLN, University of Bath JISC Joint Programmes Meeting Brighton, 6-7 July 2004
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure.
OLC Spring Chapter Conferences Metadata, Schmetadata … Tell Me Why I Should Care? OLC Spring Chapter Conferences, 2004 Margaret.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Dublin Core as a tool for interoperability Common presentation of data from archives, libraries and museums DC October 2006 Leif Andresen Danish.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
UKOLUG - July Metadata for the Web RDF and the Dublin Core Andy Powell UKOLN, University of Bath UKOLN.
Populating the Infrastructure using Standards Daan Broeder CLARIN NL EB TLA - MPI for Psycholinguistics CLARIN Coordinators Meeting June 29,30 Budapest.
CLARIN-NL First Call Jan Odijk CLARIN-NL Kick-off Meeting Utrecht, 27 May 2009.
CLARIN-NL Second Open Call Jan Odijk CLARIN-NL Call 2 Info-session Amsterdam, 26 Aug 2010.
The ISO-DCR 17 January /20111CMDI tutorial Marc Kemps-Snijders a, Menzo Windhouwer b, Sue Ellen Wright c a Meertens Institute, b MPI for.
Eureka! User friendly access to the MPI linguistic data archive Max Planck Institute for Psycholinguistics Alexander Koenig Jacquelijn Ringersma Claus.
ISOcat demo and providing RELcat input Menzo Windhouwer The Language Archive tla.mpi.nl Data Archiving and Networked Solutions
Using IESR Ann Apps MIMAS, The University of Manchester, UK.
CLARINO WP2 National Registry and Long- Term Archiving Freddy Wetjen and Oddrun Pauline Ohren National Library of Norway Bergen, 12. September 2013.
The role of Parthenos for CLARIN ERIC Steven Krauwer CLARIN ERIC Executive Director 1.
Metadata & CMDI CLARIN Component Metadata Infrastructure Daan Broeder et al. Max-Planck Institute for Psycholinguistics CLARIN NL CMDI Metadata Tutorial.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Why should we invest in DWF? Peter Wittenburg CLARIN Research.
Metadata Considerations Implementing Administrative and Descriptive Metadata for your digital images 1.
CMDI Component Registry Patrick Duin Max Planck Institute for Psycholinguistics 2011.
CLARIN Infrastructure Vision (and some real needs) Daan Broeder CLARIN EU/NL Max-Planck Institute for Psycholinguistics.
CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics.
Wishes from Hum infrastructures Examples: DOBES and CLARIN Peter Wittenburg Max Planck Institute for Psycholinguistics.
The JISC IE Metadata Schema Registry and IEEE LOM Application Profiles Pete Johnston UKOLN, University of Bath CETIS Metadata & Digital Repositories SIG,
Linguistics with CLARIN Storing resources in CLARIN Jan Odijk LOT Winterschool Amsterdam,
CLARIN for Linguists Portal & Searching for Resources Jan Odijk LOT Summerschool Nijmegen,
11 CMDI/ISOcat And Semantic Operability Ineke Schuurman ISOcat content coördinator CLARIN-NL Menzo Windhouwer ISOcat system administrator Utrecht
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands NP CMDI-1 Metadata Component Framework New Standardization.
CLARIN Issues Peter Wittenburg MPI for Psycholinguistics Nijmegen, NL.
Technology – Broad View Aspects that play a role when integrating archives leave the details of some core topics to the 2. day Bernhard Neumair:Base Technologies.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
1 Dublin Core & DCMI – an introduction Some slides are from DCMI Training Resources at:
CMDI Software Components. MD Service Delivers services for the Catalog & Search GUI – Query – Populate UI Acts as a WS and exposes the query and “queryModel()*”
Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.
JISC Information Environment Service Registry (IESR) Ann Apps MIMAS, The University of Manchester, UK.
Search Interoperability, OAI, and Metadata Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond Grainger Engineering Library April.
Beyond ISOcat 20 June 2013CLARIN-NL ISOcat tutorial1.
1 CLARIN - NL What is going on? Jan Odijk Amsterdam 26 Aug 2010.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands TLA/MPI requirements for a Semantic Registry.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Agenda CMDI Tutorial 9.30 Welcome & Coffee Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.30CMDI & ISO-DCR 10.50The CMDI.
CLARIN Concept Registry: the new semantic registry Ineke Schuurman, Menzo Windhouwer, Oddrun Ohren, Daniel Zeman
Digitization – Basics and Beyond workshop Interoperability of cultural and academic resources New services for digitized collections Muriel Foulonneau.
NEFIS (WP5) Evaluation Meeting, November 2004 Evaluation Metadata Aljoscha Requardt, University of Hamburg Response rate: 93% (14 of 15 partners.
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
AAI needs of the Distributed Computing Infrastructures - CLARIN Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
MICHAEL and the European Digital Library: promoting teaching, learning and research The MICHAEL Project is funded under the European Commission eTEN Programme.
Introduction to Metadata
Metadata for research outputs management
Session 2: Metadata and Catalogues
Presentation transcript:

Agenda CMDI Workshop 9.15 Welcome 9.30 Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.15Coffee 10.30Use of ISOCat within CMDI 11.00The CMDI Component Registry and CMDI Component Editor ARBIL, the CMDI metadata editor 12.30Lunch Standard Metadata Components and Profiles available from the registry Metadata creation scenarios and try it your self opportunity 15.00Coffee 15.15Metadata creation scenario’s and try it your self opportunity, continued 16.00Further hands on practice with guidance 17.00End

CMDI CLARIN Component Metadata Infrastructure Daan Broeder et al. Max-Planck Institute for Psycholinguistics CLARIN NL CMDI Metadata Workshop May 27’th Nijmegen

CLARIN metadata project background  CLARIN EU WP2 since 2007 investigated and creates (prototypical) solutions for:  Common AAI infrastructure  Single system of persistent identifiers (PIDs) for resources  Common metadata domain - CMDI  …  CMDI is being developed by CLARIN partners: Austrian Academy, IDS, MPI for Psyl, Sprakbanken Univ. Gothenborg,  National CLARIN projects: CLARIN-NL, (D-SPIN) CLARIN- DE have committed resources to work with CMDI  CLARIN NL metadata project has been testing the CMDI basics

Metadata in General  Data about Data  Structured Data about Data  Not a prose description (although that can be a part)  … but keyword/value type of data: Name = “myresource”, Title = “mybook”  Internet: Machine readable Data about Data  XML format. Used for:  Resource discovery / accessing  Management  …

Dublin Core (DC) Metadata Set ContentIntellectual Property Instance TitleCreatorDate SubjectPublisherType DescriptionContributorFormat LanguageRightsIdentifier Relation Coverage Source

DC Example DC.Title = “My first book” DC.Title /Alternative = “My last book” DC.Creator = “L. Smith” DC.Subject /LCSH = “Building” DC.Description/Abstract = “…….” DC.Language/ ISO639-2 = “eng” Qualifiers either specify: encoding scheme refinement

Metadata for Language Resources I  Resource types:  Video, audio, pictures, annotations, primary texts, notes, grammars, lexica, …  Different levels of description (granularity):  complete corpora e.g. Brown Corpus.  sub corpora or corpus components: e.g. all Flemish recordings in the Spoken Corpus Dutch with all the transcriptions  (recording) sessions: e.g. the recording of a dialogue (sound file + transcript)  individual resources: e.g. a text file

Metadata for Language Resources II  Metadata was/is often embedded in annotations  CHAT format  TEI  Advantage of splitting this:  Independent formats allowing combinations as IMDI metadata with CHAT annotations  Keep several versions for different tools  … but danger of inconsistencies

Current Metadata Situation Fragmented landscape  Metadata sets, schema & infrastructures in our domain:  IMDI, OLAC/DCMI, TEI  Problems with current solutions:  Inflexible: too many (IMDI) or too few (OLAC) metadata elements  Limited interoperability (both semantic and functional)  Problematic (unfamiliar) terminology for some sub- communities.  Limited support for LT tool & services descriptions

Common metadata domain Why a common metadata domain:  Finding and sharing resources housed at all archives & repositories participating in CLARIN  Specify distributed heterogeneous collections of LRs and processing these collections  In general, a common metadata domain helps bringing along a single domain of LRs

Metadata Components CLARIN chose for a component approach: CMDI  NOT a single new metadata schema  but rather allow coexistence of many (community/researcher) defined schemas  with explicit semantics for interoperability How does this work?  Components are bundles of related metadata elements that describe an aspect of the resource  A complete description of a resource may require several components.  Components may use and contain other components  Components should be designed for reusability

Metadata Components Technical Metadata Sample frequency Format Size … Lets describe a speech recording

Metadata Components Language Technical Metadata Name Id … Lets describe a speech recording

Metadata Components Language Technical Metadata Actor Sex Language Age Name … Lets describe a speech recording

Metadata Components Language Technical Metadata Actor Location … Continent Country Address Lets describe a speech recording

Metadata Components Language Technical Metadata Actor Location Project … Name Contact Lets describe a speech recording

Metadata Components Language Technical Metadata Actor Location Project Metadata schema Metadata profile Lets describe a speech recording

Metadata Components Language Technical Metadata Actor Location Project Metadata schema Metadata description Lets describe a speech recording Metadata profile

Metadata Components Language Technical Metadata Actor Location Project Metadata schema Metadata description Lets describe a speech recording Component definition XML W3C XML Schema XML File Profile definition XML Metadata profile

Location Country Coordinates Actor BirthDate MotherTongue Text Language Title Recording CreationDate Type Component registry user Dance Name Type User selects appropriate components to create a new metadata profile or an existing profile Selecting metadata components from the registry CMDI Component Reuse

Concept registries  Basically a list with concepts and their descriptions where every concept has a unique identifier.  Some have a complicated structure and are associated with elaborate (administrative) processes to determine the status and acceptation of concepts in the registry. e.g. ISO- DCR.  others are static and simple lists of concepts and descriptions e.g. DCTERMS

Country dcr:1001 Language dcr:1002 Location Country Coordinates Actor BirthDate MotherTongue Text Language Title Recording CreationDate Type Component registry BirthDate dcr:1000 ISOcat concept registry user Dance Name Type Semantic interoperability partly solved via references to ISO DCR or other registry Selecting metadata components from the registry Title: dc:title DCMI concept registry CMDI Explicit Semantics User selects appropriate components to create a new metadata profile or an existing profile

CMDI Metadata Live-cycle Search Service Joint Metadata Repository Metadata Repository Metadata Repository Relation Registry ISOcat Concept Registry DCMI Concept Registry other Concept Registry CLARIN Component Registry Semantic Mapping Create metadata schema from selection of existing components. Allow creation of new components if they have references to ISOcat Perform search/browsing on the metadata catalog using the ISO DCR and other concept registries and CLARIN relation registry Metadata component profile was selected from metadata component registry Metadata harvesting by OAI-PMH protocol Metadata descriptions created

CMDI Architecture I  The CMDI takes an archivist or “production” first viewpoint  Prioritize that the metadata can be of good quality: consistent, coherent, correctly linked to the concept registries  The consumer side can be more “experimental” and diverse.  Many MD exploitation “stacks” or consumers applications can work in parallel on the same metadata

CMDI Architecture II MD Comp. Editor MD Comp. Registry ISO-Cat DCR MD Editor. Local MD Repository OAI-PMH Data provider OAI-PMH Service Provider CLARIN Joint MD Repository MD Services Semantic mapping Services Relation Registry MD Catalog user Metadata modeler ISO TDG MD Creator External agents Virtual Collection Registry

Current CMDI status I  ISO-DCR: 218 metadata concepts  CMDI component registry: 135 components, 19 profiles Produced & inspired by:  Deconstructing existing metadata schema IMDI, OLAC, TEI  Considering requirements of other CLARIN activities like profile matching  CLARIN NL metadata project tested the CMDI model and delivered components and profiles for the resources in two major Dutch Language Resource centers

Current CMDI status II Operational or test phase:  ISOCat DCR  Component registry & editor  ARBIL metadata editor Still working on:  Joint Metadata Repository, Metadata Catalog, Semantic Mapping, Relation Registry Expect a usable first version in third quarter 2010

Thank you for your attention CLARIN has received funding from the European Community's Seventh Framework Programme under grant agreement n°