CMD and TEI CMDI interoperability workshop 2013-06-04 - Utrecht Matej Ďurčo, ICLTT, Vienna.

Slides:



Advertisements
Similar presentations
Can I Use It, and If so, How? Christian Lieske SAP AG – MultiLingual Technology Discussion of Consortium Proposal for OLIF2 File Header.
Advertisements

OLAC Metadata Steven Bird University of Melbourne / University of Pennsylvania OLAC Workshop 10 December 2002.
INTER-VIEWs Curation of Interview Data 1 feb. – 1 nov CLST, Nijmegen,, Henk van den Heuvel Centre for.
CLARIN Metadata & ISO DCR Daan Broeder. Max-Planck Institute for Psycholinguistics TKE ES05 Workshop, August 14th Dublin.
DANS is an institute of KNAW and NWO Data Archiving and Networked Services EASY Dublin Core and CMDI Georgi Khomeriki, Marnix van Berchum, Menzo Windhouwer.
ISOcat Data Category Registry Defining widely accepted linguistic concepts Menzo Windhouwer 1CLARIN-NL MD tutorial, September 2009.
Utrecht Matej Ďurčo, ICLTT, Vienna Controlled Vocabularies and SMC4LRT Semantic Mapping in CMDI.
From CLARIN Component Metadata to Linked Open Data
Flexible Syntax and Concept Registries as a basis for Metadata Daan Broeder TLA - MPI for Psycholinguistics & CLARIN Metadata in Context, APA/CLARIN Workshop,
Data Category specifications 19 June 20121CLARIN-NL 2012 ISOcat tutorial.
ICT Monica Monachini – 1° KYOTO Workshop – Amsterdam 2/ KYOTO (ICT ) Yielding Ontologies for Transition-Based Organization Intelligent.
CLARIN-NL/VL procedure 20 June 20131CLARIN-NL ISOcat workshop.
11 CLARIN? ISOCAT! Ineke Schuurman ISOcat content coördinator CLARIN-NL Amsterdam
Digital Editions & Language Resources Portal Workshop - Save the data, , Wien Matej Ďurčo ICLTT/ ACDH, ÖAW
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
METS Metadata Encoding and Transmission Standard Metadata Working Group Forum April 19, 2002.
I.1 ii.2 iii.3 iv.4 1+1=. i.1 ii.2 iii.3 iv.4 1+1=
I.1 ii.2 iii.3 iv.4 1+1=. i.1 ii.2 iii.3 iv.4 1+1=
ISOcat: known issues 10 May /20111CLARIN-NL ISOcat workshop.
TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria.
Populating the Infrastructure using Standards Daan Broeder CLARIN NL EB TLA - MPI for Psycholinguistics CLARIN Coordinators Meeting June 29,30 Budapest.
CLARIN-NL: Dealing with ISOcat Ineke Schuurman. ISOcat and CLARIN Projects call 1 CLARIN-NL Joint Flemish/Dutch pilot Whenever relevant, elements are.
1 CLARIN - NL Language Resources and Technology Infrastructure for the Humanities and the Social Sciences in the Netherlands Jan Odijk LREC May.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
CLARIN-NL Second Open Call Jan Odijk CLARIN-NL Call 2 Info-session Amsterdam, 26 Aug 2010.
Agenda CMDI Workshop 9.15 Welcome 9.30 Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.15Coffee 10.30Use of ISOCat within CMDI.
CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.
The ISO-DCR 17 January /20111CMDI tutorial Marc Kemps-Snijders a, Menzo Windhouwer b, Sue Ellen Wright c a Meertens Institute, b MPI for.
PwC SCHEMAS Forum for metadata schema implementers Metadata: SCHEMAS and other European projects First Austrian Metadata Seminar, 18 May 2001 Michael Day,
Sharing Resources in CLARIN-NL Jan Odijk, Arjan van Hessen LRTS Workshop IJCNLP Chiang Mai, Thailand, 12 Nov 2011.
ISOcat demo and providing RELcat input Menzo Windhouwer The Language Archive tla.mpi.nl Data Archiving and Networked Solutions
CLARIN-NL Call 3 ISOcat follow-up 10/10/20121CLARIN-NL ISOcat Call 3 follow-up.
Content of the Data Category Registry 10 May /20111CLARIN-NL ISOcat workshop.
Metadata & CMDI CLARIN Component Metadata Infrastructure Daan Broeder et al. Max-Planck Institute for Psycholinguistics CLARIN NL CMDI Metadata Tutorial.
Session IV Chapter 9 – XML Schemas
CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics.
ISOcat: known issues 20 June 20131CLARIN-NL ISOcat workshop.
Lifecycle Metadata for Digital Objects (INF 389K) September 18, 2006 The Big Metadata Picture, Web Access, and the W3C Context.
CLARIN-NL Call 4 ISOcat follow-up 2/10/20131CLARIN-NL Call 4 ISOcat follow-up.
Linguistics with CLARIN Storing resources in CLARIN Jan Odijk LOT Winterschool Amsterdam,
ISOcat introduction 20 June 20131CLARIN-NL ISOcat workshop.
Introduction to GATE Developer Ian Roberts. University of Sheffield NLP Overview The GATE component model (CREOLE) Documents, annotations and corpora.
CLARIN-NL ISOcat workshop 2012 part 2 ( ) Ineke Schuurman Menzo Windhouwer.
ISOcat: known issues 19 June 20121CLARIN-NL ISOcat workshop.
11 CMDI/ISOcat And Semantic Operability Ineke Schuurman ISOcat content coördinator CLARIN-NL Menzo Windhouwer ISOcat system administrator Utrecht
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands NP CMDI-1 Metadata Component Framework New Standardization.
CLARIN Issues Peter Wittenburg MPI for Psycholinguistics Nijmegen, NL.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
Project Overview Vangelis Karkaletsis NCSR “Demokritos” Frascati, July 17, 2002 (IST )
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
ISO-PWI Lexical ontology some loose remarks Thierry Declerck, DFKI GmbH.
ISOcat: How to create a DC (including “do’s and don’ts”) 19 June 20121CLARIN-NL ISOcat tutorial.
CLARIN-NL Requirements and Desiderata Jan Odijk CLARIN-NL Call 3 Info-session Utrecht, 25 Aug 2011.
Beyond ISOcat 20 June 2013CLARIN-NL ISOcat tutorial1.
1 CLARIN - NL What is going on? Jan Odijk Amsterdam 26 Aug 2010.
Agenda CMDI Tutorial 9.30 Welcome & Coffee Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.30CMDI & ISO-DCR 10.50The CMDI.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
WP 3: Standardisation of shared metadata Mode of operation –All partners are involved –Building on practice outside the project Achievements of Year 1.
The ISO Data Category Registry ISO 12620:2009 introduces – A web-based electronic Data Category Registry (DCR) for simple, complex and (in the future)
ISOcat status
CLARIN Requirements for a Semantic Registry Daan Broeder The Language Archive – MPI Ineke Schuurman CLARIN-NL/VL – KU Leuven & Utrecht.
1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010.
1 CLARIN? ISOCAT! Ineke Schuurman Hilversum,
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
UNL Document Summarization Virach Sornlertlamvanich, Tanapong Potipiti and Thatsanee Charoenporn Information Research and Development Division National.
ISOcat: How to create a DC (including “do’s and don’ts”) 20 June 20131CLARIN-NL ISOcat tutorial.
Linking to Linguistic Data Categories in ISOcat Menzo Windhouwer a, Sue Ellen Wright b a The Language Archive - MPI for Psycholinguistics,
2018/4/14 SMC4LRT Semantic Mapping Component for Language Resources and Technology Matej Ďurčo, ICLTT, Vienna;
TEI Workshop 10. ROMA Summer 2010.
Presentation transcript:

CMD and TEI CMDI interoperability workshop Utrecht Matej Ďurčo, ICLTT, Vienna

TEI at ICLTT AAC – Austrian Academy Corpus – diachronic corpus ~ 500 mil. tokens – being converted into TEI C4 – distributed corpus of german of 20 th century – Basel, Berlin, Bozen, Wien – harmonized format (TEI/teiHeader) Dict-Gate – TEI encoded multilingual lexicons (persian, arabic, german, english) – however described with LexicalResourceProfile Abacus – Austrian Baroque Corpus – 3 (5) historical texts encoded in TEI – elaborate teiHeader 2

TEI (and friends?) in CMD 3 ProjektAuthor, YearProfileComp/Elem/Datcatsinstances Deutsches Text Archiv ? teiHeader #clarin.eu:cr1:p_ (NOT in CompReg!) 56/82/10857 ICLTTDurco, 2010 teiHeader #clarin.eu:cr1:p_ /35/13 (7 dublincore, 6 isocat) 467 Leipzig Corpora Eckart, 2012 TEIDocumentDescription #clarin.eu:cr1:p_ /17/17 (isocat) ? NederlabZhang 2013 ? DBNL_Tekst #clarin.eu:cr1:p_ DBNL_Tekst_Onzelfstandig #clarin.eu:cr1:p_ (private) 20/38/15 20/47/21? overview of currently existing TEIish CMD-profiles

teiHeader (ICLTT) 4 size = reuse in other profiles

teiHeader (DTA) 5 size = count elements in instance data

datcats in teiHeader(DTA) 6

TEI and ISOcat a special DCS: TEi Header (2.1.0) – Windhouwer, 2012 – a datcat for every element of the teiHeader (135 datcats) – based on an ODD-file (ODD2DCIF.xsl and DCIF2ODD.xsl available) – owed to CLARIN-NL projects using TEI header a enriched schema was generated = annotated with these new data categories ( dcr:datcat -attribute) put in SCHEMAcat: define relations between TEI and other data categories in RELcat (the relation registry) 7

Next Step(s) ? create (or adapt existing) teiHeader profile – as a union of the existing profiles ? – based on the enriched schema – i.e. linking to the new TEI data categories – define a relation set in RELcat between TEI and ISOcat (and dublincore) data categories 8

profile: data (LINDAT) dublincore + metashare 9

profile: data (LINDAT) resourceInforesourceInfo-component 10

dublincore I 2 profiles with dc-terms (55 datacategories) 2 profiles with dc-elements (called „dc-terms“) as of

dublincore II currently ( ) 4 DCMI-terms profiles 4 DCMI-terms profiles 12

dublincore III 13 (almost) all datcats shared by all

dublincore IV 1 profile has extra component: DANS-DC-metadata example: language 14