Presentation is loading. Please wait.

Presentation is loading. Please wait.

Interoperability aspects in the The Virtual Language Observatory Dieter Van Uytvanck Max Planck Institute for Psycholinguistics

Similar presentations


Presentation on theme: "Interoperability aspects in the The Virtual Language Observatory Dieter Van Uytvanck Max Planck Institute for Psycholinguistics"— Presentation transcript:

1 Interoperability aspects in the The Virtual Language Observatory Dieter Van Uytvanck Max Planck Institute for Psycholinguistics Metadata in Context workshop Nijmegen

2 Metadata in Context Nijmegen Overview Context sketch VLO: ideas, sources, modalities Interoperability issues Future plans

3 Metadata in Context Nijmegen Context sketch Lots of resources somewhere out there: Data collections Corpora Lexica Grammars Multimedia recordings Software Web applications / services Old-school linguistic resources: Books Articles CD-ROMs It’s like a jungle, sometimes...

4 Metadata in Context Nijmegen VLO: the idea Researcher: “where do I start?” Provide a single entry point giving access to all information Because of the large amount of data: Drill-down paradigm (decrease search space gradually) Multiple ways of exploring: Full-text search Facet browsing Geographic overlay Unified interface, links to the original context Available via

5 Metadata in Context Nijmegen VLO: the sources

6 Metadata in Context Nijmegen VLO: the sources – LRT inventory Initiated by CLARIN Ad-hoc, low-barrier, user-driven inventory of Language Resources and Tools Number of records (+/-): Resources: 848 Tools: 180 You can add new entries yourself!

7 Metadata in Context Nijmegen VLO: the sources – OLAC catalogue > OLAC data providershttp://catalog.clarin.eu Metadata as harvested from 40 OLAC providers (among them several CLARIN centres) Quality and quantity differs hugely

8 Metadata in Context Nijmegen VLO: the sources – MPI catalogue About metadata records Broad spectrum: Experimental data Spoken Dutch corpus Sign Language corpora Endangered languages documentation Archive in principle open for externally created linguistic data collections (eg: endangered languages, see Donated Corpora) If these collections comply with the technical requirements (archiveable formats, metadata, …)

9 Metadata in Context Nijmegen VLO: the sources – DFKI tool registry Contains information about 292 (linguistic) software packages You can add entries yourself

10 Metadata in Context Nijmegen VLO: the modalities GIS

11 Metadata in Context Nijmegen VLO: the modalities Hierarchical catalogue

12 Metadata in Context Nijmegen VLO: the modalities Facet browser

13 Metadata in Context Nijmegen Interaction between modalities

14 Metadata in Context Nijmegen … all leading to the data

15 Metadata in Context Nijmegen Interoperability issues (1) The six facets to which all of the metadata records are mapped are currently country continent origin language organization genre subject

16 Metadata in Context Nijmegen Interoperability issues (2) Observations: Lots of inconsistencies and errors, eg for 1 organisation: MPI (5) MPI for Psycholinguistics (Nijmegen, Netherlands), Académie Marquisienne (Tuhuna 'Eo 'Enata) (2) MPI for Psycholinguistics (Nijmegen, Netherlands), Académie Marquisienne (Tuhuna 'Eo 'Enata) (39) Max Planck Institute for Psycholinguistics (Nijmegen, Netherlands) (112) Max Planck Institute for Psycholinguistics (13849) Max Planck Institute for Psycholinguistics & Volkswagen Stiftung (12) Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands (2) Max Planck Institute for Psycholinguistics, Postbus 310, 6500 AH Nijmegen, The Netherlands (15) Facets help to detect them

17 Metadata in Context Nijmegen Interoperability issues (3) Because of the distributed approach: Distributed responsabilities Loss of specificity by converting all metadata records to a common subset Important to provide link to original record (also for the context!) Need for high-quality and well maintained controlled vocabularies and relevant Persistent Identifiers: Mime types Organisation names ISO language codes (cfr. ISOcat) Domain-specific vocabularies

18 Metadata in Context Nijmegen Interoperability issues (4) Metadata exchange protocols exist (OAI-PMH eg) but: They are not always used For the VLO one still has to rely on non-continuous information flows like CSV files Clearly an undesired situation on the longer term Granularity: how to indicate it in a standardized way? User feedback

19 Metadata in Context Nijmegen Future steps Curate the metadata: correct typographical errors add information use consistent terminology, etc. Process CMDI- and ISOcat based metadata Use (emerging) standards to refer to persons projects resources... in a persistent and interoperable way

20 Thank you for your attention CLARIN has received funding from the European Community's Seventh Framework Programme under grant agreement n°


Download ppt "Interoperability aspects in the The Virtual Language Observatory Dieter Van Uytvanck Max Planck Institute for Psycholinguistics"

Similar presentations


Ads by Google