Presentation is loading. Please wait.

Presentation is loading. Please wait.

The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure.

Similar presentations


Presentation on theme: "The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure."— Presentation transcript:

1 The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure Nijmegen, The Netherlands

2 Old Concept of course "metadata" is an old concept library cards were introduced to cope with mass and anonymity not surprising that library people started thinking about this to describe all kind web-accessible resources DC and qualified DC wee the results however, research world is different - not just search therefore in many domains solutions were developed 2 years ago CLARIN revised its 15 year old set&framework

3 Big Ideas of course managing increasing amounts of data of course finding valuable data in the growing haystacks but also machine usage of metadata automatic profile matching research statistics - virtual sub-collection building etc. multilinguality in a multilingual European society interdisciplinary research biodiversity people should find information in linguistic archives etc. linking with contextual information document lifecycle management (provenance)

4 Big Change until now researchers informed each other culture of personal exchange claim: this will only work partially in the future have distributed centers storing lots of data national and discipline dimensions depositors upload their data into these centers will have an anonymous landscape of data & tools all offered as services what do we have to find things: proper metadata descriptions social tagging by virtual organizations content to operate on by "smart" data mining

5 Big Question are we ready to meet these wishes and changes? probably not some major issues quality interoperability registry and reference stability functional multilingual scalability IT principles

6 Quality Issue lack quality in descriptions not all elements filled in (researchers are lazy, lack of tool support) often not schema based (XLS) thus inconsistent lack agreed and standardized vocabularies ISO 639-3 - about 6000 language codes what about subject classification schemes what about institution names thus many errors and inconsistencies ontologies are expensive to maintain misinterpretations/misuse of element semantics etc

7 Interoperability Issue hampered by different approaches (closed DB, no modularity, embedded ontologies) structural difficulties up to context dependency difficult semantic mapping different description dimensions bad element definitions bad vocabulary definitions only little support of OAI-PMH reliance on DC semantics - but useless for research etc often "hardwired" mappings lack of a flexible framework to create/share/use relations little is standardized - what about lifetime then

8 Registry and Reference Stability Issue flexibility only when we separate things define & register all concepts in open registries (we are using ISO 12620 - ISOcat) define & register all components/profiles (we are using CLARIN registry) register all mappings (nothing yet) but if we do this we need to refer are our references stable?? some are using Cool URIs - are they just URLs? some using explicit Handles - are they maintained? who takes care? (we are using EPIC - European PID Consortium)

9 Functional Issue do we address new functional requirements what about provenance information is it automatically generated what about versions - are they visible what about ltp information what about formal access information do we know what is needed for the web services scenario (profile matching, deployment information, etc)

10 Multilingual Issue what does it really include? localizing all software multilingual definitions of all concepts elements and vocabulary terms (no translations of proper names of course or?) or do we simply rely on some lingua franca answer probably discipline dependent how much is (should be) public involved whatever we do it is a lot of work CLARIN: ISOcat covers almost all major EU languages

11 Scalability Issue are our solutions scalable? in EUROPEANA millions of metadata records in CLARIN about 270.000 how to structure the offer how to present this to naive users do we share same granularity (md at collection and/or resource level) can we deal with aggregations in same way can we apply semantic web technology automatic mapping automatic quality improvement

12 IT Principles we need to disseminate the message of some basic IT principles define and register your semantics specify and register your syntax use a stable reference scheme in some areas separate definitions and relations get things standardized or use standards such as XML, some schema language ISO 12620, etc URI, Handles

13 What can we do? listen to each other first increase awareness about metadata and basic principles see how we can create an interoperable landscape harmonizing approaches harmonizing along major issues making things explicit and scalable look for proper interdisciplinary solutions

14 Üm nicht to end in Babylonish scenario nous avons still algo time om sistemas te improve. Thanks for your attention. moving towards an ideal e-Science domain


Download ppt "The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure."

Similar presentations


Ads by Google