Presentation is loading. Please wait.

Presentation is loading. Please wait.

Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.

Similar presentations


Presentation on theme: "Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010."— Presentation transcript:

1 Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Folkert.de.Vriend@meertens.knaw.nl Meertens Institute, Amsterdam 18/05/2010 LREC, Malta

2 Project partners  Daan Broeder  Dieter Van Uytvanck  Folkert de Vriend  Laura van Eerten  Griet Depoorter

3 3 Outline 1.What is CMDI? 2.What is the goal of our project? 3.How to go from a resource to harvestable metadata? 4.Findings of the project and future challenges

4 1) What is CMDI?  CLARIN MetaData Infrastructure (CMDI) is the infrastructure used for descriptive metadata in CLARIN (Common Language Resources and Technology Infrastructure)  Descriptive metadata is used to characterize data resources and tools, to facilitate discovery and management in large (virtual) infrastructures and repositories. 4

5 Advantages CMDI  Compared to other metadata infrastructures: - Flexibility - Researchers can decide what metadata fits their needs and use ready made metadata components. - Researchers can also create new metadata components if they want. - Complete Infrastructure: software for metadata modeling, editing, harvesting, exploitation - Still compatible with existing frameworks: OLAC, IMDI, TEI 5

6 Basic Component Metadata Modeling Technical Metadata Sample frequency Format Size … Lets describe a sound recording

7 Basic Component Metadata Modeling Language Technical Metadata Name Id … Lets describe a sound recording

8 Basic Component Metadata Modeling Language Technical Metadata Actor Sex Language Age Name … Lets describe a sound recording

9 Basic Component Metadata Modeling Language Technical Metadata Actor Location … Continent Country Address Lets describe a sound recording

10 Basic Component Metadata Modeling Language Technical Metadata Actor Location Project … Name Contact Lets describe a sound recording

11 Basic Component Metadata Modeling Language Technical Metadata Actor Location Project Lets describe a sound recording Metadata profile

12 Main principles behind CMDI  Component approach which is flexible and lets you design your own metadata profile  But semantics need to be declared explicitly by making use of concepts that are stored in the ISOcat registry. This way interoperability can still be guaranteed. 12

13 2) What is the goal of our project? Testing of CMDI principles by applying them to existing resources at MI and INL 13

14  Lexical resources (with proper names, monolingual and bilingual lexica, historical and scientific dictionaries)  Linguistic databases (with syntactical, morphological and phonological dialect variation)  Ethnological databases (containing data about folktales, songs, probate inventories and pilgrimages).  Corpora (spoken and written)  Historical documents (bible texts) 14 Resources at MI and INL used

15 3) Workflow from resource to harvestable metadata instance 15 A Resource analysis B Construction of XML metadata profiles for each granularity level present in resource C Add metadata to instances Resource Harvestable metadata instance Very basic tool kit for creating schema and instances

16 Let’s apply this workflow to one of the resources in the project 16  Dynamic Syntactic Atlas of the Dutch dialects (DynaSand)  A linguistic database of speech and text to chart the syntactic variation at the clausal level in 267 dialects of Dutch spoken in the Netherlands, Belgium and North-West France.

17 A) Resource analysis 17 A Resource analysis DynaSAND Data, information, metadata? Granularity levels?

18 B) Profile construction 18 B Construction of XML metadata profiles for each granularity level present in resource Use existing components

19 Existing components 19

20 B) Profile construction 20 B Construction of XML metadata profiles for each granularity level present in resource Introduce new components Introduce new components Use existing components

21 New Components 21

22 B) Profile construction 22 B Construction of XML metadata profiles for each granularity level present in resource Introduce new components Introduce new components Use existing components Link concepts in new components to existing ISOCat concepts (ensuring semantic interoperability) Link concepts in new components to existing ISOCat concepts (ensuring semantic interoperability)

23 Link concepts in new components to existing ISOCat 23

24 B) Profile construction 24 B Construction of XML metadata profiles for each granularity level present in resource Introduce new components Introduce new components Introduce new ISOCat concepts (ensuring semantic interoperability) Introduce new ISOCat concepts (ensuring semantic interoperability) Use existing components Link concepts in new components to existing ISOCat concepts (ensuring semantic interoperability) Link concepts in new components to existing ISOCat concepts (ensuring semantic interoperability)

25 Introduce new ISOCat concepts 25

26 Result 1: DynaSand collection profile 26

27 Result 2: DynaSand subcollection profile 27

28 C: Generate schemas and add metadata to instances 28 B Construction of XML metadata profiles for each granularity level present in resource C Add metadata to instances Very basic tool kit for creating schema and instances

29 Instance for DynaSand collection metadata 29

30 Workflow from resource to harvestable metadata instance 30 A Resource analysis B Construction of XML metadata profiles for each granularity level present in resource C Add metadata to instances Introduce new components Introduce new components Resource Harvestable metadata instance Introduce new ISOCat concepts (ensuring semantic interoperability) Introduce new ISOCat concepts (ensuring semantic interoperability) Data, information, metadata? Granularity levels? Use existing components Link concepts in new components to existing ISOCat concepts (ensuring semantic interoperability) Link concepts in new components to existing ISOCat concepts (ensuring semantic interoperability) Very basic tool kit for creating schema and instances

31 4) Most important findings of the project  CMDI appeared flexible enough for the resources selected at MI and INL: - Many existing components could be reused. - Where this was not possible the framework indeed made it possible to make new components.  This was the case for both IMDI and non-IMDI type of resources.  A very general issue when making existing resources available through a metadata infrastructure (not CMDI- specific), is how to deal with “data, information, metadata distinction” and granularity levels. -> Advice: keep an end user perspective (discovery and management).  Document with best practices will be made available on CLARIN.EU website. 31

32 Future challenges for CMDI  Existing ISOCat concept definitions can be too specific or too broad (“birth year” versus “birth date” f.i.). What if too many components and concepts are created and the semantics become too diffuse to be useful? - Will we need increasingly more standardization and “cleaning” effort from ISOCat in the future? - Will we need more ways of encouraging reuse of existing components and concepts? - Should we add success indicators?: “this component is already being used by 1 million satisfied customers!” - Should we make more explicit what the benefits of reuse are?: “all of these great tools can be used on your data too when you reuse components X and Y!”. 32

33 33 Some links  CLARIN-NL components: http://www.clarin.eu/cmd/components/clarin-nl/ http://www.clarin.eu/cmd/components/clarin-nl/  ISOcat data category registry: http://www.isocat.org http://www.isocat.org  Tools for creating CMDI: - XML-toolkit: http://www.clarin.eu/toolkit http://www.clarin.eu/toolkit - Component registry and browser and Arbil metadata editor: http://www.clarin.eu/cmdi

34 Thank you 34


Download ppt "Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010."

Similar presentations


Ads by Google