Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enhancing the Quality of Metadata by using Authority Control Thorsten Trippel, Claus Zinn LDL 2016 Workshop at LREC May 23-28, Portorož (Slovenia)

Similar presentations


Presentation on theme: "Enhancing the Quality of Metadata by using Authority Control Thorsten Trippel, Claus Zinn LDL 2016 Workshop at LREC May 23-28, Portorož (Slovenia)"— Presentation transcript:

1 Enhancing the Quality of Metadata by using Authority Control Thorsten Trippel, Claus Zinn LDL 2016 Workshop at LREC May 23-28, 2016 @ Portorož (Slovenia) Department of General and Computational Linguistics

2 Overview Virtual Language Observatory -nearly 900.000 language resources -described with CMDI-based metadata -harvested from about 30 centres via OAI-PMH -considerable amount of heterogeneity in the data -negative impact on VLO usability Tackle data heterogeneity -use of controlled vocabulary whenever possible, in particular, -use of authority files for identification of persons, organisations, geographic places used in the library world -bringing together VLO cataloguing with library cataloguing -improve facet browsing experience in the VLO

3 CMDI & VLO Virtual Language Observatory - faceted browser, giving access to 900.000 items via a dozen facets - items described with different CMDI profiles from about 30 data providers - with considerable amount of data heterogeneity, expensive curation Component MetaData Infrastructure (CMDI) element-in-element, lego-brick approach to metadata modeling basic descriptors (concept registry), components (component registry) metadata standard in the CLARIN world

4 Two issues: Data curration Connection with other available data 4

5 O.K. Three....... Availability of content of language resources - Copyright -... 5

6 Connection with other available data/linked data Metadata publicly accessible - Open protocol - Standard format - Widely used standard (!) Definition of data categories accessible - Human readable - examples BUT: - No SPARQL endpoints - Data format not RDF - Conversion of data formats into RDF != linked data 6

7 Data Curation Issues VLO list of organizations

8 BBAW metadata (DNB) VLO list of organizations

9 Authority Organizations GND (Gemeinsame Normdatei) 2.5M persons VIAF (Virtual International Authority File) ISNI (International Standard Name Identifier) 2.5M persons Library of Congress Control Number Open Researcher and Contributor ID (ORCID) 1M persons ResearcherId, GeoNames, […] all offer persistent URLs, and are widely used http://www.isni.org/0000000118749683 http://d-nb.info/gnd/143840657 http://viaf.org/viaf/37069402 http://id.loc.gov/authorities/names/n97014413

10 Authority file information for the authors

11 Enhancing Metadata with authority identifiers Interlinking with different resources possible -Common identifier -Link into the linked data cloud Robust against non-normalized spelling -Change of names (!) -Language variation Variety of different authority providers possible

12 Add Authority Files to Metadata for linguistic resources according to the CMDI Framework associate strings of persons, organizations, etc. with their respective authority file. definition of concepts and components in CLARIN registries adapting all CMDI schemas to include components allowing authority file references adding AF information to all CMDI instances -about 60% of all researcher names have an AF record -nearly all organizations have an authority file entry -similarly, for geographic locations

13 XML Encoding in CMDI

14 Related Work CLAVAS initiative for VLO data curation effort Bringing CMDI-based metadata closer towards other metadata standards -mapping CMDI to MARC21 -mapping CMDI-based representation to RDF -using pURIs whenever possible  Authority files as high quality source  links data to other data sources, in particular, library catalogues

15 Connection with other available data/linked data Metadata publicly accessible - Open protocol - Standard format - Widely used standard (!) Definition of data categories accessible - Human readable - examples BUT: - No SPARQL endpoints  trivial - Data format not RDF  conversion exists - Conversion of data formats into RDF != linked data  But with authority file references.... 15

16 Conclusion Authority files excellent data source Use of authority files greatly improves metadata quality If all CMDI data providers would adopt AFs -Search through aggregated data sources improves -better linking to library catalogues -researchers linked to their traditional publications and to the research data they created Data sharing at the URI level pays off, makes CMDI metadata a part of the linked open data world Conversion of CMDI to RDF becomes sensible


Download ppt "Enhancing the Quality of Metadata by using Authority Control Thorsten Trippel, Claus Zinn LDL 2016 Workshop at LREC May 23-28, Portorož (Slovenia)"

Similar presentations


Ads by Google