Enhancing the Quality of Metadata by using Authority Control Thorsten Trippel, Claus Zinn LDL 2016 Workshop at LREC May 23-28, Portorož (Slovenia)

Slides:



Advertisements
Similar presentations
Resource description and access for the digital world Gordon Dunsire Centre for Digital Library Research University of Strathclyde Scotland.
Advertisements

The worlds libraries. Connected. VIAF & ISNI Virtual International Authority File (VIAF) & International Standard Name Identifier (ISNI) by Titia van der.
February Harvesting RDF metadata Building digital library portals with harvested metadata workshop EU-DL All Projects concertation meeting DELOS.
Interoperability aspects in the The Virtual Language Observatory Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
Supported by EU projects 12/12/2013 Athens, Greece Open Data in Agriculture Hands-on with data infrastructures that can power your agricultural data products.
From CLARIN Component Metadata to Linked Open Data
Corey A Harper DC2006 October 4, 2006 Authority Control for the Semantic Web Encoding Library of Congress Subject Headings (LCSH) in SKOS.
TLA/CLARIN CLAVAS Use Cases: Overview CMDI integration – Metadata editing Resource Annotation Kinship data.
© Tefko Saracevic, Rutgers University1 metadata considerations for digital libraries.
National libraries and identity in the Semantic Web Gordon Dunsire BNE, Madrid, 14 Dec 2011.
Leveraging Names with Linked Data Karen Smith-Yoshimura Ralph LeVan 2010 RLG Partnership Annual Meeting Chicago, IL 9 June 2010.
OCLC Online Computer Library Center A Global OpenURL Resolver Registry Phil Norman OCLC Dlsr4lib Workshop March 23 rd, 2006 Arlington VA.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
Multilingual Issues in the Representation of International Bibliographic Standards for the Semantic Web Gordon Dunsire Independent Consultant; Chair of.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Increasing the usage of endangered language archives in the.
ISOcat demo and providing RELcat input Menzo Windhouwer The Language Archive tla.mpi.nl Data Archiving and Networked Solutions
Using IESR Ann Apps MIMAS, The University of Manchester, UK.
CLARINO WP2 National Registry and Long- Term Archiving Freddy Wetjen and Oddrun Pauline Ohren National Library of Norway Bergen, 12. September 2013.
MPEG-21 : Overview MUMT 611 Doug Van Nort. Introduction Rather than audiovisual content, purpose is set of standards to deliver multimedia in secure environment.
The role of Parthenos for CLARIN ERIC Steven Krauwer CLARIN ERIC Executive Director 1.
DASISH Metadata Catalogue Binyam Gebrekidan Gebre, Stephanie Roth, Olof Olsson, Catharina Wasner, Matej Durco, Bartholemeus Worcslav, Przemyslaw Lenkiewicz,
CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics.
Scalable Metadata Definition Frameworks Raymond Plante NCSA/NVO Toward an International Virtual Observatory How do we encourage a smooth evolution of metadata.
© 2005 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice The China Digital Museum Project.
Linguistics with CLARIN Storing resources in CLARIN Jan Odijk LOT Winterschool Amsterdam,
Laura Waugh University of North Texas 16 th International Symposium on ETDs September 25, 2013 Creating Order Out of Chaos: Introducing Name Authority.
CLARIN for Linguists Portal & Searching for Resources Jan Odijk LOT Summerschool Nijmegen,
Introduction to Web Services Eric Lease Morgan University Libraries of Notre Dame June 24, 2005.
Andy What is Wikipedia? ??? An encyclopedia The free encyclopedia, that anyone can edit Many encyclopedias (288 languages)
CLARIN Issues Peter Wittenburg MPI for Psycholinguistics Nijmegen, NL.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
METS Application Profiles Morgan Cundiff Network Development and MARC Standards Office Library of Congress.
Evidence from Metadata INST 734 Doug Oard Module 8.
RELATORS, ROLES AND DATA… … similarities and differences.
Theo van Veen, Koninklijke Bibliotheek Metadata in the context of The European Library.
JISC Information Environment Service Registry (IESR) Ann Apps MIMAS, The University of Manchester, UK.
Semantic Enhancement: Key to Massive and Heterogeneous Data Pools Violeta Damjanovic, Thomas Kurz, Rupert Westenthaler, Wernher Behrendt, Andreas Gruber,
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Introduction to the Semantic Web and Linked Data
Similarity-Based Object Metadata Browser Progress Report Rod McFarland CPSC 533C.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
CLARIN Concept Registry: the new semantic registry Ineke Schuurman, Menzo Windhouwer, Oddrun Ohren, Daniel Zeman
THE BIBFRAME EDITOR AND THE LC PILOT Module 3 – Unit 1 The Semantic Web and Linked Data : a Recap of the Key Concepts Library of Congress BIBFRAME Pilot.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
ADLUG Roma (Italy) What is known must be shared Building on the insights from OCLC Research.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
CGI – GeoSciML Testbed 3 Status for BRGM Jean-Jacques Serrano.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
| Barbara Pfeifer | VIAF workshop Strasbourg | VIAF partners: Deutsche Nationalbibliothek (DNB) Barbara Pfeifer.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.
The Earth System Curator Metadata Infrastructure for Climate Modeling Rocky Dunlap Georgia Tech.
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
RDA and linked data Gordon Dunsire Presented to Code4Lib Ottawa, MacOdrum Library, Carleton University, Ottawa, 27 April 2016.
Integrating Data for Archaeology
Repository Software - Standards
Thomas Hickey Chief Scientist OCLC Research Singapore, 2013
Recording RDA data as linked data
Cataloging the Internet
PREMIS Tools and Services
Name authority control in an evolving landscape
Session 2: Metadata and Catalogues
LOD reference architecture
RDA Community and linked data
W3C Recommendation 17 December 2013 徐江
Linked Data Ryan McAlister.
Presentation transcript:

Enhancing the Quality of Metadata by using Authority Control Thorsten Trippel, Claus Zinn LDL 2016 Workshop at LREC May 23-28, Portorož (Slovenia) Department of General and Computational Linguistics

Overview Virtual Language Observatory -nearly language resources -described with CMDI-based metadata -harvested from about 30 centres via OAI-PMH -considerable amount of heterogeneity in the data -negative impact on VLO usability Tackle data heterogeneity -use of controlled vocabulary whenever possible, in particular, -use of authority files for identification of persons, organisations, geographic places used in the library world -bringing together VLO cataloguing with library cataloguing -improve facet browsing experience in the VLO

CMDI & VLO Virtual Language Observatory - faceted browser, giving access to items via a dozen facets - items described with different CMDI profiles from about 30 data providers - with considerable amount of data heterogeneity, expensive curation Component MetaData Infrastructure (CMDI) element-in-element, lego-brick approach to metadata modeling basic descriptors (concept registry), components (component registry) metadata standard in the CLARIN world

Two issues: Data curration Connection with other available data 4

O.K. Three Availability of content of language resources - Copyright

Connection with other available data/linked data Metadata publicly accessible - Open protocol - Standard format - Widely used standard (!) Definition of data categories accessible - Human readable - examples BUT: - No SPARQL endpoints - Data format not RDF - Conversion of data formats into RDF != linked data 6

Data Curation Issues VLO list of organizations

BBAW metadata (DNB) VLO list of organizations

Authority Organizations GND (Gemeinsame Normdatei) 2.5M persons VIAF (Virtual International Authority File) ISNI (International Standard Name Identifier) 2.5M persons Library of Congress Control Number Open Researcher and Contributor ID (ORCID) 1M persons ResearcherId, GeoNames, […] all offer persistent URLs, and are widely used

Authority file information for the authors

Enhancing Metadata with authority identifiers Interlinking with different resources possible -Common identifier -Link into the linked data cloud Robust against non-normalized spelling -Change of names (!) -Language variation Variety of different authority providers possible

Add Authority Files to Metadata for linguistic resources according to the CMDI Framework associate strings of persons, organizations, etc. with their respective authority file. definition of concepts and components in CLARIN registries adapting all CMDI schemas to include components allowing authority file references adding AF information to all CMDI instances -about 60% of all researcher names have an AF record -nearly all organizations have an authority file entry -similarly, for geographic locations

XML Encoding in CMDI

Related Work CLAVAS initiative for VLO data curation effort Bringing CMDI-based metadata closer towards other metadata standards -mapping CMDI to MARC21 -mapping CMDI-based representation to RDF -using pURIs whenever possible  Authority files as high quality source  links data to other data sources, in particular, library catalogues

Connection with other available data/linked data Metadata publicly accessible - Open protocol - Standard format - Widely used standard (!) Definition of data categories accessible - Human readable - examples BUT: - No SPARQL endpoints  trivial - Data format not RDF  conversion exists - Conversion of data formats into RDF != linked data  But with authority file references

Conclusion Authority files excellent data source Use of authority files greatly improves metadata quality If all CMDI data providers would adopt AFs -Search through aggregated data sources improves -better linking to library catalogues -researchers linked to their traditional publications and to the research data they created Data sharing at the URI level pays off, makes CMDI metadata a part of the linked open data world Conversion of CMDI to RDF becomes sensible