Presentation on theme: "Bibliographic data in the Semantic Web – what issues do we face in getting it there? Gordon Dunsire Presented to the ALCTS Cataloging and Classification."— Presentation transcript:
Bibliographic data in the Semantic Web – what issues do we face in getting it there? Gordon Dunsire Presented to the ALCTS Cataloging and Classification Section Executive Committee Forum, ALA Annual, 24 June 2011
Overview Introduction to linked data and the Semantic Web From record to statement: a paradigm shift Some issues
Linked data and RDF Resource Description Framework (RDF) Designed for machine-processing of metadata at global scale (Semantic Web) 24/7/365 Trillions of operations per second Everything must be dis-ambiguated Machines are dumb Simplicity helps! Machine-readable identifiers
RDF triple Metadata expressed as atomic statements A simple, single, irreducible statement The title of this book is Cataloguing is fun! Constructed in 3 parts Triple The title of this book is Cataloguing is fun! Subject of the statement = Subject: This book Nature of the statement = Predicate: has title Value of the statement = Object: Cataloguing is fun! This book – has title – Cataloguing is fun! subject – predicate - object
Machine-readable identifiers Uniform Resource Identifier (URI) Can be any unique combination of numbers and letters No intrinsic meaning; its just an identifier Can look like a URL Cool URI: exploits existing processes developed for the World-Wide Web http://iflastandards.info/ns/isbd/elements/P1001 But does not lead to a Web page (in principle...) RDF requires the subject and predicate of triple to be URIs Object can be a URI, or a literal string (Cataloguing is fun!)
Title:Cataloguing is fun! Author:Mary MacDonald Content type: Media type: LCSH: microform text Cataloging Bibliographic record: 12345 b12345AuthorMary MacDonald b12345TitleCataloguing is fun! b12345Content typetext b12345Media typemicroform b12345LCSHCataloging subjectpredicateobject Name authority record: 8765 Heading:MacDonald, Mary n8765HeadingMacDonald, Mary n8765 t1234Preferred labelmicroform t1234 lc1234 HeadingCataloginglc1234Preferred labeltextt9876
Identifiers for properties Predicates are known as properties in RDF http://iflastandards.info/ns/isbd/elements/P1004 has key title Properties can be mixednmatched Chosen from different sources (element sets) Different element sets contain similar properties http://RDVocab.info/Elements/keyTitleManifestation Key title (Manifestation) Some element sets are not available in RDF E.g. MARC21
Choosing properties/URIs for legacy records Closest inclusive meaning Minimises information loss Check the definition ISBDs has title proper better than Dublin Cores title (a name given to the resource.) Check other semantic constraints RDAs titleManifestation implies a triples subject URI is a Manifestation No good for non-FRBRized records
Metadata rights Potential legal minefield Multiple agencies contributing to one record Anxiety that others may use open triples to build rival, competitive services Main rights associated with the record? i.e. As an aggregation of triples Can a triple be copyrighted if component URIs are openly published?
Minting URIs for resources Specific subject of a triple Mainly bibliographic resources URIs for Persons, Places, etc. taken from RDF authorities FRBRized records need separate URI for the Work, Expression, Manifestion, (Item) Standard identifiers only a partial solution ISBN, ISSN, national bibliography numbers, etc. Risk of different agencies creating different URIs for the same resource Inefficient, and costly to maintain namespaces
Other costs Providing access to triples Data-dump, triple store, data query (SPARQL) URIs should last forever Preservation and archive regime required De-referencing services Providing human- and machine-readable information about a URI Cost of re-engineering systems, re-designing interfaces, re-training cataloguers... But long-term benefits will justify the investment
The Semantic Web ecosystem Not just professionally-generated triples Machines generate triples by parsing content and semantic inferencing RDA anticipates... User-generated tags The madness (or wisdom) of crowds Other communities generate relevant triples Memory institutions, publishers, reference services Everybody uses triples In ways beyond our dreams...
Thank you firstname.lastname@example.org Sponsors ALA Cataloging & Classification Quarterly MARCIVE, Inc.