Presentation on theme: "Bibliographic data in the Semantic Web – what issues do we face in getting it there? Gordon Dunsire Presented to the ALCTS Cataloging and Classification."— Presentation transcript:
Bibliographic data in the Semantic Web – what issues do we face in getting it there? Gordon Dunsire Presented to the ALCTS Cataloging and Classification Section Executive Committee Forum, ALA Annual, 24 June 2011
Overview Introduction to linked data and the Semantic Web From record to statement: a paradigm shift Some issues
Linked data and RDF Resource Description Framework (RDF) Designed for machine-processing of metadata at global scale (Semantic Web) 24/7/365 Trillions of operations per second Everything must be dis-ambiguated Machines are dumb Simplicity helps! Machine-readable identifiers
RDF triple Metadata expressed as “atomic” statements A simple, single, irreducible statement The title of this book is “Cataloguing is fun!” Constructed in 3 parts “Triple” The title of this book is “Cataloguing is fun!” Subject of the statement = Subject: This book Nature of the statement = Predicate: has title Value of the statement = Object: “Cataloguing is fun!” This book – has title – “Cataloguing is fun!” subject – predicate - object
Machine-readable identifiers Uniform Resource Identifier (URI) Can be any unique combination of numbers and letters No intrinsic meaning; it’s just an identifier Can look like a URL “Cool” URI: exploits existing processes developed for the World-Wide Web But does not lead to a Web page (in principle...) RDF requires the subject and predicate of triple to be URIs Object can be a URI, or a literal string (“Cataloguing is fun!”)
Title:Cataloguing is fun! Author:Mary MacDonald Content type: Media type: LCSH: microform text Cataloging Bibliographic record: b12345Author“Mary MacDonald” b12345Title“Cataloguing is fun!” b12345Content type“text” b12345Media type“microform b12345LCSH“Cataloging” subjectpredicateobject Name authority record: 8765 Heading:MacDonald, Mary n8765Heading“MacDonald, Mary” n8765 t1234Preferred label“microform” t1234 lc1234 Heading“Cataloging”lc1234Preferred label“text”t9876
Identifiers for properties Predicates are known as properties in RDF “has key title” Properties can be mixed’n’matched Chosen from different sources (element sets) Different element sets contain similar properties “Key title (Manifestation) ” Some element sets are not available in RDF E.g. MARC21
Choosing properties/URIs for legacy records Closest inclusive meaning Minimises information loss Check the definition ISBD’s “has title proper” better than Dublin Core’s “title” (a name given to the resource.) Check other semantic constraints RDA’s “titleManifestation” implies a triple’s subject URI is a Manifestation No good for non-FRBRized records
Metadata rights Potential legal minefield Multiple agencies contributing to one record Anxiety that “others” may use open triples to build rival, competitive services Main rights associated with the record? i.e. As an aggregation of triples Can a triple be copyrighted if component URIs are openly published?
“Minting” URIs for resources Specific subject of a triple Mainly bibliographic resources URIs for Persons, Places, etc. taken from RDF “authorities” FRBRized records need separate URI for the Work, Expression, Manifestion, (Item) “Standard” identifiers only a partial solution ISBN, ISSN, national bibliography numbers, etc. Risk of different agencies creating different URIs for the same resource Inefficient, and costly to maintain namespaces
Other costs Providing access to triples Data-dump, triple store, data query (SPARQL) URIs should last forever Preservation and archive regime required De-referencing services Providing human- and machine-readable information about a URI Cost of re-engineering systems, re-designing interfaces, re-training cataloguers... But long-term benefits will justify the investment
The Semantic Web ecosystem Not just professionally-generated triples Machines generate triples by parsing content and semantic inferencing RDA anticipates... User-generated tags The madness (or wisdom) of crowds Other communities generate relevant triples Memory institutions, publishers, reference services Everybody uses triples In ways beyond our dreams...
Thank you Sponsors ALA Cataloging & Classification Quarterly MARCIVE, Inc.