Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bibliographic data in the Semantic Web – what issues do we face in getting it there? Gordon Dunsire Presented to the ALCTS Cataloging and Classification.

Similar presentations


Presentation on theme: "Bibliographic data in the Semantic Web – what issues do we face in getting it there? Gordon Dunsire Presented to the ALCTS Cataloging and Classification."— Presentation transcript:

1 Bibliographic data in the Semantic Web – what issues do we face in getting it there? Gordon Dunsire Presented to the ALCTS Cataloging and Classification Section Executive Committee Forum, ALA Annual, 24 June 2011

2 Overview  Introduction to linked data and the Semantic Web  From record to statement: a paradigm shift  Some issues

3 Linked data and RDF  Resource Description Framework (RDF)  Designed for machine-processing of metadata at global scale (Semantic Web)  24/7/365  Trillions of operations per second  Everything must be dis-ambiguated  Machines are dumb  Simplicity helps!  Machine-readable identifiers

4 RDF triple  Metadata expressed as “atomic” statements  A simple, single, irreducible statement  The title of this book is “Cataloguing is fun!”  Constructed in 3 parts  “Triple”  The title of this book is “Cataloguing is fun!”  Subject of the statement = Subject: This book  Nature of the statement = Predicate: has title  Value of the statement = Object: “Cataloguing is fun!”  This book – has title – “Cataloguing is fun!”  subject – predicate - object

5 Machine-readable identifiers  Uniform Resource Identifier (URI)  Can be any unique combination of numbers and letters  No intrinsic meaning; it’s just an identifier  Can look like a URL  “Cool” URI: exploits existing processes developed for the World-Wide Web   But does not lead to a Web page (in principle...)  RDF requires the subject and predicate of triple to be URIs  Object can be a URI, or a literal string (“Cataloguing is fun!”)

6 Title:Cataloguing is fun! Author:Mary MacDonald Content type: Media type: LCSH: microform text Cataloging Bibliographic record: b12345Author“Mary MacDonald” b12345Title“Cataloguing is fun!” b12345Content type“text” b12345Media type“microform b12345LCSH“Cataloging” subjectpredicateobject Name authority record: 8765 Heading:MacDonald, Mary n8765Heading“MacDonald, Mary” n8765 t1234Preferred label“microform” t1234 lc1234 Heading“Cataloging”lc1234Preferred label“text”t9876

7 Identifiers for properties  Predicates are known as properties in RDF   “has key title”  Properties can be mixed’n’matched  Chosen from different sources (element sets)  Different element sets contain similar properties   “Key title (Manifestation) ”  Some element sets are not available in RDF  E.g. MARC21

8 Choosing properties/URIs for legacy records  Closest inclusive meaning  Minimises information loss  Check the definition  ISBD’s “has title proper” better than Dublin Core’s “title” (a name given to the resource.)  Check other semantic constraints  RDA’s “titleManifestation” implies a triple’s subject URI is a Manifestation  No good for non-FRBRized records

9 Metadata rights  Potential legal minefield  Multiple agencies contributing to one record  Anxiety that “others” may use open triples to build rival, competitive services  Main rights associated with the record?  i.e. As an aggregation of triples  Can a triple be copyrighted if component URIs are openly published?

10 “Minting” URIs for resources  Specific subject of a triple  Mainly bibliographic resources  URIs for Persons, Places, etc. taken from RDF “authorities”  FRBRized records need separate URI for the Work, Expression, Manifestion, (Item)  “Standard” identifiers only a partial solution  ISBN, ISSN, national bibliography numbers, etc.  Risk of different agencies creating different URIs for the same resource  Inefficient, and costly to maintain namespaces

11 Other costs  Providing access to triples  Data-dump, triple store, data query (SPARQL)  URIs should last forever  Preservation and archive regime required  De-referencing services  Providing human- and machine-readable information about a URI  Cost of re-engineering systems, re-designing interfaces, re-training cataloguers...  But long-term benefits will justify the investment

12 The Semantic Web ecosystem  Not just professionally-generated triples  Machines generate triples by parsing content and semantic inferencing  RDA anticipates...  User-generated tags  The madness (or wisdom) of crowds  Other communities generate relevant triples  Memory institutions, publishers, reference services  Everybody uses triples  In ways beyond our dreams...

13 Thank you   Sponsors  ALA  Cataloging & Classification Quarterly  MARCIVE, Inc.


Download ppt "Bibliographic data in the Semantic Web – what issues do we face in getting it there? Gordon Dunsire Presented to the ALCTS Cataloging and Classification."

Similar presentations


Ads by Google