Presentation on theme: "Toward a post-MARC view of bibliographic metadata Jean Godby, Senior Research Scientist Triangle Research Libraries Network workshop -- Chapel Hill, North."— Presentation transcript:
Toward a post-MARC view of bibliographic metadata Jean Godby, Senior Research Scientist Triangle Research Libraries Network workshop -- Chapel Hill, North Carolina March 15, 2012
Post-MARC bibliographic metadata2 Outline for today 1.How did I get to this place? 2.The Library of Congress Bibliographic Framework for Digital Resources 3.The OCLC Beyond MARC work agenda 4.Four guiding assumptions 5.Some questions
Post-MARC bibliographic metadata3 OCLC MARC OutputsInputs Translations in the Crosswalk service ONIX Books 2.1 ONIX Books 3.0 MODS Dublin Core OCLC MARC DC-Qualified MARC ONIX Books 2.1 ONIX Books 3.0 MODS Dublin Core DC-Qualified MARC OCLC MARC
Post-MARC bibliographic metadata4 Problems with mapping to and from MARC Problem: In a MARC record, some critical information is represented redundantly. Effect on the Crosswalk: requires one-to-many mappings, which are semantically opaque and difficult to maintain. Problem: Some MARC fields are ambiguous. Effect on the Crosswalk: The distinctions are difficult to recover or may be lost. Problem: Many MARC free-text fields have formatting requirements. Effect on the Crosswalk: They must be added in (and taken out).
Post-MARC bibliographic metadata5 And so forth….and so on Problem: Many formatting requirements are explicitly stated only in cataloging rules, not in the data that is algorithmically processed. Effect on the Crosswalk: Knowledge of the cataloging rules must be embedded in the translation software. Problem: Some MARC fields are coded with hidden assumptions. Effect on the Crosswalk: Knowledge of the hidden assumptions must be embedded in the translation software, which requires complex and brittle Boolean logic. Problem: MARC has a long tail. Effect on the Crosswalk: It is necessary to maintain a large number of mappings that are not used.
Post-MARC bibliographic metadata6 RDA or other structured metadata vocabulary OutputsInputs MARCs complexity needs to be quarantined. ONIX Books 2.1 ONIX Books 3.0 MODS Dublin Core OCLC MARC DC-Qualified MARC ONIX Books 2.1 ONIX Books 3.0 MODS Dublin Core DC-Qualified MARC OCLC MARC
Post-MARC bibliographic metadata7 In other words, with MARC in the center of our model…
Post-MARC bibliographic metadata8 The new bibliographic framework we are aiming for will broaden participation in the network of resources, librarians will be able to do a much better job of linking their patrons to resources of all kinds (from the library and from many other sources), and costs can be better contained. -- Library of Congress Bibliographic framework is... an environment rather than a format A Bibliographic Framework for the Digital Age (October 31, 2011)
Post-MARC bibliographic metadata9 resource relationship manifestation entity object data abstract library RDA service format linked authority MARC carrier groundtruthing FRBR semantic beyond content transformation RDF instance description statement schema role hadoop property UML model identifier legacy web OCLCs Beyond MARC research agenda theme
Post-MARC bibliographic metadata10 The OCLC Beyond MARC: research agenda: whos involved Eric Childress, Consulting Product Manager Eric Childress Jean Godby, Senior Research Scientist Jean Godby Thom Hickey, Chief Scientist Thom Hickey Devon Smith, Consulting Software Engineer Devon Smith Karen Smith-Yoshimura, Program Officer Karen Smith-Yoshimura Roy Tennant, Senior Program Officer Roy Tennant Diane Vizine-Goetz, Senior Research Scientist Diane Vizine-Goetz Jeff Young, Software Architect Jeff Young
Post-MARC bibliographic metadata11 Assumption 1 There are many moving targets
Post-MARC bibliographic metadata12 Dont add to the complexity. Use publicly defined standards wherever possible. Leverage the work of others. Focus on data preparation, cleanup, and modeling that will support a variety of formats. The OCLC Research response: Some guiding principles
Post-MARC bibliographic metadata13
Post-MARC bibliographic metadata14 Make your stuff available on the web. Make it available as structured data… …in a non-proprietary format. Use URLs to identify things. Link your data to other peoples data. Data preparation: principles Source: W3C Data, not text Identifiers, not strings Statements, not records Machine-readable schema Machine-readable lists Source: Karen Coyle
Post-MARC bibliographic metadata15 Assumption 2: Most bibliographic metadata will not be created by libraries
Post-MARC bibliographic metadata16 Why ONIX is interesting BB 01 McBains Ladies A01 Hunter, Evan 02 Policewomen--Fiction. Leader jm a g eng 020 $a $a Hunter, Evan 245 $a McBains ladies 260 $b Mysterious Press $d $a 320 p. 650 #2 $a Policewomen -- Fiction Leader jm a g eng 020 $a $a Hunter, Evan 245 $a McBains ladies 260 $b Mysterious Press $d $a 320 p. 650 #2 $a Policewomen -- Fiction identifier text A record string identifier string data identifier data string
Post-MARC bibliographic metadata17 A hypothetical bibliographic description expressed as linked data Ladies A01 Evan
Post-MARC bibliographic metadata18 This list is inadequate for describing the range of material types held by libraries. This list is inadequate for describing the range of material types held by libraries.
Post-MARC bibliographic metadata19 Some proposed library extensions to Schema.org.
Post-MARC bibliographic metadata20 The extensions are derived from MARC data for the WorldCat search interface.
Post-MARC bibliographic metadata21 The WorldCat search interface terms reduce a complex MARC concept space to a list.
Post-MARC bibliographic metadata22 Assumption 3: MARC will be around for awhile. Assumption 4: Mapping is still necessary.
A publishing model OCLC Abstract Model model map Raw Data Standard Vocabularies RDA or other structured metadata vocabulary OutputsInputs ONIX Books 2.1 ONIX Books 3.0 MODS Dublin Core OCLC MARC DC-Qualified MARC ONIX Books 2.1 ONIX Books 3.0 MODS Dublin Core DC-Qualified MARC OCLC MARC
Post-MARC bibliographic metadata24 It is not enough To RDF-ify MARC It is not enough To RDF-ify MARC The concepts must be extracted. The concepts must be extracted. They eventually emerge. They eventually emerge.
Post-MARC bibliographic metadata25 Some (perhaps uncomfortable) questions 1.How much work will be involved in building out the abstract model? What is the value proposition? 2.How can we engage communities of practice to contribute to the parts of the abstract model that describe their resources? 3.How will mappings be implemented in the post-MARC information landscape? 4.How much information in the MARC record will get lost? 5.What will content standards look like in post-MARC descriptions? 6.How many of the FRBR and RDA concepts are algorithmically recoverable from legacy data? 7.What happens if linked data does not live up to its promise or is not adopted quickly enough?
Post-MARC bibliographic metadata26 But maps from many MARC concepts look like this. Set-theoretic mappings can be implemented elegantly in RDF/OWL.
Post-MARC bibliographic metadata27 References Coyle, Karen MARC 21 as data: a start Taking library data from here to there. Godby, Carol Jean From records to streams: merging library and publisher metadata. Library of Congress A bibliographic framework for the digital age. Library Linked Data Incubator Group final report OCLC FAST Linked Data. Schema.org Smith-Yoshimura, Karen, et al Implications of MARC tag usage on library metadata practices.