© Rightscom – All rights reserved Testbed for Interoperable Metadata for Ebooks Hugh Look (Project Manager)

Slides:



Advertisements
Similar presentations
Putting the Pieces Together Grace Agnew Slide User Description Rights Holder Authentication Rights Video Object Permission Administration.
Advertisements

Metadata issues and DOI doi>. Presentation based on one given at the IDF members conference, July 2000… and Forthcoming additional Handbook material on.
Doi> DOI Standardisation DOI Tools and Technologies.
Metadata issues and DOI doi>. overview of presentation... Background Three conclusions The metadata landscape: which schemes matter most to DOI? DOI metadata.
Resource description and access for the digital world Gordon Dunsire Centre for Digital Library Research University of Strathclyde Scotland.
Godfrey Rust, Ontologyx © Rightscom 2004Presentation to DOI Members meeting June 2004 Godfrey Rust, Ontologyx DOI Metadata consultant Metadata interoperability.
DC2001, Tokyo DCMI Registry : Background and demonstration DC2001 Tokyo October 2001 Rachel Heery, UKOLN, University of Bath Harry Wagner, OCLC
OCLC Online Computer Library Center OCLC Cataloging Update Connexion client 1.50 & more OCLC CJK Users Group Annual Meeting San Francisco, CA April 8,
Interoperability: the value of recombinant potential Lorcan Dempsey VP Research and Chief Strategist ARLIS 2004, New York, April 2004.
Putting together a METS profile. Questions to ask when setting down the METS path Should you design your own profile? Should you use someone elses off.
Metadata workshop, June The Workshop Workshop Timetable introduction to the Go-Geo! project metadata overview Go-Geo! portal hands on session.
1 Demystifying metadata Ann Chapman UKOLN University of Bath UKOLN is funded by Resource: The Council for Museums, Archives and Libraries, the Joint Information.
The PREMIS Data Dictionary Michael Day Digital Curation Centre UKOLN, University of Bath JORUM, JISC and DCC.
UKOLN, University of Bath
The future – what needs to happen? The publisher view Ed Pentz Executive Director, CrossRef Discovery and access: standards and the information chain,
The future – what needs to happen? The library view Hazel Woodward University Librarian, Cranfield University Discovery and access: standards and the information.
A publishers perspective on standards Discovery and Access: Standards and the Information Chain 7 December 2006 Cliff Morgan, John Wiley & Sons, Ltd.
“Microlicensing”: towards more effective mechanisms to support copyright compliance on the network A workshop session for UKSG 2009 – Torquay Mark Bide,
The analysis Godfrey Rust, Data Definitions, London W3C DRM workshop, January 2001 January 2001.
National Digital Repository ® Preserving the imperfect: reflections from NDAD and elsewhere Kevin Ashley Head of Digital Archives Group ULCC.
An Introduction to MODS: The Metadata Object Description Schema Tech Talk By Daniel Gelaw Alemneh October 17, 2007 October 17, 2007.
The Vocabulary Mapping Framework and its potential for improving metadata interoperability in the Semantic Web. Gordon Dunsire Presented to the EUROVOC.
Gathering Data NISO E-Resource Management Forum Denver, Colorado September 24-25, 2007 Oliver Pesch EBSCO Information Services
© Tefko Saracevic, Rutgers University1 metadata considerations for digital libraries.
Dspace – Digital Repository Dawn Petherick, University Web Services Team Manager Information Services, University of Birmingham MIDESS Dissemination.
OCLC Online Computer Library Center Two Paths to Interoperable Metadata Jean Godby, Devon Smith, Eric Childress DC-2003 September 29, 2003.
OCLC Online Computer Library Center A Global OpenURL Resolver Registry Phil Norman OCLC Dlsr4lib Workshop March 23 rd, 2006 Arlington VA.
UKOLUG - July Metadata for the Web RDF and the Dublin Core Andy Powell UKOLN, University of Bath UKOLN.
By Carrie Moran. To examine the Metadata Object Description Schema (MODS) metadata scheme to determine its utility based on structure, interoperability.
ISO as the metadata standard for Statistics South Africa
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Metadata: An Overview Katie Dunn Technology & Metadata Librarian
NAMING: A Key Component of Robust Resolution/Linking Albert Simmonds Business Manager Open Names Service Online Computer Library Center Dublin, Ohio, USA.
Metadata Xiangming Mu. What is metadata? What is metadata? (cont’) Data about data –Any data aids in the identification, description and location of.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
Metadata Considerations Implementing Administrative and Descriptive Metadata for your digital images 1.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Content and Computer Platforms Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers.
Is Dublin Core Dying? Kayla Willey – Brigham Young University Cheryl Walters – Utah State University Utah Library Association Annual Conference St. George,
Modularization and Interoperability: Dublin Core and the Warwick Framework Sandra D. Payette Digital Library Research Group Cornell University November.
Evolving MARC 21 for the future Rebecca Guenther CCS Forum, ALA Annual July 10, 2009.
Metadata Bridget Jones Information Architecture I February 23, 2009.
SCORM Course Meta-data 3 major components: Content Aggregation Meta-data –context specific data describing the packaged course SCO Meta-data –context independent.
A centre of expertise in digital information managementwww.ukoln.ac.uk DCMI Affiliates: Implications for Institutions Rosemary Russell UKOLN University.
Resource Description and Access (RDA) information session Deirdre Kiorgaard Australian Committee on Cataloguing Representative to the Joint Steering Committee.
C-2-C Industry workshop The future starts with DRM.
1 Not So Strange Bedfellows: Information Standards For Librarians AND Publishers November 6, 2015.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Digital Object Identifier doi> Norman Paskin The International DOI Foundation W3C DRM workshop January 22/
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
1 Educational Metadata Paul Miller Interoperability Focus UKOLN U KOLN is funded by Resource: the Council for.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
Metadata & Repositories Jackie Knowles RSP Support Officer.
Developing a Dark Archive for OJS Journals Yu-Hung Lin, Metadata Librarian for Continuing Resources, Scholarship and Data Rutgers University 1 10/7/2015.
ONIX-PH and Communications Between Preservation Agencies, The Keepers Registry, and Libraries Briefing session, 22 nd May 2013 Tim Devenport & Kathy Klemperer.
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
1 Metadata: an overview Alan Hopkinson ILRS Middlesex University.
Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
Introduction to Metadata
Metadata to fit your needs... How much is too much?
PREMIS Tools and Services
Metadata in Digital Preservation: Setting the Scene
Oya Y. Rieger Cornell University Library May 2004
Some Options for Non-MARC Descriptive Metadata
Presentation transcript:

© Rightscom – All rights reserved Testbed for Interoperable Metadata for Ebooks Hugh Look (Project Manager)

TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Testbed for Interoperable Metadata for Ebooks Which spells… TIME Weird coincidence, isnt it?

TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved We built a TIME machine The team included the very memorable Kane Richmond as Brick Bradford & Linda Leighton as June Salisbury And, of course, the Time Top

TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved When they heard we were building a time machine, of course the client wanted…

TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved So we said hold on…its only a testbed…

TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Overview of the project Objectives To develop a testbed system to support ebook cataloguing "The testbed will help provide solutions to one of the key challenges identified for the takeup of ebooks: the lack of standardised e-book catalogue records and also the lack of interoperability between different e-book metadata records." The key participants EPICentre Rightscom Supported by Book Industry Communications Helen Henderson

TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Overview of the project (Cont.) Formats we are transforming Relevance confirmed by librarians &VLE specialists Dublin Core – Simple and Qualified Onix MARC LOM No publishers in project using this at present - we will transform other formats to LOM, but not from it > Not specifically an e-book standard LOM input can be added later (as can any other format)

TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Overview of the project (Cont.) Key concept: map to and from a single intermediate format > Intermediate format is comprehensive > Extensible to new formats Data Records were obtained from publishers and intermediaries (to whom many thanks are due): > Oxford University Press > Taylor and Francis > Cambridge University Press > OCLC A total of 1886 records were received, in DC, MARC and Onix format.

TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Requirements process Review of requirements from documentation Requirements analysis focused on needs of libraries Range of documents identified None contain complete requirements Synthesis presented to workshop

TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Requirements process: standards at the centre Requirements validation workshop Focused on standards No radical disagreements or additions to synthesis Confirmed standards identified during analysis process were appropriate No other significant issues identified

TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Delivery Working transform system Simple user interface Testbed released to JISC Packaged for installation by further testers

TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Standards (brace yourselves)

FRBR Handle Multimedia ISRC ISAN ISMN CIS Dublin Core IMS DOI IIM ISWC URL URN SICI today 1980s mid 90s Books Audio Audiovisual Libraries Copyright Journals Magazines Newspapers STANDARDS Education MARC CAE ISBN ISSN Music Texts EAN Technology Archives Museums UPC ISO codes IPI UMID ISTC SMPTE DMCS EPICS ONIX LOM abc MPEG-7 MPEG-21 ISO11179 RDF XML schema IPDA PRISM eBooks OeBF NITF CIDOC CrossRef P/META XrML URI BICI MPEG21 RDD/REL MI3P SCORM NewsML GRid MPid MWLI SAN V-ISAN ERMI DAISY METS MODS OWL

The testbed eBook Catalogue – Common (generic) semantic and syntactic format MARC Dublin Core ONIX LOM MARC ONIX Dublin Core LOM Many data formats

The longer-term potential eBook Catalogue – Common (generic) semantic and syntactic format MARC Dublin Core ONIX LOM MARC ONIX Dublin Core LOM Other

TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Technical Technology & tools Fedora Open Source XML repository XML schemas and XSLT transforms Internal generic representation: Contextual Ontologyx Architecture (COA) OAI-PHM compliance

TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Issues for interoperability The hub needs to be at least as rich as all of the spokes put together The value mappings need to preserve all their semantics in the hub

TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Mapping to COA MARC tag=100 subfield a=Kreigel, Mark subfield e=author Dublin Core creator=Kriegel, Mark ONIX Contributor ContributorRole=B01 PersonNameInverted=Kriegel, Mark NamesBeforeKey=Mark KeyName=Kriegel COA A IsA Resource A IsA EBook A HasAuthor B B IsA Party B HasName C C HasNameInverted Kriegel, Mark C HasNamePart D D HasValue Kreigel D IsA KeyName D HasIdentifier E E HasValue 1 E IsA SequenceNumber C HasNamePart F F HasValue Mark F IsA NamesBeforeKeyName F HasIdentifier G G HasValue 2 G IsA SequenceNumber

TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Scheme to scheme mapping issues ONIX – rich and well-structured – good input format creating accurate if limited output in MARC or DC. MARC – rich, but not always well defined or unambiguous – weaker as an input format (made to be read by humans, not computers) Dublin Core: input data weak and often uncontrolled, so transforms no better But can output richer Qualified Dublin Core from both MARC and ONIX. LOM: Pedagogic classifications not generally captured in MARC, ONIX or DC, so poor match at that level. But even weak transforms can create basic records that can be added to later

TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Semantic loss: relative strength of metadata schemes Transformations both in and out of Dublin Core were generally poor Relative semantic poverty and ambiguity. As a source schema, unqualified or lightly-qualified Dublin Core has huge limitations dc:date may be the date of creation, date of publication (and if so, where?) or of anything else. > Unless a default assumption is made such data cannot be transformed and is lost dc:identifier often does not provide the IdentifierType, which renders it meaningless. Text in dc:coverage text may mean more or less anything. No controlled values in basic DC Code lists such as those supported by Onix and MARC cannot be mapped into Dublin Core as a basis for automated transformation is effectively a non- starter Has its uses as a human readable record, As an output schema, DC does much better Good DC records can be produced from ONIX or MARC input Both ONIX and MARC are good as source schemas for descriptive eBook metadata. Some inherent limitations, but most of these can be overcome

TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Data quality issues: errors Generally the quality of data supplied was good (but small sample) Amount of data contained in each record Homogeneity of metadata from record to record Input data inevitably contains errors Random and systematic

TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Data quality issues: errors (cont) Systematic Most frequent was misinterpretation of some fields - used for data that belongs elsewhere. > One set of ONIX input data included address of sender in FromPerson element (There is a specific From element available for this) Use of a wrong ONIX code > Another derived from print book data contained ONIX format codes showing each eBook incorrectly to be either a Hardback or a Paperback book.

TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Data quality issues: errors (cont) Random Discovered by chance when analysing a few test files. > E.g. affiliated role of an author (Professor Of Physics) sometimes included with his affiliated institute (University Of Somewhere), sometimes included in a separate field For random data errors there is nothing that can be done post-hoc apart from manual correction when an error is detected > Needs improved QA processes at source Considerable scope for post-hoc management for systematic or habitual errors

TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Data quality issues: variant schemas Variations in the way in which a particular schema is implemented One publisher used an ONIX format code to indicate the code of the source printed book, instead of the eBook Another supplier had a set of controlled values which they used for a particular MARC tag which were not standard, but were internally consistent. In another case MARC tag 043 (Geographic Area Code) was apparently used in a particular and consistent way by the supplier, but as MARC itself is non-specific, nothing could be done with the data in the general scheme. MARC users also have their own variant practise Especially in the use of internal 900 tags

TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Data quality issues: variant schemas (cont) It has been remarked that as there are over 50 UK publishers now providing data in ONIX, there are over 50 variant ONIX schemas Not in principle a problem for the COAX one-to-many approach Variant mappings can be made for different sources where consistent behaviour is identified To maintain such variations in conventional pairwise mapping very resource-intensive

TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Issues at end of project? None of the existing metadata standards meets all the requirements Publishers apply the standards differently COA model can handle variations much more efficiently than pairwise mapping Rich standards (e.g. LOM) will require additional effort from publishers Impact of new models for selling and supplying e-books?

TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved TIME is only just (the) beginning…

TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Thank you Hugh Look