Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Rightscom – All rights reserved Testbed for Interoperable Metadata for Ebooks Hugh Look (Project Manager)

Similar presentations


Presentation on theme: "© Rightscom – All rights reserved Testbed for Interoperable Metadata for Ebooks Hugh Look (Project Manager)"— Presentation transcript:

1 © Rightscom – All rights reserved Testbed for Interoperable Metadata for Ebooks Hugh Look (Project Manager)

2 TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Testbed for Interoperable Metadata for Ebooks Which spells… TIME Weird coincidence, isnt it?

3 TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved We built a TIME machine The team included the very memorable Kane Richmond as Brick Bradford & Linda Leighton as June Salisbury And, of course, the Time Top

4 TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved When they heard we were building a time machine, of course the client wanted…

5 TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved So we said hold on…its only a testbed…

6 TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Overview of the project Objectives To develop a testbed system to support ebook cataloguing "The testbed will help provide solutions to one of the key challenges identified for the takeup of ebooks: the lack of standardised e-book catalogue records and also the lack of interoperability between different e-book metadata records." The key participants EPICentre Rightscom Supported by Book Industry Communications Helen Henderson

7 TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Overview of the project (Cont.) Formats we are transforming Relevance confirmed by librarians &VLE specialists Dublin Core – Simple and Qualified Onix MARC LOM No publishers in project using this at present - we will transform other formats to LOM, but not from it > Not specifically an e-book standard LOM input can be added later (as can any other format)

8 TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Overview of the project (Cont.) Key concept: map to and from a single intermediate format > Intermediate format is comprehensive > Extensible to new formats Data Records were obtained from publishers and intermediaries (to whom many thanks are due): > Oxford University Press > Taylor and Francis > Cambridge University Press > OCLC A total of 1886 records were received, in DC, MARC and Onix format.

9 TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Requirements process Review of requirements from documentation Requirements analysis focused on needs of libraries Range of documents identified None contain complete requirements Synthesis presented to workshop

10 TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Requirements process: standards at the centre Requirements validation workshop Focused on standards No radical disagreements or additions to synthesis Confirmed standards identified during analysis process were appropriate No other significant issues identified

11 TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Delivery Working transform system Simple user interface Testbed released to JISC Packaged for installation by further testers

12 TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Standards (brace yourselves)

13 FRBR Handle Multimedia ISRC ISAN ISMN CIS Dublin Core IMS DOI IIM ISWC URL URN SICI today 1980s mid 90s Books Audio Audiovisual Libraries Copyright Journals Magazines Newspapers STANDARDS Education MARC CAE ISBN ISSN Music Texts EAN Technology Archives Museums UPC ISO codes IPI UMID ISTC SMPTE DMCS EPICS ONIX LOM abc MPEG-7 MPEG-21 ISO11179 RDF XML schema IPDA PRISM eBooks OeBF NITF CIDOC CrossRef P/META XrML URI BICI MPEG21 RDD/REL MI3P SCORM NewsML GRid MPid MWLI SAN V-ISAN ERMI DAISY METS MODS OWL

14 The testbed eBook Catalogue – Common (generic) semantic and syntactic format MARC Dublin Core ONIX LOM MARC ONIX Dublin Core LOM Many data formats

15 The longer-term potential eBook Catalogue – Common (generic) semantic and syntactic format MARC Dublin Core ONIX LOM MARC ONIX Dublin Core LOM Other

16 TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Technical Technology & tools Fedora Open Source XML repository XML schemas and XSLT transforms Internal generic representation: Contextual Ontologyx Architecture (COA) OAI-PHM compliance

17 TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Issues for interoperability The hub needs to be at least as rich as all of the spokes put together The value mappings need to preserve all their semantics in the hub

18 TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Mapping to COA MARC tag=100 subfield a=Kreigel, Mark subfield e=author Dublin Core creator=Kriegel, Mark ONIX Contributor ContributorRole=B01 PersonNameInverted=Kriegel, Mark NamesBeforeKey=Mark KeyName=Kriegel COA A IsA Resource A IsA EBook A HasAuthor B B IsA Party B HasName C C HasNameInverted Kriegel, Mark C HasNamePart D D HasValue Kreigel D IsA KeyName D HasIdentifier E E HasValue 1 E IsA SequenceNumber C HasNamePart F F HasValue Mark F IsA NamesBeforeKeyName F HasIdentifier G G HasValue 2 G IsA SequenceNumber

19 TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Scheme to scheme mapping issues ONIX – rich and well-structured – good input format creating accurate if limited output in MARC or DC. MARC – rich, but not always well defined or unambiguous – weaker as an input format (made to be read by humans, not computers) Dublin Core: input data weak and often uncontrolled, so transforms no better But can output richer Qualified Dublin Core from both MARC and ONIX. LOM: Pedagogic classifications not generally captured in MARC, ONIX or DC, so poor match at that level. But even weak transforms can create basic records that can be added to later

20 TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Semantic loss: relative strength of metadata schemes Transformations both in and out of Dublin Core were generally poor Relative semantic poverty and ambiguity. As a source schema, unqualified or lightly-qualified Dublin Core has huge limitations dc:date may be the date of creation, date of publication (and if so, where?) or of anything else. > Unless a default assumption is made such data cannot be transformed and is lost dc:identifier often does not provide the IdentifierType, which renders it meaningless. Text in dc:coverage text may mean more or less anything. No controlled values in basic DC Code lists such as those supported by Onix and MARC cannot be mapped into Dublin Core as a basis for automated transformation is effectively a non- starter Has its uses as a human readable record, As an output schema, DC does much better Good DC records can be produced from ONIX or MARC input Both ONIX and MARC are good as source schemas for descriptive eBook metadata. Some inherent limitations, but most of these can be overcome

21 TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Data quality issues: errors Generally the quality of data supplied was good (but small sample) Amount of data contained in each record Homogeneity of metadata from record to record Input data inevitably contains errors Random and systematic

22 TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Data quality issues: errors (cont) Systematic Most frequent was misinterpretation of some fields - used for data that belongs elsewhere. > One set of ONIX input data included email address of sender in FromPerson element (There is a specific FromEmail element available for this) Use of a wrong ONIX code > Another derived from print book data contained ONIX format codes showing each eBook incorrectly to be either a Hardback or a Paperback book.

23 TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Data quality issues: errors (cont) Random Discovered by chance when analysing a few test files. > E.g. affiliated role of an author (Professor Of Physics) sometimes included with his affiliated institute (University Of Somewhere), sometimes included in a separate field For random data errors there is nothing that can be done post-hoc apart from manual correction when an error is detected > Needs improved QA processes at source Considerable scope for post-hoc management for systematic or habitual errors

24 TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Data quality issues: variant schemas Variations in the way in which a particular schema is implemented One publisher used an ONIX format code to indicate the code of the source printed book, instead of the eBook Another supplier had a set of controlled values which they used for a particular MARC tag which were not standard, but were internally consistent. In another case MARC tag 043 (Geographic Area Code) was apparently used in a particular and consistent way by the supplier, but as MARC itself is non-specific, nothing could be done with the data in the general scheme. MARC users also have their own variant practise Especially in the use of internal 900 tags

25 TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Data quality issues: variant schemas (cont) It has been remarked that as there are over 50 UK publishers now providing data in ONIX, there are over 50 variant ONIX schemas Not in principle a problem for the COAX one-to-many approach Variant mappings can be made for different sources where consistent behaviour is identified To maintain such variations in conventional pairwise mapping very resource-intensive

26 TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Issues at end of project? None of the existing metadata standards meets all the requirements Publishers apply the standards differently COA model can handle variations much more efficiently than pairwise mapping Rich standards (e.g. LOM) will require additional effort from publishers Impact of new models for selling and supplying e-books?

27 TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved TIME is only just (the) beginning…

28 TIME: presentation to Discovery and Access seminar 13 December 2006 © Rightscom 2006– All rights reserved Thank you Hugh Look www.rightscom.com 020 7620 4433


Download ppt "© Rightscom – All rights reserved Testbed for Interoperable Metadata for Ebooks Hugh Look (Project Manager)"

Similar presentations


Ads by Google