JATS for Ejournals and BITS for Ebooks-- Adopting BITS for Scholars Portal Ebook Repository JATS conference April 22, 2015
OCUL Scholars Portal Services OCUL is a consortium of twenty-one university libraries in the province of Ontario. Scholars Portal is a project of OCUL to provide shared technology infrastructure and shared collections to OCUL universities. Scholars Portal services includes digital content of ebooks, ejournals, statistics data, geo data and other services including interlibrary loan and research management.
SP E-Books--Introduction Digital repository containing 600,000 Ebooks from collections of various commercial publishers and open access titles from Internet Archive Running on a PDF-based system MARC as the metadata format
SP E-Books--Introduction While the PDF is still the dominating format, the publishers start to move from PDF to XML book. The XML books are loaded on Ebook platform, but with some major problems. MARC as the metadata format delays the data loading process.
Publisher’s Source Data Metadata and PDF full text -- Book metadata in MARC/Onix and one PDF file for the whole book --Book metadata in XML in various DTD/schema and one PDF file for each chapter --Chapter metadata in XML along with one PDF file for each chapter XML full text with/without PDF full text --One single XML file for the whole book --One XML file for each chapter a long with the book metadata XML
Adopting BITS The scope of BITS is a right fit to SP Ebook collection. The flexibility of managing books and book parts as separate files and assembling into a final document as need. The experience of using JATS
Project Goal Ultimate goal is building a new book application based on BITS XML in MarkLogic. At the current stage, we will be transforming the publishers' native data into BITS format and begin populating a new BITS XML database in MarkLogic while feeding the BITS data into ISIS.
Work Flow
SP Ebook data structure in MarkLogic --File naming
SP Ebook data structure in MarkLogic --Book Level XML
SP Ebook data structure in MarkLogic --Book part XML
Data transformation process Crosswalk--created manually in excel sheet Loader—Written in XSLT testing
Data transformation practice -- Springer book DTD to BITS Source metadata in book and chapter level using Springer book DTD A++ V2.4. Elements hard to be matched to BITS --ChapterCategory --SeriesAbbreviatedTitle
Data transformation practice -- NLM book DTD v.2.3 to BITS Harvard University Press deliver us the data in Full text XML using NLM book DTD v.2.3 Both book level and chapter level XML are delivered Transformation to BITS is straightforward
Data transformation practice -- MARC to BITS Elements not found in BITS ##$a74 p. of ill., $a975.5/4252/00222$222
Meet the challenges Decide the types of loader needed for the data transformation. Plan the modular design to reduce the loader work. Maintain the related document on the wiki. Decide the list of key elements that must be transformed. maintain a standard list of attribute to ensure the attribute vale is normalized from different source data
Future Plans Transform the XML full text part. Transform the XML full text reference work such as the dictionaries, encyclopedia. Build the new Ebook application based on BITS XML
Questions?