© January/2008 CCS Content Conversion Specialists GmbH Weidestr. 134, 22083 Hamburg, Germany consulting technology digitization services.

Slides:



Advertisements
Similar presentations
Putting together a METS profile. Questions to ask when setting down the METS path Should you design your own profile? Should you use someone elses off.
Advertisements

The Future of Scholarship in the Digital Age: The Role of Institutional Repositories Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
Delivering textual resources. Overview Getting the text ready – decisions & costs Structures for delivery Full text Marked-up Image and text Indexed How.
Capacity Building Passing on the Experience Dr. Noha Adly World Digital Library Arab Peninsula Regional Group meeting.
By : Swaran lata Country Manager W3C India Office 6,CGO Complex, Electronics Niketan, New Delhi E-Publishing standard 1.
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
Services Digitisation & Content Management. 600 People – India.
Web design Most digitisation projects are made available through Websites Effective Access depends on good web design Identify users and their information.
® Copyright 2008 Adobe Systems Incorporated. All rights reserved. ADOBE® ACCESSIBILITY Achieving Accessibility with PDF Greg Pisocky Accessibility Specialist.
Workflows for Digital Curation and Preservation Stacy Kowalczyk PASIG Dublin 2012 October 17, 2012.
On the Two Sides of the Pond By Hans-Jörg Lieder, Head of the Department of Bibliographic Services – Union Catalogue of Serials Staatsbibliothek zu Berlin.
The Library behind the scene How does it work ? The Library behind the scenes 1 JINR / CERN Grid and advanced information systems 2012 Anne Gentil-Beccot.
Cataloging in Publication: Moving Beyond the Print ALA Midwinter Chicago, Ill. January 31-February 1, 2015.
PDF (Portable Document Format) for Digital Preservation and Delivery John Laurie Digital Initiatives Librarian The University of Auckland Library National.
These ain’t “Old News”! Creating access to historic newspapers Christine Guenther OCLC Product Manager, Digital Services Preservation Service Centers Bethlehem,
ELPUB 2006 June Bansko Bulgaria1 Automated Building of OAI Compliant Repository from Legacy Collection Kurt Maly Department of Computer.
Joachim Bauer Senior System Engineer, CCS
1 Archiving Workflow between a Local Repository and the National Library Archive Experiences from the DiVA Project Eva Müller, Peter Hansson, Uwe Klosa,
THE RUTGERS WORKFLOW MANAGEMENT SYSTEM Mary Beth Weber Cataloging and Metadata Services Rutgers University Libraries August 3, 2007.
Gerald Schmidt Learning and Teaching Solutions The Open University Embedding automated accessible outputs in open educational resources.
Digitization Workflow Management System for Massive Digitization Projects Bibliotheca Alexandrina November 19, 2006 The 2 nd International Conference on.
1 Newspaper Digitisation Workflows Rose Holley- Manager ANDP Presentation to Cultural Heritage Digitisation professionals 26 November 2008.
1 Australian Newspapers Digitisation Program Development of the Newspapers Content Management System Rose Holley – ANDP Manager ANPlan/ANDP Workshop, 28.
Create and Manage METS in retrodigitization Markus Enders Goettingen State and University Library
The National Digital Newspaper Program (NDNP) An NEH/LC Collaborative Program Enhancing access to historical newspapers Release: September 2006.
Leonardo da Vinci Programme Project ACCELERATE Nicosia, May 2001 Services offered toVIP by the University of Graz, Austria Services to individuals Services.
Information innovation independence Reaching our Audience.
1 April 2004 – METS Opening Day West docWORKS/METAe Automated Conversion Of Printed Documents Into Fully Tagged METS Objects Claus Gravenhorst.
European Metadata Initiatives: The METAe Metadata Engine Simon Tanner Higher Education Digitisation Service
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Digitisation of Cultural Heritage at the National Library of Latvia: Past and Future Uldis Zariņš Head of Strategic Development National Library of Latvia.
Digitisation of Archival and Manuscript Materials in Libraries Presentation by Martin Bradley.
Interoperable Digitised Content “Discover, search, extract, link, associate, and view digitised content” Les Carr.
Digitization of the Federal Depository Library Program Judith C. Russell Superintendent of Documents & Managing Director, Information Dissemination “Electronic.
The Luminary Library Experience: Large scale digitization at Toronto Public Library Agenda Introduction Background The project Current status Implementation.
Solutions. People. Innovation.1 Content Transformation in the Next Decade Solutions. People. Innovation.
Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge.
Digitising Journals, March 2000, Copenhagen Astrid Wissenburg Information Services and Systems King’s College London
Cataloguing Electronic resources Prepared by the Cataloguing Team at Charles Sturt University.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation.
JENN RILEY METADATA LIBRARIAN IU DIGITAL LIBRARY PROGRAM Introduction to Metadata.
International Conference, Russia, 02. September, 2008 Dr. Frank Jacobi, Midvox GmbH, Berlin Digitisation of printed books for marketing and flexible information.
Digital Archiving in the Hungarian Széchényi Library The story and the plans of the Hungarian Electronic Library Rome, 21. Oct István Moldován OSZK,
Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge.
Quality Levels of Reproduction Adolf Knoll National Library of the Czech Republic.
UVa's Digital Library CSG - September 2005 Slides courtesy of: Leslie Johnston Director, Digital Access Services, UVA Library Tim Sigmon University of.
1 Bridging the gap between the paper past and digital future.
Scientific Data and Electronic Publishing Renze Brandsma, Head, Digital Production Centre University of Amsterdam Maarten Hoogerwerf, Project Manager,
International Seminary on Digitisation: Experience and Technology 11 th May 2004 | National Library | Lisbon – Portugal DIGITAL ARCHIVE OF PORTUGUESE ART.
Million Book Bibliotheca Alexandrina Youssef Eldakar 19 November 2006.
Introduction to Metadata Jenn Riley Metadata Librarian IU Digital Library Program.
Enterprise Solutions Chapter 10 – Enterprise Content Management.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
National Library of Finland Metadata in the Digitisation Process Cultural unity and diversity of the Baltic Sea Region – common history, different languages,
Institutional Repositories: the DSpace Experience Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
Gerald Schmidt Learning and Teaching Solutions The Open University Producing DAISY talking books without manual intervention.
Feb 21-25, 2005ICM 2005 Mumbai1 Converting Existing Corpus to an OAI Compliant Repository J. Tang, K. Maly, and M. Zubair Department of Computer Science.
Primo at the British Library Mandy Stewart. 2 About the British Library The British Library is the National Library of the UK It is a world-class.
Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American.
CENTRAL/WESTERN MASSACHUSETTS AUTOMATED RESOURCE SHARING Digitization GOALS & THEIR LOGISTICS Michael J. Bennett Digital Initiatives Librarian C/WMARS,
1 July 2004 – METS Opening Day UK docWORKS/METAe The Engine for Automated Metadata Extraction and XML Tagging Claus Gravenhorst Content.
IFLA Newspapers pre-conference Geneva, Arturs Zogla
DIGITIZATION OF PAPER DOCUMENTS OF INSTITUTE OF OCEANOGRAPHY’S LIBRARY
Introduction to Metadata
VI-SEEM Data Repository
Internet Archive & OPENLIBRARY.ORG
DIGITAL LIBRARY.
Presentation transcript:

© January/2008 CCS Content Conversion Specialists GmbH Weidestr. 134, Hamburg, Germany consulting technology digitization services

© January/2008 CCS Content Conversion Specialists GmbH Weidestr. 134, Hamburg, Germany docWORKS/METAe, a tool for converting documents into structured digital objects Accessible e-books, Paris, January 28th, 2008 Claus Gravenhorst, Director Strategic Initiatives

3 © January/2008 CCS Content Conversion Specialists GmbH Weidestr. 134, Hamburg, Germany e-books Background Only a fraction of the world-wide cultural and scientific heritage is available in electronic form Limited access to digitised documents No common metadata standard for ingest and long-term preservation In-house digitisation: set up and operation of an efficient workflow based on a patchwork of various digitisation tools is complicated and expensive Manual work takes the largest part of these costs Outsourced digitisation: limited control and administration mechanisms regarding quality, quantity and adherence to schedules Cost for a digitised page is still high

4 © January/2008 CCS Content Conversion Specialists GmbH Weidestr. 134, Hamburg, Germany e-books Challenges Increase the amount of accessible content in a reasonable timeframe – Enable digitisation on a larger scale Enable quick and enhanced access by high structured documents – for everyone Provide integrated digitisation/conversion technologies as well as efficient workflows to lower the total cost of content creation Provide a standardized output format Open up new dimensions for research, presentation and distribution of digital content

5 © January/2008 CCS Content Conversion Specialists GmbH Weidestr. 134, Hamburg, Germany e-books Goals of new technologies Automate the digitisation and conversion process to create more content at lower costs, less than 10 EuroCent a book page! Increase the added value of digital content through automated tagging and metadata extraction Provide effective workflow environments for integration of various “state of the art” technologies Full integration into the given environment and workflow of content owners Scalability for enabling the setup of networked and distributed production environments

6 © January/2008 CCS Content Conversion Specialists GmbH Weidestr. 134, Hamburg, Germany e-books Workflow: Value Chain Image pre- processing Layout analysis OCR (text) ISR (structure) Metadata Extraction Automated QA QA Feedback: + Resolution + Deskew + OCR accuracy + Page sequence Items: + Books + Newspapers + Journals + Manuscripts + etc... Source: + Original + Microfilm + Digital Image + PDF Scan from originals and microfilm Import from digital image or PDF files Non-proprietary format OS Independent METS/ALTO compliant XML data compatible with multiple presentation systems Improved navigation and search through structured information reversemigration Automated workflow for the conversion of printed items into fully structured digital objects based on common open metadata standards

7 © January/2008 CCS Content Conversion Specialists GmbH Weidestr. 134, Hamburg, Germany e-books Traditional OCR THE AMERICAN MISSIONARY. Vo.. XXXII JANUARY, 1878 No. 1 American Missionary Association xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

8 © January/2008 CCS Content Conversion Specialists GmbH Weidestr. 134, Hamburg, Germany e-books Physical and Logical Structure

9 © January/2008 CCS Content Conversion Specialists GmbH Weidestr. 134, Hamburg, Germany e-books Structure Analysis FRONT MAIN BACK

10 © January/2008 CCS Content Conversion Specialists GmbH Weidestr. 134, Hamburg, Germany e-books Structure Analysis Chapter 1 Chapter 2 Subchapter1 Subchapter 2

11 © January/2008 CCS Content Conversion Specialists GmbH Weidestr. 134, Hamburg, Germany e-books Physical and Logical Structure Preface Table of Contents Title Page Statement Page

12 © January/2008 CCS Content Conversion Specialists GmbH Weidestr. 134, Hamburg, Germany e-books METS/ALTO – open metadata standard Advantages of fully tagged objects based on XML and open metadata standards: Supporting open metadata standards like METS, DC, MODS, NISO MIX, ALTO Full description of the original -> “Digital Original” With logical structures search results are improved (chapter-, article-based) and more easily accessed (chapter titles, headlines, pictures with captions, footnotes, etc.) Enables data exchange with any XML based 3rd party system Provides the source for transformation to other formats being used for distribution (various eBook-formats up to the XML based open eBook format EPUB (International Digital Publishing Forum -IDPF), navigation and adapted visual or audio-visual presentation (e.g. DAISY) Supporting Digital long-term preservation Enables migration to meet future standards

13 © January/2008 CCS Content Conversion Specialists GmbH Weidestr. 134, Hamburg, Germany e-books METS Metadata Encoding and Transmission Standard document TIFFALTO ALTO – Analyzed Layout and Text Object METS/ALTO XML object A document processed in docWORKS is converted into one METS XML file. It reflects the whole physical and logical structure, manages all links to the image files and the related ALTO XML files. There is exactly one ALTO file for one image file. ALTO is based on a standardized page description schema and contains all information of a page (print space, margins, coordinates, OCR results). METS/ALTO – open metadata standard

14 © January/2008 CCS Content Conversion Specialists GmbH Weidestr. 134, Hamburg, Germany e-books Workflow: Institution-based, integrated Re-Scan Conversion Imaging Layout Analysis OCR ISR Post OCR Correction Reject Condition Delivery QA random Final Output Book Delivery QA+Correction offshore Scanning Image Metadata Database Repository Metadata Z Automated Quality Assurance Document UID Barcode ISBN

15 © January/2008 CCS Content Conversion Specialists GmbH Weidestr. 134, Hamburg, Germany e-books Selected Reference CCS technology is in use at digitisation service providers as well as ivy-league cultural and scientific institutions around the world: British Library Institution-based centre, fully operated by CCS staff Process aligned with standards, data bases and workflows in use at the library Employment of robotic scanning technology (colour) 1 million pages per month, 25 million in 2 years Full production since begin of September 2007 Out of copyright books, 19th century Output in METS/ALTO, JPEG2000 and PDF (e-book)

16 © January/2008 CCS Content Conversion Specialists GmbH Weidestr. 134, Hamburg, Germany e-books Conclusion and Perspective docWORKS/METAe reached a high degree of automation and supports a broad range of document types Improved accessibility through highly structured digital objects for everyone docWORKS in use at in-house digitisation centres of various content owners all around the world Growing demand of the publishing industry Scalable technology enables mass digitisation Constantly increasing the level of automation Major goal is lowering the cost of digitisation while assuring high quality standards

17 © January/2008 CCS Content Conversion Specialists GmbH Weidestr. 134, Hamburg, Germany e-books Contact CCS Content Conversion Specialists GmbH information:accessible Weidestr. 134, D Hamburg, Germany +49 (0) phone +49 (0) fax +49 (0) mobile Internet: Claus Gravenhorst Director Strategic Initiatives