Web-based workflow software to support book digitization and dissemination The Mounting Books project books.northwestern.edu Open Repositories 2009 Meeting,

Slides:



Advertisements
Similar presentations
Home-Grown Digital Library System Built Upon Open Source XML Technologies and Metadata Standards David Lacy Villanova University
Advertisements

Capacity Building Passing on the Experience Dr. Noha Adly World Digital Library Arab Peninsula Regional Group meeting.
DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
Vital Implementation Update Vital Implementation Update 11 th January 2006 Paul Bevan – Glen Robson –
Enterprise Integration Solutions SharePoint Imaging.
The Documentum Team Lance Callaway, Brooke Durbin, Perry Koob, Lorie McMillin, Jennifer Song Missouri University of Science and Technology Rolla, Missouri.
CAPTURE SOFTWARE Please take a few moments to review the following slides. Please take a few moments to review the following slides. The filing of documents.
1. The Digital Library Challenge The Hybrid Library Today’s information resources collections are “hybrid” Combinations of - paper and digital format.
Depositing e-material to The National Library of Sweden.
Object Re-Use and Exchange Mellon Retreat, Nassau Inn, Princeton, NJ, March Herbert Van de Sompel, Carl Lagoze The OAI Object Re-Use & Exchange.
HyperContent 2.0 JA-SIG Winter Conference December 5, 2005 Alex Vigdor, Columbia University.
R.Jantz, August 31, Two-day forum on PREMIS Preservation Metadata and the Trusted Digital Repositories August 31, September 1 National Library of.
Fedora Commons: Introduction and Update Swedish National Library June 24, 2008.
Building a Digital Library with Fedora International Conference on Developing Digital Institutional Repositories Hong Kong December 9, 2004.
Resource Discovery Module DigiTool Version 3.0. Resource Discovery 2 Deposit Approval Search & Index Dispatcher & Viewers Single & Bulk Web Services DigiTool.
ISP 433/533 Week 8 IR in libraries. Goal Universal Access to Information Vannevar Bush 1945 article Memex A memex is a device in which an individual stores.
1 CS 502: Computing Methods for Digital Libraries Lecture 22 Repositories.
William Y. Arms Corporation for National Research Initiatives March 22, 1999 Object models, overlay journals, and virtual collections.
Introducing Symposia : “ The digital repository that thinks like a librarian”
Demonstration of repositories Fedora (Flexible Extensible Digital Object Repository Architecture) Marie Lagerwall MIDESS Partners Meeting February 9, 2007.
Eric Luhrs Digital Initiatives Librarian Special Collections & College Archives MetaDB Development at Lafayette College Haruki Yamaguchi.
National Aeronautics and Space Administration Implementing DSpace at NASA Langley Research Center 1 Greta Lowe Librarian NASA Langley Research Center
Progress in Access Technologies: NLM Video Search Jennifer Marill Chief, Technical Services Division Edward Luczak Systems Architect, Office of Computer.
OCLC Online Computer Library Center CONTENTdm 4.3 Claire Cocco Global Product Manager CONTENTdm October 3, 2007.
Putting it all together for Digital Assets Jon Morley Beck Locey.
Digital Asset Management for All? Visualising a Flexible DAMS Solution for Small and Medium Scale Institutions Paul Bevan Llyfrgell Genedlaethol Cymru.
JWST Integrated Modeling Environment James Webb Space Telescope.
Submitted by: Madeeha Khalid Sana Nisar Ambreen Tabassum.
ED Plus Electronic Reserve Collection For the Libraries Wai Chan Asia Corporate Information Ltd. October 1999.
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
Dr. Kurt Fendt, Comparative Media Studies, MIT MetaMedia An Open Platform for Media Annotation and Sharing Workshop "Online Archives:
Adventures in Digital Asset Management: Fedora at the National Library of Wales Glen Robson National Library of Wales
Why Open-Source? No Vendor-Locking In a proprietary software --- Your supports lock with it. freedom to customize and improvements in software needs,
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
1. 2 introductions Nicholas Fischio Development Manager Kelvin Smith Library of Case Western Reserve University Benjamin Bykowski Tech Lead and Senior.
ISpheres Project. Project Overview iSpheresCore iSpheresImage Demonstration References.
Web based METS creation Ralf Stockmann case study.
Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context Paul Bevan DAMS Implementation Manager
NLM Digital Collections Update for DCFedoraUsersGroup January 22, 2013 John Doyle National Library of Medicine.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
The DiVA System: Current Status and Ongoing Development Uwe Klosa Electronic Publishing Centre, Uppsala University, Sweden Eva Müller.
An Iterative Approach to Building Sustainable Repository Services on Fedora Open Repositories 2009, May 19, 2009.
University of California Mass Digitization Projects Update Users Council Annual Meeting May 8, 2008 Heather Christenson, Mass Digitization Project Mgr,
Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
ISpheresImage iSpheresImage Feature Overview and Progress Summary.
International Seminary on Digitisation: Experience and Technology 11 th May 2004 | National Library | Lisbon – Portugal DIGITAL ARCHIVE OF PORTUGUESE ART.
Successes and Growing Pains: The Indiana University Digital Library Program Jenn Riley Metadata Librarian Indiana University Digital Library Program January.
Digital Commons & Open Access Repositories Johanna Bristow, Strategic Marketing Manager APBSLG Libraries: September 2006.
INTELLECTUAL RIGHTS AND HISTORIC CORPORA Mark Sandler University of Michigan ICOLC, March, 2003.
ARROW Institutional Repositories for Managing e-Theses Presentation to ETD September 2005 Geoff Payne, ARROW Project Manager.
The Oxford-Google Digitization Project* Michael Popham Oxford Digital Library * Rules of commercial confidentiality apply to this presentation!
Collecting History: Profiles in Science Alexa T. McCray National Library of Medicine Bethesda, MD Stanford University August 21, 1999.
A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008.
Permanent Hosting, Archiving and Indexing of Digital Resources and Assets Markus Höckner Computer Center University of Vienna.
ALA Annual Meeting Claire Cocco Global Product Manager CONTENTdm Users Group June 30th, 2008.
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
1 « Luxembourg, 18 April 2007 « Virtual Library of Official Statistics « Dissemination Working Group.
NLW. Object Classes Class 1  1 MARC Record  1 Image  No METS Class 2  1 MARC Record  Many images  No METS Class 3  1 MARC Record  Many.
FACES General Overview ViRR (Virtueller Raum Reichsrecht) Software Solutions Kristina Büchner and Bastien Saquet Contact:Kristina Buechner:
NLM Update and Still Image Serving April 27, 2016 John Doyle, Doron Shalvi, TA Nguyen National Library of Medicine.
Customising Primo V3 for discovery of digital collections E-LUNA 2011 Annual Conference Milwaukee, WI – 13th May 2011 Stefania Riccardi Library Repository.
Building Search Systems for Digital Library Collections
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
Metadata to fit your needs... How much is too much?
NLM Digital Repository The Search for a New Book viewer
Presentation transcript:

Web-based workflow software to support book digitization and dissemination The Mounting Books project books.northwestern.edu Open Repositories 2009 Meeting, May 19, 2009

What we will cover today History of Northwestern's Kirtas program o Scope of local book digitization o Workflow Goals of the Mounting Books project Details of the workflow system About Fedora architecture for Northwestern Books Future directions

2005 Acquisition and initial testing (in Preservation) 2006 Production of print facsimiles of Brittle books Northwestern University Library

Problem We had a book scanner. We weren't doing anything with the scans other than printing them, binding, returning to shelf. We didn't have a system in place to publish them to the Library's Fedora repository.

The book digitization workflow

Production / Staffing Kirtas: APT pages/hour (20 pages/minute) Assumes no setup time, 1200 page book Produces raw images Real-world: APT 1200 produces pages/hour Each student processes pages/hour Currently 6 students: 2 scanning, 4 processing Before Book Workflow Interface software: 120 books/quarter

Brittle Image Processing

Patron/Collection Image Processing

2007 Shift away from Brittle reformatting on Kirtas New Digital Collections department formed Applied to Andrew W. Mellon Foundation 2008 Collection/Patron/Brittle, quarterly queues New focus on digital-only delivery Mellon-funded Mounting Books project 2009 Book Workflow Interface (BWI) books.northwestern.edu Northwestern University Library

Simple workflow

Workflow reality

Workflow reality over time

Scope of the Mounting Books project Streamline and extend the book scanning workflow Publish digitized books online, link from the library catalog Share as many of our software components as possible

Demonstration Finding books in Voyager

The BWI workflow diagram Generated by the jBPM. Shows the steps taken to process a book (view larger image) (view larger image)

List of in-process book jobs

Book job details

Status message: book files moving

The Queue Server The Queue Server is notified of jobs that need to be processed, and queues them up to avoid overloading the server. Each set of images has the following processes: Conversion to JPEG2000 using Aware software. MIX metadata generated with JHOVE Exif extraction from original camera images Three OCR derivatives are extracted (image plus text PDF, plain text, and XML) using ABBYY OCR software. The Queue Server also triggers FOXML generation and ingestion into Fedora.

The Book Builder As part of the workflow, operators create and enhance structure for digitized books. Features of the Book Builder: Operators can create chapters and other subdivisions. Each part of the structure (such as a chapter) can be given a number and a label. Automatic page numbering in descending or ascending order. Optionally, operators can apply their own custom page numbers. Application of informational tags indicating the presence of other elements such as figures and blank pages. These tags will be useful in the future for scholarly analysis and collection maintenance. Information about the structure is stored as METS.

Book Builder demonstration

Book approval Generate a Handle that links to book in public interface Update Voyager (Library catalog) with 856 URL to Handle Update the Search index Update PREMIS (future feature)

Fedora Datastreams and Disseminators Datastreams are either pieces of content, or pointers to external content that are stored within one Fedora object. For example, a datastream may be a JPEG image, a Dublin Core XML file, or a redirect to a streaming movie on a server. Disseminators are services that can provide a different view of the stored data. For instance, a dissemination may include the ability to retrieve a different size of an image that is stored. The service takes the stored image, resizes it, and sends it to the user. The original image is not modified.

Atomistic vs. Compound Approach Compound objects: Include many datastreams within one object Have information about the whole and its parts While the information is stored in one big object, it's harder to take advantage of Fedora's object model Objects can be as easily reused

Atomistic vs. Compound Approach An example Compound object (a book object):

Atomistic vs. Compound Approach Atomistic objects: Have a small number of datastreams per object Relationships are stored so the objects can be recognized as one entity. Many objects may be needed, but this takes best advantage of Fedora's object model. Best for object reuse; for example, all image objects in the repository can share the same dissemination services.

Atomistic vs. Compound Approach Example Atomistic model (a book object and related page objects):

Each page object shares the same basic access model (In fact, all of our Fedora image objects share the same dissemination access model)

Page Imaged Book Model

Page Image Model

Web-enabled services Web-enabled services allow us to more easily use software components for other projects. The web-enabled services used for this project include: Voyager (Integrated Library System) updater Handle persistent URL creator Queue server Yaz Proxy Z39.50 server SOLR search service Fedora digital repository

Other Technologies used in the project METS - used to store the book structure Ext-JS - JavaScript library, includes interface and AJAX JSON - Used to communicate between JavaScript and server-side Java ABBYY - OCR technology Aware - Generates and serves up derivatives from JPEG2000 files JHove - For generating MIX data from images

"we" = Northwestern University Library Administration, Bibliographic Services, Digital Collections, Library Technology, Preservation, Business & Finance, Special Collections and University Archives, Africana, Science and Engineering Library, Transportation, GovInfo, Music Library, Circulation... and many more Northwestern University Academic & Research Technologies Flash viewer development OpenSky Solutions James Chartrand Project funded by the Andrew W. Mellon Foundation Thank you

where we are now...

future growth...

What next? PDF for download and printing, OCR tweaks Minor fixes to metadata display, OPAC linking Fine-tuning Book Builder navigation Northwestern Books beta launch early April Mounting Books software available June 30 then... ebook support, print-on-demand, expanding access to Book Builder, METS/ALTO & TEI, integration with Portico, Google content/HathiTrust?