The Oxford-Google Digitization Project* Michael Popham Oxford Digital Library * Rules of commercial confidentiality apply to this presentation!

Slides:



Advertisements
Similar presentations
Digital Text and Libraries Michael Popham. DOI Meeting, Oxford, June 2006 Ranganathans laws of library science 1. Books are for use 2. Every reader his.
Advertisements

Beyond the Google Book: the Future of the Digital Library Cory Snavely Library IT Core Services manager University of Michigan April 20, 2010.
What is HathiTrust and How Can it Make a Difference? Sourcing and Scaling brought to the collective collection.
An update on Google Book search digitization at the University of Michigan … the agreement and plans for work between Google and the.
Digital Preservation A Matter of Trust. Context * As of March 5, 2011.
Making sense of the hybrid super union catalogue in the information environment … Gordon Dunsire Centre for Digital Library Research.
Capacity Building Passing on the Experience Dr. Noha Adly World Digital Library Arab Peninsula Regional Group meeting.
Services Digitisation & Content Management. 600 People – India.
“Can you digitise this for me please?” The University of Auckland's approach to managing digitisation proposals John Garraway Digital Services & Information.
The Oxford-Google Digitization Project* Michael Popham Oxford Digital Library * Rules of commercial confidentiality apply to this presentation!
October 24, 2006Merit Technical Staff Meeting1 The Google Project at the University of Michigan Perry Willett Head, Digital Library Production Service.
1 Large-scale collaborative digitisation 19 th Century Pamphlets Online Mar-2007 – Feb-2009 Grant Young Project Manager, 19 th Century.
DRS 2 one in a series of periodic updates Harvard University Library Andrea Goethals October 21, 2009 DRS = Digital Repository Service.
Large Scale Digitization Workflow Yale University Library January 2008.
Document Management Proposed Scanning Solution September 22nd 2008.
The Oxford Google Digitization Project Frances Boyle.
John OckerbloomDec. 6, 2002 Supporting learning at the library Towards integrating LMS and digital library technology at Penn John Mark Ockerbloom CNI.
PAWN: Producer-Archive Workflow Network University of Maryland Institute for Advanced Computer Studies Joseph JaJa, Mike Smorul, Mike McGann.
JSTOR & OCR - A Case Study Kiffany Francis. What is JSTOR? “JSTOR is a not-for- profit organization with a dual mission to create and maintain a trusted.
Digital Partnerships at San Francisco Public Library: So Many Suitors, So Little Time.
Massively Digitizing UC Library Collections Google, Microsoft, and More Learning in Retirement Libraries – The Intersection of Tradition and Innovation.
Partnership agreement between Complutense University and Google Books Manuela Palafox Parejo Servicio Edición Digital y Web Biblioteca de la Universidad.
New Innovative Access to Educational and Cultural Multimedia Contents Yuka Egusa Educational Resources Research Center, National Institute for Educational.
Digital Library Architecture and Technology
Models for Partnership Jennifer Johnson Kristi Palmer May 3, 2006 IUPUI's Collaborative Digital Projects in Content DM │
Scholars Portal Project Ontario Council of University Libraries Scholars Portal in 2007 A Progress Report Leslie Weir Université d’Ottawa - University.
HathiTrust – How To By Dr. Rob McGeachin 20 th Annual AgNIC Meeting May 7, 2015.
LIBER Digitisation Conference, Copenhagen The cost of digitisation and preservation: The LIFE Project October 2007 Richard Davies LIFE 2 Project.
LIFE 3 LIFE 3 : Predicting Long Term Preservation Costs Brian Hole LIFE 3 Project Manager The British Library KeepIt training course 05/02/10.
Adventures in Digital Asset Management: Fedora at the National Library of Wales Glen Robson National Library of Wales
Web-based workflow software to support book digitization and dissemination The Mounting Books project books.northwestern.edu Open Repositories 2009 Meeting,
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Integrating museum systems: Accessing collections information at the Victoria and Albert Museum Christopher Marsden Sarah Winmill, Frances Lloyd-Baynes.
The world’s libraries. Connected. WorldShare platform & Management Services Integrate all of your collections: print, licensed & digital Chris Thewlis.
WORKFLOWS AND OTHER CONSIDERATIONS FOR DIGITIZATION  Steve Bingo  Processing Archivist Washington State University Libraries  Alex Merrill  Assistant.
Digitization of the Federal Depository Library Program Judith C. Russell Superintendent of Documents & Managing Director, Information Dissemination “Electronic.
The Luminary Library Experience: Large scale digitization at Toronto Public Library Agenda Introduction Background The project Current status Implementation.
Digitising Journals, March 2000, Copenhagen Astrid Wissenburg Information Services and Systems King’s College London
The DigiTool to FDA Program Lydia Motyka Florida Center for Library Automation.
From Concept to Reality: An overview of the University of Wisconsin Digital Collections Melissa Mclimans.
Archival information system ARHiNET Croatian national archival information system Vlatka Lemić Croatian State Archives, Croatia.
The Legislative Library of Ontario’s Ontario Documents Repository Road to Partnership.
Technology Choices for the JSTOR Online Archive Presented by Chang Feng Department of Computer Engineering and Computer Science, University of Missouri-Columbia,
University of California Mass Digitization Projects Update Users Council Annual Meeting May 8, 2008 Heather Christenson, Mass Digitization Project Mgr,
Keele Pathfinder Project CLA Reporting of Scanned Material in a Repository Pathfinder - Tim Denning - Project Leader Catering VLE Powerlink - Boyd Duffee.
Google Confidential Daniel Clancy Engineering Director, Google Print 18-July-05.
Digitizing Aloha: Using Information Technology to Preserve and Present the History and Culture of Hawai'i Bob Schwarzwalder Assistant University Librarian,
National and University Library Zagreb Digitisation Activities.
International Seminary on Digitisation: Experience and Technology 11 th May 2004 | National Library | Lisbon – Portugal DIGITAL ARCHIVE OF PORTUGUESE ART.
1 Annual Meeting 2004 CrossRef Publishers International Linking Association, Inc Charles Hotel, Cambridge, MA November 9 th, 2004.
1 ARRO: Anglia Ruskin Research Online Making submissions: Benefits and Process.
ECM and Shared Services Overview AITR Meeting April 23, 2009.
PAN-European Exploitation of the Results of the Libraries Programme - EXPLOIT German Libraries Institute Berlin EXPLOIT 1 Electronic library materials.
INTELLECTUAL RIGHTS AND HISTORIC CORPORA Mark Sandler University of Michigan ICOLC, March, 2003.
1 The Oxford-Google mass-digitisation project: How, why and what? An EDUCAUSE Webcast by Reg Carr (University of Oxford) 15 June 2005.
National Library of the Czech Republic as End-User of the Research Networks Adolf Knoll deputy director
The New DRS Introduction. What is DRS? Digital repository for preservation and access – Maintains integrity of deposited content – Preserves content for.
Collecting History: Profiles in Science Alexa T. McCray National Library of Medicine Bethesda, MD Stanford University August 21, 1999.
JISC/CNI Conference Edinburgh, 26th June 2002 Challenges of Digital Preservation – do we have a road map? Maggie Jones.
WISER: Finding stuff Journal articles Kerry Webb, Deputy Librarian, English Faculty Library & Angela Carritt, OULS User Education Coordinator.
Managing Access at the University of Oregon : a Case Study of Scholars’ Bank by Carol Hixson Head, Metadata and Digital Library Services
Fourth UNICA Scholarly Communication Seminar, Prague The LIFE Project Costing Digital Preservation May 2008 Richard Davies LIFE 2 Project Manager,
CENTRAL/WESTERN MASSACHUSETTS AUTOMATED RESOURCE SHARING Digitization GOALS & THEIR LOGISTICS Michael J. Bennett Digital Initiatives Librarian C/WMARS,
CENTRAL/WESTERN MASSACHUSETTS AUTOMATED RESOURCE SHARING Digitization GOALS & THEIR LOGISTICS Michael J. Bennett Digital Initiatives Librarian C/WMARS,
CENTRAL/WESTERN MASSACHUSETTS AUTOMATED RESOURCE SHARING Digital Repositories Build It & They Will Come Michael J. Bennett Access Services Supervisor C/WMARS,
Access for user self- sufficiency: making rich local content intuitively available Catalog Transformed: From Traditional to Emerging Models of Use Program.
HathiTrust Digital Library Interface and Services
Pre-Course Assignment
digital archival storage
AUC’s Role In Facilitating Access To Knowledge In The Arab World
Presentation transcript:

The Oxford-Google Digitization Project* Michael Popham Oxford Digital Library * Rules of commercial confidentiality apply to this presentation!

WISER – 4 th June 2008 The Lawyers’ Vision ( non-attributable ) Google and Oxford plan to digitize 1-1.5M books as part of the Google Books Library Project The project will take at least 3 years to complete and involve approximately 35 digitization workstations running in 2 shifts Files will be created as TIFFs and JPEGs and delivered as PNG or PDFs….etc. As far as possible, both OULS and Google like to make information accessible…

WISER – 4 th June 2008 Why partner with Google? The synergy between missions: – Bodley’s “Republic of Letters” – Google’s “To organize the world’s information and make it universally accessible and useful” Emphasis is on access not conservation – Oxford University Library Services: opening-up our closed stacks – Google: “…the next generation of the card catalog” Bring more Oxford-held content into the digital landscape making it available for scholarly and public benefit. Builds on the work of the Oxford Digital Library (ODL)

WISER – 4 th June 2008 The “Digital Library” at Oxford 1960s Machine-readable texts for scholarly purposes 1976 Oxford Text Archive founded 1980s Networked databases and CD-ROMs 1990s Libraries on the web, e-journals etc Oxford Digital Library (ODL) 2005Google/Oxford partnership 2006ORA (Oxford University Research Archive) e-prints/e-theses institutional repository 2008New LMS  hybrid library service

WISER – 4 th June 2008 Some Oxford digitization projects Toyota City Imaging Project (1993) Specialized Research Collections in the Humanities (NFF) and eLib projects ( ) – John Johnson Collection – Broadside Ballads – Early manuscripts in Oxford Oxford Digital Library (2001 onwards) – Scoping study ( ) – ODL Development Fund (Mellon Foundation ) – Three production phases

WISER – 4 th June 2008 What to digitize? Direct discussions with Google since 2003 Mutual benefits for both parties Extensive holdings of out-of-copyright (and mostly out-of-print) material identified – Oxford differs from most other partners in this aspect of our agreement (Michigan vs Harvard) – Decision made to begin with the 19 th century material – Scope = approximately 1+ million items

WISER – 4 th June 2008 Overview of workflow (1) Selection Suitable for digitization? Reshelve Digitize Generate deliverables Store outputs Update OULS OPAC QA Y Y N N Update Google Books index ODC

Overview of workflow (2) OULSGoogle Retrieve catalogue records Survey items Pick items Bibliographic Evaluation  Metadata checks Digitization Quality Assurance OCR and index Receive and Reshelve items  Update catalogue recordsMount in books.google.com Retrieve Oxford Digital CopyPreserve/reprocess master files

WISER – 4 th June 2008 Approach OULS staff work closely with Google staff – e.g. training on how to handle the material Each component of the workflow must be comfortable for both parties – Identify, survey, pick, track, reshelve, update OPAC… A large and complex logistical operation that must not compromise the service to our users – or other parts of OULS(!)

WISER – 4 th June 2008 Outputs and outcomes Large raw colour images from digitization process Per volume, OULS receives: – JPEG2000 page images – Uncorrected OCR (per page) – Report on scanning process Quality Control checks at Google (and Oxford) Deliverable images –hosted by Google in the first instance – linked to OPAC records Ongoing software/hardware developments to improve the process and outputs

WISER – 4 th June 2008 Challenges that lie ahead… Building the local infrastructure to manage and deliver the Oxford Digital Copy of the data Investigating ways to exploit the data, e.g.: – Correcting OCR files, adding additional markup – (Re-)structuring the data – moving beyond a simple search and page-turning presentation – Completing/extending volumes and collections – Automatic collation, authorship attribution, stylistic analysis.….and many, many more(?!) Raising the barrier of what is possible, and end-users’ expectations about what we can deliver

WISER – 4 th June 2008 Feel the Fear…. ©opyright and IPR Threat to (Scholarly) e-Publishers Proliferating plagiarism Encouraging poor research Scope creep, scalability, data deluge (Digital) preservation and access – Sun Center of Excellence – ODL DAMS

WISER – 4 th June 2008 Useful links