Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Oxford-Google Digitization Project* Michael Popham Oxford Digital Library * Rules of commercial confidentiality apply to this presentation!

Similar presentations


Presentation on theme: "The Oxford-Google Digitization Project* Michael Popham Oxford Digital Library * Rules of commercial confidentiality apply to this presentation!"— Presentation transcript:

1 The Oxford-Google Digitization Project* Michael Popham Oxford Digital Library * Rules of commercial confidentiality apply to this presentation!

2 WISER – 4 th June 2008 The Lawyers’ Vision ( non-attributable ) Google and Oxford plan to digitize 1-1.5M books as part of the Google Books Library Project The project will take at least 3 years to complete and involve approximately 35 digitization workstations running in 2 shifts Files will be created as TIFFs and JPEGs and delivered as PNG or PDFs….etc. As far as possible, both OULS and Google like to make information accessible…

3 WISER – 4 th June 2008 Why partner with Google? The synergy between missions: – Bodley’s “Republic of Letters” – Google’s “To organize the world’s information and make it universally accessible and useful” Emphasis is on access not conservation – Oxford University Library Services: opening-up our closed stacks – Google: “…the next generation of the card catalog” Bring more Oxford-held content into the digital landscape making it available for scholarly and public benefit. Builds on the work of the Oxford Digital Library (ODL)

4 WISER – 4 th June 2008 The “Digital Library” at Oxford 1960s Machine-readable texts for scholarly purposes 1976 Oxford Text Archive founded 1980s Networked databases and CD-ROMs 1990s Libraries on the web, e-journals etc. 2001 Oxford Digital Library (ODL) 2005Google/Oxford partnership 2006ORA (Oxford University Research Archive) e-prints/e-theses institutional repository 2008New LMS  hybrid library service

5 WISER – 4 th June 2008 Some Oxford digitization projects Toyota City Imaging Project (1993) Specialized Research Collections in the Humanities (NFF) and eLib projects (1995-1998) – John Johnson Collection – Broadside Ballads – Early manuscripts in Oxford Oxford Digital Library (2001 onwards) – Scoping study (1998-99) – ODL Development Fund (Mellon Foundation 2002-2005) – Three production phases

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20 WISER – 4 th June 2008 What to digitize? Direct discussions with Google since 2003 Mutual benefits for both parties Extensive holdings of out-of-copyright (and mostly out-of-print) material identified – Oxford differs from most other partners in this aspect of our agreement (Michigan vs Harvard) – Decision made to begin with the 19 th century material – Scope = approximately 1+ million items

21 WISER – 4 th June 2008 Overview of workflow (1) Selection Suitable for digitization? Reshelve Digitize Generate deliverables Store outputs Update OULS OPAC QA Y Y N N Update Google Books index ODC

22 Overview of workflow (2) OULSGoogle Retrieve catalogue records Survey items Pick items Bibliographic Evaluation  Metadata checks Digitization Quality Assurance OCR and index Receive and Reshelve items  Update catalogue recordsMount in books.google.com Retrieve Oxford Digital CopyPreserve/reprocess master files

23 WISER – 4 th June 2008 Approach OULS staff work closely with Google staff – e.g. training on how to handle the material Each component of the workflow must be comfortable for both parties – Identify, survey, pick, track, reshelve, update OPAC… A large and complex logistical operation that must not compromise the service to our users – or other parts of OULS(!)

24 WISER – 4 th June 2008 Outputs and outcomes Large raw colour images from digitization process Per volume, OULS receives: – JPEG2000 page images – Uncorrected OCR (per page) – Report on scanning process Quality Control checks at Google (and Oxford) Deliverable images –hosted by Google in the first instance – linked to OPAC records Ongoing software/hardware developments to improve the process and outputs

25 WISER – 4 th June 2008 Challenges that lie ahead… Building the local infrastructure to manage and deliver the Oxford Digital Copy of the data Investigating ways to exploit the data, e.g.: – Correcting OCR files, adding additional markup – (Re-)structuring the data – moving beyond a simple search and page-turning presentation – Completing/extending volumes and collections – Automatic collation, authorship attribution, stylistic analysis.….and many, many more(?!) Raising the barrier of what is possible, and end-users’ expectations about what we can deliver

26 WISER – 4 th June 2008 Feel the Fear…. ©opyright and IPR Threat to (Scholarly) e-Publishers Proliferating plagiarism Encouraging poor research Scope creep, scalability, data deluge (Digital) preservation and access – Sun Center of Excellence – ODL DAMS

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51 WISER – 4 th June 2008 Useful links http://books.google.com/ http://books.google.com/googlebooks/library.html http://www.bodley.ox.ac.uk/google/


Download ppt "The Oxford-Google Digitization Project* Michael Popham Oxford Digital Library * Rules of commercial confidentiality apply to this presentation!"

Similar presentations


Ads by Google