Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries.

Similar presentations


Presentation on theme: "Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries."— Presentation transcript:

1 Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries

2 Digitizing our print collections  About mass digitization Who is digitizing? What is getting digitized? What is the output? Digitization issues  Integration of print-to-digital content Choices for discovery and access Issues for delivery

3  Partners: 11 libraries; several publishers University of California National Library of Catalonia University Complutense of Madrid Harvard University University of Michigan New York Public Library Oxford University Stanford University University of Texas at Austin University of Virginia University of Wisconsin at Madison

4  Scanning: in-copyright & out-of-copyright  In-copyright: Searching : all content fully indexed Display: snippets can be viewed, full text can be bought or located in library  Out-of-copyright: PDFs can be downloaded for personal use, bought, or located in library

5 Google output  Metadata  Scanned images: TIFFs  Access derivatives: JPEGs Image-based PDFs (one per page or one per book) Uncorrected OCR

6 Partners:  Internet Archive  29 libraries  Publishers: O’Reilly Media  Industrial partners: MSN HP Labs Adobe Xerox

7 Open Content Alliance library partners  Boston Public Library  Boston Library Consortium  Columbia University  Emory University  European Archive  Indiana University  Johns Hopkins University Libraries  McMaster University  Memorial University of Newfoundland  Missouri Botanical Garden  MSN  National Archives (United Kingdom)  National Library of Australia  Rice University  Tufts University  San Francisco Public Library  Simon Fraser University  Smithsonian Libraries  University of Alberta  University of British Columbia  University of California  University of Chicago Library  University of Georgia  University of Illinois Urbana- Champaign  University of North Carolina  University of Ottawa  University of Pittsburgh  University of Texas  University of Toronto  University of Virginia  Washington University  York University

8 Scanning:  Out of copyright, copyright-cleared material Searching:  Search full content via MSN Live Book  Search Metadata via Internet Archive Use:  All scans & derivatives can be downloaded

9 Open Content Alliance output  Metadata  Scanned images: TIFFs/J2Ks/CR2s  Access derivatives: JPEGs DJVU (viewer, requires plugin) Flip book (viewer, does not require plugin) Image-based PDFs (one per book) Uncorrected OCR integrated into the PDFs

10 Mass digitization content  900,000 pre-1923 titles  60% are unique  40% have more than one manifestation Data courtesy Microsoft, 2007

11 Comparison: scanned & born-digital Scanned from printBorn-digital Page imagesE-text Search uncorrected OCRSearch text TOC, title page, index are markedCan be highly segmented, linked Literature, history, …STM, social sciences, reference, …

12 Mass digitization: some Q&A Duplication Q: How do we guard against duplication? A: It might be cheaper just to scan duplicates. Omissions Q: What about fold outs, uncut pages, tightly bound books, print running into margins… A: Mass digitization works because it is efficient. A parallel process should handle exception cases.

13 Mass digitization at U of Toronto Not scanned: 2,400 (8%) Scanned: 32,000 (92%)

14 Method 1: Union digital repository Internet Archive (OCA)  E-books integrated with non-book content  User contributions (content, reviews)  Other sites can point to this content

15

16

17

18 Method 2: Full text search repository  MSN Live Books and Google Books  Both cross-book & intra-book searching  Google’s goal is to index  MSN is developing a reading environment

19 Google

20

21

22

23 MSN Live Book Search

24

25

26 Why load it locally?  Safekeeping Lots of copies keep stuff safe!  Discovery Integration with licensed books Integration with non-book content Local subject specialization

27 Method 3: Local load University of Michigan  E-books linked from OPAC  Rights system decides who can view: Nobody University of Michigan United States World  In-book searching: OCR, one-at-a-time

28 MBooks at University of Michigan  Download and validation: local data mover GROOVE (perl & mysql)  Data integrity: MDS fixity checks on jpegs, tiffs, utf-8  Quality assurance: GROOVE samples 20 p. chunks for students to check with ACDSee  Problems referred to Google for later correction

29 University of Michigan: OPAC link

30

31 University of Michigan: e-book display

32 University of Michigan: search in e-book

33 How do people read? Intentional reading Attentive, sustained, linear reading of text Heavily influenced by printed-book culture Dominant in classical and scholarly literature Functional reading Manipulating different content types Web browsing, text database searching Most screen reading is functional Intentional Functional Hillesund, T., & Noring, J. E. (2006)

34 How do people know what they’ve read? [A] strong relationship…exists between the sensory motor representation of the user and his/her treatment of the information content of the paper book or e-book… Because an electronic book is functionally closer to a computer than a traditional book […] it does not provide the external indicators to memory that the classical book does… Morineau et al, 2005

35 Delivering the book to the user Printed books Make use copy Make discovery surrogate Search surrogates, choose candidates Examine candidates Browse more candidates Choose material Online books Make discovery surrogate Search surrogates, choose candidates Examine candidates ??? Choose material Make use copy User tasks

36 Implications for mass digitization  Support production of good print copies for use  Target TOC and index for indexing & correction  Provide granular linking  Provide browse functions

37

38 References  Blanche, C., Gueguen, N., Morineau, T., & Tobin, L. (2005). The emergence of the contextual role of the e-book in cognitive processes through an ecological and functional analysis. International Journal of Human-Computer Studies, 62(3), 329- 348.  Christianson, M., & Aucoin, M. (2005). Electronic or print books: Which are used? Library Collections, Acquisitions, and Technical Services, 29(1), 71-81.  Hillesund, T., & Noring, J. E. (2006). Digital libraries and the need for a universal digital publication format. JEP: the Journal of Electronic Publishing, vol.9, no.2,  cLevine-Clark, M. (2006). Electronic book usage: A survey at the University of Denver. portal: Libraries and the Academy, 6(3), 285-299.  Su, S. (2005). Desirable search features of web-based scholarly e-book systems. Electronic Library, 23(1), 64-71.

39 Mass digitization archives  Google Books: http://books.google.com/  Internet Archive: http://archive.org/  MSN Live Book Search: http://books.live.com/  University of Michigan: http://mirlyn.lib.umich.edu/


Download ppt "Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries."

Similar presentations


Ads by Google