Presentation is loading. Please wait.

Presentation is loading. Please wait.

HATHITRUST A Shared Digital Repository HathiTrust Outside-In University of Michigan Law School June 14, 2011 Jeremy York HathiTrust Project Librarian.

Similar presentations


Presentation on theme: "HATHITRUST A Shared Digital Repository HathiTrust Outside-In University of Michigan Law School June 14, 2011 Jeremy York HathiTrust Project Librarian."— Presentation transcript:

1 HATHITRUST A Shared Digital Repository HathiTrust Outside-In University of Michigan Law School June 14, 2011 Jeremy York HathiTrust Project Librarian

2 Outline Front end What you see Backend – About (Mission and Goals) – Governance – Content – Services (including differences from Google) – How work gets done – Costs – Shared Strategies/Benefits

3 Front End

4

5

6

7

8

9

10

11

12

13

14

15

16 Descriptive headings added (hidden from GUI with CSS) Info about SSD service & link to accessibility page Images used for style are in css so no need to use alt tags Skip navigation link Access keys for navigating pages with keyboard Added labels & descriptive titles to forms & ToC table

17

18

19

20

21

22 Type of work Search – Bib and Full text ViewFull-PDF download Print on Demand Print disabilities Section 108 (preservation uses) Public domain worldwide World World if no restrictions, Partners if restrictions WorldPartners worldwide N/A Public domain in the US WorldUSUS if no restrictions, US partners if restrictions USUS Partners N/A Open Access (+Creative Commons) World World if no restrictions World with permission Partners worldwide if no restrictions N/A In copyright (and undetermin ed) WorldNot available Partners US and worldwide, where applicable Access Matrix

23 Backend

24 About

25 Partnership Arizona State University Boston University Baylor University California Digital Library Columbia University Cornell University Dartmouth College Duke University Emory University Harvard University Library Indiana University Johns Hopkins University Library of Congress Massachusetts Institute of Technology Michigan State University New York University New York Public Library North Carolina Central University North Carolina State University Northwestern University The Ohio State University The Pennsylvania State University Princeton University Purdue University Stanford University Texas A&M University Universidad Complutense de Madrid University of California Berkeley Davis Irvine Los Angeles Merced Riverside San Diego San Francisco Santa Barbara Santa Cruz The University of Chicago University of Illinois University of Illinois at Chicago The University of Iowa University of Maryland University of Michigan University of Minnesota The University of North Carolina at Chapel Hill University of Pennsylvania University of Pittsburgh University of Utah University of Virginia University of Washington University of Wisconsin- Madison Utah State University Yale University Library

26 Digital Repository Launched 2008 Initial focus on digitized book and journal content Light archive – As accessible as possible within the bounds of law

27 The Name The meaning behind the name – Hathi (hah-tee)--Hindi for elephant – Big, strong – Never forgets, wise – Secure – Trustworthy

28 Mission To contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge

29 Goals Comprehensive collection Preservation…with Access Shared strategies – Collection management, development – Preservation – Copyright – Efficient user services Openness

30 Governance

31 HathiTrust Executive Committee Strategic Advisory Board Budget/Finances Decision-making Guidance on Policy, Planning

32 Executive Committee Paul Courant, University Librarian and Dean of Libraries, UM Laine Farley, Executive Director, CDL John King, Vice Provost for Academic Information, UM Paula Kaufman, University Librarian and Dean of Libraries, UI Brian Schottlaender, University Librarian, UCSD Ed Van Gemert, Deputy Director of Libraries, UW – Madison (ex officio) Brenda Johnson, Dean of Libraries, IU Brad Wheeler, Chief Information Officer, IU John Wilkin, Executive Director of HathiTrust and Associate University Librarian, LIT, UM

33 Strategic Advisory Board Ed Van Gemert (Chair), Deputy Director of Libraries, University of Wisconsin - Madison John Butler, AUL for Information Technology, University of Minnesota Patricia Cruse, Director, Preservation, CDL Todd Grappone, AUL for Digital Initiatives & IT, UCLA Julia Kochi, Director, Digital Library and Collections, UC San Francisco Sarah Pritchard, University Librarian, Northwestern University Paul Soderdahl, Director, LIT, University of Iowa John Wilkin, Executive Director, HathiTrust (ex officio) Robert Wolven, Columbia University Strategic Advisory Board

34 Constitutional Convention October 2011 Delegates from each institution and consortium – Carry certain number of votes determined according to formula approved by Executive Committee 3-year review Proposals – Print management – Ballot proposals

35 Content

36 What is in HathiTrust? 8,825,372 Total volumes 2,407,570 Public Domain 4,819,000 Book titles 214,719 Serial titles * As of June 14, 2011

37 Content Sources * As of June 13, 2011

38 Content Distribution * As of June 13, 2011

39 Dates * As of June 13, 2011

40 Breakdown of HathiTrust book corpus by publication date Bibliographic Indeterminacy and the Scale of Problems and Opportunities of "Rights" in Digital Collection Building Bibliographic Indeterminacy and the Scale of Problems and Opportunities of "Rights" in Digital Collection Building – 2/2011

41 Breakdown of HathiTrust book corpus by publication date

42 Language Distribution (1) The top 10 languages make up ~86% of all content * As of June 13, 2011

43 Language Distribution (2) The next 40 languages make up ~13% of total * As of June 13, 2011

44 Content over time * As of June 13, 2011

45 Content Growth

46 Services

47 Services (1) Ingest – Book and Journal content Google Internet Archive In-house, other vendor digitization – Images, Audio, Born Digital (coming soon…) Two parts – Content – Bibliographic metadata

48 Services (2) Long-term preservation – Bit-level, migration – Standard and open formats (ITU G4 TIFF, JPEG2000, JPG, Unicode) – Validation, integrity, redundancy – OAIS How reliable is it? – DRAMBORA, TRAC

49 Technology - OAIS GRIN Internal Data Loading GRIN Internal Data Loading Google Internet Archive In-house Conversion Google Internet Archive In-house Conversion MARC record extensions (Aleph) Rights DB MARC record extensions (Aleph) Rights DB Page Turner HathiTrust API OAI GeoIP DB CNRI Handles [Solr] Page Turner HathiTrust API OAI GeoIP DB CNRI Handles [Solr] METS/PREMIS object TIFF G4/JPEG2000 OCR MD5 checksums METS/PREMIS object TIFF G4/JPEG2000 OCR MD5 checksums METS object PNG OCR PDF METS object PNG OCR PDF Isilon Site Replication TSM MD5 checksum validation Isilon Site Replication TSM MD5 checksum validation GROOVE (JHOVE) GROOVE (JHOVE) ; Technology

50 Quality Partner Digitization Google Digitization Quality work / Volume certification Quality

51 Services (3) Preservation…with Access – As part of preservation, service to partners, and as public good – Discovery Bibliographic (temporary catalog, OCLC/HathiTrust catalog) Full-text – Reading Interface optimized for users with print disabilities – Collections

52 Services (4) Rights Management – Rights Database – Copyright review IMLS Grant awarded to University of Michigan 2008 to determine copyright status of books published in US between 1923 and staff members, 4 institutions – Indiana University – University of Michigan – University of Minnesota – University of Wisconsin 140,000 reviewed through CRMS 77,500 (54%) in public domain

53 Copyright status of books published pre-1923 and US works published

54

55 Services (5) Data Availability – Tab-delimited inventory files – Bibliographic API – Data API – OAI feed of public domain – SFX target – Summon

56 Some Examples of Use Catalogs – UM loaded every record – Chicago links to public domain volumes owned in print – TROVE harvesting through OAI – OCLC loads records into OCLC Link Resolves – UC created SFX target Vendors – H.W. Wilson database links to public domain volumes – ProQuest full-text index via Summon

57

58

59

60 Services (6) Collaborative Development Environment – Active repository development Support for Computational Research – Datasets 120,000-volume set Google-digitized public domain – Protocol-based access – Research Center

61 How does work get done? Collective work – e.g., working groups – Perform the work of the partnership – Now 40+ people across partner institutions Distributed work – Driven by needs of institutions – able to leverage across the partnership – Projects, e.g. grant work, ingest specifications, page-turner, bibliographic data management Leverage expertise across institutions

62 Working Groups (1) Operational focus – Appointed by Executive Director in coordination with Executive Committee – Current Usability User Support Communications – Previous Development Environment Storage Research Center

63 Working Groups (2) Planning or Exploratory focus – Appointed by Strategic Advisory Board – Recommendations reviewed by SAB and XCom; may call for subsequent implementation Collections Committee Surrogates Quality, Ingest, and Error rate Discovery

64 How is work prioritized? Initial functional objectives Collective processes – Working groups and committees

65 Financial contributions of partners HathiTrust Functional Framework

66 Costs

67 Base funding from partner institutions Basic infrastructure costs Commitments in 5-year periods

68 How much does it cost? (1) Cost

69 How much does it cost? (2) $0.149/volume/year for Google-digitized $0.489/volume/year for IA-digitized $0.154/volume/year for all content $3.40 per GB

70 Cost Model 1. Based on contributed content 2. Based on overlap with print collections – Public Domain / In-copyright – Depends on Print Holdings Database Costs Lawful uses of materials Complete picture Volumes institutions own or have owned – OCLC number; Bib record ID; Condition; Holding – Status

71 Shared Strategies/Benefits

72 How Different from Google? Preservation Content Collective work Uses of materials Own trajectory Partnership – Not just about digital content or repository – Address challenges – Fulfill mission – Provide services for our communities

73 A global change in the library environment June 2010 Median duplication: 31% June 2009 Median duplication: 19% Academic print book collection already substantially duplicated in mass digitized book corpus

74 Digitized Books in Shared Repositories ~75% of mass digitized corpus is backed up in one or more shared print repositories ~3.5M titles ~2.5M

75 Shared Strategies Copyright Preservation – Digital and print Discovery / Use Bibliographic Indeterminacy Consolidate development talent Collective Attention to solving shared problems

76 How to find out more Website About section – Twitter – Monthly newsletter – – (RSS) Contact us – –

77 Thank you very much!


Download ppt "HATHITRUST A Shared Digital Repository HathiTrust Outside-In University of Michigan Law School June 14, 2011 Jeremy York HathiTrust Project Librarian."

Similar presentations


Ads by Google