Presentation is loading. Please wait.

Presentation is loading. Please wait.

HathiTrust Digital Library

Similar presentations

Presentation on theme: "HathiTrust Digital Library"— Presentation transcript:

1 HathiTrust Digital Library
Cooperation for Preservation

2 Outline About HathiTrust Background What we do How we do it
Mission & Goals Background What we do Services How we do it Governance Partnership & Resources Technology Future Directions

3 About

4 What is HathiTrust Shared Digital Repository
Launched 2008 by 25 institutions (now 26) Initial focus on digitized book and journal content Expanding to non-book/non-journal, born digital “Light” archive Collaboration Preservation and access Print collections Local services Public Good

5 Background

6 History Michigan Digitization Project 2004
“…U of M shall have the right to use the U of M Digital Copy, in whole or in part at U of M's sole discretion, as part of services offered in cooperation with partner research libraries such as the institutions in the Digital Library Federation…”

7 History Collective Agreement with CIC Announced in June 2007
CIC agreed to establish a shared digital repository

8 CIC Shared Digital Repository
History CIC Shared Digital Repository HathiTrust

9 The Partners When announced in October 2008, partners included:
University of California system CIC (Committee on Institutional Cooperation) University of Virginia University of Chicago University of Illinois Indiana University University of Iowa University of Michigan Michigan State University University of Minnesota Northwestern University Ohio State University Pennsylvania State University Purdue University University of Wisconsin-Madison Columbia University

10 The Name The meaning behind the name
Hathi (hah-tee)--Hindi for elephant Big, strong Never forgets, wise Secure Trustworthy

11 Content Distribution As of February 1: 5,323,716 - Total
764,481 - Public Domain

12 Content Growth

13 What we do

14 Services Bit-level preservation and migration Viewing Redistribution
Long-term preservation Bit-level preservation and migration Access Viewing Redistribution Print disabilities Section 108 Rights management Rights database Copyright review Publish virtual collections Collection Builder Availability of data Metadata files Bib API Data API Google ingest Inbound validation Fixity checks Bibliographic search Temporary catalog Version 1 permanent catalog April 2010 Full-text search November 2009 Print on Demand UM public domain UM Press

15 How we do it

16 Strategic Advisory Board
Governance Budget/Finances Decision-making Policy Planning Executive Committee Strategic Advisory Board HathiTrust

17 Executive Committee Paul Courant, University Librarian and Dean of Libraries, UM Laine Farley, Executive Director, CDL John King, Vice Provost for Academic Information, UM Paula Kaufman, University Librarian and Dean of Libraries, UI Brian Schottlaender, University Librarian, UCSD Ed Van Gemert, Director of Libraries, UW - Madison Brenda Johnson, Dean of Libraries, IU Brad Wheeler, Chief Information Officer, IU John Wilkin, Executive Director of HathiTrust and Associate University Library, LIT, UM

18 Strategic Advisory Board
Ed Van Gemert (Chair), Director of Libraries, UW - Madison John Butler, Associate University Librarian for Information Technology, U Minn Patricia Cruse, Director, Preservation, CDL Bernie Hurley, Director, Library Technologies, UC Berkeley R. Bruce Miller, University Librarian, UC - Merced Sarah Pritchard, University Librarian, Northwestern Paul Soderdahl, Director, LIT, U Iowa John Wilkin, Executive Director, HathiTrust (ex officio)

19 Partnership & Resources (1)
Funded for a initial 5 years with base-funding from partners Budget – separately held within UMich budget system, managed by the Executive Committee Cost Model – Per GB cost of storage per year with a one-time fee on new content to build a capital fund Review in 3rd yr of each 5 yr period

20 Partnership & Resources (2)
Staff/Expertise – highly integrated Project managers, IT and communications staff, copyright experts, administrators (UM, Indiana and UC taking the lead) Working groups UM recently hired a Digital Preservation Librarian Shared development space

21 HathiTrust Functional Framework
Governance Budget, Finances Decision-making Policy Planning Enterprise Management Communication and Coordination with partner institutions Project management Repository Administration Hardware configuration and maintenance Web and application server configuration and maintenance Security Permissions Logging Data management (content storage, backup, integrity checks, deletion) Hardware selection and replacement Content and Metadata specifications Disaster Recovery Processes for ensuring content integrity Rights Management Copyright determination Copyright review Copyright information management (database) Rightsholder permissions Bibliographic Data Management Entity description (record-level) Object identification (item-level) Data availability Collection Development Digital Expansion beyond books and journals (born-digital, images and maps, audio) Selection of content (for non-Google volume ingest and pilots projects) Print Cloud Library (effect of digital on print) e-Commerce Print on Demand Content Ingest Transformation Validation Content Access PageTurner Collection Builder Large-scale Search Bibliographic Catalog Research Center APIs Quality Assurance Quality Review Content Certification User Services Usability User support (helpdesk) Outreach Project website Monthly newsletter Papers and presentations Communication with potential partners Surveys, general inquiries Repository evaluation and audit (e.g., DRAMBORA, TRAC) Legal Risk management (use of materials) Partner agreements Advocacy  Financial contributions of partners HathiTrust Functional Framework

22 Partnership & Resources (3)
Toward a Cloud Library CLIR, Mellon Foundation OCLC Research, NYU, HathiTrust, Recap Libraries Objective: Characterize the near-term opportunity for externalizing management of academic research collections leveraging capacity of large-scale shared print and digital repositories* Outcomes: opportunity and risk assessment based on aggregate collection analysis; draft service agreement enabling generic consumer library to selectively outsource preservation and access of low-use research collections to large-scale print and digital repositories *From the RLG Partner Update January 7, 2010

23 Partnership & Resources (4)
CRL TRAC Audit Portico and HathiTrust assessments timely “Certification will augment CRL’s strategic archiving of print, and support a responsible transition to electronic-only formats where appropriate.” Work with UC to design shared print journal archiving effort “With this hybrid strategy CRL hopes to enable its community to accelerate the shift to electronic-only resources in a careful and responsible manner.” *

24 Partnership & Resources (5)
New cost model Based on benefits to institutions Public Domain In-copyright Volumes “held”

25 Partnership & Resources (6)
Timeline: Implement in 2013 Accept new partners now with costs based on overlap calculations Requirements: Print holdings database Update mechanisms Manual remediation

26 Technology - OAIS ; Page Turner HathiTrust API MARC record extensions
GeoIP DB CNRI Handles [Solr] MARC record extensions (Aleph) Rights DB GROOVE (JHOVE) Google [OCA] In-house Conversion ; GRIN Internal Data Loading METS object PNG OCR PDF METS/PREMIS object TIFF G4/JPEG2000 OCR MD5 checksums Isilon Site Replication TSM MD5 checksum validation

27 Technology – Architecture
Inbound validation, standards-based object storage and related metadata Storage in Ann Arbor and Indianapolis Encrypted backup to 3rd location Rights database for rights metadata Online catalog as source and storage for descriptive metadata

28 Technology - Ingest Automatic validation in GROOVE
Check barcode check digit using Luhn algorithm Fixity check on JPG2000, TIFF, UTF8 using MD5 Well-formedness and embedded metadata check on JPG2000, TIFF, UTF8 using JHOVE Creation of METS and PREMIS

29 Technology - Repository
Isilon storage Simple filesystem layout One directory per volume, zip file and METS file Use of a namespace allows for conflicting identifiers Namespaces for institutions and, if needed, types of identifiers within the institution

30 Technology – METS Object
Why METS? Can serve as Archival Information Package and a Dissemination Information Package Designed to record the relationship between pieces of complex digital objects Can be created automatically as texts are loaded or reloaded Preservation actions (PREMIS)

31 Technology – METS Object
What’s there? metsHdr with an ID and CREATEDATE 2 dmdSecs: Marcxml and mdRef amdSec containing one techMD with PREMIS metadata fileSec with 4 fileGrps (zip, images, OCR, hOCR) Physical structMap tying together files with metadata (pg. numbers and features)

32 Future Directions

33 Future Directions (1) 3-year review SAB OCLC catalog Quality
De-duplication TRAC compliance Current and ongoing areas Shibboleth Full-PDF Collection Builder Section 108 Users with print disabilities Non-Google print content IA-digitized locally-digitized Non-book/non-journal Audio pilot Images (maps) Born-digital Beginning to investigate ePub as a delivery format Openness Data API

34 Future Directions (2) Collaborative Development PageTurner
Advanced search Search facets Collection Builder Fixity checking Isilon software June 2010 Large-scale Search CB Integration Index optimizing New hardware Ingest reporting Wisconsin Bibliographic management University of California Content validation Grant projects NSF EAGER Mellon Quality Usage reporting Partner Institutions Holdings database Data mining tools Research Center Data distribution Tools such as SEASR

35 Links Catalog, Full-text search, and Collection Builder
METS and PREMIS implementation Technical profile: Technical flow diagram Rights management TRAC

36 Thank You!

Download ppt "HathiTrust Digital Library"

Similar presentations

Ads by Google