Presentation is loading. Please wait.

Presentation is loading. Please wait.

HATHI TRUST A Shared Digital Repository HathiTrust Open Webinar Jeremy York Project Librarian, HathiTrust May 3 and 5, 2011.

Similar presentations

Presentation on theme: "HATHI TRUST A Shared Digital Repository HathiTrust Open Webinar Jeremy York Project Librarian, HathiTrust May 3 and 5, 2011."— Presentation transcript:

1 HATHI TRUST A Shared Digital Repository HathiTrust Open Webinar Jeremy York Project Librarian, HathiTrust May 3 and 5, 2011

2 Outline Overview Mission and Goals Content Services Governance, how the partnership operates Partnership Changing Library Landscape

3 About

4 Current Partners Arizona State University Baylor University California Digital Library Columbia University Cornell University Dartmouth College Duke University Emory University Harvard University Library Indiana University Johns Hopkins University Library of Congress Massachusetts Institute of Technology Michigan State University New York University New York Public Library North Carolina Central University North Carolina State University Northwestern University The Ohio State University The Pennsylvania State University Princeton University Purdue University Stanford University Texas A&M University Universidad Complutense de Madrid University of California Berkeley Davis Irvine Los Angeles Merced Riverside San Diego San Francisco Santa Barbara Santa Cruz The University of Chicago University of Illinois University of Illinois at Chicago The University of Iowa University of Maryland University of Michigan University of Minnesota The University of North Carolina at Chapel Hill University of Pennsylvania University of Pittsburgh University of Utah University of Virginia University of Washington University of Wisconsin- Madison Utah State University Yale University Library HathiTrust Community

5 Mission To contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge Mission and Goals

6 Universal Library Common Goal Single Entity, Many Partners HathiTrust

7 Goals Comprehensive collection Preservation…with Access Shared strategies – Collection management, development – Preservation – Copyright – Efficient user services Openness Mission and Goals

8 Content

9 What is in HathiTrust? 8,625,158 Total volumes 2,297,041 Public Domain 4,722,664 Book titles 209,930 Serial titles * As of May 1, 2011

10 Content Sources * As of May 1, 2011

11 Content Distribution * As of May 1, 2011

12 Dates * As of May 1, 2011 Statistics and Visualizations

13 Breakdown of HathiTrust book corpus by publication date Bibliographic Indeterminacy and the Scale of Problems and Opportunities of "Rights" in Digital Collection Building Bibliographic Indeterminacy and the Scale of Problems and Opportunities of "Rights" in Digital Collection Building – 2/2011

14 Breakdown of HathiTrust book corpus by publication date

15 Language Distribution (1) The top 10 languages make up ~86% of all content Statistics and Visualizations * As of May 1, 2011

16 Language Distribution (2) The next 40 languages make up ~13% of total Statistics and Visualizations * As of May 1, 2011

17 Content over time * As of May 1, 2011

18 Content Growth

19 A global change in the library environment June 2010 Median duplication: 31% June 2009 Median duplication: 19% Academic print book collection already substantially duplicated in mass digitized book corpus

20 Digitized Books in Shared Repositories ~75% of mass digitized corpus is backed up in one or more shared print repositories ~3.5M titles ~2.5M

21 Services

22 Services (1) Ingest – Book and Journal content Google Internet Archive In-house, other vendor digitization – Images, Audio, Born digital (coming soon…) Two parts – Bibliographic Data – Content Getting Content Into HathiTrustGetting Content Into HathiTrust | Building a Future by Preserving our PastBuilding a Future by Preserving our Past

23 Services (2) Long-term preservation – Bit-level, migration – Standard and open formats (ITU G4 TIFF, JPEG2000, JPG, Unicode) – Validation, integrity, redundancy – OAIS How reliable is it? – DRAMBORA, TRAC PreservationPreservation | Technology | TRACTechnologyTRAC

24 Technology - OAIS GRIN Internal Data Loading GRIN Internal Data Loading Google Internet Archive In-house Conversion Google Internet Archive In-house Conversion MARC record extensions (Aleph) Rights DB MARC record extensions (Aleph) Rights DB Page Turner HathiTrust API OAI GeoIP DB CNRI Handles [Solr] Page Turner HathiTrust API OAI GeoIP DB CNRI Handles [Solr] METS/PREMIS object TIFF G4/JPEG2000 OCR MD5 checksums METS/PREMIS object TIFF G4/JPEG2000 OCR MD5 checksums METS object PNG OCR PDF METS object PNG OCR PDF Isilon Site Replication TSM MD5 checksum validation Isilon Site Replication TSM MD5 checksum validation GROOVE (JHOVE) GROOVE (JHOVE) ; Technology

25 Quality Partner Digitization Google Digitization Quality work / Volume certification Quality

26 Services (3) Preservation…with Access – As part of preservation, service to partners, and as public good – Discovery Bibliographic (temporary catalog, OCLC/HathiTrust catalog) Full-text – Reading Interface optimized for users with print disabilities – Collections Searching, Reading, and Building Collections

27 Type of work Search – Bib and Full text ViewFull-PDF download Print on Demand Print disabilities Section 108 (preservation uses) Public domain worldwide World World if no restrictions, Partners if restrictions WorldPartners worldwide N/A Public domain in the US WorldUSUS if no restrictions, US partners if restrictions USUS Partners N/A Open Access (+Creative Commons) World World if no restrictions World with permission Partners worldwide if no restrictions N/A In copyright (and undetermin ed) WorldNot available Partners US and worldwide, where applicable Access Matrix

28 Services (4) Rights Management – Rights Database – Copyright review IMLS Grant awarded to University of Michigan 2008 to determine copyright status of books published in US between 1923 and 1963 18 staff members, 4 institutions – Indiana University – University of Michigan – University of Minnesota – University of Wisconsin 125k reviewed through CRMS 67,000 (54%) in public domain Copyright

29 Copyright status of books published pre-1923 and US works published 1923-1963


31 Services (5) Data Availability – Tab-delimited inventory files – Bibliographic API – Data API – OAI feed of public domain – SFX target – Summon HathifilesHathifiles | Data Distribution and APIsData Distribution and APIs

32 Services (6) Collaborative Development Environment – Active repository development Support for Computational Research – Datasets 120,000-volume set Google-digitized public domain – Protocol-based access – Research Center Datasets

33 How Different from Google? Preservation Content Collective work Uses of materials Own trajectory Partnership – Not just about digital content or repository – Address challenges – Fulfill mission – Provide services for our communities

34 Governance and Work

35 Governance HathiTrust Executive Committee Strategic Advisory Board Budget/Finances Decision-making Guidance on Policy, Planning Governance

36 Executive Committee Paul Courant, University Librarian and Dean of Libraries, UM Laine Farley, Executive Director, CDL John King, Vice Provost for Academic Information, UM Paula Kaufman, University Librarian and Dean of Libraries, UI Brian Schottlaender, University Librarian, UCSD Ed Van Gemert, Deputy Director of Libraries, UW – Madison (ex officio) Brenda Johnson, Dean of Libraries, IU Brad Wheeler, Chief Information Officer, IU John Wilkin, Executive Director of HathiTrust and Associate University Librarian, LIT, UM Executive Committee

37 Strategic Advisory Board Ed Van Gemert (Chair), Deputy Director of Libraries, UW - Madison John Butler, Associate University Librarian for Information Technology, U Minn Patricia Cruse, Director, Preservation, CDL Bernie Hurley, Director, Library Technologies, UC Berkeley R. Bruce Miller, University Librarian, UC - Merced Sarah Pritchard, University Librarian, Northwestern Paul Soderdahl, Director, LIT, U Iowa John Wilkin, Executive Director, HathiTrust (ex officio) Robert Wolven, Columbia University Strategic Advisory Board

38 Constitutional Convention October 2011 Delegates from each institution and consortium – Carry certain number of votes determined according to formula approved by Executive Committee 3-year review Proposals – Print management – Ballot proposals

39 How does work get done? Collective work – e.g., working groups – Perform the work of the partnership – Now 40+ people across partner institutions Distributed work – Driven by needs of institutions – able to leverage across the partnership – Projects, e.g. grant work, ingest specifications, page-turner, bibliographic data management Leverage expertise across institutions Working Groups and Committees Working Groups and Committees | ProjectsProjects

40 Working Groups (1) Operational focus – Appointed by Executive Director in coordination with Executive Committee – Current Usability User Support Communications – Previous Development Environment Storage Research Center

41 Working Groups (2) Planning or Exploratory focus – Appointed by Strategic Advisory Board – Recommendations reviewed by SAB and XCom; may call for subsequent implementation Collections Committee Surrogates Quality, Ingest, and Error rate Discovery

42 How is work prioritized? Initial functional objectives Collective processes – Working groups and committees Functional ObjectivesFunctional Objectives | Working Groups and CommitteesWorking Groups and Committees

43 e-Commerce Print on Demand Content Ingest Transformation Validation Content Access PageTurner Collection Builder Large-scale Search Bibliographic Catalog Research Center APIs Quality Assurance Quality Review Content Certification User Services Usability User support (helpdesk) Outreach Project website Monthly newsletter Papers and presentations Communication with potential partners Surveys, general inquiries Repository evaluation and audit (e.g., DRAMBORA, TRAC) Legal Risk management (use of materials) Partner agreements Advocacy Governance Budget, Finances Decision-making Policy Planning Enterprise Management Communication and Coordination with partner institutions Project management Repository Administration Hardware configuration and maintenance Web and application server configuration and maintenance Security Permissions Logging Repository Administration Data management (content storage, backup, integrity checks, deletion) Hardware selection and replacement Content and Metadata specifications Disaster Recovery Processes for ensuring content integrity Rights Management Copyright determination Copyright review Copyright information management (database) Rightsholder permissions Bibliographic Data Management Entity description (record-level) Object identification (item-level) Data availability Collection Development Digital Expansion beyond books and journals (born-digital, images and maps, audio) Selection of content (for non- Google volume ingest and pilots projects) Print Cloud Library (effect of digital on print) Financial contributions of partners HathiTrust Functional Framework Functional Framework

44 Partnership

45 Who can become a partner? – Institutions worldwide – Libraries with print holdings Eligibility and Agreements

46 What are the benefits? (1) Cost-effective long-term preservation and access services for digitized content – Commitments on digital content facilitate decisions about digitization efforts and print collection management For those with content, immediately offering long-term preservation, bibliographic and full-text search, collection-building With content or not, full viewing and downloading capabilities for public domain materials and materials for which we have received permissions Features and Benefits Features and Benefits | New Cost Model FAQNew Cost Model FAQ

47 What are the benefits? (2) Specialized access to public domain and in-copyright materials for users with print disabilities Other lawful uses of in copyright materials such as Section 108 uses (print replacement copies, digital access to applicable works) HathiTrust encourages participation in initiatives and resources geared toward – Shared collection development and management (e.g., copyright review work, print holdings database, de-duplication, collaboration with other organizations and initiatives) – Participation in governance and collaborative initiatives – Defining future directions of the shared library.

48 Whats involved? Contract – Sustaining – Content-Contributing Yearly fees Commitment – 5-year periods Shibboleth Print Holdings

49 How much does it cost? (1) Cost

50 How much does it cost? (2) $0.149/volume/year for Google-digitized $0.489/volume/year for IA-digitized $0.154/volume/year for all content $3.40 per GB

51 How does it work? (1) Sustaining membership is base – Pricing model for all partners beginning 2013 – Based on overlap of HathiTrust volumes with institutions print holdings – Share in infrastructure costs for public domain volumes: (PD*X*C)/N – Share in infrastructure costs for in copyright volumes based on holdings For a given in­copyright volume: IC=(C*X)/H

52 How does it work? (2) Main factors in costs are – Amount of content – Number of partners – Also a flexible multiplier designed to pay for programmatic activities Tend to result in lower costs and more benefits over time

53 How does it work? (3) In order to support these calculations – Need print holdings database (2013) – Update mechanisms – Manual remediation Using estimates currently – Based on infrastructure costs of anticipated content – Estimated partnership growth – Institution total volume counts Cost

54 How does it work? (4) Does not exclude contribution of content If contribute content, costs covered up to amount that would be paid as Sustaining partner – Barring additional costs that might be needed to accommodate content (e.g., specialized load routines, generation of OCR) Above that, pay per-GB cost ($3.40)

55 How does it work? (5) Partners share in costs of sustaining common resource Share in uses of relevant materials Voice in future directions Costs to institutions go down Quality of services increases – Realize in aggregated collection, something dont get through distributed search or federation Free riders?

56 Changing Library Landscape Rapidly changing landscape Libraries are making these decisions but they are more and more collective decisions We cannot afford anymore to do work separately that could be done collaboratively

57 HathiTrust overall benefits to libraries Digital Curation – Drive costs down – Reduce bibliographic indeterminacy – Make meaningful decisions about formats and quality – Increase discoverability, use – Consolidate development talent – Improve strength of archiving Print Curation – Means to associate our print holdings – Coordinated record-keeping Subsidiary benefits – Quantify problems – Collective attention to solving shared problems

58 How to find out more Web site About section: Twitter: Monthly newsletter: RSS: Contact us: Soon: Facebook, blog

59 Thank you very much

Download ppt "HATHI TRUST A Shared Digital Repository HathiTrust Open Webinar Jeremy York Project Librarian, HathiTrust May 3 and 5, 2011."

Similar presentations

Ads by Google