Presentation is loading. Please wait.

Presentation is loading. Please wait.

HATHI TRUST A Shared Digital Repository HathiTrust How We Can Make A Difference Jeremy York Yale University November 3, 2010.

Similar presentations

Presentation on theme: "HATHI TRUST A Shared Digital Repository HathiTrust How We Can Make A Difference Jeremy York Yale University November 3, 2010."— Presentation transcript:

1 HATHI TRUST A Shared Digital Repository HathiTrust How We Can Make A Difference Jeremy York Yale University November 3, 2010

2 Current Partners Committee on Institutional Cooperation (CIC) Columbia University Cornell University Dartmouth College New York Public Library Yale University Library Princeton University Library Triangle Research Libraries Network University of California system University of Virginia Utah State University

3 Mission To contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge

4 Universal Library Common Goal Single Entity, Many Partners HathiTrust

5 Goals Comprehensive collection Preservation…with Access Shared strategies – Collection management, development – Preservation – Copyright – Efficient user services Openness

6 Outline Content Services Governance How work is done What work there is How we can make a difference

7 Content Distribution 7,130,606 – Total volumes 1,678,161 – Public Domain 4,071,294 Book titles 170,535 Serial titles * As of November 3, 2010

8 Language Distribution (1) * As of November 2, 2010

9 Language Distribution (2) The next 40 languages make up ~13% of total * As of November 2, 2010

10 Dates * As of November 2, 2010

11 Content Growth

12 A global change in the library environment June 2010 Median duplication: 31% June 2009 Median duplication: 19% Academic print book collection already substantially duplicated in mass digitized book corpus

13 Digitized Books in Shared Repositories ~75% of mass digitized corpus is backed up in one or more shared print repositories ~3.5M titles ~2.5M

14 Services (1) Ingest – Google, Internet Archive, Local – Working toward sustainable model for ingest of content from diverse sources Long-term preservation – Bit-level, migration – Standard and open formats (ITU G4 TIFF, JPEG2000, JPEG, Unicode) – OAIS, TRAC – Validation, integrity, redundancy

15 Repository Philosophy/Design Consistency Standardization Simplicity (in design, not function) Practicality Sustainability


17 Services (2) Rights Management – Automatic review – Manual review (Michigan, Indiana, Minnesota, Wisconsin) Since 2007 IMLS in staff in all US ,497 US publications from ,000 reviewed, 175,000 remaining candidates 52,000 in public domain

18 Services (3) Preservation…with Access Brings concerns of research libraries to bear on the way the scholarly record is cared for and made available – Bibliographic Search – Full-text search – Collections – Full-PDF download of public domain – Scholarly Resource














32 Services (4) Data Distribution – Metadata files, Bib API, Data API, OAI Print on Demand Collaborative Development Environment Coming soon… – Non-Book/Non-Journal Ingest – Computational Research

33 Computational Research Data distribution Protocol-based access Research Center

34 Quality Partner Digitization Google Digitization Volume Certification

35 Outlook Leverage partner resources and input to create and maintain the library of the future This is our library The more we use it, the better it will become

36 Governance HathiTrust Executive Committee Strategic Advisory Board Budget/Finances Decision-making Guidance on Policy, Planning

37 Executive Committee Paul Courant, University Librarian and Dean of Libraries, UM Laine Farley, Executive Director, CDL John King, Vice Provost for Academic Information, UM Paula Kaufman, University Librarian and Dean of Libraries, UI Brian Schottlaender, University Librarian, UCSD Ed Van Gemert, Deputy Director of Libraries, UW – Madison (ex officio) Brenda Johnson, Dean of Libraries, IU Brad Wheeler, Chief Information Officer, IU John Wilkin, Executive Director of HathiTrust and Associate University Librarian, LIT, UM

38 Strategic Advisory Board Ed Van Gemert (Chair), Deputy Director of Libraries, UW - Madison John Butler, Associate University Librarian for Information Technology, U Minn Patricia Cruse, Director, Preservation, CDL Bernie Hurley, Director, Library Technologies, UC Berkeley R. Bruce Miller, University Librarian, UC - Merced Sarah Pritchard, University Librarian, Northwestern Paul Soderdahl, Director, LIT, U Iowa John Wilkin, Executive Director, HathiTrust (ex officio) Robert Wolven, Columbia University

39 How does work get done? Collective work – e.g., working groups Distributed work – Projects, e.g. grant work, ingest specifications, page turner, bibliographic data management

40 Working Groups (1) Operational focus – Appointed by Executive Director in coordination with Executive Committee Usability Communications Development Environment Storage Research Center

41 Working Groups (2) Planning or Exploratory focus – Appointed by Strategic Advisory Board – Recommendations reviewed by SAB and XCom; may call for subsequent implementation Collections Committee Surrogates Quality, Ingest, and Error rate Discovery

42 e-Commerce Print on Demand Content Ingest Transformation Validation Content Access PageTurner Collection Builder Large-scale Search Bibliographic Catalog Research Center APIs Quality Assurance Quality Review Content Certification User Services Usability User support (helpdesk) Outreach Project website Monthly newsletter Papers and presentations Communication with potential partners Surveys, general inquiries Repository evaluation and audit (e.g., DRAMBORA, TRAC) Legal Risk management (use of materials) Partner agreements Advocacy Governance Budget, Finances Decision-making Policy Planning Enterprise Management Communication and Coordination with partner institutions Project management Repository Administration Hardware configuration and maintenance Web and application server configuration and maintenance Security Permissions Logging Repository Administration Data management (content storage, backup, integrity checks, deletion) Hardware selection and replacement Content and Metadata specifications Disaster Recovery Processes for ensuring content integrity Rights Management Copyright determination Copyright review Copyright information management (database) Rightsholder permissions Bibliographic Data Management Entity description (record-level) Object identification (item-level) Data availability Collection Development Digital Expansion beyond books and journals (born-digital, images and maps, audio) Selection of content (for non- Google volume ingest and pilots projects) Print Cloud Library (effect of digital on print) Financial contributions of partners HathiTrust Functional Framework

43 What work is there? Usage Reporting Quality Copyright Review Specifications Metadata Development Environment Other?

44 Cost Model 1 Reasonable costs of sustaining the archive, includes cost of replacement, capital fund

45 Cost Model 1 Economies of scale keep costs low – $0.145/volume/year for Google-digitized – about $0.45/volume/year for IA-digitized Advantages not fully known until you jump in

46 For public domain volumes: (PD*X*C)/N For a given in copyright volume: IC=(C*X)/H Share in costs of curation Share in uses of relevant materials Voice in future directions Free riders? Cost Model 2

47 Sustaining common resource Costs go down Quality of services increases – Realize in aggregated collection, something dont get through distributed search or federation

48 Cost Model 2: Timeline & Requirements Timeline: – Implement in 2013 – Accept new partners now with costs based on overlap calculations Requirements: – Print holdings database – Update mechanisms – Manual remediation

49 Print Holdings Database Print holdings database will also benefit – De-duplication Compromises user experience, obscures collection development needs – Management of print volumes Information to withdraw volumes (journals) – Legal uses of copyright materials Section 108, 121, ADA uses will depend knowledge of which institutions own(ed) which materials

50 Future Directions (1) Locally-digitized partner content Usage reporting Coordinate digital and print resources (holdings database) Computational Research Quality Strategies for openness Collaborative Development Extending Services through Shibboleth Non-book, non-journal content

51 Future Directions (2) Born-digital content New Bibliographic Management Compliance with TRAC Grant projects OCLC Catalog 3-year review Improvements to Large-scale Search Improvements to PageTurner Ingest Reporting

52 How can we make a difference? Digital Curation – Drive costs down – Reduce bibliographic indeterminacy – Make meaningful decisions about formats and quality – Increase discoverability – Consolidate development talent – Improve strength of archiving Print Curation – Means to associate our print holdings – Coordinated record-keeping Subsidiary benefits – Improve description – Quantify problems – Collective attention to solving shared problems

53 Thank you!

Download ppt "HATHI TRUST A Shared Digital Repository HathiTrust How We Can Make A Difference Jeremy York Yale University November 3, 2010."

Similar presentations

Ads by Google