HATHI TRUST A Shared Digital Repository HathiTrust 101 John Wilkin and Jeremy York August 27, 2010.

Slides:



Advertisements
Similar presentations
Beyond the Google Book: the Future of the Digital Library Cory Snavely Library IT Core Services manager University of Michigan April 20, 2010.
Advertisements

HathiTrust Digital Library
HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservation Infrastructure of HathiTrust Digital Library Jeremy York.
HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Is There A Past In Your Future? Princeton University February 2010.
KAT HAGEDORN HATHITRUST SPECIAL PROJECTS COORDINATOR UNIVERSITY OF MICHIGAN LIBRARIES OCTOBER 9, 2009 Seamless Sharing: NYU, HathiTrust, ReCAP and the.
HathiTrust: Building the Universal Collection John Wilkin 18 May 2009.
This Library Never Forgets Preservation, Cooperation, and the Making of HathiTrust Digital Library Jeremy York Project Librarian HathiTrust Digital Library.
HATHI TRUST A Shared Digital Repository Unpacking HathiTrusts New Cost Model Jeremy York Project Librarian, HathiTrust SUNY July 15, 2011.
HathiTrust: A Big Idea with Bold Plans
HATHI TRUST A Shared Digital Repository HathiTrust Open Webinar Jeremy York Project Librarian, HathiTrust May 3 and 5, 2011.
HATHI TRUST A Shared Digital Repository HathiTrust Overview Julie Bobay, Heather Christenson, and John Wilkin April 12, 2011.
Building the Universal Library: The Promise and Challenges of HathiTrust John Wilkin 2 April 2009.
HathiTrust Sharing a Federal Print Repository: Issues and Opportunities May 25, 2011 Heather Christenson.
HATHI TRUST A Shared Digital Repository Digital Preservation, HathiTrust, and the Reimagination of the Library Landscape Jeremy York Iceland August 5,
HATHI TRUST A Shared Digital Repository HathiTrust How We Can Make A Difference Jeremy York Yale University November 3, 2010.
What is HathiTrust and How Can it Make a Difference? Sourcing and Scaling brought to the collective collection.
HathiTrust and Print Storage Building around a digital core.
What is HathiTrust and Why is it relevant to research libraries? Sourcing and Scaling brought to the collective collection.
How HathiTrust Serves the UC Community Users Council May 21, 2012 Heather Christenson, California Digital Library.
Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries.
HATHI TRUST A Shared Digital Repository HathiTrust, Collections, and Collaboration COLD 2011 Spring Meeting Jeremy York May 20, 2011.
HATHITRUST A Shared Digital Repository HathiTrust Outside-In University of Michigan Law School June 14, 2011 Jeremy York HathiTrust Project Librarian.
KAT HAGEDORN HATHITRUST SPECIAL PROJECTS COORDINATOR UNIVERSITY OF MICHIGAN LIBRARIES OCTOBER 9, 2009 Seamless Sharing: NYU, HathiTrust, ReCAP and the.
E-Content Service Group Virtual Meeting Digital Preservation: How to Get Started.
Digital Preservation A Matter of Trust. Context * As of March 5, 2011.
HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009.
HATHITRUST A Shared Digital Repository We’re Preserving the Past, What About the Present? NISO Webinar: Ensuring the Preservation of E-Books May 23, 2012.
What’s Next for HathiTrust?. We’re Growing Up! Partnership Arizona State University Baylor University Boston University California Digital Library Columbia.
HATHITRUST A Shared Digital Repository HathiTrust current work, challenges, and opportunities for public libraries Creating a Blueprint for a National.
HATHITRUST A Shared Digital Repository HathiTrust as a Model for Preservation and Access Jeremy York Media Preservation Conference April 17, 2013.
HATHI TRUST A Shared Digital Repository Digital Repositories for Preservation and Access Digital Directions 2013 Jeremy York July 22, 2013 Unless otherwise.
HATHITRUST A Shared Digital Repository Bibliographic Metadata and HathiTrust ALCTS CaMMS Catalog Management Interest Group Meeting American Library Association.
Digital archival storage for the University of Michigan Library collections.
HathiTrust Constitutional Convention Session #2: Report on 3-year review and Q & A Ed Van Gemert, Chair, Strategic Advisory Board Patricia Cruse, Member,
HATHITRUST A Shared Digital Repository HathiTrust on the Move A Growing Partnership Taking Stock and Looking Ahead National Library of Medecine October.
PREMIS in Thought: Data Center for LC Digital Holdings Ardys Kozbial, Arwen Hutt, David Minor February 11, 2008.
HATHITRUST A Shared Digital Repository A Preservation Infrastructure Built to Last: Preservation, Community, and HathiTrust UNESCO Memory of the World.
HATHITRUST A Shared Digital Repository HathiTrust Overview: Partnership and Services Jeremy York Wesleyan University Web Presentation February 18, 2014.
HATHI TRUST A Shared Digital Repository Columbia University and HathiTrust Collaboration at a new level.
HATHITRUST A Shared Digital Repository HathiTrust Past, Present, and Future A Brief Introduction.
HATHITRUST A Shared Digital Repository More, Better, Together: HathiTrust Accomplishments and Aspirations The Researcher of Tomorrow Universidad Complutense.
HathiTrust – How To By Dr. Rob McGeachin 20 th Annual AgNIC Meeting May 7, 2015.
HATHITRUST A Shared Digital Repository HathiTrust: Putting Research in Context HTRC UnCamp September 10, 2012 John Wilkin, Executive Director, HathiTrust.
HATHITRUST A Shared Digital Repository Collaborating Globally, Planning Locally HathiTrust and New Opportunities in Collection Management GWLA/UNM: Emerging.
HATHITRUST A Shared Digital Repository HathiTrust Infrastructure and Information Organization November 7, 2011 Jeremy York Project Librarian, HathiTrust.
HathiTrust Digital Library. Overview ›Began in 2008 ›Large scale digital preservation repository ›Partnership of major research libraries ›Focus on both.
HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing.
Breana McCracken University of Illinois at Urbana-Champaign HathiTrust and Copyright Future Implications - Strong precedent for libraries to continue to.
HATHITRUST A Shared Digital Repository HathiTrust and TRAC DigitalPreservation 2012 July 25, 2012 Jeremy York, Project Librarian, HathiTrust.
H ATHI T RUST HTTP :// WWW. HATHITRUST. ORG Large-Scale Digital Initiatives and their potential impact on the Maine Shared Collections Strategy Colby College.
Challenges and Opportunities for Academic Libraries Collaborative Imperatives to Support Collections, Digital Initiatives, and New Services for a Changing.
HathiTrust’s Past, Present and Future. Short- and Long-term Functional Objectives Short-term Page turner mechanism (and Mobile!) Branding (overall initiative;
Author(s): Jeremy York, 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Noncommercial–Share.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
HATHITRUST A Shared Digital Repository HathiTrust and the Future of Research Libraries American Antiquarian Society March 31, 2012 Jeremy York, Project.
HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar.
HATHI TRUST A Shared Digital Repository Use of PREMIS for Internet Archive AIPs September 22, 2010.
Preservation Program Digital Preservation Program Digital Preservation Services: Extending tools to meet campus needs Patricia Cruse, Director, Digital.
HATHITRUST A Shared Digital Repository Institution Uses of HathiTrust Jeremy York University of Maine May 24, 2013.
HathiTrust: Collaboration in Building the Universal Collection John Wilkin 1 October 2009.
HATHITRUST A Shared Digital Repository HathiTrust Large Digital Libraries: Beyond Google Books Modern Language Association January 5, 2012 Jeremy York,
SEDAC Long-Term Archive Development Robert R. Downs Socioeconomic Data and Applications Center Center for International Earth Science Information Network.
Barbara Preece ICOLC, April Mark Sandler Center for Library Initiatives Chicago Illinois Indiana Iowa Michigan Michigan State Minnesota Northwestern.
HathiTrust: A valuable and visionary Partnership.
HathiTrust--a GovDocs Repository? Brian Vetruba, Catalog Librarian/Germanic Studies Librarian Washington University in St. Louis Leveraging.
HathiTrust Digital Library Interface and Services
Building the Universal Library: Introducing HathiTrust
HathiTrust And Its Research Center
Robin Dale RLG OAIS Functionality Robin Dale RLG
digital archival storage
Presentation transcript:

HATHI TRUST A Shared Digital Repository HathiTrust 101 John Wilkin and Jeremy York August 27, 2010

Outline About HathiTrust – Mission & Goals Governance Content What we do (services) Partnership & Resources Technology Future Directions

Current Partners – Columbia University – New York Public Library – University of California system – CIC (Committee on Institutional Cooperation) – Triangle Research Library Network – University of Virginia – Yale University University of Chicago University of Illinois Indiana University University of Iowa University of Michigan Michigan State University University of Minnesota Northwestern University Ohio State University Pennsylvania State University Purdue University University of Wisconsin-Madison

Universal Digital Library Common Goal Single Entity, Many Partners HathiTrust

Governance HathiTrust Executive Committee Strategic Advisory Board Budget/Finances Decision-making Guidance on Policy, Planning

Executive Committee Paul Courant, University Librarian and Dean of Libraries, UM Laine Farley, Executive Director, CDL John King, Vice Provost for Academic Information, UM Paula Kaufman, University Librarian and Dean of Libraries, UI Brian Schottlaender, University Librarian, UCSD Ed Van Gemert, Deputy Director of Libraries, UW – Madison (ex officio) Brenda Johnson, Dean of Libraries, IU Brad Wheeler, Chief Information Officer, IU John Wilkin, Executive Director of HathiTrust and Associate University Librarian, LIT, UM

Strategic Advisory Board Ed Van Gemert (Chair), Deputy Director of Libraries, UW - Madison John Butler, Associate University Librarian for Information Technology, U Minn Patricia Cruse, Director, Preservation, CDL Bernie Hurley, Director, Library Technologies, UC Berkeley R. Bruce Miller, University Librarian, UC - Merced Sarah Pritchard, University Librarian, Northwestern Paul Soderdahl, Director, LIT, U Iowa John Wilkin, Executive Director, HathiTrust (ex officio) Robert Wolven, Columbia University

Content Distribution 6,331,718 – Total 1,215,210 – Public Domain * As of July 25, 2010

Language Distribution (1) * As of July 25, 2010

Language Distribution (2) The next 40 languages make up ~13% of total * As of July 25, 2010

Dates * As of July 25, 2010

Originating Institution * As of July 25, 2010

Content over time * As of July 25, 2010

Content Growth

Bit-level preservation and migration Long-term preservation Viewing Redistribution Print disabilities Section 108 Content Access Rights database Copyright review Rights management Temporary catalog Version 1 permanent catalog Summer 2010 Bibliographic search November 2009 Full-text search UM public domain UM Press Print on Demand Collection Builder Publish virtual collections Metadata files Bib API Data API Availability of data Inbound validation Fixity checks Google and IA ingest Full-PDF download Collection Builder Shibboleth Supporting partner development Development Environment Datasets Protocol Research Center Computational Research Born digital Images/maps Audio Beyond Books and Journals Services

Focus on users Preservation…with Access Brings concerns of research libraries to bear on the way the scholarly record is cared for and made available – Scholarly Resource – Bibliographic Search – Full-text search – Collections – Full-PDF download of public domain

Cost Model 1 Reasonable costs of sustaining the archive, includes cost of replacement, capital fund

Cost Model 1 Economies of scale keep costs low – $0.145/volume/year for Google-digitized – about $0.45/volume/year for IA-digitized Advantages not fully known until you jump in

For public domain volumes: (PD*X*C)/N For a given in­copyright volume: IC=(C*X)/H Share in costs of curation Share in uses of relevant materials Voice in future directions Free riders? Cost Model 2

Sustaining common resource Costs go down Quality of services increases – Realize in aggregated collection, something dont get through distributed search or federation

Cost Model 2: Timeline & Requirements Timeline: – Implement in 2013 – Accept new partners now with costs based on overlap calculations Requirements: – Print holdings database – Update mechanisms – Manual remediation

Print Holdings Database Print holdings database will also benefit – De-duplication Compromises user experience, obscures collection development needs – Management of print volumes Information to withdraw volumes (journals) – Legal uses of copyright materials Section 108, 121, ADA uses will depend knowledge of which institutions own(ed) which materials

Staff Staff/Expertise – highly integrated – Project managers, IT and communications staff, copyright experts, administrators – Working groups Shared development space

e-Commerce Print on Demand Content Ingest Transformation Validation Content Access PageTurner Collection Builder Large-scale Search Bibliographic Catalog Research Center APIs Quality Assurance Quality Review Content Certification User Services Usability User support (helpdesk) Outreach Project website Monthly newsletter Papers and presentations Communication with potential partners Surveys, general inquiries Repository evaluation and audit (e.g., DRAMBORA, TRAC) Legal Risk management (use of materials) Partner agreements Advocacy Governance Budget, Finances Decision-making Policy Planning Enterprise Management Communication and Coordination with partner institutions Project management Repository Administration Hardware configuration and maintenance Web and application server configuration and maintenance Security Permissions Logging Repository Administration Data management (content storage, backup, integrity checks, deletion) Hardware selection and replacement Content and Metadata specifications Disaster Recovery Processes for ensuring content integrity Rights Management Copyright determination Copyright review Copyright information management (database) Rightsholder permissions Bibliographic Data Management Entity description (record-level) Object identification (item-level) Data availability Collection Development Digital Expansion beyond books and journals (born-digital, images and maps, audio) Selection of content (for non- Google volume ingest and pilots projects) Print Cloud Library (effect of digital on print) Financial contributions of partners HathiTrust Functional Framework

Collection Development Digitization/Collaboration with other initiatives Public domain determinations Duplicate volumes Citation Building Collections Quality

A global change in the library environment June 2010 Median duplication: 31% June 2009 Median duplication: 19% Academic print book collection already substantially duplicated in mass digitized book corpus

Digitized Books in Shared Repositories ~75% of mass digitized corpus is backed up in one or more shared print repositories ~3.5M titles ~2.5M

Technology - OAIS GRIN Internal Data Loading GRIN Internal Data Loading Google [OCA] In-house Conversion Google [OCA] In-house Conversion MARC record extensions (Aleph) Rights DB MARC record extensions (Aleph) Rights DB Page Turner HathiTrust API OAI GeoIP DB CNRI Handles [Solr] Page Turner HathiTrust API OAI GeoIP DB CNRI Handles [Solr] METS/PREMIS object TIFF G4/JPEG2000 OCR MD5 checksums METS/PREMIS object TIFF G4/JPEG2000 OCR MD5 checksums METS object PNG OCR PDF METS object PNG OCR PDF Isilon Site Replication TSM MD5 checksum validation Isilon Site Replication TSM MD5 checksum validation GROOVE (JHOVE) GROOVE (JHOVE) ;

Future Directions Locally-digitized partner content Usage reporting Coordinate digital and print resources (holdings database) Computational Research Quality Strategies for openness Collaborative Development Extending Services through Shibboleth Non-book, non-journal content Born-digital content New Bibliographic Management Compliance with TRAC Grant projects OCLC Catalog 3-year review Improvements to Large-scale Search Improvements to PageTurner Ingest Reporting

How can HathiTrust make a difference? Digital Curation – Drive costs down – Reduce bibliographic indeterminacy – Make meaningful decisions about formats and quality – Increase discoverability – Consolidate development talent – Improve strength of archiving Print Curation – Means to associate our print holdings – Coordinated record-keeping Subsidiary benefits – Improve description – Quantify problems – Collective attention to solving shared problems