Presentation on theme: "HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009."— Presentation transcript:
HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009
The Big Picture
Why Collaborate on Shared Digital? It used to make sense for libraries to compete on collections Now it only makes sense to compete in a very small area of collecting: the rare and unique (and sometimes, sadly, the expensive) For everything else, it makes economic sense to collaborate
Why not Google? Because Google is not a library.
Persistence Persistence is essential for scholarship The libraries that care about persistence are relatively few. Most of them are in ARL. This makes it even more important that those of us who do care about persistence work to make it happen.
Two (and a half) models of participation 1)Contributing both collections and financial support 2)Using the collection and contributing financial support 2.5) Using the collection and contributing nothing. A.k.a. Free riders
The $64K M challenge What does it take for me to be able to show in my catalog a work that is persistently available and held elsewhere?
What is HathiTrust? origins intentions size and growth projections aspirations
current members California Digital Library Indiana University Michigan State University Northwestern University The Ohio State University Penn State University Purdue University UC Berkeley UC Davis UC Irvine UCLA UC Merced UC Riverside UC San Diego UC San Francisco UC Santa Barbara UC Santa Cruz The University of Chicago University of Illinois University of Illinois at Chicago The University of Iowa University of Michigan University of Minnesota University of Wisconsin-Madison University of Virginia
Preservation: OAIS Reference Model GRIN Internal Data Loading GRIN Internal Data Loading Google [OCA] In-house Conversion Google [OCA] In-house Conversion MARC record extensions (Aleph) Rights DB MARC record extensions (Aleph) Rights DB Page Turner HathiTrust API OAI GeoIP DB CNRI Handles [Solr] Page Turner HathiTrust API OAI GeoIP DB CNRI Handles [Solr] METS/PREMIS object TIFF G4/JPEG2000 OCR MD5 checksums METS/PREMIS object TIFF G4/JPEG2000 OCR MD5 checksums METS object PNG OCR PDF METS object PNG OCR PDF Isilon Site Replication TSM MD5 checksum validation Isilon Site Replication TSM MD5 checksum validation GROOVE (JHOVE) GROOVE (JHOVE)
Mission and Goals to contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge – materials converted from print – improve access …to meet the needs of the co-owning institutions – reliable and accessible electronic representations – coordinate shared storage strategies – “public good” … free-riders. – simultaneously …centralized …open
accomplishments to date 1.25 partners 2.successful ingest and millions of vols online 3.mirroring and backup 4.rich access
books and journals online?
Search inside in-copyright
accomplishments to date 1.25 partners 2.successful ingest and millions of vols online 3.mirroring and backup 4.rich access 5.“collection builder”
accomplishments to date 1.25 partners 2.successful ingest and millions of vols online 3.mirroring and backup 4.rich access 5.collection builder 6.soon, full text search and data API
Project staff review comments and enrich cataloging records. Title Wasīlat al- ṭ ullābli-ma‘rifata‘mālal-laylwa-al- nahār bi- ṭ arīq al- ḥ isāb: وسيلةالطلاب ل معرفةأعمالالليلوالنهاربطريقالحساب manuscript [between 1525? and 1861] Ḥ a ṭṭ āb, Ya ḥ yáibnMu ḥ ammad, 1496 or or 7. يحيىينمحمدالحطاب. Author Comment 1 Comment 2Comment 3 Catalog records Local OPAC Page images HathiTrust Project Website Comments Enriched records
next up … non-Google ingest (OCA & local digitization) corpus research support – SEASR – Data export – Research center openness strategies binding together shared print and digital in strategy to manage local print
Universal Library? collaborative work around collaborative problem preserving the published record comprehensiveness through consolidation and sense-making commitment to perpetuity
opportunities economies of scale comprehensive collection combining print and digital strategies more effective digital preservation stepping stone to preserving other forms of digital content platform for new methods of discovery non-consumptive research
challenges digital preservation collaboration understanding what the right services are The Silence of the Archive: The USPS problem