Presentation is loading. Please wait.

Presentation is loading. Please wait.

HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009.

Similar presentations


Presentation on theme: "HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009."— Presentation transcript:

1 HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009

2 The Big Picture

3 Why Collaborate on Shared Digital? It used to make sense for libraries to compete on collections Now it only makes sense to compete in a very small area of collecting: the rare and unique (and sometimes, sadly, the expensive) For everything else, it makes economic sense to collaborate

4 Why not Google? Because Google is not a library.

5 Persistence Persistence is essential for scholarship The libraries that care about persistence are relatively few. Most of them are in ARL. This makes it even more important that those of us who do care about persistence work to make it happen.

6 Two (and a half) models of participation 1)Contributing both collections and financial support 2)Using the collection and contributing financial support 2.5) Using the collection and contributing nothing. A.k.a. Free riders

7 The $64K M challenge What does it take for me to be able to show in my catalog a work that is persistently available and held elsewhere?

8 What is HathiTrust? origins intentions size and growth projections aspirations

9 current members California Digital Library Indiana University Michigan State University Northwestern University The Ohio State University Penn State University Purdue University UC Berkeley UC Davis UC Irvine UCLA UC Merced UC Riverside UC San Diego UC San Francisco UC Santa Barbara UC Santa Cruz The University of Chicago University of Illinois University of Illinois at Chicago The University of Iowa University of Michigan University of Minnesota University of Wisconsin-Madison University of Virginia

10 Preservation: OAIS Reference Model GRIN Internal Data Loading GRIN Internal Data Loading Google [OCA] In-house Conversion Google [OCA] In-house Conversion MARC record extensions (Aleph) Rights DB MARC record extensions (Aleph) Rights DB Page Turner HathiTrust API OAI GeoIP DB CNRI Handles [Solr] Page Turner HathiTrust API OAI GeoIP DB CNRI Handles [Solr] METS/PREMIS object TIFF G4/JPEG2000 OCR MD5 checksums METS/PREMIS object TIFF G4/JPEG2000 OCR MD5 checksums METS object PNG OCR PDF METS object PNG OCR PDF Isilon Site Replication TSM MD5 checksum validation Isilon Site Replication TSM MD5 checksum validation GROOVE (JHOVE) GROOVE (JHOVE)

11 Mission and Goals to contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge – materials converted from print – improve access …to meet the needs of the co-owning institutions – reliable and accessible electronic representations – coordinate shared storage strategies – “public good” … free-riders. – simultaneously …centralized …open

12 growth trajectory

13 accomplishments to date 1.25 partners 2.successful ingest and millions of vols online 3.mirroring and backup 4.rich access

14 books and journals online?

15 Search inside in-copyright

16 accomplishments to date 1.25 partners 2.successful ingest and millions of vols online 3.mirroring and backup 4.rich access 5.“collection builder”

17 Collection Builder

18 accomplishments to date 1.25 partners 2.successful ingest and millions of vols online 3.mirroring and backup 4.rich access 5.collection builder 6.soon, full text search and data API

19 Project staff review comments and enrich cataloging records. Title Wasīlat al- ṭ ullābli-ma‘rifata‘mālal-laylwa-al- nahār bi- ṭ arīq al- ḥ isāb: وسيلةالطلاب ل معرفةأعمالالليلوالنهاربطريقالحساب manuscript [between 1525? and 1861] Ḥ a ṭṭ āb, Ya ḥ yáibnMu ḥ ammad, 1496 or or 7. يحيىينمحمدالحطاب. Author Comment 1 Comment 2Comment 3 Catalog records Local OPAC Page images HathiTrust Project Website Comments Enriched records

20 next up … non-Google ingest (OCA & local digitization) corpus research support – SEASR – Data export – Research center openness strategies binding together shared print and digital in strategy to manage local print

21 Universal Library? collaborative work around collaborative problem preserving the published record comprehensiveness through consolidation and sense-making commitment to perpetuity

22 opportunities economies of scale comprehensive collection combining print and digital strategies more effective digital preservation stepping stone to preserving other forms of digital content platform for new methods of discovery non-consumptive research

23 challenges digital preservation collaboration understanding what the right services are The Silence of the Archive: The USPS problem

24 thank you!


Download ppt "HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009."

Similar presentations


Ads by Google