Presentation is loading. Please wait.

Presentation is loading. Please wait.

HATHI TRUST A Shared Digital Repository Digital Preservation, HathiTrust, and the Reimagination of the Library Landscape Jeremy York Iceland August 5,

Similar presentations

Presentation on theme: "HATHI TRUST A Shared Digital Repository Digital Preservation, HathiTrust, and the Reimagination of the Library Landscape Jeremy York Iceland August 5,"— Presentation transcript:

1 HATHI TRUST A Shared Digital Repository Digital Preservation, HathiTrust, and the Reimagination of the Library Landscape Jeremy York Iceland August 5, 2010

2 Outline Digital Preservation in U.S. HathiTrust – About HathiTrust – Content – What we do (services) – Governance – Partnership & Resources Google Settlement Publishing Changing Library Landscape

3 Books and JournalsArchivesData Portico Centralized Journals Source files, mainly focused on XML, highly controlled transformation Internet Archive Centralized Web files ICPSR Centralized Social science data LOCKSS Distributed Journals Web files, not source images or XML MetaArchive (NDIIPP) Distributed Private LOCKSS Network Web files DATA-PASS (NDIIPP) Distributed Social science data HathiTrust Centralized Books and Journals Master image and OCR files International Internet Preservation Consortium Distributed Harvesting tools, Access, Preservation strategies GeoMAPP (NDIIPP) Distributed Geospatial data State governments OCLC – Digital Archive Centralized Master files, web archiving CONTENTdm, custom repository LOCKSS, DuraCloud, DSpace, Fedora

4 NDIIPP Mission: Develop a national strategy to collect, preserve and make available significant digital content, especially information that is created in digital form only, for current and future generations. Since 2000 Broad collaborations with institutions and organizations (e.g., OCLC, Portico) Funding (Establishing a network, Preserving Creative America, Preserving State Government Information) Standards/Best Practices Tools o JHOVE2 (validation) o Chronopolis (data grid framework) o Dataverse (management, dissemination, exchange, and citation of virtual collections (dataverses) of quantitative data) o BagIt (transfer utilities - creation, manipulation and validation of bags) o Hub and Spoke (repository interoperability) o FITS (bundle of identification, validation and metadata extraction tools)

5 About

6 HathiTrust Digital Library Digital Repository – Initial focus on digitized book and journal content – Light archive Collections and Collaboration – Comprehensive collection – Shared strategies – Local services – Public Good

7 Current Partners – Columbia University – New York Public Library – University of California system – CIC (Committee on Institutional Cooperation) – University of Virginia – Yale University University of Chicago University of Illinois Indiana University University of Iowa University of Michigan Michigan State University University of Minnesota Northwestern University Ohio State University Pennsylvania State University Purdue University University of Wisconsin-Madison

8 Content Distribution 6,383,209 – Total 1,234,088 – Public Domain * As of August 5, 2010

9 Language Distribution (1) * As of July 25, 2010

10 Language Distribution (2) The next 40 languages make up ~13% of total * As of July 25, 2010

11 Dates * As of July 25, 2010

12 Originating Institution * As of July 25, 2010

13 Content over time * As of July 25, 2010

14 Content Growth

15 What we do

16 Services (1) Ingest – Google, Internet Archive – Working toward sustainable model for ingest of content from diverse sources Long-term preservation – Bit-level, migration – Standard and open formats (ITU G4 TIFF, JPEG2000, JPG, Unicode) – OAIS, TRAC – Validation, integrity, redundancy

17 Services (2) Preservation…with Access Brings concerns of research libraries to bear on the way the scholarly record is cared for and made available – Scholarly Resource – Bibliographic Search – Full-text search – Collections – Full-PDF download of public domain







24 Services (4) Rights Management – Rights Database – Copyright review US k candidates, 85k reviewed 60% in public domain Data Distribution – Metadata files, Bib API, Data API Print on Demand

25 Services (5) Community Development Environment Non-Google Ingest Non-Book/Non-Journal Ingest Computational Research

26 Outlook Leverage partner resources and input to create and maintain the library of the future This is our library The more we use it, the better it will become

27 Governance

28 HathiTrust Executive Committee Strategic Advisory Board Budget/Finances Decision-making Guidance on Policy, Planning

29 Partnership & Resources

30 Funding Funded for a initial 5 years with base-funding from partners 3-year review of governance and sustainability Budget – separately held within UMich budget system Cost Models – Per GB cost of storage per year with a one-time fee on new content to build a capital fund – Volume overlap

31 Cost Model 1 Reasonable costs of sustaining the archive, includes cost of replacement, capital fund

32 Cost Model 1 Economies of scale keep costs low – $0.145/volume/year for Google-digitized – about $0.45/volume/year for IA-digitized Advantages not fully known until you jump in

33 Cost Model 2 Shared space to deal with shared problems – Use HathiTrust as part of broader library strategies Beginning to see benefits of aggregating this body of materials together – Overlap, collection development – Coordinated print management – Begin to ask What is missing?

34 For public domain volumes: (PD*X*C)/N For a given in­copyright volume: IC=(C*X)/H Share in costs of curation Share in uses of relevant materials Voice in future directions Free riders? Cost Model 2

35 Staff Staff/Expertise – highly integrated – Project managers, IT and communications staff, copyright experts, administrators (UM, Indiana and UC taking the lead) Working groups Shared development space

36 e-Commerce Print on Demand Content Ingest Transformation Validation Content Access PageTurner Collection Builder Large-scale Search Bibliographic Catalog Research Center APIs Quality Assurance Quality Review Content Certification User Services Usability User support (helpdesk) Outreach Project website Monthly newsletter Papers and presentations Communication with potential partners Surveys, general inquiries Repository evaluation and audit (e.g., DRAMBORA, TRAC) Legal Risk management (use of materials) Partner agreements Advocacy Governance Budget, Finances Decision-making Policy Planning Enterprise Management Communication and Coordination with partner institutions Project management Repository Administration Hardware configuration and maintenance Web and application server configuration and maintenance Security Permissions Logging Repository Administration Data management (content storage, backup, integrity checks, deletion) Hardware selection and replacement Content and Metadata specifications Disaster Recovery Processes for ensuring content integrity Rights Management Copyright determination Copyright review Copyright information management (database) Rightsholder permissions Bibliographic Data Management Entity description (record-level) Object identification (item-level) Data availability Collection Development Digital Expansion beyond books and journals (born-digital, images and maps, audio) Selection of content (for non- Google volume ingest and pilots projects) Print Cloud Library (effect of digital on print) Financial contributions of partners HathiTrust Functional Framework

37 Working Groups Current Quality Discovery Interface (with OCLC) Collections Communication Usability Past Storage Research Center

38 Google Settlement (1) 2005, Authors Guild, AAP sued Google claimed fair use Settlement – 2008 Amended – Nov 2009 Works covered – registered with U.S. copyright office, Canada, UK, Australia Works not covered – public domain, published after 5 Jan 2009

39 Google Settlement (2) Google continues scanning In copyright, non-commercially available out-of-print work – Sell individual access, any book retailer - 63% of revenue to rights holders, distributed by BRR – display up to 20% – Copy & paste and printing – Rights holders can open access, distribute under CC, set printing limits – Institutional subscription (available to libraries, fee based on FTE users) Includes unclaimed works – BRR required to search for rights holders and hold revenue on their behalf Public access terminals Cash payments to Rightsholders whose works were scanned before May 5, 2009

40 Book Rights Registry – Represent the interests of the Rightsholders – equal representation of Author and Publisher sub-classes on board; one author and publisher representative from US, UK, Canada, Australia; court-appointed representative for rights holders of unclaimed works – Establish and maintain a database of contact information for authors and publishers; – Use commercially reasonable efforts to locate Rightsholders; – Distribute payments received from Google for the Rightsholders share of revenues; and – Assist in the resolution of disputes between Rightsholders. – Funded by Google (initial 34.5 million, ongoing percentage of revenues)

41 Settlement for HathiTrust Complementary – Settlement provides access to covered works, HathiTrust is preservation, trust for the future – Research Center (75% of Google Book Search scanned from HathiTrust partner libraries) Specifically sanctions – Section 108 uses, access for users with print disabilities, computational research Does not allow – Fair use, sale of access, interlibrary loan, e-reserves, use in course management systems

42 Publishing Libraries would like to buy more eBooks Cost is high Not good models for consortia (multiple users) Move to on-demand purchase, leasing of volumes Do we need to own it?

43 Changing Library Landscape Leverage collective resources, expertise – Drive costs down – Increase discoverability, use – Improve strength of archiving – Reduce redundancy of collections (digital and print), effort – Address collective challenges Focus on local resources and services Redefine who we are, what we provide – Collections, research

44 Thank you!

Download ppt "HATHI TRUST A Shared Digital Repository Digital Preservation, HathiTrust, and the Reimagination of the Library Landscape Jeremy York Iceland August 5,"

Similar presentations

Ads by Google