Presentation is loading. Please wait.

Presentation is loading. Please wait.

HATHI TRUST A Shared Digital Repository HathiTrust Overview Julie Bobay, Heather Christenson, and John Wilkin April 12, 2011.

Similar presentations

Presentation on theme: "HATHI TRUST A Shared Digital Repository HathiTrust Overview Julie Bobay, Heather Christenson, and John Wilkin April 12, 2011."— Presentation transcript:

1 HATHI TRUST A Shared Digital Repository HathiTrust Overview Julie Bobay, Heather Christenson, and John Wilkin April 12, 2011

2 HathiTrust Overview Our organization and how it functions Our HathiTrust collection Perspectives on HathiTrust and public services Leveraging HathiTrust data How HathiTrust can make a difference How to find out more

3 Universal Library Common Goal Single Entity, Many Partners HathiTrust

4 Current Partners Arizona State University Baylor University California Digital Library Columbia University Cornell University Dartmouth College Duke University Emory University Harvard University Library Indiana University Johns Hopkins University Library of Congress Massachusetts Institute of Technology Michigan State University New York University New York Public Library North Carolina Central University North Carolina State University Northwestern University The Ohio State University The Pennsylvania State University Princeton University Purdue University Stanford University Texas A&M University Universidad Complutense de Madrid University of California Berkeley University of California Davis University of California Irvine University of California Los Angeles University of California Merced University of California Riverside University of California San Diego University of California San Francisco University of California Santa Barbara University of California Santa Cruz The University of Chicago University of Illinois University of Illinois at Chicago The University of Iowa University of Maryland University of Michigan University of Minnesota The University of North Carolina at Chapel Hill University of Pennsylvania University of Pittsburgh University of Utah University of Virginia University of Washington University of Wisconsin-Madison Utah State University Yale University Library

5 Governance HathiTrust Executive Committee Strategic Advisory Board Budget/Finances Decision-making Guidance on Policy, Planning

6 Executive Committee Paul Courant, University Librarian and Dean of Libraries, UM Laine Farley, Executive Director, CDL John King, Vice Provost for Academic Information, UM Paula Kaufman, University Librarian and Dean of Libraries, UI Brian Schottlaender, University Librarian, UCSD Ed Van Gemert, Deputy Director of Libraries, UW – Madison (ex officio) Brenda Johnson, Dean of Libraries, IU Brad Wheeler, Chief Information Officer, IU John Wilkin, Executive Director of HathiTrust and Associate University Librarian, LIT, UM

7 Strategic Advisory Board Ed Van Gemert (Chair), Deputy Director of Libraries, UW - Madison John Butler, Associate University Librarian for Information Technology, U Minn Patricia Cruse, Director, Preservation, CDL Bernie Hurley, Director, Library Technologies, UC Berkeley R. Bruce Miller, University Librarian, UC - Merced Sarah Pritchard, University Librarian, Northwestern Paul Soderdahl, Director, LIT, U Iowa John Wilkin, Executive Director, HathiTrust (ex officio) Robert Wolven, Columbia University

8 Working Groups Appointed by Strategic Advisory Board and Executive Committee Both operational and strategically-focused groups Collections, Communications, Discovery Interface, Full-text Search, Usability, User Support Now 40+ people across the country Expertise from across the partnership

9 Staff Staff/Expertise – highly integrated – Project managers, IT and communications staff, copyright experts, administrators – Working groups Shared development space

10 e-Commerce Print on Demand Content Ingest Transformation Validation Content Access PageTurner Collection Builder Large-scale Search Bibliographic Catalog Research Center APIs Quality Assurance Quality Review Content Certification User Services Usability User support (helpdesk) Outreach Project website Monthly newsletter Papers and presentations Communication with potential partners Surveys, general inquiries Repository evaluation and audit (e.g., DRAMBORA, TRAC) Legal Risk management (use of materials) Partner agreements Advocacy Governance Budget, Finances Decision-making Policy Planning Enterprise Management Communication and Coordination with partner institutions Project management Repository Administration Hardware configuration and maintenance Web and application server configuration and maintenance Security Permissions Logging Repository Administration Data management (content storage, backup, integrity checks, deletion) Hardware selection and replacement Content and Metadata specifications Disaster Recovery Processes for ensuring content integrity Rights Management Copyright determination Copyright review Copyright information management (database) Rightsholder permissions Bibliographic Data Management Entity description (record-level) Object identification (item-level) Data availability Collection Development Digital Expansion beyond books and journals (born-digital, images and maps, audio) Selection of content (for non- Google volume ingest and pilots projects) Print Cloud Library (effect of digital on print) Financial contributions of partners HathiTrust Functional Framework

11 What work is there? Usage Reporting Quality Copyright Review Specifications Metadata Development Environment Other?

12 Basic Infrastructure Costs

13 Cost Model 1 Economies of scale keep costs low – $0.149/volume/year for Google-digitized – $0.489/volume/year for IA-digitized – $0.154/volume/year for all content Advantages not fully known until you jump in

14 A global change in the library environment June 2010 Median duplication: 31% June 2009 Median duplication: 19% Academic print book collection already substantially duplicated in mass digitized book corpus

15 Digitized Books in Shared Repositories ~75% of mass digitized corpus is backed up in one or more shared print repositories ~3.5M titles ~2.5M

16 For public domain volumes: (PD*X*C)/N For a given in­copyright volume: IC=(C*X)/H Share in costs of curation Share in uses of relevant materials Voice in future directions Free riders? Cost Model 2

17 Sustaining common resource Costs go down Quality of services increases – Realize in aggregated collection, something dont get through distributed search or federation

18 Cost Model 2: Timeline & Requirements Timeline: – Implement in 2013 – Accept new partners now with costs based on overlap calculations Requirements: – Print holdings database – Update mechanisms – Manual remediation

19 Print Holdings Database Print holdings database will also benefit – De-duplication Compromises user experience, obscures collection development needs – Management of print volumes Information to withdraw volumes (journals) – Legal uses of copyright materials Section 108, 121, ADA uses will depend knowledge of which institutions own(ed) which materials

20 Questions?

21 Our HathiTrust Collection

22 Content Distribution 8,234,081 – Total volumes 2,102,033 – Public Domain 4,527,381 Book titles 202,649 Serial titles * As of March 5, 2011

23 Language Distribution (1) * As of March 5, 2011 The top 10 languages make up ~86% of all content

24 Language Distribution (2) The next 40 languages make up ~13% of total * As of March 5, 2011

25 Dates * As of March 5, 2011

26 Originating Institution * As of March 5, 2011

27 Content over time * As of March 5, 2011

28 Content Growth

29 Collection Development and Management Collections Committee Appropriate principles for duplicate volumes Print management proposal Prioritization of collection development activities Process for decision-making and prioritization for new content types Recommendations for tools and services Prioritization of copyright review and rights- clearing processes

30 What about quality? Validation upon ingest Gating on metrics from Google Updated versions from Google Proactive work by Google library partners IMLS grant to develop framework and methodology for validating content in large-scale digital repositories Crowd sourcing in our future?

31 Questions?

32 Perspectives on HathiTrust and public services

33 HathiTrust and Reference HathiTrust: like Google and licensed databases – very large, rich repositories of content, with services supporting their use Reference librarians – are intermediaries between all these resources and researchers who use them

34 HathiTrust as a Reference Source HathiTrust is CONSTANTLY changing Requirement thats not new to reference librarians, but greatly increased: Stay engaged. Read updates. Use it.

35 HathiTrust is DIFFERENT We are THE PRODUCERS of this resource – HathiTrust is OUR COLLECTION – New role - not recipient/grader/purchaser – WE build this resource Close engagement of sort we have not experienced before

36 HathiTrust and Google Books Fact: content in HathiTrust, by the numbers, is currently largely a subset of Google Books Thats how we started BUT Its just the start

37 HathiTrust stands on its own - Content HathiTrust content has been curated over time by librarians – Mirrors collections of large research libraries – Focus on quality Expanding Non-Google content – Public Domain: Copyright Review Management System – Content from non-Google sources Internet Archive, image collections, government publications

38 Copyright Review Management System – IMLS Grant awarded to University of Michigan 2008 to determine copyright status of books published in US between 1923 and 1963 – Wisconsin, Minnesota and Indiana each devote 1 FTE to this effort for Phase 3, – As of March, 2011, over 125,000 volumes reviewed; 54% opened up in HathiTrust

39 HathiTrust stands on its own - Functionality HathiTrust supports scholarship Proper metadata User interface designed for scholarly work Services for people with visual impairments Large-scale text mining

40 HathiTrust stands on its own - Services Collection builder Member services (via Shibolleth logons) – download full PDFs – create permanent collections

41 How do people use HathiTrust? Of course, to read public domain books and journals But much more

42 Use stories I now go to HathiTrust as my first destination for in-depth reference questions. Fantastic searchable corpus; good metadata; content and functionality designed for scholarly needs. Indiana University librarian

43 Use stories (2) Complete Works of Voltaire (52-volume set published in late 19 th century) – scholar needed all volumes to do scholarly referencing from home – all in HathiTrust presented together under a single MARC record

44 Use stories (3) Open Folklore – a new way to use HathiTrust – Portal that provides access to open access published and unpublished folklore literature – Indiana Universitys Folklore Collection first CIC Collection of Distinction in Google – HathiTrust – the corner store in the shopping mall of digital repositories – Anchor for whole set of services and initiatives, including journal liberation projects

45 Questions?

46 Leveraging HathiTrust data

47 A bibliographic metadata moment Bib data for each digital volume must be present in HathiTrust in order for volumes to be ingested Depositors make bib data available to UM to be loaded into HathiTrust bibliographic management system Info in the submitted bib records is used to make an initial rights determination about each volume The bib record acts as a manifest for the digital content that is then ingested A snapshot in time of the bib data associated with an object is also stored in the preservation metadata

48 HathiTrust makes our data available Goal is to extend possibilities for development of local services and other uses Bibliographic API Data API OAI feed of public domain Hathifiles 120,000 public domain texts for computational research

49 Some examples of use Catalogs UM loaded every record Chicago links to public domain volumes owned in print OCLC loaded records into WorldCat Link resolvers UC created SFX target Vendors H.W. Wilson databases linked to public domain volumes Needed: A guide with examples of how partners have used the data!

50 Future Directions (1) Locally-digitized partner content Usage reporting Coordinate digital and print resources (holdings database) Computational Research Quality Strategies for openness Collaborative Development Extending Services through Shibboleth Non-book, non-journal content

51 Future Directions (2) Born-digital content (Publishing) New Bibliographic Management Compliance with TRAC Grant projects OCLC Catalog 3-year review Improvements to Large-scale Search Improvements to PageTurner Ingest Reporting

52 How can HathiTrust make a difference? Digital Curation – Drive costs down – Reduce bibliographic indeterminacy – Make meaningful decisions about formats and quality – Increase discoverability – Consolidate development talent – Improve strength of archiving Print Curation – Means to associate our print holdings – Coordinated record-keeping Subsidiary benefits – Quantify problems – Collective attention to solving shared problems

53 How to find out more Web site About section: Twitter: RSS: Monthly newsletter: Contact us: Soon: Facebook, blog

Download ppt "HATHI TRUST A Shared Digital Repository HathiTrust Overview Julie Bobay, Heather Christenson, and John Wilkin April 12, 2011."

Similar presentations

Ads by Google