Presentation is loading. Please wait.

Presentation is loading. Please wait.

HathiTrust: A Big Idea with Bold Plans

Similar presentations


Presentation on theme: "HathiTrust: A Big Idea with Bold Plans"— Presentation transcript:

1 HathiTrust: A Big Idea with Bold Plans
Brenda Johnson, Dean of University Libraries Gary Charbonneau, Systems Librarian Julie Bobay, Associate Dean for Collection Development and Scholarly Communication Statewide IT Conference, Indiana University Sept. 27, 2010

2 HathiTrust - Outline A Big Idea
Statewide IT Conference, Indiana University September 27, 2010 HathiTrust - Outline A Big Idea Mission and Goals; Partners; Governance Content and Use Relationship to Google Books and Internet Archive Size, characteristics of content A few words about technology Bold Plans

3 Importance of A Name Hathi (pronounced hah-tee)
Statewide IT Conference, Indiana University September 27, 2010 Importance of A Name Hathi (pronounced hah-tee) Hindi word for elephant, an animal highly regarded for its memory, wisdom, and strength Trust A core value of research libraries and one of their greatest assets. In combination, the words convey the key benefits researchers can expect from a first-of-its-kind shared digital repository There’s an elephant in the library.

4 Statewide IT Conference, Indiana University
September 27, 2010 What is HathiTrust? Started in 2008 as a partnership among research libraries, HathiTrust is an open web resource that aggregates, preserves and provides access to the collections of member libraries. Initial purpose was to provide trusted shared repository for books and journals digitized by and available through Google Books and Internet Archive

5 Google Books/Internet Archive
Statewide IT Conference, Indiana University September 27, 2010 Google Books/Internet Archive In 2004, Google began digitizing the books and journals from many major research libraries in U.S. – including, starting in 2008, IU’s Some libraries, including the University of California, had similar digitization projects with the Internet Archive Books and journals digitized from these projects were deposited in HathiTrust 5

6 Current HathiTrust Partners: 29 and Counting
Statewide IT Conference, Indiana University September 27, 2010 Current HathiTrust Partners: 29 and Counting Columbia University Dartmouth University University of California system (11 libraries) CIC (Committee on Institutional Cooperation) (12 libraries) University of Chicago University of Minnesota University of Illinois Northwestern University Indiana University Ohio State University University of Iowa Pennsylvania State University University of Michigan Purdue University Michigan State University University of Wisconsin, Madison New York Public Library Princeton University University of Virginia Yale University

7 Statewide IT Conference, Indiana University
September 27, 2010 If Google and Internet Archive have these books, why do we need HathiTrust? HathiTrust’s mission is much broader than simply to replicate Google Books: Contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge. 7

8 Why do we need HathiTrust? (1)
Statewide IT Conference, Indiana University September 27, 2010 Why do we need HathiTrust? (1) Preservation…For The Long Term Better entrusted to research libraries than to a private corporation, even a benevolent one Not just preserving bits Full preservation program, including active curation, metadata, migration, management plans, etc. Seeking TRAC Certification (Trustworthy Repository Audit and Certification) 8

9 Why do we need HathiTrust? (2)
Statewide IT Conference, Indiana University September 27, 2010 Why do we need HathiTrust? (2) Expanded access and discoverability Full-text access to pre-1923 books and journals, plus those which have had rights cleared Beyond full-text keyword search: enhanced discoverability options 9

10 Why do we need HathiTrust? (3)
Statewide IT Conference, Indiana University September 27, 2010 Why do we need HathiTrust? (3) Focus on scholarly values and needs Develop content, access and functionality that meets needs of researchers Share expertise and cost of preserving and providing access to scholarly record among institutions who share this fundamental mission 10

11 HathiTrust: Getting Started
Statewide IT Conference, Indiana University September 27, 2010 HathiTrust: Getting Started Initial development responsibility: University of Michigan, with mirror site at IUPUI, administered by UITS Enterprise Infrastructure Much future development will be distributed among partner institutions under direction of HathiTrust Executive Committee

12 Statewide IT Conference, Indiana University
September 27, 2010 A Unique Partnership HathiTrust is library work at scale; an early example of an “above-campus” service A new experiment in collaboration Not a separate entity; not a 501(c)(3) like Sakai, Kuali, DuraSpace or many open source software projects Instead, a jointly-funded, jointly governed, jointly developed partnership. Together, we are HathiTrust.

13 Sustainability: HathiTrust Governance 2008-2012
Statewide IT Conference, Indiana University September 27, 2010 Sustainability: HathiTrust Governance Executive Committee Budget, finances, decision making Strategic Advisory Board Guidance on policy and planning HathiTrust staff Working groups and committees

14 Current Working Groups
Statewide IT Conference, Indiana University September 27, 2010 Current Working Groups Discovery Interface Collections Quality Communication Usability Storage Development Environment Research Center

15 HathiTrust Functional Framework
Governance Budget, Finances Decision-making Policy Planning Enterprise Management Communication and Coordination with partner institutions Project management Repository Administration Hardware configuration and maintenance Web and application server configuration and maintenance Security Permissions Logging Data management (content storage, backup, integrity checks, deletion) Hardware selection and replacement Content and Metadata specifications Disaster Recovery Processes for ensuring content integrity Rights Management Copyright determination Copyright review Copyright information management (database) Rightsholder permissions Bibliographic Data Management Entity description (record-level) Object identification (item-level) Data availability Collection Development Digital Expansion beyond books and journals (born-digital, images and maps, audio) Selection of content (for non-Google volume ingest and pilots projects) Print Cloud Library (effect of digital on print) e-Commerce Print on Demand Content Ingest Transformation Validation Content Access PageTurner Collection Builder Large-scale Search Bibliographic Catalog Research Center APIs Quality Assurance Quality Review Content Certification User Services Usability User support (helpdesk) Outreach Project website Monthly newsletter Papers and presentations Communication with potential partners Surveys, general inquiries Repository evaluation and audit (e.g., DRAMBORA, TRAC) Legal Risk management (use of materials) Partner agreements Advocacy  Financial contributions of partners HathiTrust Functional Framework

16 Next steps in governance
Statewide IT Conference, Indiana University September 27, 2010 Next steps in governance 5-year agreements, reviewed in the third year of every term First Constitutional Convention will be in 2012 Partners will determine governance structures and partnership models, effective 2013

17 Focus On Users Preservation…with access
Statewide IT Conference, Indiana University September 27, 2010 Focus On Users Preservation…with access Benefits to IU researchers and their colleagues around the world: Ensure long-term preservation and access Increase discoverability Create scholarly tools Expand content beyond Google and Internet Archive

18 HathiTrust – constantly changing
Statewide IT Conference, Indiana University September 27, 2010 HathiTrust – constantly changing Rapid growth and development; fluid environment Next few slides describe HathiTrust currently Will follow with discussion about future plans

19 Statewide IT Conference, Indiana University
September 27, 2010 HathiTrust - Content The vast majority of what is currently in HathiTrust consists of files received from Google from volumes digitized by Google for Google Book Search Almost all of the remainder consists of files received from Internet Archive. Much of the content from University of California comes by way of Internet Archive

20 Statewide IT Conference, Indiana University
September 27, 2010 HathiTrust Content (2) Since not all of Google’s “library partners” are members of HathiTrust, and none of Google’s publisher partners are, HathiTrust is still (mostly) a subset of what is in Google Book Search. However….

21 Statewide IT Conference, Indiana University
September 27, 2010 HathiTrust Content (3) Because of HathiTrust’s copyright clearance project, there are some things available in full text in HathiTrust that are only available in “snippet view” in Google. Because of Internet Archive, there are probably some things in HathiTrust that are not available in Google at all.

22 HathiTrust - focus on collections
Statewide IT Conference, Indiana University September 27, 2010 HathiTrust - focus on collections HathiTrust is about collections, not simply Google digitization For example: access for persons with print disabilities opening access for public domain volumes collection building tool high-quality bibliographic data necessary for scholarly work

23 Content Growth Statewide IT Conference, Indiana University
September 27, 2010 Content Growth

24 Content Distribution Statewide IT Conference, Indiana University
September 27, 2010 Content Distribution

25 Language Distribution (1)
Statewide IT Conference, Indiana University September 27, 2010 Language Distribution (1)

26 Language Distribution (2)
Statewide IT Conference, Indiana University September 27, 2010 Language Distribution (2)

27 Statewide IT Conference, Indiana University
September 27, 2010 Dates

28 Originating Institution
Statewide IT Conference, Indiana University September 27, 2010 Originating Institution

29 Content Over Time Statewide IT Conference, Indiana University
September 27, 2010 Content Over Time

30 Statewide IT Conference, Indiana University
September 27, 2010

31 Statewide IT Conference, Indiana University
September 27, 2010

32 Statewide IT Conference, Indiana University
September 27, 2010

33 Statewide IT Conference, Indiana University
September 27, 2010

34 HathiTrust DataGrid Using Isilon Clustered Storage System
Statewide IT Conference, Indiana University September 27, 2010 HathiTrust DataGrid Using Isilon Clustered Storage System Similar principles to a datagrid using WAFS (OneFS) Wide Area File System (2.3 PB per file system) Automated data replication among nodes Currently Two Nodes Ann Arbor - University of Michigan Indianapolis – Indiana University NOC Connected via I-Light and Michigan Lambda Rail

35 HathiTrust Grid Isilon OneFS Currently Supports
Statewide IT Conference, Indiana University September 27, 2010 HathiTrust Grid Indianapolis Ann Arbor Isilon OneFS Currently Supports up to 2.3 PB between Two Nodes

36 More on HathiTrust Technology
Statewide IT Conference, Indiana University September 27, 2010 More on HathiTrust Technology

37 Statewide IT Conference, Indiana University
September 27, 2010 A Use Case IUB scholar needed quick access to a definitive 52-volume set of Voltaire’s work published in late 1800s; deadline approaching Had been transferred to the Auxiliary Library Facility Available in HathiTrust and Google Books Google Books not usable for this scholarly purpose Able to do work much more efficiently and quickly in HathiTrust

38 HathiTrust’s Bold Plans
Statewide IT Conference, Indiana University September 27, 2010 HathiTrust’s Bold Plans We believe the HathiTrust of tomorrow will look very different from the HathiTrust of today Google and Internet Archive digitized volumes just the beginning The sky’s the limit (or, more accurately, the combined will and resources of the partnership are the limit)

39 Vision for the future: More Content
Statewide IT Conference, Indiana University September 27, 2010 Vision for the future: More Content Current and backlist scholarly monographs Born-digital materials Some locally-digitized collections Some non-book/non-journal resources …anything that is appropriate for a research library collection AND IS A SHARED PRIORITY FOR PARTNERS

40 Vision for the future: More Content (2)
Statewide IT Conference, Indiana University September 27, 2010 Vision for the future: More Content (2) More full-text: Google Book Settlement - if approved: could receive all Google-digitized files to preserve could make much more full-text available Rights-clearing project - open access to public domain materials

41 Vision for the Future: More Functionality
Statewide IT Conference, Indiana University September 27, 2010 Vision for the Future: More Functionality Research tools Computational research Advanced collection builders Advanced discovery Expanded quality processes Rigorous preservation guarantees Defining paths for fair uses Tools for shared print collection management

42 Vision for the Future: Enhanced Discoverability
Statewide IT Conference, Indiana University September 27, 2010 Vision for the Future: Enhanced Discoverability Not just keyword searching of full-text Highly-functional bibliographic access HathiTrust catalog Integration into other discovery tools: IUCAT, WorldCat, Discovery Services

43 HathiTrust and local digital library initiatives
Statewide IT Conference, Indiana University September 27, 2010 HathiTrust and local digital library initiatives HathiTrust is a solution for large-scale, shared high-priority needs of partners; currently optimized for digitized monographs and journals Partners will identify priorities for content and functionality development HathiTrust will not supplant all institutionally-based digital library initiatives Local digital library collections and services will still be needed

44 How Can HathiTrust Make a Difference?
Statewide IT Conference, Indiana University September 27, 2010 How Can HathiTrust Make a Difference? Future not yet known precisely, but… For the first time in history, HathiTrust has: defined a large-scale partnership to achieve a large-scale goal built the first version of a very large, high-quality shared repository Building blocks to ensuring that research collections, print and digital: are preserved, curated, highly discoverable and accessible retain their research value in a digital platform

45 Some lessons learned so far
Statewide IT Conference, Indiana University September 27, 2010 Some lessons learned so far HathiTrust can serve as shared repository for mass digitized library collections HathiTrust can provide organizational structure for other collaborations Shared print collection management Bibliographic integration The research library community is able to collaborate deeply to attain shared goals

46 HathiTrust Mission - redux
Statewide IT Conference, Indiana University September 27, 2010 HathiTrust Mission - redux Contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge. 46

47 Statewide IT Conference, Indiana University
September 27, 2010 Credits Our thanks to colleagues who generously granted us permission to use their slides for this presentation: John Wilkin, HathiTrust Executive Director Jeremy York, HathiTrust Project Librarian Heather Christenson, Mass Digitization Project Manager, California Digital Library Also, many of the ideas for this presentation based on: Courant, Paul N. and John Wilkin. “Building ‘Above Campus’ Library Services.” Educause Review, July/August 2010,


Download ppt "HathiTrust: A Big Idea with Bold Plans"

Similar presentations


Ads by Google