Archiving Electronic Journals: A Developmental Approach Eileen Fenton The JISC/CNI Meeting, July 2004.

2 Overview Preservation in a time of transition Organizational context for preservation Components of a trusted archive and emerging roles Overview of the Electronic-Archiving Initiative

3 Preservation: In Transition Libraries serve an important two-fold mission: –They provide the information resources necessary for their local community. –They are the traditional preservers of the scholarly record. In order to meet the information needs of the local community, a library is must hold and preserve a local copy - within the confines of an extensive, expensive infrastructure. Preservation and access are tightly linked.

4 Preservation: In Transition For electronic resources ownership is not required in order to meet local information needs. Ownership, preservation and access are no longer linked. This shift has enormous implications for the preservation of electronic resources.

5 Preservation: In Transition First, there is no longer a natural motivation to build an infrastructure to insure the long-term preservation of and access to electronic resources. Second, it is less clear who is responsible for fulfilling the preservation role. Third, new models – technical and organizational – and new infrastructure are needed.

6 The JSTOR Context JSTOR is a not-for-profit organization with a mission to help the scholarly community take advantage of advances in information technologies. JSTOR has pursued this mission through the creation and maintenance of a trusted digital archive of the full back runs of academic journals. To date JSTOR serves as a trusted digital archive of over 400 journals from more than 38 disciplines. –Over 15 million pages have been digitized. –JSTOR is supported by more than 2,000 participating libraries from 80+ countries.

7 The JSTOR Context JSTORs commitment to serve as an archive is format neutral. From the inception of JSTOR the inclusion of e- versions of journals was anticipated. JSTOR launched the Electronic-Archiving Initiative, or E-Archive in response to the challenge of archiving e- journals. JSTOR approaches this challenge with a system-wide perspective seeking to reduce costs and improve convenience for all participants in the scholarly communication cycle.

8 The JSTOR Context It is clear that archiving electronic resources will require a significant investment in the development of organizational and technological infrastructure. Maximum system-wide benefit from the investment in this infrastructure will be achieved by archiving a broad array of content that extends well beyond JSTORs current collections scope and mission. A new entity is needed. Launching new organizations is beyond the scope of JSTORs mission.

9 Mission Ithaka has been founded to accelerate the creation, development and success of not-for-profit organizations focused on deploying new technologies for the benefit of higher education It brings together: –Financial resources from (initially) three foundations (Mellon, Hewlett, Niarchos) –The experience derived from the creation of JSTOR, including a conviction that organizations such as JSTOR can contribute enormous value to the scholarly community –Relationships in all sectors and at all levels of the higher education community (developed at the sponsoring foundations and through JSTOR)

10 The Electronic-Archiving Initiative The mission of the Electronic-Archiving Initiative is to preserve scholarly literature published in electronic form and to ensure that these materials remain available to future generations of scholars, researchers, and students. E-Archive expects to take responsibility for archiving a broad range of scholarly e-journals and journal-like resources. JSTOR, Ithaka, and The Andrew W. Mellon Foundation are together supporting the development of E-Archive.

11 Components of a Trusted Archive 1.Mission –Mission is critical because it drives resource allocation and routine organizational priorities and activities. 2.Business Model –Sustainability is key. –The archive must generate funds adequate to cover the work of the archive from sufficiently diversified sources. –Together the community will need to find a way to develop and sustain an archiving capacity. Libraries and publishers will need to contribute. Foundations and government agencies may also have a role.

12 Components of a Trusted Archive 3.Technical Infrastructure –An infrastructure must be developed which supports in a sufficiently redundant way the key functions of the archive (ingest, verification, storage, delivery, migration) 4.Relations with Libraries –The archive must meet the needs of the library community and the scholars they serve. –Libraries and archives have an opportunity to work together to ensure that content is preserved in a way that fulfills the needs of scholars. 5.Relations with Content Producers –The archive must secure the rights necessary to the archival task and must arrange for timely, secure deposit of content. –Publishers and archives have an opportunity to work together to create archivable content.

13 E-Archive Approach Source File archive: E-Archive will seek to preserve the source files which comprise publishers e-journals. –This approach captures some content which is not presented online (i.e., higher resolution graphics). –This approach makes it very difficult to capture certain elements such as dynamic advertisements and editorial information.

14 E-Archive Areas of Activity 1.Define an archival service. 2.Develop a business model which ensures the short-, mid-, and long-term sustainability of the archive. 3.Design and build technological infrastructure and develop content processing protocols and tools. 4.Research the economic impact of electronic resources on operations costs for libraries and content producers.

15 Activities to Date: Define an Archival Service Engaged libraries in discussions of e-archiving needs and challenges. –Emerging themes: There is a widespread desire for a trusted solution to the e- archiving need. This is true for academic libraries of all sizes. Regardless of institution size, librarians believe it is important for their own institution to contribute to the solution of this problem. Librarians recognize that e-archiving raises complex technical and business issues. Librarians are concerned about perpetual access to materials that have been bought and paid for.

16 Activities to Date: Define an Archival Service Ten publishers are participating in pilot, developmental phase. Association of Computing Machinery American Economic Association American Mathematical Society American Political Science Association Blackwell Publishing Ecological Society of America National Academy of Sciences The Royal Society University of Chicago John Wiley & Sons, Inc.

17 Activities to Date: Define an Archival Service Gathered publishers perspectives on the e-archiving challenge. Emerging themes: –Establishing a trusted archival arrangement is an emerging best practice for leading publishers. –Multiple archival arrangements are being contemplated by some publishers. –An archive helps scholarly societies to maintain flexibility in publishing relationships. –An archive provides a practical way to fulfill the perpetual access clauses found in many content licenses. –An archive eliminates the need for the publisher to store older materials indefinitely, thereby freeing resources for enhancing current publications.

18 Activities to Date: Define an Archival Service Archival Service features: –Archive a publishers full complement of scholarly journals. Seek payment from publishers for this service. –Libraries also support the work of the archive and in return can access the archive. This access is provided in order to allow supporters of the archive to see that the content is safely held in the archive. –Access to the archive is in accordance with a very JSTOR-like moving wall. The archive also provides access as needed to address perpetual access concerns.

19 Activities to Date: Develop Business Model Assumptions: –Those parties who benefit from an archive will help to pay for it. Libraries and publishers are the key beneficiaries. –A diversified revenue stream is important. Ideally the archive will be able to cover its costs via contributions from publishers, libraries, and possibly foundation and governmental sources. –An archive must provide enough access to its materials to enable those who rely on the archive to know that the content is safe and well cared for. A completely dark archive is not satisfactory. Activities: Working to assess costs and establish pricing.

20 Activities to Date: Technical Infrastructure Analyzed and processed sample e-journal source file data. Created prototype archive; production-level archive now in development. Developed tools for normalization and verification of archived content. Developing quality control routines and targets. Participating in a number of efforts focused on related issues: –Digital Library Federation Global File Format Registry –OCLC/RLG Preservation Metadata Framework Work Group (PREMIS) –Harvard/NLM Archival/Interchange DTD Advisory Group –Sponsored development of the JSTOR/Harvard Object Validation Environment (JHOVE)

21 Activities to Date: Research Working with Ithakas research unit, completed a study investigating the non-subscription costs to libraries for print and electronic periodicals. – – Working with a consultant to design a similar study involving publishers.

22 Current Focus Finalize business model Complete work on production-level archival repository Secure support from publisher and library communities

23 The Electronic-Archiving Initiative Eileen Gifford Fenton

