Presentation is loading. Please wait.

Presentation is loading. Please wait.

Open Access Conference, Pretoria, July 2004 Wouter Klapwijk, Univ. of Stellenbosch The LOCKSS Project: an overview Open Access Conference, Pretoria, July.

Similar presentations


Presentation on theme: "Open Access Conference, Pretoria, July 2004 Wouter Klapwijk, Univ. of Stellenbosch The LOCKSS Project: an overview Open Access Conference, Pretoria, July."— Presentation transcript:

1 Open Access Conference, Pretoria, July 2004 Wouter Klapwijk, Univ. of Stellenbosch The LOCKSS Project: an overview Open Access Conference, Pretoria, July 2004 Wouter Klapwijk, Univ. of Stellenbosch

2 Overview 1.What is LOCKSS ? 2.Why use LOCKSS ? 3.How does LOCKSS work ? 4.Stellenbosch-SA LOCKSS project

3 WHAT IS LOCKSS ?

4 Background LOCKSS program initiated by Stanford University Libraries. Software under development since 1999. Development funded by Mellon Foundation and NSF. First beta version released in 2001 - Stellenbosch beta testing site. Production version released in April 2004 as Open Source (http://www.sourceforge.com)

5 Technological view: LOCKSS defined as a Persistent cache LOCKSS creates low-cost persistent digital “caches” of e-journal content at institutions that (a) subscribe to that content and (b) actively choose to preserve it. Enables institutions to locally collect, store, preserve and archive the (authorized) content. Unlike normal caches, pages in a LOCKSS cache are never flushed. LOCKSS system loads itself with newly published content before the first local user seeks it.

6 Accessing cached content Preserved content remains accessible at the original publisher’s URL. Links and bookmarks, searches through I+A databases resolve either to the publishers site or the to the locally-cached content. Techniques used to access content at publisher also work to find the preserved content.

7 So, let’s clarify what it’s all about by asking ourselves: “WHY USE LOCKSS ?”

8 Paper Library System For centuries libraries and publishers had stable roles: publishers produced information and libraries kept it safe for reader access. Librarians’ defence against irreplacable loss has always rested on redundancy. “One library burns but only one of many copies of a work is destroyed” A cooperative, affordable, decentralized, ‘archive system’ with LOTS OF COPIES

9 Going electronic from a library perspective Libraries are continuing with paper subscriptions due to the absence of sustainable digital archiving solutions. As a condition to moving towards electronic content, publishers must guarantee long-term access, but only some large publishers can. Library acquisition funds are insufficient to purchase both formats. Libraries only pay for access, not ownership.

10 Going electronic from a publisher perspective Publishers do not currently guarantee perpetual access to their materials. Publishers are reluctant to place their publishing platforms under risk. Publishers might regard archiving as a responsibility of the librarian.

11 Still going electronic (from an accessibility perspective) A unilateral change of policy by the publisher may cause access to a title to cease completely. Failure to renew a subscription can remove a library’s electronic access to past material with no recourse. Governmental policies? Internet unavailable.

12 LOCKSS The LOCKSS model capitalizes on the traditional roles of libraries and publishers. Libraries should retain custodial role of preserving scholary information. Publishers participate by permitting libraries to collect material as published for preservation (a so- called “Publisher manifest”) Effected by utilizing LOCKSS as a persistent access preservation system. A cooperative, affordable, decentralized, ‘archive system’ with LOTS OF COPIES

13 “Publisher manifest” A Publisher manifest is a web page that lists a title’s top level URLs / volume and grants LOCKSS permission to collect and preserve the content. Each volume of a title needs a publisher manifest. Publishers permit libraries to use material preserved in caches consistent with original license terms. Caches provide content only to the original authorized and authenticated subscriber base. For paid e-journals, a library must participate at point of subscription or renewal to benefit from the system.

14 What to Collect and Preserve? E-Journals –Titles you’ve paid for and are leasing –Freely available titles Other genres –Newspapers, Gov Docs http delivered - serial - stable URLs – authoritative version

15 Absinthe Literary Review Be (Berkeley Electronic) Press Cultural Logic Early Modern Literary Studies Open Journal System Other Voices 6 free-access publishers 2 subscription-based publishers Project MUSE HighWire Press

16 How does LOCKSS work ?

17 LOCKSS Caches Collect HTTP delivered content –“Crawls” publisher sites in the same way a search engine does. –All formats (PDF, HTML, JPEG, TIF, Audio, Video) Preserve content integrity –Independent collection –Cooperate to audit and repair damage by means of polling and voting (“reputation based system”) Provide access (i.e. serve content) –Via web browser –Utilizing EZproxy configurations or a PAC file.

18 LOCKSS machines Approximate Data Flows

19 LOCKSS machines (proxy servers) Prevent the publisher from revoking access rights to back content Approximate Data Flows

20 Look and Feel to Readers Configure LOCKSS as a web proxy Example: –PNAS Online table of contents page from web (9/11/02) from LOCKSS cache

21

22

23 Distributed Repository Model Technology Uses many “unreliable repositories” (PCs) Robustness through redundancy Inexpensive consumer hardware Low sys admin overhead (less 1 hour/mo ) Leverages web technology HTTP delivered and displayed content, all formats No need to replicate publisher’s system No single point of failure

24 Collection Access LOCKSS and Local Networks publisher is available LOCKSS PUBPUB PAC File or Proxy

25 Collection Access LOCKSS and Local Networks publisher is unavailable LOCKSS PUBPUB PAC File or Proxy

26 Storage disc space? Terabytes of E-Journals Median e-journal size is less then 0.5 GB/ year 1 Terabyte (1000 GB) = 2000 journal years J-yr storage TB/PCJ-yrs/PC 2004 $0.35 1.44 2,880 2005$0.28 2.88 5,760 2006$0.14 5.7611,520 2007$0.07 11.5223,000

27 The Stellenbosch-SA LOCKSS project

28 South African Goal LOCKSS Project runs from 1 August 2004 - 31 December 2005 with OSI Foundation funding. PHASE 1 intended to collaboratively move towards setting up caches and developing Plug-ins (i.e. maintain momentum). PHASE 2 focused on bandwith savings related to the initial crawl for titles (i.e. “designated cache”). Focus on previously disadvantaged institutions.

29 Thank-you Long Lived: slow, determined, indestructible


Download ppt "Open Access Conference, Pretoria, July 2004 Wouter Klapwijk, Univ. of Stellenbosch The LOCKSS Project: an overview Open Access Conference, Pretoria, July."

Similar presentations


Ads by Google