Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 791-S04 Digital Preservation Seminar Presentation of: Arms, "Preservation of Scientific Serials: Three Current Examples", JEP, 5(2), 1999 and Nelson.

Similar presentations


Presentation on theme: "CS 791-S04 Digital Preservation Seminar Presentation of: Arms, "Preservation of Scientific Serials: Three Current Examples", JEP, 5(2), 1999 and Nelson."— Presentation transcript:

1 CS 791-S04 Digital Preservation Seminar Presentation of: Arms, "Preservation of Scientific Serials: Three Current Examples", JEP, 5(2), 1999 and Nelson & Allen, “Object Persistence and Availability in Digital Libraries", D-Lib Magazine, 8(1), 2002 Michael L. Nelson 2/12/04 mln@cs.odu.edu http://www.cs.odu.edu/~mln/

2 Arms, "Preservation of Scientific Serials: Three Current Examples", JEP, 5(2), 1999

3 Three Serials ACM Digital Library –http://www.acm.org/dl/http://www.acm.org/dl/ Internet RFCs –http://www.rfc-editor.org/http://www.rfc-editor.org/ http://www.rfc-editor.org/rfc-index.html D-Lib Magazine –http://www.dlib.org/http://www.dlib.org/

4 Arm’s Three Levels of Preservation conservation –maintaining the “look and feel” cf. D-Lib Magazine’s approach preservation of access –maintenance of services on the content e.g.: search engines, author indexes, annotations, etc. preservation of content –maintain “content” only e.g.: maintain only the XML and not the stylesheets, transformations, etc.

5 Publishers as Archivists think really long-term: –“Tomorrow we could see the National Library of Medicine abolished by Congress, Elsevier dismantled by a corporate raider, the Royal Society declared bankrupt, or the University of Michigan Press destroyed by a meteor. All are highly unlikely, but over a long period of time unlikely events will happen.” emphasis mine - MLN

6 How Long is Forever? Average human life span (from: http://www.che.uc.edu/acs/archives/cintacs/vol39no5/vol39no5.html) –female: 78 –male: 77 Average Fortune 500 company lifespan: (from: http://www.businessweek.com/chapter/degeus.htm) –40 - 50 years Universities? U.S. Government agency or institution? –what about individual labs? NASA Zero Base Review U.S. Military BRAC

7 Partnerships With Publishers cf. next week’s lecture on LOCKSS –http://lockss.stanford.edu/http://lockss.stanford.edu/ requirements: –cooperative publishers –cooperating libraries with significant individual and aggregate resources –IPR resolution…

8 Acting Independently of Publishers “The Library of Congress could play a special role. A prime function of the Library of Congress is to collect the cultural and intellectual output of today for the benefit of future generations. No legal changes are needed for the library to extend its mission to collecting and preserving information that is created in digital formats.” –this was in 1999; we now have http://www.digitalpreservation.gov/ (Week 15’s lecture) http://www.digitalpreservation.gov/ –“outsourcing examples” de jure: Theses -> UMI de facto: web pages -> Internet Archive (Week 6’s lecture)

9 Nelson & Allen, “Object Persistence and Availability in Digital Libraries", D-Lib Magazine, 8(1), 2002

10 Where to Measure Availability? HTML page? HTTP server? DL Service? Information Objects?

11 Previous Studies - HTML Pages / URLS “…estimates put the average lifetime for a URL at 44 days.” –Brewster Kahle, Scientific American, 1997 http://www.hackvan.com/pub/stig/articles/trusted- systems/0397kahle.htmlhttp://www.hackvan.com/pub/stig/articles/trusted- systems/0397kahle.html “…appears that the half-life of a Web page is somewhat less than two years and the half-life of a Web site is somewhat more than two years.” –Wallace Koehler, Information Research, 1999 http://informationr.net/ir/4-4/paper60.html see also JASIST 53(2), JASIS 50(2), and others

12 Previous Studies - DL Services –Powell & French, DL 2000 http://www.cs.virginia.edu/~cyberia/papers/DL00.pdf (note: this was for the Dienst-based NCSTRL, not the OAI- PMH-based NCSTRL) –see: Anan et al., JCDL 2002 »http://www.cs.odu.edu/~mln/pubs/ncstrl-oai.pdf

13 Previous Studies - HTTP Servers measured latency (~ 500 ms) and measured uptime probability to be ~ 0.95 –Viles & French, Computing Systems, 1995 not here: http://www.usenix.org/publications/computing/ http://www.usenix.org/publications/computing/ not here: ftp://ftp.cs.virginia.edu/pub/techreports/CS-94- 36.ps.Z ftp://ftp.cs.virginia.edu/pub/techreports/CS-94- 36.ps.Z cf. the problem as presented by Arms!

14 But What About the Information Objects? Access to the http server / DL service / web page is a necessary but not sufficient condition to actually getting “the stuff” Premise: items are put in a DL because they are more valuable than the “average” URL; they should be more available

15 Experiment Select 20 different DLs by hand –try to get a good mix between subject-based, author contributed, institution repository, different architectures, etc. (see figure 1) –by fiat declare that it is a DL if it “looks like a DL” –“randomly” (but still by hand) select 50 objects from the DL only DLs with >= 50 objects were chosen –establish a baseline –harvest 3 times per week for > 1 year –record bytes recvd at each harvest

16 Results Table 2, Figures 1-20: http://www.dlib.org/dlib/january02/nelson/01nelson.html Results: 31 / 1000 objects unavailable –lots of additional analysis could be done here… –see me if you’d like to pick this up as a project –3% corresponds with the study by Lawrence et al., IEEE Computer, 1999 http://www.neci.nec.com/~lawrence/papers/persistence- computer01/ persistence-computer01.pdf –more recent study by Spinellis, CACM, 2003: “…after four years 40%-50% of the referenced URLs [in CACM and IEEE Computer articles] become inaccessible.” –http://citeseer.nj.nec.com/spinellis03decay.html


Download ppt "CS 791-S04 Digital Preservation Seminar Presentation of: Arms, "Preservation of Scientific Serials: Three Current Examples", JEP, 5(2), 1999 and Nelson."

Similar presentations


Ads by Google