Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation.

Similar presentations


Presentation on theme: "1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation."— Presentation transcript:

1 1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation

2 2 Administration Online survey http://create.hci.cornell.edu/cssurvey.cfm Course evaluations at end of class today

3 3 Long-term preservation Objective Retain digital library materials over centuries Longer than... computer architectures (Wintel, Linux, 390,...) magnetic storage (disks, tapes,...) formats, protocols, applications (Unicode, Java, XML,...) Internet or the web for purposes that we have not yet considered

4 4

5 5

6 6

7 7

8 8

9 9 Levels of preservation Preserve full look and feel of digital material in its context e.g., A video game with its hardware Preserve content with an access system but migrate the look and feel to new environments e.g., successive versions of MS Windows Preserve raw content but no software system e.g., UTF-8 text with XML/XSL mark-up, but no XML/XSL software The complexity of preservation varies greatly with the level.

10 10 Challenges: user needs Digital information differs from print  May be useless without its environment.  Creator and subscriber may not have copies.  Numerous versions. Example: A scientific journal on-line  If the author does not subscribe - no access to own article.  If the library does not renew subscription - no access to anything.

11 11 Challenges: technical problems Technical issues  Storage media have short life-span.  Formats and specifications change continually.  Computing environments are very complex. Example: personal files I have retained all my personal computer files since 1984, but have great difficulty in reading some of them.

12 12 Challenges: economic and legal Legal  Archives require permission to save information. Institutions:  Library of Congress, National Archives, etc. do not provide the same services for electronic information that they provide for physical artifacts. Example: discontinued serials What happens if a journal publisher goes bankrupt, or a scientific archive does not get its grant renewed?

13 13 Technical approaches: 1. Persistent storage MaterialApproximate life (years) Acid-free paper500+ Microfilm300 Optical disks100? Color film25-50 CDs20? Magnetic disk and tape5 Persistent storage preserves raw content only Research in high-volume, long-term digital media in lacking

14 14 Technical approaches 2. Copying bits (refreshing) Refreshing bits Repeatedly copy bits from one storage medium to the next. A standard technique in data processing. Benefits from the rapid fall in prices of storage devices. Preserves raw content only. Requires active management Mirrors Have many copies of the same information with independent management.

15 15 Technical approaches 3. Migration of content Migration Retain content but change formats and representations to keep current with technology Used by journal publishers Preserves content and an access system Example. Pension funds The Social Security Administration has records of every FICA payment, which migrate between systems over many years.

16 16 Technical approaches 4. Emulation Concept Record a full specification of the computing environment in which the digital information was created At time in future, emulate the original computing environment Would preserve full look and feel Clearly not practical for complex computing systems Emulation is never perfect Computing environments are remarkably complex But may be useful for parts of systems e.g., Java virtual machine

17 17 Technical approaches 5. Digital archeology After periods of neglect, archeologists are needed Recover data from old media Reverse engineer lost formats and specifications Experts in digital paleography (reading archaic scripts and formats) Example. East Germany German archivists are reconstructing the records of the East German state from worn out tapes, broken computer systems, undocumented data bases, and the recollections of staff.

18 18 Preservation at publication This is a period of experimentation and change in formats, protocols, object models, etc. Some information is easier to preserve than others. Longevity is more likely if:  Formats are widely used, in important applications.  Methods are simple, without using obscure options.  Coding schemes are easy to interpret. Example. Internet RFC Series The Internet RFC Series use text/ascii. The RFCs go back to 1969 and have no preservation problems. A few RFCs are in PostScript and already hard to decipher

19 19 Metadata Digital information needs interpretation Self-documentation is always good Persistent identification is vital Simple, standard metadata has a chance of long-life Authentication of material need not be complex (e.g., hash) History of changes (e.g., migration to different format)

20 20 Preservation of specifications Digital information needs a context Therefore store the specifications of: Formats Database designs Technical documentation User manuals...on high-quality archival materials, e.g., paper.

21 21 Final word Long-term preservation needs people and organizations who want it!


Download ppt "1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation."

Similar presentations


Ads by Google