Download presentation
Presentation is loading. Please wait.
1
1 Gerhard Schneider – Rechenzentrum der Universität Freiburg Aspects of Long Term Preservation of Digital Libraries gerhard.schneider@rz.uni-freiburg.de Gerhard Schneider Computing Centre & CS Department University of Freiburg
2
2 Gerhard Schneider – Rechenzentrum der Universität Freiburg Storage on Paper Longevity of the media –paper lasts for centuries, no special care required –except perhaps: acid in paper, water from burst pipes, fire, etc Longevity of the description language –except perhaps: old English or the old German alphabet –abstract terms: decoding is possible, as related information is available –how about old assyrian writings? Loss of information is a well known phenomenon –loss of old information is not so relevant to current society 5th book of Aristotle –loss of new information is more or less impossible through the distribution of knowledge to many places thanks to Gutenberg
3
3 Gerhard Schneider – Rechenzentrum der Universität Freiburg Storage on Paper Accessibility of printed information –no special device is needed, except perhaps glasses –no technical knowledge is required: “hands on” Outsourcing of the handling of knowledge distribution to publishers –economically very successful - so successful that we can no longer afford to buy the books we wrote long term storage of information has been centralised in libraries –high running costs library building, maintenance, staff required to manage books –cost of storage may by far exceed the cost of acquisition
4
4 Gerhard Schneider – Rechenzentrum der Universität Freiburg Storage on Paper If you don‘t live close to the library, accessing information can be very difficult (3rd world countries) a rather costly machinery has been set up to ease the problem –long distance inter-library loans staff intensive, cost of transportation photocopies of articles vs. copyright now: scanning articles and delivery via fax (sic!) or email the user is charged with a nominal fee –nominal w.r.t. the cost of operation, not w.r.t. the user’s own budget Information is produced electronically –most features are lost when the information is brought to paper It is only natural that scientists are asking for electronic libraries - given all the benefits
5
5 Gerhard Schneider – Rechenzentrum der Universität Freiburg Electronic storage There are a few pitfalls when it comes to digital storage –can you still read your old 5 1/4“ - floppies? Do you still have a device to read them? –Well known problem in other areas: record players are rare these days. And if so is there still anything on them? –Magnets can erase information, and each information bit is a little magnet interfering with the others –well known phenomenon also in other areas of magnetic storage music cassettes, tape recorders, video tapes Solution: digitally stored information can be copied to new media without any loss! –The problem old fashioned industry is now facing w.r.t. to CD-writers
6
6 Gerhard Schneider – Rechenzentrum der Universität Freiburg Electronic storage Thus in principle we have a solution to the media problem: –keep converting –conversion can be done in a fully automated way, using robots –the technology is available in most computer centres and used for automated backup and archive. –Typical archive software recycles tapes which have been overused and copies the information onto new tapes, ejecting the old tapes. Interpretation of the contents –bits carry no real information, interpretation by software is required before it can be presented to the human eye/ear New problem: convert the software that was used to generate the information. –Well known problem: word processors can’t read old files
7
7 Gerhard Schneider – Rechenzentrum der Universität Freiburg Format issues What do the bits mean? –Simple, but good example: TeX Information and control commands are stored in plain ASCII The functionality of the control commands are exactly described in the TeX manual So, if you sit on an island with nothing but the bits and the TeX manual, you can find out what the paper is supposed to look like –Try this with MS-Word – or, even better, MS-Powerpoint Putting data into electronic libraries only makes sense if the format is 100% specified –Whether this description has to be in the document or in an accompanying file is of secondary interest. –Keep it simple: the original document should be understandable even if additional structure information gets lost (or is difficult to retrieve)
8
8 Gerhard Schneider – Rechenzentrum der Universität Freiburg Format issues Example: the Kodak imaging software in MS-Windows allows the annotation of TIFF-files –Can only be read with the Kodak software –Or the annotation can be added permanently to the document, thus making it visible (and not removable) to any other TIFF reader. Text formats are precise – i.e. we know what has been typed Image formats are different, as information is lost during the scanning process –By the lens itself –By the sensor (300 dpi means that only 300 dots of an inch are stored) –By the storage format (i.e. do we get back what we stored?) Lossy vs faithful
9
9 Gerhard Schneider – Rechenzentrum der Universität Freiburg Format issues What does „lossy“ mean? –We do not get back every information that we stored – sounds scaring –Did we see it in the first place? Is Fax lossy? (fax = 100 dpi or 200 dpi) –Analog recording vs. CD vs. MP3 CD is a lossy process, but what is really lost? MP3 is good enough, even for the young generation. –What we lose depends on the algorithm Doctor‘s scare: „vital Xray data is lost“ – completely wrong –Why lossy? Keeping the original information needs too much space and does not give any gain in knowledge. In addition „writing things down“ is already a lossy process. „lossy“ does not imply that we lose more and more information over time
10
10 Gerhard Schneider – Rechenzentrum der Universität Freiburg Complexity of storage When it comes to electronic media, we tend to ask for overkill, forgetting that we cannot do anything like that on paper When moving to the paperless office (my office does!!) –after having solved the format issue in favour of RTF and TIFF how do we store the documents? –We use the filesystem and nothing else as it pretty well reflects the current structure of an office. –Thus we are independent of the operating system and the management software all I need are long filenames and a tree structure, possibly access rights Thus we can get quite far before running into another really hard problem
11
11 Gerhard Schneider – Rechenzentrum der Universität Freiburg Software issues In a multimedia environment, it may not be enough to convert the media, the software has to be recompiled –a standard job in science, whenever a new computer architecture appears, just recompile and run. –Most scientific software has little sophisticated I/O –what happens if the software is intimately married to the underlying operation system like Word to Windows??? –Can we really afford to store our information in proprietary systems? i.e. systems which we cannot look into? Use system-independent data storage –even if a loss of information occurs don‘t put this information in in the first place....
12
12 Gerhard Schneider – Rechenzentrum der Universität Freiburg Live documents After all we want live documents –query and retrieval how many libraries are locked into old fashioned systems because their data cannot be converted? –hyperlinks –computer games –simulation upgrades to new versions on new operating systems are upward compatible - hopefully a manufacturer may decide NOT to move to a new platform –make as much money as possible and vanish reimplemenation may not be a solution –incompatibilities, copyright issues, errors become historic features
13
13 Gerhard Schneider – Rechenzentrum der Universität Freiburg Solution Why not specify the programming environment along the lines of the file format discussion? Use JAVA ! –Port the java engine to a new environment and you are „done“ Unfortunately: –Users like their own programming environment –Environments are made for performance (data bases) –And not for long term storage So we have to face the real world
14
14 Gerhard Schneider – Rechenzentrum der Universität Freiburg Solutions?? Keep a museum of running machinery? Emulation?? (Idea of Rothenberger, Rand Corp) during a phase of transition emulators are typically available Example: Lots of games were available for the C64 and are still kept (collected) in libraries, without a working environment emulators are available: CCS64 v 1.09 runs under Windows
15
15 Gerhard Schneider – Rechenzentrum der Universität Freiburg Emulators under Windows Sinclair ZX Spectrum Atari emulator Even emulators for modern PalmPilots
16
16 Gerhard Schneider – Rechenzentrum der Universität Freiburg Even more emulators Even for Sony‘s Playstation, there is an emulator under Win98 There is a Palm emulator for the Gameboy, running in the Windows emulator of the Palm, which runs…
17
17 Gerhard Schneider – Rechenzentrum der Universität Freiburg What about Windows? Running NT under Linux on an Intel machine.. Or: Running Linux under NT on an Intel machine Or: Running NT under Windows XP
18
18 Gerhard Schneider – Rechenzentrum der Universität Freiburg What about other hardware? Emulate Windows on other hardware (Macintosh):
19
19 Gerhard Schneider – Rechenzentrum der Universität Freiburg Observation Many software developers use emulators to cross-compile applications for new environments Thus emulators do exist in most environments Can we obtain them from the manufacturers? –Copyright issues –company secrets –maybe enforce a deposit of software emulators in a safe?? For later use?
20
20 Gerhard Schneider – Rechenzentrum der Universität Freiburg Using emulators Emulators typically store an environment in one special file Application example (tested) for VMWARE –install Windows 98 in a VMWARE box keep the resulting file as a reference installation –install one computer game (or one programme setup) under a copy of the reference installation –store the resulting file in a digital library with the name of the game as metadata –to play the game, start your computer (either NT, Win2k or Linux), start VMWARE with that specific file and.... Play! the file can be exchanged between operating systems –to convert the file from one storage medium to another, use the standard process
21
21 Gerhard Schneider – Rechenzentrum der Universität Freiburg Using emulators At some stage, the PC technology will die. Very likely there will be an emulator for the old fashioned PC on the new hardware, at least for a limited time. During this time, set up a scheme to use that emulator to run your favourite operating system and install your favourite emulator under the emulated environment. If this works, continue to use all the old files If it fails, some development has to be carried out –money on such projects is wisely spent: one local solution is a solution for the whole world Performance is not an issue!
22
22 Gerhard Schneider – Rechenzentrum der Universität Freiburg Performance of emulators Machines get faster –VMWARE loses a factor of 2, so on an 800 MHz machine it appears as if the original code were running on a 350 MHz machine we will thus keep even the original „feeling“ of the software For some time, before machines get faster Experience: a whole server setup can be run under emulators –VMWARE even has network and USB connection –a complete digital library system, when installed under VMWARE can be kept in one (huge) file and preserved for the future, at least for a limited time which is better than losing it right away –The hardest part is to convince a sysadmin not to use the real machine
23
23 Gerhard Schneider – Rechenzentrum der Universität Freiburg Using emulated environments even today A typical „library loan“ requires the retrieval of the software and the handing over to the customer –customer may lose parts of the software (diskette, documentation) –customer may have problems with the installation and the librarian cannot help, since a computer expert is required using the emulated version means the retrieval of a file from a digital library (electronic storage) and its installation (i.e. a copy process) on the library computer (which has an emulator installed) –no manpower involved, instant service to the customer it suffices to have one reference installation in the world –libraries could trade the files, provided they own the copyright of the “computer game”
24
24 Gerhard Schneider – Rechenzentrum der Universität Freiburg Summary Emulators may be the only way to preserve a complex software environment –a „living“ environment in contrast to a „dead“ environment like a book (text or image) Digital libraries themselves are complex software environments, which depend on hardware and operating systems This is a current Ph.D.-project at the University of Freiburg. –How far can we go? –Apparently very far…..
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.