Presentation on theme: "Long-Term Preservation. Technical Approaches to Long-Term Preservation the challenge is to interpret formats a similar development: sound carriers From."— Presentation transcript:
Technical Approaches to Long-Term Preservation the challenge is to interpret formats a similar development: sound carriers From phonograph to MP3
Those that do not keep up with this development soon will lack support: – new audio documents are only produced in current formats – out-of-date equipment spare parts are hard to come by.
technical approaches to long-term preservation of digital documents fall into two categories: – aim to preserve the original state of documents along with systems that are suitable for rendering the documents in their original format – aim to continually transform digital documents into the formats of state-of-the-art rendition systems and at the same time to retain their original look and feel.
Migration advantages: – well known – documents available all the time – Possibly improved quality disadvantages: – reduced authenticity – hard to automate
Hardware Museums The mission of a hardware museum is to collect (and keep operational) all relevant computing systems so that future generations may view our documents in their original environments.
Hardware museums are not feasible in practice : – too many items – Additional software and hardware required – hard to maintain Hardware museum at the Universität der Bundeswehr München, Germany
Emulation Emulators allow the function of processors and other hardware components to be simulated by software. When using emulation, for each digital document the following items have to be preserved (using, e.g., migration): – The character stream and the metadata – A specification of the hardware that can be interpreted by the emulator – The complete software of the rendition system (in the form of binary data streams).
If interested persons would like to access a document conserved that way in, say, 100 years from now, they would have to proceed as follows: 1.create an emulator, – Load the hardware specification into an emulator to obtain a software implementation which is functionally equivalent to the original hardware. 2.install software – On the emulated computer install the systems software and the application programs needed for rendering the document
3.and render documents – Load the character stream of the digital document into the emulated... and render computer and start the rendition software to access the document.
advantages of emulation: – relatively small cost per document – cost proportional to actual use one emulator suffices for many documents – high authenticity Whenever an old format becomes obsolete emulation (while new ones become popular), new conversion techniques and tools have to be developed that achieve the required transformation.
Standard Formats costs proportional to number of formats standards for simple character sequences and For complex document types
Legal and Social Concerns long-term preservation of digital documents involves legal and social concerns: 1.Digital Rights Management (DRM) and copy protection 2.reserved software right 3.Should hardware manufacturers provide emulators? 4.criteria for selection 5.costs as a limiting factor 6.make costs affordable 7.balance of interests between shareholders
OAIS Models Open Archival Information System Reference Model an ISO standard on the long-term preservation of digital documents. two complementary points of view: both, an information model and a process model
The Information Model Data Object and Information Object The knowledge which is required to understand data is called Knowledge Base In order to understand the data one needs additional information. Ex, Along with the source code of the Java program, a book about the programming language Java must be available (Representation Information)
The Content Information is the information object proper which contains all the information necessary to interpret data Preservation Description Information (PDI) denotes all the information required to suitably preserve the corresponding Content Information.
Content Information and PDI are combined into one logical entity, the Information Package. Packaging Information. It specifies how Content Information and PDI are actually related to each other e.g., by describing the directory structure of a CD-ROM.
Descriptive Information which yields Information about the content of the Information Package and thus allows the Information Package to be found in the archive.
Modeling Context and Processes In order to define the processes that are going on in the archive in more detail, the OAIS Reference Model starts by considering the context of the archive. An archives purpose is to maintain documents, which are submitted to it and which are to be made available to future users.
Producers, i.e., authors, institutions, etc. that deliver documents to the archive. Management. defines the specific purpose of the archive, e.g., which documents are to be collected and which are not
The OAIS Reference Model differentiates three different kinds of Information Packages in their relation to the environment of the archive: – Submission Information Packages (SIP) are sent to the archive by Producers – Archive Information Packages (AIP) are preserved in the archive – Dissemination Information Packages (DIP) are passed from the archive to Consumers.
The Ingest process receives an SIP from the Producer and prepares it Ingest for storage and administration within the archive. SIPs must be transformed into AIPs, and Descriptive Information corresponding to the AIPs has to be created. AIP is passed on to the Archival Storage process, and the corresponding Descriptive Information to the Data Management process.
Data Management process manages the Descriptive Information and also the data that are necessary to run the system Administration process handles routine work in the archive: negotiates with producers the prerequisites for sending documents to the archive.
DSEP Model Deposit System for Electronic Publications The business routine of library can be subdivided into four domains: – Acquisition of stock – Capturing metadata – Preservation and maintenance – Providing access
The process Delivery & Capture transforms documents into SIPs conforming to the DSEP standards. The process Packaging & Delivery unpacks the DIP and transforms it into a format that can be used by the library system.