Download presentation
Presentation is loading. Please wait.
1
Digital Preservation
2
Losing digital information hurts everyone
An NHS doctor needs a 1987 clinical study found on Google Scholar She tries to open the ‘dvi’ file, but can’t A father shows his children the computer game he wrote in school He wrote the game in PDP assembler He stored the program on paper tape A small business owner wants to market the energy saving device it developed in 1985 She carefully stored all of the files Now she doesn’t have the applications to read the documents, spread-sheets, and CAD drawings The CAD company is long out of business
3
Losing digital information costs opportunity
A university research lab has provided its data, technical reports, software on-line since 1984 and on the web since The professor retires and closes the lab in 2004 A university IP officer wants to defend a patent challenge A biographer wants review the unpublished work A former student wants to revive a line of research The digital files Some are damaged Some rely on applications that are out-of-use Some rely on hardware that is unavailable Some rely on an environment that no longer exists Some rely on information that no-one recorded
4
Losing digital information costs money
An oil company collected extensive data for a reservoir and want to exploit it in 2007 All documents and data are held in v1.3 of an integrated management product They now use v9.0 and can’t read or access it An oilfield services company collects dipmeter data in the 1970s Stored on 7-Track tapes Recorded in optimised formats Difficult and expensive to repeat measurement data
5
Example – Loss of Information
Acrobat 5 Acrobat 7
6
“There is no access without preservation
“There is no access without preservation. … Action has to be taken at the outset to ensure viable and sustainable access to digital content. …In essence, preservation decisions need to be integrated with the creation process that will determine the way in which the material is created, described and stored.” (Brindley, 2000) “Like it or not, the digital domain will have to include a preservation component.” (Hunter, 2000, p.2)
7
A Landmark Report Garrett, J., & Waters, D. (Eds.) (1996). Preserving digital information: Report of the Task Force on Archiving of Digital Information. Washington, DC: Commission on Preservation and Access. Available online: report.pdf Much recent awareness of the digital preservation issue was facilitated by this report.
8
Key Questions What is digital preservation?
Why is digital preservation important? What are the big challenges? What are the relevant standards, initiatives, programs?
9
Some Confusion in Terms
Digital reformatting (digitizing for preservation) - Using digital technology to preserve the information content of print and non-print materials. The product is a digital reproduction for the original. Digitizing for access – Converting artifactual information into digital form so that it may be viewed by a larger community and from a distance. Preserving digital-only information – long-term archiving of information that exists in digital form, such as born digital data, e-journals, etc. (Lazinger, 2001)
10
Digital Information – An Estimate
UC Berkeley‘s School of Information Management and Systems: How much Information? 2003 Analysis of the year 2002 to estimate the yearly increase of new (digital and analog) information. Finding: 30 % increase of digital information per year See: much-info-2003/index.htm
11
Can Digitization Be Preservation?
“We digitized it, so it is preserved.” – NOT REALLY! Comparatively, digitized materials seem inherently ephemeral (i.e., they have relatively shorter period of life span or “shelf-life”) Since digitization has become the mainstream of publishing and communication, if we don’t make effort to preserve these digitized objects, they may be lost for good. Indeed, “the origins of the digital era are probably already lost!” (Lyman & Kahle, 1999)
12
Digital Preservation – A Definition
What is meant by “digital long-term preservation“ or “digital preservation”? Definition by Ute Schwens / Hans Liegmann (DNB/nestor): “In terms of preserving digital resources, ‘long-term’ does not mean issuing a guarantee for five or fifty years, rather the responsible development of strategies which can cope with the constant changes brought about by the information market.”
13
What Is Preservation? “Preservation itself is primarily concerned with the survival of information in a usable form for as long as it is required. Preservation … is not just concerned with the conservation or restoration of physical artifacts, but includes all of the strategic and organization considerations that relate to the survival of information over time.” (Day, 2003, p ) “Preserving anything beyond data means that we also have to preserve other things: the communication of the information, the context of the document, the integrity of the record, and the value of the archives.” (Hunter, 2000, p.4)
14
What Is Digital Preservation?
“Digital preservation is the ability to keep digital documents and files available for time periods that can transcend technological advances without concern for alteration or loss of readability” (The Association for Information and Image Management) “Digital preservation refers to the series of managed activities necessary to ensure continued access to and preservation of digital materials” (RLG/OCLC report) (Both definitions were cited in Chapman, 2001)
15
What is Preservation? (cont.)
Preservation encompasses a wide variety of interrelated activities, such as policies, standards, procedures, processes, etc., designed to prolong the usable life of human artifacts. Preservation won’t happen naturally. In essence, preservation is costly and takes a lot of time and effort. Preservation is a continuous obligation.
16
What Is Digital Preservation? (cont.)
Digital preservation refers to the series of managed activities necessary to ensure continued access to digital materials for as long as necessary. Digital preservation…refers to all of the actions required to maintain access to digital materials beyond the limits of media failure or technological change. Long-term preservation – Continued access to digital materials, or at least to the information contained in them, indefinitely. Medium-term preservation – Continued access to digital materials beyond changes in technology for a defined period of time but not indefinitely. Short-term preservation – Access to digital materials either for a defined period of time while use is predicted but which does not extend beyond the foreseeable future and/or until it becomes inaccessible because of changes in technology. (A definition adapted from the Preservation Management of Digital Materials: A Handbook, Available:
17
What Is Digital Preservation? (cont.)
Preserving digital content entails far more than making backup copies and storing them in disparate location. Digital preservation is … to extend the usable life of machine-readable files and protect them from media failure, physical loss, and hardware and software obsolescence, these activities include: Ensuring the long-term maintenance of a bitstream (the zeros and ones): Backing up files and keeping a copy at an offsite location Running checks to track the deterioration of storage media, files or bitstreams Providing continued accessibility of the contents: Viability – making sure that information is intact and readable from the storage media Renderability – making sure that information is viewable by humans and able to be processed by computers Understandability – making sure that information is able to be interpreted by humans. (Hixson, 2004)
18
Digital Preservation – The Challenge
Hardware and Software are becoming obsolete in very short period of time Incompatibility of different versions of hard- and software Fading knowledge of how to use older hard- and software Aging and decaying storage media
19
How big is the problem? Who is touched by digital preservation problems? Individual consumers Small and medium sized enterprises Large corporations University libraries, faculties, institutes Publishers Libraries Local, regional, national governments … every person or organisation that keeps digital material for more than 15 years! Estimates suggest Europe loses €3bn per year in business value
20
Digital Resources New forms of information: Rapid change of technology
digital production (digitization, born digital, only digital) digital publication (only digital, object features like retrieval) digital distribution (portal, value chain) Rapid change of technology
21
Digital Long-Term Preservation
Digital Preservation consists of processes that ensure that digital objects remain accessible, (re-)usable and understandable in the future. Digital Preservation has to ensure that future software and hardware tools retain the authenticity, integrity, and reliability of the digital object.
23
What Digital Materials Should Be Preserved? (1)
“The first act of preservation of any material … is identifying or selecting which material merits the effort of preservation. … With digital information, the time frame within which selection choices must be made is shortened due to the problems of media instability and technological obsolescence. … Digital resources require decisions not only about what items to keep but also about which elements of the resource should be preserved. …This may involve saving not only the resource, but also contextual information such as information on the medium and metadata about the contents.” (Lazinger, 2000, p.17)
24
What Digital Materials Should Be Preserved? (2)
“Not everything can be preserved, and that a selection process is necessary and unavoidable. The enormous amount of digital information which exists, and the ease with which it can be created or changed makes selection criteria even more essential, but in a way even more difficult. What should those selection criteria be? Can we be sure that what is selected for preservation now will be what is required in the future?” (Gould & Varlamoff, 2000, p.4)
25
What Content Should Be Preserved? An Example
For E-journal , what content should be preserved? While articles are the intellectual core of E-journals, they also contain many other kinds of information, such as: Editorial board Rights and usage terms Copyright statement Journal description Advertisements Reprint information Editorials Events lists Errata Conference announcements Various sorts of digital files related to individual articles (data sets, images, tables, videos, models, etc.) Which of these need to be archived and preserved for the future? This should be carefully considered by the institutions with mandate to preserve. (Flecker, 2002, p.19)
26
Heterogeneity - Materials
Journals and monographs retrodigitized material genuine digital material Web Documents, Web Server Preprint-Server, theses, e-Proceedings, etc. Primary data, research data, raw data s, blogs, etc. Film, Music, Multimedia etc. ...
27
Heterogeneity: Formats
Depends on subject, e.g. Mathematics (TEX, PS, ...) Geography (GIS) ... Multimedia, e.g. Animated WWW pages Interactive objects in e-Learning Different versions in e.g. PDF, TEX, ... Presentation Format / Preservation Format
28
Heterogeneity - General
Metadata formats (Dublin Core, MODS, PREMIS, MIX, ..) Exchange formats (XML, METS, XML/RDF, SOAP, ...) Controlled vocabulary systems (Ontologies, Taxonomies, ...) Architecture, Protocols ... Standardisation & Interoperability
29
Dealing with the Heterogeneity
Preservation policy Cooperation: international/national Cooperation: cross-domain (e.g. museums, archives, research institutes, commercials, ...) Redundancy of digital repositories explicitly desired Cooperative management/administration of distributed digital archives/repositories
30
... Coordinated cooperation needed between:
producers of digital objects (e.g. scientists) providers (e.g. libraries) distributors (e.g. publishers, hosts of db) Use of international standards (e.g. DC, OAI, OAIS, METS)
31
Who Should Be Responsible for Digital Preservation? (1)
Creators Publishers Distributors System administrators Libraries Archives Users (Lazinger, 2000, pp.49-53)
32
Who Should Be Responsible for Digital Preservation? (2)
“There is some agreement that it is unfair of libraries to expect publishers to begin to take on the role of archiving when they have never done so before, but similarly publishers cannot expect libraries to preserve material which they do not own and do not have long term access to. There is good reason to expect licensing agreements between publishers and libraries to change in due course to take account of this dilemma.” (Gould & Varlamoff, 2000, p.7)
33
How Can Digital Information Be Preserved?
Roughly speaking, there are four main technical options but none of these can provide a perfect solution: Medium refreshing – copying digital files from one storage medium to another medium Data migration – transfer of digital materials from one hardware/software configuration to another Technology preservation – preserve an information object together with all of the hardware and software needed to interpret it Software emulation – software designed to perform the functions of obsolete hardware and other software (Day, 2003, p. 180; Lazinger, 2000, pp.76-77) Print to Paper or Microfilm/fiche or barcode
34
Media Renewal / Refreshing
A common response to the medium deterioration problem is known as media renewal or media refreshing. There is a need to clarify the following terms: Copying – transferring data from old storage media to new storage media with the same format specification. Reformatting – there is a change to the underlying bit stream, but there is no change in the intellectual content of the records. Converting – data export or import from one software environment to another. Migrating – usually involves moving data from proprietary legacy systems that lack software functionality to open systems. (Hunter, 2000, pp.57-58) Media refreshing may solve the problem of medium deterioration, but it won’t be able to solve the problem of technology obsolescence.
35
Data Migration Data migration involves the transfer of the entire technology environment, usually from one generation of computer technology to a subsequent one, not just the physical storage medium Data migration is a more tried and tested preservation strategy. Data migration is labor-intensive and it’s difficult to predict what it will involve.
36
Developing Migration Strategies
Less stable to more stable media Highly software-dependent formats to less software-intensive formats Multiplicity of formats to smaller number of common formats Development of backward compatibility paths Standard development and imposition Development of process centers for migration and reformatting (Lazinger, 2000, pp.79-83)
37
Technology Preservation
This approach intends to retain the needed hardware and software. This approach “may have an important role for the recovery of data from obsolete storage media and platforms, but it is unlikely to become a viable long-term strategy.” (Day, 2003, p.180) This approach my end up with “a museum of ageing and incompatible computer hardware.” (Feeney, 1999, p.42)
38
Software Emulation - “Mimicking” obsolescent hardware in software
This approach is based on “the premise that the best way to preserve the functionality and ‘look and feel’ of digital resources would be to preserve the original software and then to run this on emulators that would mimic the behavior of the obsolete hardware and operating systems.” This approach requires “the encapsulation of a digital data object together with the application software used to create or interpret it and a description of the required hardware environment that could be used as a specification for an emulator.” (Day, 2003, p. 181)
39
Software Emulation (cont.)
The emulator involves encapsulating three kinds of information with each digital document: The document and its software environment, including document itself, application software, and operating system The specification of an emulator for the document’s original hardware platform Explanatory material, labeling information, annotations, metadata about the document and its history, and documentation for the software and emulated hardware included in the encapsulation. (Lazinger, 2000, pp.84-85)
41
Open Archival Information System (OAIS) Reference Model
OAIS Reference Model is a conceptual framework for an archival system dedicated to preserving and maintaining access to digital information over long term. The reference model has been well-received by a diverse community of institutions. Many current digital archiving projects are based on OAIS, such as CEDARS, PANDORA, and NEDLIB. The reference model defines roles for three players in archiving: creators (producers), archive operators (management), and consumers (designated users). These three players and the archive itself consist of the OAIS environment. This model also specify functional components of complete digital archiving system. More information:
42
The reference model: provides a framework for the understanding and increased awareness of archival. provides the concepts needed by non- archival organizations to be effective participants in the preservation process.
43
OAIS Reference Model Source: Lavoie (2004)
44
Preservation Planning
OAIS Reference Model Source: producer consumer SIP DIP Access Archival storage AIP Administration Preservation Planning Submission Information Package Archival Information Package Dissemination Information Package Ingest Data management
45
OAIS Reference Model Source: http://www. oclc
46
Relevant Aspects Technical Issues / Obsolescence
Identification & Validation of Formats Preservation Metadata Preservation Policy Legal Aspects Trusted Repositories
47
Preservation Metadata
All Preservation strategies (migration, emulation, etc.) depend on the creation, capture and maintenance of suitable metadata: "Preserving the right metadata is key to preserving digital objects" (ERPANET Briefing Paper, 2003) "It's all about metadata" (Cedars project manager, ca )
48
Preservation Metadata
Specific preservation metadata are necessary to ensure that information can be accessed in the future, e.g. metadata about: Provenance Structure File Format(s) Technical Environment Rights Much of the necessary metadata can be extracted automatically, e.g. via tools like JHOVE
49
Preservation Metadata
Preservation metadata is information that supports and documents the long-term preservation of digital materials. It addresses the following attributes of an archived digital object: provenance - documenting the custodial history of the object authenticity - validating that the digital object is in fact what it purports to be, and has not been altered in an undocumented way preservation activity - documenting the actions taken to preserve the digital object, and any consequences of these actions that impact its look, feel, or functionality technical environment - describing the technical requirements, such as hardware and software, needed to render and use the digital objects rights management - recording any binding intellectual property rights that may limit the repository’s ability to preserve and disseminate the digital object over time (Lavoie & Gartner, 2005, p.2)
50
Preservation Metadata (cont.)
Preservation metadata helps make an archived digital object self-documenting over time, even as the intellectual, economic, legal, and technical environments surrounding the object are in a constant state of change. (Lavoie & Gartner, 2005, p.2) Examples: PREMIS (Preservation Metadata: Implementation Strategies) CDEARS Guide to Preservation Metadata National Library of Australia. Preservation Metadata for Digital Collections
51
Trusted Digital Repositories
A trusted digital repository is one whose mission is to provide reliable, long-term access to managed digital resources to its designated community, now and in the future. (Trusted Digital Repositories: Attributes and Responsibilities, p.i) More information: Trusted Digital Repositories: Attributes and Responsibilities. An RLG-OCLC report (May 2002) Minimum Criteria for an Archival Repository of Digital Scholarly Journals (May 15, 2000)
52
Economic Infrastructure
Digital preservation relies heavily upon continual investment and long-term administrative supporting. Economic infrastructure for digital preservation includes: Identify preservation responsibilities and appropriately allocate them Provide sufficient incentives to encourage people to carry out these responsibilities Leverage infrastructure in support of digital preservation, eliminate redundancies and use scarce resources effectively More information:
53
Technical Issues / Obsolescence
Digital information is stored as a bit stream on physical media => Preservation of the bit stream! Storage media types change quickly and are subject to obsolescence Storage media are unstable and can degrade quickly Keeping the bit stream accessible Migration (Medium and Format) Emulation (Hard- and Software) ...
54
Formats: Identification & Validation
Examples: Document - DOC, HTML Raster Images - TIFF, PNG, JPEG Structured graphics - CAD, VSD, Audio - WAV, MP3, MIDI Video - MPEG, AVI Databases - DBF, MDB Raw data Collections - tar, zip … We are dealing with lots of different formats! File format registries may help to handle the heterogeneity.
55
File Format Registries: Use Cases
Identification I have a digital object; what format is it? Validation I have an object purportedly of format F; is it? Transformation I have an object of format F, but need G; how can I produce it? Characterization I have an object of format F; what are its features? Risk assessment I have an object of format F; is it at risk of obsolescence? Delivery I have an object of format F; how can I render it? (Abrams, Seaman: Towards a global digital format registry. IFLA 2003)
56
Format validation with JHOVE
JSTOR/Harvard Object Validation Environment see: The concept of representation format, or type, permeates all technical areas of digital repositories. Policy and processing decisions regarding object ingest, storage, access, and preservation are frequently conditioned on a per-format basis. In order to achieve necessary operational efficiencies, repositories need to be able to automate these procedures to the fullest extent possible How much technical metadata do I need?
57
Preservation Policy What do you want to preserve?
Why do you want to preserve? How do you want to render an object in the future? Furthermore ... Documentation Policy for short-term preservation Policy for long-term preservation …
58
Preservation Policy What kind of digital objects is the repository responsible for? Fixed format texts, images, web resources, complex digital objects, datasets, … What do you want to render in the future? Keep the original? What is the original? Offer extended functionalities?
59
Preservation Policy What are the significant properties of the object?
Appearance (layout, colour, font size, etc) Behaviour (functionality, interaction, etc) Structure (chapter, section, etc) Content (text, video, audio, etc) Context (cross-references, etc) How do you want to provide access? Designated User Community Options for the user?
60
But Policies/Strategies are not enough ...
… we need tools that help choose & perform a strategy make the strategy possible (emulators, migration tools) maintain the link between originals and conversions enable interoperability and co-operation between different repositories/archives Tools have to be implemented in the archiving system and archiving workflow. Preservation has to come to practice!
61
Legal Aspects Copyright and other intellectual property rights (IP) have a substantial impact on digital preservation Preservation of digital materials is dependent on a range of strategies, which has implications for IPR in those materials Consideration may need to be given not only to content but to any associated software Specific permissions may be very challenging e.g. for webarchiving or digital art
62
..., but Digital preservation is often a legal grey area not yet understood or considered by legislators Lack of legal certainty should not prevent digital preservation actions Take action to manage risks
63
Examples of Current Initiatives and Projects
64
Organizations / Cooperatives
Digital Library Federation (DLF) Council on Library and Information Resources (CLIR) Electronic Media Group (EMG) The Joint Information Systems Committee (JISC) UKOLN Electronic Resource Preservation and Access Network (ERPANET) Digital Preservation Coalition
65
Initiatives, Programs & Projects
The International Research on Permanent Authentic Records in Electronic Systems (InterPARES) Library of Congress. National Digital Information Infrastructure and Preservation Program (NDIIPP) U. S. National Archives. Electronic Records Archive (ERA) Canadian Initiative on Digital Libraries: PADI (Preserving Access to Digital Information): PANDAS: PANDORA Digital Archiving System The Joint Information Systems Committee (JISC). Digital Preservation and Record Management.
66
Initiatives, Programs & Projects (cont.)
Archiving Electronic Journals. Research Funded by the Andrew W. Mellon Foundation. CAMiLEON Project (Creative Archiving at Michigan & Leeds: Emulating the Old on the New) LOCKSS Florida Center for Library Automation (FCLA). DAITSS (Dare Archive In The Sunshine State) Internet Archive DiVA: Digital Scientific Archive portal.org/index.xsql?lang=en The Nordic Web Archive (NWA)
67
Tools and Services JHOVE: JSTOR/Harvard Object Validation Environment U.K. National Archive. PRONOM: The File Format Registry University of Pennsylvania. Typed Object Model (TOM) San Diego Supercomputer Center (SDSC). Storage Resource Broker (SRB) Global Digital Format Registry (GDFR)
68
References Brindley, L. (2000). Preservation 2000: Keynote speech, in Preservation 2000: An international conference on the preservation and long term accessibility of digital materials. Available: Chapman, S. (2001). What is digital preservation? Paper presented at the Digital Preservation Resources Symposium Retrieved September 5, 2005, from preservation/chapman.htm Conway, P. (1996). Preservation in the digital world. Washington, DC: Commission on Preservation and Access. Cornell University Libraries. Digital preservation management tutorial. Retrieved September 3, 2005, from Council on Library and Information Resources, Digital Library Federation, and Coalition for Networked Information (2000). Minimum criteria for an archival repository of digital scholarly journals (version1.2). Retreived September 7, 2005, from Croft, J. A. (2003). The preservation evolution: A review of preservation literature, Library Resources & Technical Services, 47(2): Day, M. (2003). Digital preservation and long-term access to the content of electronic serials. In W. Jones (Ed.), E-serials: Publishers, libraries, users, and standards (2nd ed.) (pp ). Binghamton, NY: Haworth.
69
References (cont.) Feeney, M. (Ed.) (1999). Digital culture: Maximizing the nation’s investment. London: National Preservation Office. Flecker, D. (2001). Preserving scholarly e-journals. D-Lib Magazine, 7(9). Retrieved August 13, 2005, from Flecker, D. (2002). Preserving digital periodicals. In Building a national strategy for digital preservation: Issues in digital media archiving (pp ). Washington, DC: Council on Library and Information Resources. Garrett, J., & Waters, D. (Eds.) (1996). Preserving digital information: Report of the Task Force on Archiving of Digital Information. Washington, DC: Commission on Preservation and Access. Retrieved July 7, 2005, from Gould, S., & Varlamoff, M-T. (2000). The preservation of digitized collections: Recent progress and persistent challenges world-wide. In R.W. Manning & V. Kremp (comp.), A reader in preservation and conservation (pp.1-12). Munchen: Saur. Graham, P. S. (1997). Building the digital research library: Preservation and access at the heart of scholarship. Retrieved August 15, 2005, from Graham, P. S. (1998). Long-term intellectual preservation. Collection Management, 22(3/4): Hixson, C. (2004). Digital preservation at the UO Libraries: A snapshot. Retrieved September 15, 2005, from Hunter, G. S. (2000). Preserving digital information: A how-to-do-it manual. New York: Neal-Schuman Publishers.
70
References (cont.) Lavoie, B. F. (Ed.). (2001). Preservation metadata for digital objects: A review of the state of the art. White paper by the OCLC/RLG working group on preservation metadata. OCLC Research publication. Retrieved August 30, 2005, from Lavoie, B. F. (2004). The open archival information system reference model: Introductory guide. A joint report of the Digital Preservation Coalition (DPC) and OCLC, published electronically as a DPC Technology Report. Retrieved September 2, 2005, from Lavoie, B. F., & Dempsey, L. (2004). Thirteen ways of looking at ... digital preservation. D- Lib Magazine, 10(7/8). Retrieved August 13, 2005, from Lavoie, B. F., & Gartner, R. (2005). Preservation Metadata. A joint report of OCLC, Oxford Library Services, and the Digital Preservation Coalition (DPC), published electronically as a DPC Technology Watch Report (No ). Retrieved September 13, 2005m, from Lazinger, S. S. (2001). Digital preservation and metadata: History, theory, practice. Englewood, Colo.: Libraries Unlimited. Lyman, P., & Kahle, B. (1000). Archiving digital cultureal artifacts. D-Lib Magazine, 4(7/8). Retrieved August 13, 2005, from Marcum, D. (1997). A moral and legal obligation: Preservation in the digital age. International Information and Library Review, 29(3/4):
71
Exercise Which Digital Preservation issues are relevant in the
context of your Digital Collection? How are they relevant? Data creation? Data management (collection management)? Data storage? Data documentation and description? Data preservation? Data use? Rights management? ... Try to describe a digital preservation Framework for your institution.
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.