Presentation on theme: "Planning for Digital Preservation. Planning for Preservation Digital preservation issues come up much faster than traditional preservation issues Digital."— Presentation transcript:
Planning for Preservation Digital preservation issues come up much faster than traditional preservation issues Digital resources need on-going attention Build a preservation strategy into your project from the start Keep dealing with the short-term issues and you wont ever need to face the long-term problem
Issues The content of digital resources is only accessible with the aid of intermediary technologies Digital resources are complex Reliance on specific combination of formats, software and hardware to operate correctly I.T. develops rapidly, and resources can become obsolete very quickly
Three Key Areas Content – the bits and bytes Technologies: software systems; hardware: websites, access and delivery systems Organisational
Planning for the Future Short-term: Initial technology still current and actively supported - 0 - 5 years Medium-term: Initial technology still in use and supported, but no longer used for new work - 5-10 years Long-term: Initial technology no longer used or supported - 10+ years
….. In the Short Term Making digital assets available Website administration Website updates Software and operating system patches Periodic backups Periodic checks on master copies
…….. In the Medium Term Keeping your existing digital outputs up and running Upgrading operating systems and software Upgrading hardware Replacing hardware components Refreshing master copies Periodic backups Periodic checks on master copies
…….. In the Longer Term Overcoming technological obsolescence to preserve a usable digital resource Introducing completely new software Replacing entire hardware systems Enhancing functionality Periodic backups Periodic checks on master copies
During the Data Creation Phase Importance of backups Preferably more than one copy, on and off site Appropriate frequency More than one file format Check your backups But backup is not preservation!
What to Preserve? Significant Characteristics Very difficult to preserve everything (data, functionality and interaction) about a digital resource Documented or commonly understood significant characteristics help simplify preservation action
Analogue…… Book - Significant: Words, paragraphs, chapters, author, publication date, … Not Significant: Binding, print run, font, colour of paper, … Newspapers - Significant: Words, paragraphs, headlines, size of type, date, page number of article, … Not Significant: Size of page, spacing, text justification, colour of paper, …
Digital……… There is a shared understanding of what is important in a paper-based resource Less agreement about what is important in a digital resource Complicated to decide as software and formats support many options that are not knowingly used but have default settings
Questions to ask…. What are the significant characteristics of your digital outputs? What are the digital objects that make up your resource? What is the purpose of your digital resource? Think about the problem in terms of content and purpose Very difficult (if not impossible) to ensure your resource stays exactly the same in the future What can change without adverse effects? What changes must be limited, and by how much? How can you check changes are acceptable?
Assessing the scale of the Preservation Task Estimating volume and type: Textual Documents Still Images Moving Images Audio files Numeric dataset Database Markup Documents (XML etc.) CAD GIS Virtual reality Website Software executable
Risk Assessment for file formats used Review data types and file formats Assess the risks associated with those file formats Establish policy for dealing with them
Preservation Metadata Metadata needed to manage preservation of digital collections: technical; administrative Not necessarily a complete set of preservation metadata elements Possible sources: OCLC/RLG Working Group; the Consultative Committee for Space Data Systems; CEDARS project; The UK National Archives (formerly the Public Record Office); Arts and Humanities Data Service; NEDLIB project; California Digital Library; Harvard University Library
File Structure Create an overview of the file structure Create a list of all files Create a logical file strategy from the outset Choose consistent filenames Avoid using re-using same filename even in separate folders. Store files in a logical order with systems and contents files kept apart. Summary of contents may be included with each file. Keep a record of encryption keys – important for preservation.
Preservation Strategies: Content Migration: convert the data to work with new applications Emulation: convert the data, application (and operating system) to work on new hardware Technology preservation: Keep everything running Virtual computing: create a standard virtual runtime environment Migration on demand: convert original format directly into up-to-date format
Theory ----- Practice In practice, migration is the simplest and most common approach Limitations of migration are: Can be difficult to ensure accurate migration Does not capture functionality, only (possibly partial) data May need to be repeated frequently Might lead to mutation over time
Migrating to new standards – but which one? "The good thing about standards is that there are so many to choose from (A. Tanenbaum) Quicktime 1.01992 MPEG-11992 Real Media1995 MPEG-21996 RealVideo1997 MPEG-41999 Quicktime 5.01999 Active Streaming Format1999 DIVX 5.02002 The number of A/V de-facto standard formats has exploded in the past five years, and this does not cover the dozens of audio and video codec combinations!
Measuring Longevity of Standard Who developed it? Microsoft, Motion Picture Expert Group, etc. Has it received mainstream support? Can your hardware save data in that format? What organisations are using it? Is it used in industry Is it widely accepted by the professional and amateur community? Technology watch – check web sites, developer forums and newsgroups. Has it been submitted as an ISO standard?
Measuring Longevity of Standard Are there any legal actions to change the standard? Is there a licensing fee? What tools are available to create and manipulate the format Open source vs. proprietary PRONOM – National Archive database of 250 software products, 550 file formats and 100 manufacturers Can I execute these tools on my computer? Java, Windows-only, Mac-only
Choosing a Suitable Migration Path What are the main features? Small file size, streaming support Will it support your specialist needs? Subtitles, DRM, Internet delivery, etc. Does it provide sufficient quality Lossless vs. lossy compression. Will it impose any restrictions on use? Can it actually be played by your target audience? Is the standard stable or does it change frequently? How will this affect your desire to use the format?
Migration problems Have you encountered any problems when accessing these files in other applications? Quirks (text not displaying, desynchronised audio/video, upside-down video playback). Version incompatibilities Migrating to other formats Are there any other problems when exporting to other formats? E.g. lossless-to-lossless conversion, in-editable Document quirks & incompatibilities for later.
Updating Hardware Hardware has changed dramatically in the last 3 years Memory – DDR vs. SD-RAM CPU – pin compatibility Graphics cards – AGP 2x, 4x, 8x Operating system – will Windows NT4/98 run on newer hardware? Do you upgrade existing hardware or replace it with new equipment?
Updating Software Software changes on a frequent basis Four service packs available for Windows 2000. Microsoft issues 3 patches per week on average. Legal action force changes to plugin handling. In addition, there is an estimated 20 un-patched vulnerabilities in Internet Explorer alone (PivX Solutions). Do you upgrade to a later operating system or continue to use an operating system & software with known security flaws?
Preserving Your Website: technical issues Standards And Formats Has the Web site been designed using open standards, which should help future-proofing? Have proprietary formats been used (for which backwards compatibility may not be considered) Architecture & Implementation Has the technical architecture of the Web site been documented? Can you continue to use technical systems after funding has finished?
Preserving Your Website: content issues Accuracy: Is the content of the Web site accurate today Who and how will changes be made Could the content of the Web site be misleading in the future? Usability: Maintaining links – short medium and long term Legal: Is the Web site legal (accessibility; copyright; defamation; IPR; …)? Will the Web site be legal tomorrow, if new legislation is enacted? How will you know – who will make necessary changes?
Maintaining a Website Run a link check across the Web site. Fix broken internal links and as many external links as is reasonable. Document the link report. Run HTML (and CSS) validation checks across the Web site. Fix as many invalid pages as is reasonable. Document the findings. Run an accessibility check across the Web site. Fix as many inaccessible pages as is reasonable. Document the findings.
Maintaining a Website Address technical areas: Remove any backend scripts which are no longer needed Remember that scripts, etc. are liable to go wrong. Ensure that applications are configured to break gracefully and provide meaningful errors – tell users who to contact if they find an error
Procedures framework From start to finish: Creation and Management Manuals within Procedures Framework Key File Format Conversion Guides Digital Object Preservation Handbook: a how-to guide
Options for Ensuring Preservation Once a project is completed…………… Live, (supported) system Archived Organisational Repository Shelved Abandoned
Not Recommended…….. Abandoned May be appropriate, probably isnt, think about archiving the resource instead Shelved Dont - shelving a digital resource without active, on- going attention is highly likely to result in its loss Media degradation Software and hardware obsolescence Loss of knowledge about the resource
Recommended……. But Think About Live System Importance of functionality/interface Organisational buy-in: who is running the system, and what is their commitment to it? What will happen if the system is shut down? Is the digital resource completed or on-going? Who Pays?
Recommended…… But Think About Deposit in an Archive Is the digital resource going to a trusted archive? Are only some aspects of the resource being archived? Will it be available for others to use? Will the resource be updated in the future? Costs?
Recommended…….. But think about Establish a Repository Business model and financial plan Management and administrative processes Policies and procedures Systems and tools Software and hardware Resource curation Metadata and documentation Preservation management
Establishing Requirements A pragmatic approach – workable and achievable Preservation requirements Establish common practices, procedures and use of standards Investigate and establish hardware, systems, and tools requirements Investigate and evaluate products Business planning and costings
Developing the Architecture The architecture must support: The entire activity cycle including ingest, data management, storage, long term preservation, discovery, access and delivery All necessary security aspects Complex resources Discovery and delivery options
Summary Build in preservation right from the start Document decisions/policies/procedures Balance longevity with innovation Be ruthless about what you must keep and what can be discarded Think content and functionality Planning Its a continuous process – not a one-off