Digital Preservation What, Why, and How? Dan Albertson’s Digital Libraries Class April 13, 2016 Jody DeRidder Head, Metadata & Digital Services University of Alabama Libraries
What is Digital Preservation? The effort to ensure long- term access to digital content.
Why is it an issue? Hardware and software keep changing.
Why is it an issue? Computers and media … keep changing. Selection of obsolete media at the 2010 National Book Festival, by wlef70, on Flickr
Why is it an issue? Operating systems … keep changing.
Why is it an issue? Software we use to open, use or make files … keeps changing.
Why is it an issue? File types and versions … keep changing.
Things that might matter… Research & scientific data
Things that might matter… Scholarly works
Things that might matter… Cultural history
Things that might matter… Databases
Things that might matter… Websites
What happens if we wait? What we need or want may soon be impossible to access.
What happens if we wait? Legal ramifications
What happens if we wait? No previous information on which to build new knowledge
What happens if we wait? Cultural memory loss
What happens if we wait? Digital Dark Age
Where do we start?
YOU Will Become a Guide! Learn HOW to learn more – constantly!
Digital Preservation is a SWAMP
Three Existing Maps Open Archival Information System (OAIS) Model Digital Curation Centre (DCC) Curation Lifecycle Model Digital Preservation Outreach & Education (DPOE) Modules
Areas for best practices: OAIS "Reference Model for an Open Archival Information System (OAIS)," 2012.
Areas for best practices: DCC Digital Curation Centre (DCC):
Areas for best practices: DCC Digital Curation Centre (DCC):
identify select store protect manage provide Digital Preservation Outreach & Education DPOE Baseline Modules: Intro, version 2.0, Nov 2011
What we need to do Creating / Receiving Identification Appraisal & Selection Ingestion & Preservation Actions Storage & Protection Providing access
What we need to do Creating / Receiving Identification Appraisal & Selection Ingestion & Preservation actions Storage & Protection Providing access
Creating & Receiving Use open source software, such as Open Office PDF/A Open format exports of databases Prefer archival formats & collect metadata Document, document, document
What we need to do Creating / Receiving Identification Appraisal & Selection Ingestion & Preservation actions Storage & Protection Providing access
Identification Best Practices: 1: Don’t change the original! 2. Protect your computer 2: Collect metadata during receipt 3. Verify formats 4. Extract and store technical metadata PRONOM registry of file formats:
Areas for best practices Creating / Receiving Identification Appraisal & Selection Ingestion & Preservation actions Storage & Protection Providing access
Appraisal & Selection Guidelines: Digital Preservation and Outreach Education (DPOE): does the content have value? does it fit your scope? is it feasible for you to preserve the content? is it technically and legally possible to make the content available? NARA :
Appraisal & Selection At UA Libraries, 5 levels of support Level 1 : Full support (content we create) Level 2: ETDs (Support dependent on formats) Level 3: Bit level support only Level 4: Short term access support Level 5: Temporary storage only
Appraisal & Selection Tools: BitCurator: DigiPres Commons:
What we need to do Creating / Receiving Identification Appraisal & Selection Ingestion & Preservation actions Storage &Protection Providing access
Ingestion & Preservation Actions Extract and generate preservation metadata Relate all metadata to files Verify and validate digital objects Normalize file formats/structures
Ingestion & Preservation Actions Preservation Metadata includes: Descriptive Structural Administrative: o Rights o Technical PREMIS:
Ingestion & Preservation Actions Formats & Transformation: Consider significant properties Normalize to archival formats Test results Retain original Sustainability of Digital Formats (LOC):
FITS for Validation File Information Tool Set (FITS) A wrapper for multiple open source tools Compares their results Identifies, validates, extracts technical metadata Provides a single XML output Can generate partial MIX file Harvard: Ingestion & Preservation Actions
FITS for Validation Per JHOVE developer Gary McGath: Invalid = has errors that reduce functionality Not well-formed = is unusable Ingestion & Preservation Actions
MIX for Images NISO M etadata for I mages in X ML Schema Basic Digital Object Information Basic Image Information Image Capture Metadata Image Assessment Metadata Ingestion & Preservation Actions
AES57 for Audio Physical properties of analog Signal characteristics Digital file characteristics Condition comments One AES57 file per analog object Ingestion & Preservation Actions
AES57 for Audio Audio Object: format, byteOrder, block size, use, checksum, etc. Faces: startTime, duration, direction, label Regions: startTime, duration, channels, condition, security note, label Streams: channel number, position, condition, label Ingestion & Preservation Actions
Open Source Software fits2mix.pl -- Requires FITS, TIFF files, name of producer -- Generates FITS and MIX files -- OR if not valid/well-formed/TIFF, copies TIFF to repairMe directory fits2aes.pl Requires FITS, spreadsheet, WAV files -- Generates FITS and AES57 files Ingestion & Preservation Actions
What we need to do Creating / Receiving Identification Appraisal & Selection Ingestion & Preservation actions Storage & Protection Providing access
Storage & Protection Trusted Digital Repositories and Trusted Repository Audit Checklist Use for Gap Analysis: Where are we? Where do we want to be? Therefore, what gaps need to be filled?
Storage & Protection CLOCKSS: ADPNet:
Storage & Protection Plan for: Business Continuity Succession Plan Crisis Communication Cyber Incident Response IT Contingency Disaster recovery From NIST Contingency Planning Guide for Information Technology Systems, pg. 10.
Storage & Protection Disaster recovery policies Scenario 1Scenario 1: software loss (such as Acumen). Scenario 2Scenario 2: we lose all delivery content Scenario 3Scenario 3: we lose everything on the UA Libraries share drive. Scenario 4Scenario 4: partial loss of archival files with no backups. Scenario 5Scenario 5: total loss of archival files with only ADPNet content available for restoration. Scenario 6Scenario 6: total loss of archival files and all backups.
What we need to do Creating / Receiving Identification Appraisal & Selection Ingestion & Preservation actions Storage & Protection Providing access
Providing Access To the information needed Metadata, search & retrieval At the point of need When and where it is required In the form needed Emulating or transforming it as required
What we need to do Creating / Receiving Identification Appraisal & Selection Ingestion & Preservation actions Storage & Protection Providing access
1. You are setting up a digitization program at your institution, and have to determine the best formats to use for preserving images and audio. What would you recommend, and why? 2. You have been tasked with setting up a Trusted Digital Repository. Where would you go to collect information? How long do you think it would take for you to assess what would be involved in such a project? Tell us a bit about it. 3. You need to ensure that your institution is supporting at least the baseline requirements of PREMIS Preservation Metadata. What are those requirements, and how does that information have to be stored? 4. Your institution needs to capture and store web content on a regular basis. How would you go about it, and in what form would you store the captures?Exercises
5. You are the person in charge of instructing students in digital scholarship. They need to know how to capture a version of their work that they can store for future access. What metadata would you recommend? Where should it be stored? And what guidelines would you provide for selecting formats, and why? 6. You’ve been asked to work with faculty who are involved in research projects which use databases and datasets. The faculty members are required by their funders to preserve their research data. How should they prepare their database content for preservation? 7. Your institutional archives now include s, which have to be maintained indefinitely in a form that is accessible. For the moment let’s ignore the attachments. What are your recommendations for how to store all this and why?Exercises
Library of Congress: Digital Preservation. Digital Curation Centre: Resources for digital curations. The National Archives: The Technical Registry. Initial Resource Sites