Download presentation
Presentation is loading. Please wait.
1
Digital Preservation - File Integrity
Digital Stewardship Curriculum LNW
2
Digital Preservation Review
Digital Stewardship Curriculum
3
Your set of Policies should cover your whole digital lifecycle...
Digitization - Creating materials Check it - quality control - checking file integrity Save it - preserving materials for the long term Share it - Mukurtu - one way to share get it - bring materials into your institution (donations, collecting, scanning documents at another institution) ---also creating digital content check-it - check and manage - check for quality, description. make sure they come in safely - make sure they stay safe/intact. save it - secure storage, suited to your size requirements, with potential to grow - systematically backed up (multiple copies in multiple locations), meaningful organization share it - providing access ti materials - online, exhibits, education, research
4
Digital Preservation Runs through all digital projects
Ongoing conversation with others in your institution IT, administration, other departments Digital Storage 3-2-1 backup system Storage media options Revisit storage setup every few years -Not something to think about AFTER you have all your digital files created - start your digital preservation plan now -Work with others to make sure that you are in the best situation possible, might have to explain to otehrs -explain archival needs to IT -Explain technical needs to Admin, supervisors -Integrate with what is already present -Last time, digital storage, you saw different media --- most important principles when it comes to storage (your base)
5
Again, here is the whole picture of digitial preservation.
Broke down digital preservation into 3 essential parts that any Digital Preservation Plan will have to involve File storage - where is content stored, on what media, how is it backed up File integrity - making sure that files stay the same over time, making sure that only the people responsible for them can delete or change files, File access - who has access to the files internally, what information goes along with the files, what copies are for what purpose All these elements come together to make up digital preservation - important for your institution, sustainability, grant funding, disaster planning
6
Storage Review Integrate with what IT already has
Plan for what you have now and in the future Figure out your storage needs and solutions Suggested tasks from Storage slides -Get what you need for your digital content, AND what will be created in the future -Talk to IT, they may be the ones setting you up, or may prefer a certain type of storage, figuring out funding -Many options for storage - all storage options have pros and cons - make a decision that will fit best for you (and try to avoid flash drives/single hard drives) -think about migration! Think about how your backups fit in with each other -workflows for backing up content, multiple places for backups - failsafes if one is destroyed Or “rule of 3” - 3 copies total, 2 different types of media , 1 in a different geographic location, For ALL digital material marked for long term preservation
7
Versions of your Files Preservation Master files Access copies
Web Ready copies We recomment creating these multiple versions of files – images, documents, audio and video -it makes sense for workflow -but also for keeping the master files preserved in the long term -Can decide whether you are backing up just masters, or access copies as well
8
Digital Preservation - File Integrity
Digital Stewardship Curriculum
9
File integrity is the next step in our digital Preservation pyramid, you can’t have file integrity without first having the aforementioned File (or storage) tier of the pyramid or the base of the pyramid. --So if you still have questions about storage, or don’t have a place to save files----figure this out first. If you are set with storage, your next step is File Integrity, if not, then this is a preview to start planning for the next step.
10
File integrity Ensuring your file has not changed overtime Fixity
Security Write Blockers Virus Scans File integrity (in a nutshell) is the process of ensuring the files you first produced or accepted (digitized or took in - born digital) into your digital preservation workflow are viable, usable, safe and remain unchanged overtime. File integrity is composed of 4 focus areas, Fixity; Security; Write Blocking; Virus Checks. FIXITY is your QA/QC process for the long term. You all Quality assured your files at the time of creation and Fixity is just a fancy term for verifying the file you created in 2016 is the same file you have in 2070 or Most/All folks in the Digital Preservation field check fixity by creating and verifying checksums over time to know that the file is intact and unchanged. We will talk more in depth about this area in a moment File Security: is making sure that your files are only accessible to the appropriate people, and having logs of access and changes that happen as a next step Write blockers are most important for workflows that involve content that has been identified as contentious (e.g. documents of images that might be used in a legal context) or in digital forensics works. Virus checks are exactly what they sound like and are most important if your repository takes digital files from donors or volunteers on media (hard drives or flash drives that were once not in your control). All of these areas are important to consider based on your own context but we are going to concentrate on two File integrity topics…..Fixity and Security….
11
“Fixity” Definition: Stability of a digital object
Files should remain unchanged over time Digital files can degrade without warning Focusing specifically on fixity today Digital files degrade (nothing we can do about this, harder to detect without tools in place), Much like physical objects, digital files can degrade over time, and with repeated use, and can also undergo other detrimental changes. File fixity refers to the stability, steadiness, and constancy of a digital object. This helps us ensure that a digital object is the same, unchanged before and after we view or access it.
12
Levels of Fixity Checks
Simple steps: Expected file size, expected file count Good/Better: Higher level of detail MD5 (hash algorithm) Best: Use a use tool that uses SHA1 (hash algorithm) Tools available to check! -Low level of effort, low level of detail Just gloss over Low level and SHA1, focus on what we will be learning (MD5Summer). File fixity or integrity can be checked using one or a combination of several mechanisms. The simplest forms of file fixity are expected file size and expected file count. These checks are used on a regular basis when large amounts of files are being transferred between file systems. These fixity tools are the easiest to perform, as these checks can be easily read without specialty software, but also generate the least amount detail about the integrity of the files being checked. As you move up the food chain of fixity checking you see that with a little more effort you can get a high level of detail using a cryptographic hash algorithm. The lowest level of hash algorithm is MD5, it has the ability to, at least in all of our cases, uniquely identify a file with a string of letters and numbers and is relatively lightweight when it comes to computer overhead. The highest level of detail available is by using an SHA1 or SHA256 has algorithm…… these two algorithms have a higher level of detail because each file is represented by a longer string of letters and numbers….. The differences between the MD5 and the SHA is the difference between a 300DPI scan and a 600 or 700 DPI scan.
13
How to check/store fixity info
Standalone software applications MD5Summer, Fixity, Jacksum, BWF Metaedit Integrated into content management systems Dspace, Islandora, Rosetta AM OK great! I want to start checking and storing Fixity information -- How do I do it! Well, I am glad you asked. Thankfully there are many standalone software applications that can create and manage checksums as well as many integrates solutions in content management systems like Dspace, Islandora and Rosetta. Some of these standalone apps like MD5Summer -- will store the fixity information in the same directory as the files that are being checked. Fixity information can also be added to the metadata that is initiallly created for your collection as it is digitized - much like creator or source information. Other programs like the cleverly named Fixity by AV Preserve or the integrated content management system options will store the checksum in databases (or in text logs)which is managed through the software application. Remember to strip it down to the essentials so a one trick pony and an integrated solutions -- Let’s look at a quick example…..
14
MD5Summer Behind me is a screenshot of a checksum generator (MD5Summer) in action. This screenshot shows the directory that is being checked, the filenames that are created and the accompanying MD5 checksum or Hash. MD5Summer stores a text file which stores all this information right alongside the files being
15
Alongside other metadata
Where is this info (checksum) stored? It can be treated as a part of the technical or preservation metadata and stored within the metadata record in the repository or content management system where the digital object resides. The screenshot below is from a Dspace repository that created a hash upon ingestion. Some folks try to store fixity information within the object itself. This is a complicated endeavor as it means the fixity check must only select certain parts of the file to feck file integrity or a wrapper based file format like MXF could be used. This is not for the faint of heart.
16
Within Fixity logs or Fixity DB
External fixity logs can be generated by software packages that include all component parts of a fixity check, the location of the file on the filesystem, the resultant hash of that file, the last time fixity was checked for that file and whether it passed. The screenshot below is from a Fixity report generated by the Fixity software distributed by the AVPreserve. .
17
When to Check Fixity Run a checksum tool When created
At regular intervals (monthly, weekly) During a change (transfer, recovery)
18
Security Know who has access to your files
Have policies and/or technology in place to restrict access to appropriate people File System Logging (Something to discuss with IT) Example: MASC utilizes a staging and a preservation storage area. Logging: This can be done using Operating system logging and Network level credentialing or standalone software…... While logging is ideal… it can be harder to manage lost of information and sorting the useful information from the not..…
19
Integrity - People and Questions
Who do I need to talk to? IT Department Staff in your department Anyone who is responsible for ingest of new files, management of files Whoever is responsible for system or network security and department/collection security Staff in your department need to know problems to look out for, and the correct procedures for preventing and dealing with issues
20
Integrity - Information
What do I need to know or find out? Does your department /IT already check fixity? Who on your staff (or other departments) has permissions to view/edit/delete digital files? Are regular virus checks run on computers? What about when new digital collections come in?
21
DP Activities List - Integrity
[Initial Activities] -Inventory existing digital content -Research tools, equipment, other policies you may want to use in your organization [Upon Ingest or File Creation] -Run virus checks -Run fixity checks LNW
22
DP Activities List - Integrity
[Regularly] -Run fixity checks [Less frequently] -Research new tools, equipment, or policies that you may want to use in your organization [Disaster] -Assess what loss or damage has occurred
23
DP File Integrity in Policies
What policy or document will relate to file integrity? Digital Preservation Plan or Policy Quality Control Workflows Workflows for managing and transferring files for staff working with digital content
24
Quality Control When digitizing...
Digitize ONCE, at the highest possible quality Copy, edit, and share lower quality access copies Preservation master Access copy Web ready copy -Quality Control INCLUDES, digital preservation steps, but there is also more -Since you are putting so much time and money into digitizing - you want it to be done right the first time -Digitize once, at highest possible quality...from there, then you can create copies - -We talked about this concept of having different versions of your files back in August
25
Quality Control Review
Two or more step process Design for many stages of your workflow Develop from beginning Include Quality Control into your policies Sample large projects -not just checking once - hopefully multiple people can be involved, someone not involved in the digitization -Not just for digitization, but when you complete metadata, upload to your sharing platform… -THis is something that should be implemented at a higher level - put in policies and adapt to projects -----Digitization policies - at all levels instit. and project - statement -----Depends on which overlap (access and use, digitization, digital preservation) -most important to have steps in your projects -Figure out what the best sample size is - 10%, 20%, 50%? Really important -- approval for making public, metadata - checking cultural accuracy to typos, image/audio/video quality Making sure all items stand up to the STANDARDS that you set out at the beginning of the project
26
Create your Quality Control Workflow
Walk through all steps and make sure QC is built in Match goals of the project to QC Be consistent! Control your QC environment Document workflow and results -Figure out all the steps in the project, and figure out where QC will be needed -might be different for different project (one list for a project scanning archival copies, a different list for a metadata project, a different list for filming a video) -check the same way, each time -this idea leads to making sure your environment is the same each time - audio (headphones, speakers) images (calibrated monitor) -to do all these steps, you need to document
27
When to Perform QC Checking while digitizing After digitizing a batch
Metadata When transferring to storage (checksum time!) After uploading or preparing to share Second pass by other staff or supervisor LBHC example
28
3-3-3 Digital File Integrity Action Plan
Fill out worksheet People What you know What to find out/accomplish DP activities checklist (explain that this is just storage)
29
Digitization Policy – Structure/First Steps
Selection Criteria What you want/don’t want Where your materials are coming from Criteria for Digitization Purpose/Goals/Objectives Legal, Cultural, Ethical, Social and Permissions Criteria for Access and Use Who, What, Where, When, How?
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.