Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building an Audio Preservation System at Indiana University Using Standards and Best Practices Mike Casey, Archives of Traditional Music Jon Dunn, Digital.

Similar presentations

Presentation on theme: "Building an Audio Preservation System at Indiana University Using Standards and Best Practices Mike Casey, Archives of Traditional Music Jon Dunn, Digital."— Presentation transcript:

1 Building an Audio Preservation System at Indiana University Using Standards and Best Practices Mike Casey, Archives of Traditional Music Jon Dunn, Digital Library Program Jenn Riley, Digital Library Program April 14, 2008

2 The Problem Numbers Degradation Obsolescence

3 May 4, 2015 Plus many more! Audio/Video at IUB AAAMC Music Library University Archives ATM HPER Radio/TV Center for the Study of History and Memory Kinsey Institute Athletics Emeriti House Office of University Marketing Wells Library AISRI Office of Dean of the Faculties Lilly Library Alumni Relations School of Journalism School of Music School of Law Traditional Arts Indiana Department of History Department of Anthropology Department of Folklore/Ethnomusicology Black Film Center/Archive

4 By the Numbers ATM: 110,000 (mostly) audio recordings Wells Library, Kent Cooper Room: 20,000 videos Music Library: 137,000 audio recordings *2,000 lacquer discs *8,000 DATs *50,000 open reel tapes CSHM: 3,200 audio recordings AAAMC: 5,900 audio recordings








12 Obsolescence Audio formats Equipment (playback machines, test devices) Repair parts Playback expertise Repair expertise Tools Supplies

13 Preservation in the Analog Domain Life expectancy critically important Predicting when a recording will fail Quest for the eternal carrier Target preservation format-mastering-quality open reel tape Standards set in the mid-1980’s-ARSC/AAA

14 The New Paradigm Eternal sound carriers never available Maintaining equipment long-term unmanageable Therefore, classical preservation strategy is hopeless

15 The New Paradigm Preserve the content, not the carrier The eternal file, not the eternal carrier Use digital mass storage systems Longevity of carriers in mass storage systems of minor importance

16 Standards and Best Practices Ensure Quality Provide Philosophical/Ethical Foundation Encourage Sustainability Foster Interoperability Provide a Migration Path

17 Preserving Digital Information Advantage: Digital information may be copied without degradation Disadvantage: Digital information requires active management in order to remain accessible

18 Risks of Digital Information: Bit Loss Degradation of physical media –Optical, magnetic Damage or theft of physical media Media obsolescence –Ability to read physical media –Ability to read logical media format

19 Risks of Digital Information: Semantic Loss Even if the bits are intact, can a file still be understood? File format obsolescence Loss of context –Insufficient metadata

20 Risks of Digital Information: Integrity How do we know whether or not information has been altered, whether intentionally or unintentionally?

21 Methods of Mitigating Risks Migration –Migration of data to new physical media –Migration of data to new file formats Replication –Multiple copies of data in multiple locations Validation –Retain checksums for files, routinely retrieve files and compare against checksums

22 Scaling Digital Preservation Migration, replication, and validation require: –Automated processes –Ongoing monitoring, management, and planning –Ongoing funding for technology refresh

23 Digital Repositories Centrally-managed systems for storage (and delivery) of digital information Leverage economies of scale for storage and management costs Support preservation integrity functions (migration, replication, validation) Much easier to manage than many little pockets of digital information

24 OAIS: Open Archival Information System ISO Standard 14721:2003 Origins in space science community Conceptual framework for an archival system dedicated to preserving and maintaining access to digital information over the long term Basis for much work on digital preservation within the library and archive community

25 OAIS Reference Model

26 Preservation Packages in OAIS Preservation package –Digital content plus metadata SIP: Submission Information Package AIP: Archival Information Package DIP: Dissemination Package

27 From OAIS to Trusted Digital Repositories 2002 OCLC-RLG task force report: –Trusted Digital Repositories: Attributes and ResponsibilitiesTrusted Digital Repositories: Attributes and Responsibilities What are the attributes of a trusted repository? –OAIS compliance –Administrative responsibility –Organizational viability –Financial sustainability –System security –Procedural accountability

28 Trusted Digital Repositories: Auditing and Certiciation Digital Repository Audit Method Based on Risk Assessment (DRAMBORA) – Trustworthy Repositories Audit & Certification (TRAC): Criteria and Checklist –OCLC/NARA/CRL report –

29 Archives of Traditional Music Established 1948 110,000 recordings 1890s to present Field—30% World music traditions Endangered/extinct world languages

30 Sound Directions Digital Preservation and Access for Global Audio Heritage Collaboration between Harvard University and Indiana University Phase 1 an R&D project funded by NEH Focus on preservation

31 Sound Directions Digital Preservation and Access for Global Audio Heritage Project Partners Archives of Traditional Music, Indiana University Archive of World Music, Harvard University Harvard College Library Audio Preservation Services Digital Library Program, Indiana University Office for Information Systems, Harvard University

32 Sound Directions Digital Preservation and Access for Global Audio Heritage Objectives Research best practices in areas without standards or best practices Develop best practices to meet existing and emerging standards Test existing and emerging standards/best practices with a real world project

33 Sound Directions Digital Preservation and Access for Global Audio Heritage Results  Publication— Sound Directions: Best Practices for Audio Preservation  Development of audio preservation system  Software tools  Preservation of field collections


35 Sound Directions Digital Preservation and Access for Global Audio Heritage Project Future “Preservation” Phase funded by NEH Increase throughput Simultaneous transfer Indiana automation Release ATMC Develop new access system for field collections

36 Migration decision Workflow management Workflow management / scheduling Cleaning or physical restoration as needed System / Project Planning & Development Funding Personnel / Vendor Equipment Software Tools Creation / maintenance of software and scripts Selection for Preservation Assess research value Evaluate condition Consider political, technical, and other issues Establish priorities Digitization Analog playback A/D conversion Creation of Preservation Master Files Local filenames Digitization Technical metadata Structural metadata Checksums Quality control Local storage solution Post-Transfer Processing Quality control Generation of derivatives Marking areas of interest in files Signal processing (if appropriate) Preliminary Work / Pilot Project Exploratory transfers and metadata collection Quality control Reassessment of digitization plan Collection Setup Gather and assess documentation Evaluate collection needs / condition Assess cataloging / descriptive metadata issues Develop digitization plan Assess and calibrate equipment Ingestion into / Copy to Long-Term Storage Solution Preservation packages Periodic Evaluation Data integrity checking Format obsolescence analysis Migration New carrier New format

37 Common sense definition of a system: Set of interacting units or elements Forms an integrated whole Performs a function

38 A few basic principles… Each element/part affects the whole Whole is greater than sum of parts Inputs and outputs Equifinality

39 What should we preserve? Selection for Preservation Analysis of research value Evaluation of preservation condition and risk

40 Data Collection/Analysis Research Value Score Condition/Risk (FACET) Score Combined Selection Score Collection Ranking Curatorial Review Selection for Preservation

41 FACET Software tool—point-based, collection level Analyzes data on condition of field formats Returns a risk assessment score

42 FACET Package Software Formats document Procedures manual FACET worksheets



45 Where should preservation work be done? In-house or outsource? Issues: studio space, technical expertise, amount of work, future location of expertise Critical listening spaces Development of preservation studio


47 Who should do preservation transfer work? Audio engineer Importance of analog playback stage Audio examples

48 Who and Where Best Practices Use audio engineers in the workflow where their skill is required Critical listening environment Use cleanest, most direct signal path to converter Instant comparison from playback machine and post A/D converter Test/calibration chain

49 What is the target preservation format? Digital file Broadcast Wave Format (BWF or BWAV) Preservation involves a long-term responsibility to the digital file

50 What do we look for in a file format? Disclosure Adoption Transparency Self-documentation External dependencies Impact of patents Technical protection mechanisms

51 Broadcast Wave Format Audio file format based on.wav files EBU 1996 for the exchange of files Non-proprietary Recommended by IASA, AES, NARAS, Sound Directions for preservation “Chunk” for metadata residing with the file Time stamp

52 Broadcast Wave Format Metadata elements include: Description of the sound sequence Name of the originator Date/time Coding history (signal chain components) Format independent, sample accurate time stamp “Catastrophic” metadata


54 How do we define the files we create? What is in them? How are they created? What do they represent?

55 Preservation (Archival) Master Files Best Practice Documents Unmodified No subjective alterations or improvements Preserve history, not re-write it As true to the original source as possible

56 Preservation (Archival) Master Files Complete, unaltered stream from playback machine Carrier of raw material from transfer No editing, signal processing, data reduction, gain manipulation, announcements (slates) 24 bit, 96 kHz

57 Preservation (Archival) Master Files Best Practices Define purpose of every digital file Written guidelines on characteristics of files Written guidelines on “technical” and content edits Maintain common reference timeline

58 Data Integrity  Data integrity checking  “Checksums”  MD5 hash or algorithm A7F1DAD8A7BF5E88EF44495E19683B18 *atm_01007_cass6936_010101_pres_20080228.wav

59 Data Integrity All files with enduring value As soon as possible Critical metadata stored in database and in preservation package Verify before trusting A7F1DAD8A7BF5E88EF44495E19683B18 *atm_01007_cass6936_010101_pres_20080228.wav

60 How do we make the preserved content understandable and manageable? Descriptive Metadata Administrative—Technical Metadata Administrative—Digital Provenance Administrative—Rights Management Structural Metadata

61 Audio Technical Metadata Collector (ATMC) Enter/edit technical and structural metadata Audio object and process history metadata Enter/edit audio object evaluations Parse files to collect metadata






67 Quality Control and Assurance Quality control vs. quality assurance QC at ATM: aural, visual, software tools Collection setup—preliminary transfers Role of permanent staff QA at ATM

68 How do we store the data immediately after capture? Local, interim storage Backup copies at each stage ATM NAS Additional redundant copy

69 Director Project Development Selection for Preservation Archivist Selection Preview Collections QC Documentation Librarian Cataloging Issues Associate Director Project Management Selection—Format Issues Scheduling Coordination QC R&D Audio Engineer Preservation Transfer Preservation Master Files Technical MD Collection Checksums BWAV MD ADL’s Signal Processing Project Assistant Content Division Production Masters QC ADL’s Workflow Management Collection Setup Ingestion Process Programmer Software/Script Development Digital Library Program Preservation Repository Services Deliverables Access System

70 The Role of Metadata in Digital Preservation

71 What is metadata? “The stuff we need to know in order to discover and manage data over the long term” Here’s a better definition: “Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource.” NISO. “Understanding Metadata.” 2004.

72 Metadata standards Standards define mutually agreed-upon: –Definitions of key terms –“Fields” of data to record –Rules for structuring data in these fields In this area, generally expressed in XML Allow us to benefit from community experience Promote preservation by providing for more predictable data

73 Evaluating metadata standards Good fit for the type of material I have? Supports my access/management/preservation needs? Are there existing tools to help me create it? Has it been used before in similar situations? Who maintains it? How quickly are the standards in this environment changing?

74 Creating metadata Generally not done by humans encoding data directly in the storage format Instead: –Humans use tools designed for specific purposes –Derived computationally from the digital resource itself

75 Technical metadata Tracks properties of a digital file necessary for its rendering and processing Can also include data about the circumstances of creation of a digital file Often format- or media-specific Much can be generated automatically from digital file

76 Digital provenance metadata Tracks the history of a set of related digital files –Can include the methodology by which the “master” file was created from an analog source (overlap with technical metadata) –What transformative processes have been applied to the file –Relationship of “derivative” files to the “master”

77 Structural metadata Documents relationships within and between digital files –Locating the same intellectual content on multiple representations –Noting points of interest within a single resource –Grouping and sequencing multiple files that make up a logical whole

78 Rights metadata Covers legal, moral/ethical, financial rights over resources –Rights holders –Copyright status –Conditions on access –Usage fees/royalty payments Can be in human- or machine-readable format

79 Descriptive metadata Like “cataloging” Allows users and collection managers to find and identify resources of interest Factual information such as creator, date created, running time (overlap with technical metadata) Constructed information such as title Subjective information such as topic, genre

80 Preservation metadata Some overlap with technical and process history metadata Catch-all for all the metadata we need to support the preservation process that’s not recorded elsewhere Most important feature: tracking events that occur during the preservation process

81 Preservation Packages

82 Types of preservation packages According to OAIS: –Submission information package (SIP) –Archival information package (AIP) –Dissemination information package (DIP) The AIP is what is stored (potentially broken up into pieces) in the IU repository Metadata Encoding and Transmission Standard (METS) used to wrap various pieces together

83 Information representation Repository needs two simultaneous views of the content it manages –Physical files –Functions the repository needs to support

84 Technical metadata Audio Engineering Society, Core Audio Schema Draft. AES X098-B/SC-03-06. Also record for analog source object!

85 Digital provenance metadata Audio Engineering Society, Audio Processing History Draft. AES X098-C/SC-03-06.

86 Structural metadata (1) Audio Engineering Society, Audio Decision List. AES 31-3 and Metadata Encoding and Transmisson Standard (METS), section

87 Structural metadata (2) Audio Engineering Society, Audio Decision List. AES 31-3 and Metadata Encoding and Transmisson Standard (METS), section

88 Rights metadata For field audio collections, the ATM knows: –Collector –Terms of deposit governing access This area still under develop for the IU repository No decision yet on metadata format; need more thorough analysis of the functions this metadata needs to support

89 Descriptive metadata (1) MARCXML

90 Descriptive metadata (2) METS reference to external Word document

91 Preservation metadata Still under investigation for IU repository, for all formats of material Will need to implement before any preservation events occur Will likely use PReservation Metadata Implementation Strategies (PREMIS) data dictionaries and schema

92 Need to share Copies in multiple repositories can help ensure preservation Sound Directions did a test exchange of content between IU and Harvard –Different repository architectures –Different preservation package structures...demonstrated how different levels of preservation are possible

93 Two Repositories Supported by the Digital Library Program IUScholarWorks Repository –“Institutional Repository” For preserving and providing access to IU’s research output: articles, papers, etc. –Based on DSpace software IU Digital Library Repository –General-purpose digital content repository –Based on Fedora software

94 Fedora Flexible Extensible Digital Object Repository Architecture Open source digital repository software developed by Cornell and the University of Virginia Supported by new organization: Fedora Commons Basis for IU Digital Library Repository

95 Moving Content to a Digital Repository – Idealized Workflow Master audio files in MDSS Delivery audio files on streaming server Metadata records on disk ATMC/Audio Workstation Upload preservation package Temporary Server Disk Storage Fedora Repository Validate and ingest

96 IU Massive Data Storage System (MDSS) Hierarchical storage management –Some storage on hard disks –Much more storage on automated tape Managed by UITS Research Technologies Servers in Bloomington and Indianapolis connected via I-Light high-speed fiber link Total capacity: 2+ petabytes Need to build Fedora-MDSS connection

97 Repository Status Fedora is running in production –Supporting access to image and text collections –Experiments with loading audio and video Need to improve tools for ingest and retrieval to support audio projects Not yet a true preservation repository

98 Toward a Preservation Repository Need to add: –File integrity validation –Integration with MDSS – replication of data –Eventually, file format obsolescence monitoring and migration Self-audit and/or external certification as Trusted Digital Repository –DRAMBORA, TRAC

99 Access Systems Variations2 – –Provides access to cataloged commercial recordings from the Music Library and ATM Need access system to provide discovery and delivery of field collections and other types of archival audio collections

100 Questions?

Download ppt "Building an Audio Preservation System at Indiana University Using Standards and Best Practices Mike Casey, Archives of Traditional Music Jon Dunn, Digital."

Similar presentations

Ads by Google