Lifecycle Metadata for Digital Objects The Final Curtain December 4, 2006
Dramatis Personae Mundee Harrison Kaczmarczik Sevcik Bibb Holt Addison Cofield Keenan
MIX - Metadata for Images in XML Schema Currently under development Schema for a set of technical data elements required to manage digital image collections Useful for digitized text (page images)
Profiling the Dynamic Web Page What Is This Dynamic Business?!?! What Is This Dynamic Business?!?! The Deep End: The Database of All Databases The Deep End: The Database of All Databases Dynamic Web Pages and the Metadata Sets Who Love Them Dynamic Web Pages and the Metadata Sets Who Love Them Why Dynamic Web Pages Die Why Dynamic Web Pages Die Harvesters, Crawlers, and Extractors Harvesters, Crawlers, and Extractors Picking and Choosing Metadata Picking and Choosing Metadata Decisions, Decisions Decisions, Decisions
Sound Recording (Digitized) Use case: Student recitals recorded as analog, digitized for streaming access Challenge: Find schema that apply to musical performances and have usefulness for searching Metadata standards: mpeg-7, DC/MODS Use case: Student recitals recorded as analog, digitized for streaming access Challenge: Find schema that apply to musical performances and have usefulness for searching Metadata standards: mpeg-7, DC/MODS Susan Harwood Kaczmarczik December 4, 2006
Preserving ETDs Major Issues--electronic theses and dissertations Fonts--embedded--unrecognized--hacked? Big list of Unicode: Active features--links, fields, encryption Solutions PDF/A--too simple & still in development Multi-page TIFF + "too big to fail" Administrative Degree candidacy elements
Digitized Moving Image: VHS *High Points* Extension Schema: LOC AV Prototype dmdSec MODS amdSec techMD: VMD rightsMD: RMD sourceMD:VMD digiProvMD: PMD *Problems Encountered* Getting started Overwhelming file sizes Copyright Confusing technical terminology related to video
DSpace SIP Profile for a Born Digital Audio Music File Preservation Issues - Formats and Guidelines Controlled Vocabularies - Library of Congress Subject Headings - Getty Thesaurus of Geographic Names - MARC Value List for Relators and Roles - DCMI Type Vocabulary - ISO Extension Schemas - MODS - Creative Commons - AUDIOMD - LC-AV Audio Metadata Extension Schema
Born Digital Still Images Similar lifecycle to digitized. MD not always stored. Primarily use NISO MIX format (includes EXIF, GPS). Images are numerical representations - different image formats compress differently - some need special MD. NISO MIX contains many fields that are seemingly unimportant but may be valuable as evidence. NISO MIX also includes many fields completely unintelligible to the layman, referenced or not. The previous two factors can spell trouble if the preservationist is not an expert! EXIF would help, but there is not a 1:1 ratio of information. Metadata is meant to help understand transformation, not to “step backwards” to recreate images although this is possible with sufficient detail. Addison 4 DEC 06
Born Digital Spoken Word Oral History Audio.WAV.MP3.TXT AUDIOMD & PREMISTEXTMD & PREMIS TEI Encoding Rules of Description - Name and Date formats HASSETGetty Thesaurus of Geographic NamesLCSH
METS SIP Profile for Spreadsheets Melissa Keenan Preservation issues: –Saving formulas –Proprietary format –Open Document Format for Office Applications (ISO/IEC 26300:2006) Metadata: EAD (use case is archival) MathML (complex formulas) Automatically generated by Microsoft Microsoft.Office.Tools.Excel PREMIS