Implementing PREMIS in Container Formats Rebecca Guenther, Library of Congress Zhiwu Xie, Los Alamos National Laboratory IS&T’s.

Slides:



Advertisements
Similar presentations
A Tour of the OAIS Reference Model Brian Lavoie Research Scientist Office of Research OCLC Museum Computer Network Annual Conference September 2002.
Advertisements

METS: Metadata Encoding & Transmission Standard Merrilee Proffitt Society of American Archivists August 2002.
Preservation Rumination Priscilla Caplan, FCLA OCLC DSS February 16, 2005.
The PREMIS Data Dictionary Michael Day Digital Curation Centre UKOLN, University of Bath JORUM, JISC and DCC.
Introduction to METS (Metadata Encoding and Transmission Standard) Jerome McDonough New York University
Preservation Metadata Initiatives: Practicality, Sustainability, and Interoperability Michael Day UKOLN, University of Bath ERPANET Training.
Standards showcase: MODS, METS, MARCXML ALA Annual 2006 Rebecca Guenther and Jackie Radebaugh Network Development and MARC Standards Office Library of.
METS: An Introduction Towards a Digital Object Standard Rick Beaubien Library Systems Office U.C. Berkeley.
An Introduction June 17, 2013 Open Archival Information System (OAIS)
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
An Introduction to Preservation Metadata and the PREMIS Data Dictionary Rebecca Guenther, Library of Congress ALA Midwinter 2009 Intellectual Access to.
Understanding and Implementing the PREMIS Data Dictionary for Preservation Metadata Rebecca Guenther, Library of Congress Digital Preservation Partners’
Fedora 3.0 and METS: A Partnership for the Organization, Presentation and Preservation of Digital Objects Open Repositories Georgia Tech, Atlanta,
Mark Evans, Tessella Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22 nd May 2013 PREMIS Practical Strategies For Preservation Metadata.
Common Use Cases for Preservation Metadata Deborah Woodyard-Robinson Digital Preservation Consultant Long-term Repositories:
3. Technical and administrative metadata standards Metadata Standards and Applications.
Merrilee Proffitt e(X)literature / Digital Cultures Project April 2003 News from the Digital Library The Metadata Encoding and Transmission Standard; the.
Keeping the pieces together: The Role of METS in the Preservation of Digital Content Robin Wendler Harvard University Library January 16, 2005 [Men in.
PREMIS What is PREMIS? o Preservation Metadata Implementation Strategies When is PREMIS use? o PREMIS is used for “repository design, evaluation, and archived.
AIP Archival Information Package – Defines how digital objects and its associated metadata are packaged using XML based files. METS (binding file) MODS.
Rebecca Guenther Library of Congress
Metadata Standards and Applications 4. Metadata Syntaxes and Containers.
METS Intro & Overview Mets Opening Day Germany May 7, 2007 Nancy J. Hoebelheinrich Stanford University Libraries.
Metadata for preservation Michael Day, UKOLN, University of Bath Chinese-European Workshop on Digital Preservation,
Documenting to preserve your data: metadata in support of digital preservation Michael Day, UKOLN, University of Bath
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
3. Technical and administrative metadata standards Metadata Standards and Applications Workshop.
Addressing Metadata in the MPEG-21 and PDF-A ISO Standards NISO Workshop: Metadata on the Cutting Edge May 2004 William G. LeFurgy U.S. Library of Congress.
International Council on Archives Section on University and Research Institution Archives Michigan State University September 7, 2005 Preserving Electronic.
Metadata in support of digital preservation Michael Day, UKOLN, University of Bath Beginners Guide to Metadata:
Jenn Riley Metadata Librarian Indiana University Digital Library Program.
How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA.
Author(s): Paul Conway, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Noncommercial–Share.
ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics.
PREMIS Tutorial: Understanding & Implementing the PREMIS Data Dictionary for Preservation Metadata Rebecca Guenther, Library of Congress Brian Lavoie,
An Introduction to METS Morgan Cundiff Network Development and MARC Standards Office Library of Congress Metadata Encoding and Transmission Standard.
DAITSS: Dark Archive in the Sunshine State Priscilla Caplan, Florida Center for Library Automation DCC Workshop on Long-term Curation within Digital Repositories.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
PREMIS Rathachai Chawuthai Information Management CSIM / AIT.
Creating Archive Information Packages for Data Sets: Early Experiments with Digital Library Standards Ruth Duerr, NSIDC MiQun Yang, THG Azhar Sikander,
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
The FCLA Digital Archive Joint Meeting of CSUL Committees, 2005.
Implementation of PREMIS in METS Rebecca Guenther Sr. Networking & Standards Specialist, Library of Congress PREMIS Implementation Fair San.
Evolving MARC 21 for the future Rebecca Guenther CCS Forum, ALA Annual July 10, 2009.
Habing1 Integrating PREMIS and METS PREMIS Tutorial Implementers’ Panel June 21, 2007, 9:00-5:30 Library of Congress, Jefferson Building, Whittall.
OCLC Online Computer Library Center Preservation Metadata Standards PREMIS & METS Taylor Surface, OCLC.
Conceptual Data Modelling for Digital Preservation Planets and PREMIS Angela Dappert.
IMPLEMENTATION ISSUES. How PREMIS can be used  For systems in development as a basis for metadata definition  For existing repositories as a checklist.
Metadata for digital preservation: a review of recent developments Michael Day UKOLN, University of Bath ECDL2001, 5th European Conference.
PREMIS at the British Library Markus Enders, The British Library PREMIS Implementation Fair, San Fransisco, CA 07 October 2009.
PREMIS Tutorial: Understanding & Implementing the PREMIS Data Dictionary for Preservation Metadata Rebecca Guenther, Library of Congress Olaf Brandt, BStU.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
PREMIS Data Dictionary and the Future of Preservation Metadata Brian Lavoie Research Scientist OCLC Research Society of American Archivists.
The OAIS Reference Model Michael Day, Digital Curation Centre UKOLN, University of Bath Reference Models meeting,
Preservation metadata and the Cedars project Michael Day UKOLN: UK Office for Library and Information Networking University of Bath
Preservation Metadata Initiatives: Status and Direction Brian Lavoie Senior Research Scientist Office of Research OCLC Archiving Web Resources Canberra.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
The OAIS Reference Model and Trustworthy Repositories Josh Lubell Manufacturing Engineering Laboratory NIST
Cedars work on metadata Michael Day UKOLN, University of Bath Cedars Workshop Manchester, February 2002.
An Introduction to PREMIS Jenn Riley Metadata Librarian IU Digital Library Program.
Joint Meeting of CSUL Committees,
Building A Repository for Digital Objects
DAITSS: Dark Archive in the Sunshine State
Integrating PREMIS and METS
Rebecca Guenther, Library of Congress Brian Lavoie, OCLC
Metadata for preservation
Metadata in Digital Preservation: Setting the Scene
Oya Y. Rieger Cornell University Library May 2004
Open Archival Information System
Introduction to METS (Metadata Encoding and Transmission Standard)
Presentation transcript:

Implementing PREMIS in Container Formats Rebecca Guenther, Library of Congress Zhiwu Xie, Los Alamos National Laboratory IS&T’s Archiving 2007 Arlington, VA, May 23, 2007

OUTLINE  Introduction  OAIS Reference Model and Containers METS MPEG-21 DID  Implementing PREMIS in METS  Implementing PREMIS in MPEG-21 DID  Summary

Digital preservation: imperative and challenge  More and more of scholarly and cultural record exists in digital form; steps must be taken to secure its long-term future  Significant progress has been made in raising awareness about digital preservation imperative  Shift in focus from articulating problem to solving it … Not so much “Why is digital preservation important”, but “What must be done to achieve preservation objectives?”  Many practical challenges in implementing reliable, sustainable digital preservation programs One key challenge: preservation metadata

PREMIS Working Group  June 2003: OCLC, RLG sponsored new international working group: PREMIS: Preservation Metadata: Implementation Strategies  Membership: > 30 experts from 5 countries, representing libraries, museums, archives, government agencies, and the private sector Co-Chairs: Priscilla Caplan (FCLA), Rebecca Guenther (LC)  Objective 1: Identify and evaluate alternative strategies for encoding, storing, managing, and exchanging preservation metadata PREMIS Survey Report (September 2004) Snapshot of current practices/emerging trends related to managing and using preservation metadata in digital archiving systems  Objective 2: Define implementable, core preservation metadata, with guidelines/recommendations for management and use

PREMIS Data Model Intellectual Entities Objects Rights Agents Events

PREMIS Data Dictionary  May 2005: Data Dictionary for Preservation Metadata: Final Report of the PREMIS Working Group  237-page report includes: PREMIS Data Dictionary 1.0 Context/assumptions, data model, usage examples  Set of XML schema to support implementation  Data Dictionary: Comprehensive view of information needed to support digital preservation Guidelines/recommendations to support creation, use, management Used Framework as starting point Based on deep pool of institutional experiences in setting up and managing operational capacity for digital preservation  Received the 2005 Digital Preservation Award (UK) and 2006 Society of American Archivists Publication Award

Some guiding principles …  “Implementable, core, preservation metadata”: “Preservation metadata”: maintain viability, renderability, understandability, authenticity, identity in a preservation context “Core”: What most preservation repositories need to know to preserve digital materials over the long-term “Implementable”: rigorously defined; supported by usage guidelines/recommendations; emphasis on automated workflows  “Technical neutrality”: Digital archiving system: no assumptions about specific archiving technology, system/DB architectures, preservation strategy Metadata management: no assumptions about whether metadata is stored locally or in external registry; recorded explicitly or known implicitly; instantiated in one metadata element or multiple elements Promotes flexibility, applicability in wide range of contexts

Scope  What PREMIS DD is: Common data model for organizing/thinking about preservation metadata Guidance for local implementations Standard for exchanging information packages between repositories  What PREMIS DD is not: Out-of-the-box solution: need to instantiate as metadata elements in repository system All needed metadata: excludes business rules, format- specific technical metadata, descriptive metadata for access, non-core preservation metadata Lifecycle management of objects outside repository Rights management: limited to permissions regarding actions taken within repository

PREMIS Maintenance Activity  Web site: Permanent Web presence, hosted by Library of Congress Central destination for PREMIS-related info, announcements, resources Home of the PREMIS Implementers’ Group (PIG) discussion list  PREMIS Editorial Committee: Set directions/priorities for PREMIS development Coordinate future revisions of Data Dictionary and XML schema Membership: Library of Congress, OCLC, FCLA, National Archives of Scotland, British Library, National Library of Australia, U. of Goettingen, LANL, Ex Libris, Library Archives Canada

OAIS Reference Model and Containers

OAIS Reference Model  Developed by the Consultative Committee for Space Data Systems (CCSDS)  ISO 14721:2003  A functional model for preservation activities  An information model specifying types of information required for long-term preservation

OAIS Reference Model and PREMIS  OAIS reference model specifies the Preservation Description Information (PDI)  PREMIS used the OAIS information model as a starting point  PREMIS Data Dictionary consolidated and further developed the conceptual types of information objects into more than 100 structured and logically integrated semantic units.  PREMIS Data Dictionary provided detailed descriptions and guidelines to implement these semantic units.  PREMIS Data Dictionary does not provide semantic units for Intellectual Entities, but provides semantic units to link to other metadata sources for Intellectual Entities  All entities have reference (identification) information.  No “packaging information” that links content with metadata, but PREMIS can be used with container schemas  PREMIS deals mostly with representation, context, provenance, and fixity information, in keeping with PREMIS definition of preservation metadata.

PREMIS XML schemas  One schema for each PREMIS entity in data model Allows user to choose which parts of PREMIS to use  PREMIS container schema References schema for each entity type Provides a container if it is desirable to keep some or all PREMIS metadata together If using container requires at least an object which in turn requires objectIdentifier and objectCategory Individual schemas may used alone or with container  Semantic units in PREMIS schemas XML is faithful to data dictionary Only those units mandatory for all categories of objects are mandatory in object schema

Need a Container in the XML Implementation  Archival Information Package (AIP) may include much more metadata besides the preservation metadata  A well defined container is usually necessary to group and appropriately associate these metadata with the data object  For example: METS or MPEG-21 DID

 METS records the (possibly hierarchical) structure of digital objects, the names and locations of the files that comprise those objects, and the associated metadata  A METS document may be a unit of storage (e.g. OAIS AIP) or a transmission format (e.g. OAIS SIP or DIP)  METS is extensible and modular  METS uses extension “wrappers” or “sockets” where elements from other schemas can be plugged in  METS uses the XML Schema facility for combining vocabularies from different Namespaces  The METS Editorial Board has endorsed PREMIS as an extension schema  Many institutions trying to use PREMIS within the METS context

The structure of a METS file

Archival Information Package Descriptive Information Content Information described by derived from delimited by identifies further described by Representation Information Data Object Semantics Provenance Information Reference Information Fixity Information Context Information Preservation Description Information Packaging Information Structure described by premis:event MODS MARCXML DC premis:object metsRights premis:rights File formatspremis:object textMD MIX OAIS and METS Legend Black Arial = OAIS Red Times New Roman = METS Primary Schema Blue Times New Roman Italics = Extension Schema

METS extension schemas  “wrappers” or “sockets” where elements from other schemas can be plugged in  Provides extensibility  Uses the XML Schema facility for combining vocabularies from different Namespaces  Endorsed extension schemas: Descriptive: MODS, DC, MARCXML Technical metadata: MIX (image); textMD (text) Preservation related: PREMIS

Issues in using PREMIS with METS  Which METS sections to use and how many  Whether to record elements redundantly in PREMIS that are defined explicitly in the METS schema  How to record elements that are also part of a format specific technical metadata schema (e.g. MIX)  Recording structural relationships  How to deal with locally controlled vocabularies  Whether to use the PREMIS container

PREMIS and METS sections  Flexibility of METS requires implementation decisions  You can’t put all PREMIS metadata directly under amdSec  What sections to use for PREMIS metadata? Alternative 1 Object in techMD Event in digiProvMD Rights in rightsMD Agent with event or rights Alternative 2 Everything in digiProvMD Alternative 3 Everything in techMD  How many administrative MD sections to use?  Experimentation will result in best practices

Inserting technical metadata in a METS Document

SHA bc65c5b d09ad373eefd147382ecbf EchoDep/messageDigestOriginator> Elements defined in both METS and PREMIS: METS: Checksum, Checksumtype attribute of not repeatable  PREMIS: fixity also includes messageDigestOriginator allows multiples

<file ID="FID1" ADMID="TMD1PREMIS DP1EVENT DP1AGENT“ MIMETYPE="image/jpeg" <techMD ID="TMD1PREMIS“ image/jpeg 1.02 Elements defined both in METS and PREMIS: METS: MIMETYPE attribute of optional  PREMIS: more granular; includes name and version (although name may be MIMETYPE) mandatory

ECHODEP Hub Event echo12345 ECHODEP Hub Event echo12345 ingestion T15:12:53 Elements defined both in METS and PREMIS  METS ID/Idref: used to associate metadata in different sections and for different files  PREMIS identifiers: explicit linking between entity types

structural is sibling of UCB FID2 1 Elements defined both in METS and PREMIS:  METS: structMap details structural relationships and is the heart of the METS document hierarchical, so may be more expressive than PREMIS semantic units links the elements of the structure to content files and metadata  PREMIS: details all kinds of relationships, including structural data dictionary says that implementations may record by other means

How to record elements from 2 different technical metadata schemas  Format specific metadata may be included in addition to PREMIS general technical metadata  Use multiple techMD sections and specify source in MDType attribute and/or namespace declaration e.g. MDTYPE=“NISOIMG” or “PREMIS” Give MIX schema declaration in METS document  MIX was recently revised to correspond with the revision of the Z39.87 technical metadata for digital still images standard; names harmonized with corresponding PREMIS semantic units  For digital still images, best practice may be to use PREMIS for general semantic units defined in PREMIS and MIX for format specific units without redundancy

MPEG-21 Digital Item Declaration (DID)  ISO/IEC : Digital Item Declaration  A promising alternative to represent Digital Objects  Starting to get supported by some repositories, e.g., aDORe, DSpace, Fedora  A flexible and expressive model that easily represents compound objects (recursive “item”)  Attach well-formed XML from persistent namespaces as metadata  Strong industry support

Abstract Model for MPEG-21 DID resource component descriptor/statement item container resource: datastream component: binding of descriptor/statements to datastreams item: represents a Digital Item aka Digital Object aka asset. Descriptor/statement constructs convey information about the Digital Item container: grouping of items and descriptor/statement constructs pertaining to the container

Implementing PREMIS in DID  DID abstract model is an object-centric containment model  Semantically, Descriptor/statement constructs under a certain level are the metadata “about” that level of DID container or item or component.  Descriptor/statement about the DID container should be mapped to OAIS packaging information, therefore out of the PREMIS scope  Rights, Agents, and Events in the PREMIS model are linked to the objects, but not about the objects.  However, the PREMIS metadata as a whole (premis:premis), is about an object (the target of the preservation)

Mapping resource object3object4 premis: object premis:premis DIDInfo object2 object1 DID All rights, events, and agents go here. The top level object goes here. Other objects may be duplicated here or linked here. premis: object

Partial Implementation in DID resource object3object4 premis: format premis:significantProperties premis:premis DIDInfo object2 object1 DID When metadata are not sufficient to form the top level PREMIS elements, partial implementation may be done if PREMIS elements are globally defined. premis: creatingApplication

Examples of PREMIS in XML containers  PREMIS in METS: Portrait of Louis Armstrong (Library of Congress) Portrait of Louis Armstrong  PREMIS in MPEG DID: aDORe example (LANL) aDORe example

Proposed schema changes for new version  Define an abstract object type to allow for better validation of object category (representation, file, bitstream)  Define elements and types globally to allow for reuse  Implement an extensibility mechanism to provide for further structure when needed  Implement a mechanism to use controlled vocabularies  Adjust schemas to support changes in version 2 of data dictionary

Summary: container formats  A container format is needed to package together all forms of metadata (of which PREMIS is one) and digital content  Use of a container is compatible with and an implementation of the OAIS information package concept  Co-existence with other types of metadata requires best practices for both approaches; redundancy seems to be preferred  Changes to the next version of the PREMIS XML schemas will facilitate a phased approach to full PREMIS implementation  Development of registries (informal or formal) for controlled vocabularies will benefit implementation

Summary: METS vs. MPEG 21 DID  METS and MPEG DID are similar types of container formats in that both are expressed in XML, both represent the structure of digital objects, and both include metadata  MPEG DID doesn’t have the segmentation in metadata sections that METS does, so this implementation decision need not be made in DID  METS is open source and developed by open discussion, mainly cultural heritage community  MPEG DID is an ISO standard and has industry support, but is often implemented in a proprietary way and standards development is closed  It would be possible to transform a METS container to a MPEG DID and vice versa; development of stylesheets will enable transformations