PREMIS Tutorial: Understanding & Implementing the PREMIS Data Dictionary for Preservation Metadata Rebecca Guenther, Library of Congress Brian Lavoie,

Slides:



Advertisements
Similar presentations
PRESERVATION METADATA: IMPLEMENTATION STRATEGIES Preservation Metadata: The PREMIS Experience Priscilla Caplan Florida Center for Library Automation (FCLA)
Advertisements

Preservation Rumination Priscilla Caplan, FCLA OCLC DSS February 16, 2005.
The PREMIS Data Dictionary Michael Day Digital Curation Centre UKOLN, University of Bath JORUM, JISC and DCC.
Implementing PREMIS in Container Formats Rebecca Guenther, Library of Congress Zhiwu Xie, Los Alamos National Laboratory IS&T’s.
METS: An Introduction Structuring Digital Content.
The future’s so bright…. DAITSS DIGITAL PRESERVATION SYSTEM: RE-ARCHITECTED, RE- WRITTEN, AND OPEN SOURCE Priscilla Caplan Florida Center for Library Automation.
PREMIS Conformance. Agenda 1.NLNZ and NLB conformance exercise 2.History of PREMIS Conformance 3.Current status 4.Mapping to functionality.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
An Introduction to Preservation Metadata and the PREMIS Data Dictionary Rebecca Guenther, Library of Congress ALA Midwinter 2009 Intellectual Access to.
Understanding and Implementing the PREMIS Data Dictionary for Preservation Metadata Rebecca Guenther, Library of Congress Digital Preservation Partners’
1 Extending the Implementation of PREMIS to Geospatial Resources in the Stanford Digital Repository: An Exploration By Nancy J. Hoebelheinrich Metadata.
Mark Evans, Tessella Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22 nd May 2013 PREMIS Practical Strategies For Preservation Metadata.
The Promise of PREMIS: background, scope and purpose of the Data Dictionary for Preservation Metadata Rebecca Guenther, Library of Congress Long-term Repositories:
Common Use Cases for Preservation Metadata Deborah Woodyard-Robinson Digital Preservation Consultant Long-term Repositories:
3. Technical and administrative metadata standards Metadata Standards and Applications.
PREMIS What is PREMIS? – Preservation Metadata Implementation Strategies When is PREMIS use? – PREMIS is used for “repository design, evaluation, and archived.
Keeping the pieces together: The Role of METS in the Preservation of Digital Content Robin Wendler Harvard University Library January 16, 2005 [Men in.
US GPO AIP Independence Test CS 496A – Senior Design Fall 2010 Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong.
PREMIS What is PREMIS? o Preservation Metadata Implementation Strategies When is PREMIS use? o PREMIS is used for “repository design, evaluation, and archived.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
THE RUTGERS WORKFLOW MANAGEMENT SYSTEM Mary Beth Weber Cataloging and Metadata Services Rutgers University Libraries August 3, 2007.
Descriptive Metadata o When will mods.xml be used by METS (aip.xml) ?  METS will use the mods.xml to encode descriptive metadata. Information that describes,
Introduction to Preservation Metadata
Rebecca Guenther Library of Congress
Metadata for preservation Michael Day, UKOLN, University of Bath Chinese-European Workshop on Digital Preservation,
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
3. Technical and administrative metadata standards Metadata Standards and Applications Workshop.
Statewide Digitization and the FCLA Digital Archive Priscilla Caplan, Florida Center for Library Automation Statewide Digitization Planners Meeting OCLC,
Jenn Riley Metadata Librarian Indiana University Digital Library Program.
How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA.
Author(s): Paul Conway, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Noncommercial–Share.
ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics.
Topic Rathachai Chawuthai Information Management CSIM / AIT Review Draft/Issued document 0.1.
DAITSS: Dark Archive in the Sunshine State Priscilla Caplan, Florida Center for Library Automation DCC Workshop on Long-term Curation within Digital Repositories.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
PREMIS Rathachai Chawuthai Information Management CSIM / AIT.
Creating Archive Information Packages for Data Sets: Early Experiments with Digital Library Standards Ruth Duerr, NSIDC MiQun Yang, THG Azhar Sikander,
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
The FCLA Digital Archive Joint Meeting of CSUL Committees, 2005.
Implementation of PREMIS in METS Rebecca Guenther Sr. Networking & Standards Specialist, Library of Congress PREMIS Implementation Fair San.
OCLC Online Computer Library Center Preservation Metadata Standards PREMIS & METS Taylor Surface, OCLC.
PREMIS Implementation Fair – SF 2009 PREMIS use in Rosetta Yair Brama – Ex Libris.
Conceptual Data Modelling for Digital Preservation Planets and PREMIS Angela Dappert.
The State of PREMIS Brian Lavoie Research Scientist OCLC PREMIS Implementation Fair San Francisco, CA October 7, 2009.
ETD2006 Preserving ETDs With D.A.I.T.S.S. FLORIDA CENTER FOR LIBRARY AUTOMATION FC LA PAPER AUTHORS: Chuck Thomas Priscilla.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
PREMIS Tutorial: Understanding & Implementing the PREMIS Data Dictionary for Preservation Metadata Rebecca Guenther, Library of Congress Olaf Brandt, BStU.
DAITSS: Dark Archive in the Sunshine State Priscilla Caplan Florida Center for Library Automation (FCLA)
PREMIS Data Dictionary and the Future of Preservation Metadata Brian Lavoie Research Scientist OCLC Research Society of American Archivists.
AGENTS, RIGHTS, EVENTS. Agents  The Agent entity aggregates information about agents (persons, organizations, or software) associated with rights management.
The OAIS Reference Model Michael Day, Digital Curation Centre UKOLN, University of Bath Reference Models meeting,
Preservation Metadata Initiatives: Status and Direction Brian Lavoie Senior Research Scientist Office of Research OCLC Archiving Web Resources Canberra.
Implementing PREMIS in DigiTool Michael Kaplan ALA 2007 Update.
DAITSS and the Florida Digital Archive Priscilla Caplan Florida Center for Library Automation iPRES 2006.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Cedars work on metadata Michael Day UKOLN, University of Bath Cedars Workshop Manchester, February 2002.
An Introduction to PREMIS Jenn Riley Metadata Librarian IU Digital Library Program.
2/26/2004 Dan Swaney 1 Preservation Metadata and the OAIS Information Model A Metadata Framework to Support the Preservation of Digital Objects A review.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Joint Meeting of CSUL Committees,
Preserving Digital Collections
FLORIDA CENTER FOR LIBRARY AUTOMATION
DAITSS: Dark Archive in the Sunshine State
DAITSS and the Florida Digital Archive
Statewide Digitization and the FCLA Digital Archive
Rebecca Guenther, Library of Congress Brian Lavoie, OCLC
Metadata for preservation
Metadata in Digital Preservation: Setting the Scene
Medusa at the University of Illinois
A Tale of Two Archives: Notes from the Dark Side
Presentation transcript:

PREMIS Tutorial: Understanding & Implementing the PREMIS Data Dictionary for Preservation Metadata Rebecca Guenther, Library of Congress Brian Lavoie, OCLC PREMIS Tutorial Library of Congress June 13 and 21, 2007

GOALS  Background and context of PREMIS Data Dictionary  Discuss PREMIS data model, identifiers, and relationships  Discuss semantic units defined in the Dictionary  Discuss major implementation issues  Show ways of representing PREMIS in XML PREMIS and METS  Discuss institutional experiences in working with the PREMIS Data Dictionary

INTRODUCTION: BACKGROUND AND CONTEXT

Digital preservation: imperative and challenge  More and more of scholarly and cultural record exists in digital form; steps must be taken to secure its long-term future  Significant progress has been made in raising awareness about digital preservation imperative  Shift in focus from articulating problem to solving it … Not so much “Why is digital preservation important”, but “What must be done to achieve preservation objectives?”  Many practical challenges in implementing reliable, sustainable digital preservation programs One key challenge: preservation metadata

Some background …  Pre-2002: various preservation metadata element sets released Different scopes, purposes, underlying models/assumptions No international standard; little consolidation of expertise/best practice  June 2002: Preservation Metadata Framework International working group (jointly sponsored by OCLC, RLG) Comprehensive, high-level description of types of information constituting preservation metadata Used OAIS reference model as starting point Set of “prototype” preservation metadata elements Consensus-based foundation for developing formal preservation metadata specifications … but not an “off-the-shelf, ready to implement” solution  Post-2002: Need implementable preservation metadata, with guidelines for application and use, relevant to a wide range of digital preservation systems and contexts Motivated formation of PREMIS Working Group

PREMIS Working Group  June 2003: OCLC, RLG sponsored new international working group: PREMIS: Preservation Metadata: Implementation Strategies  Membership: > 30 experts from 5 countries, representing libraries, museums, archives, government agencies, and the private sector Co-Chairs: Priscilla Caplan (FCLA), Rebecca Guenther (LC)  Objective 1: Identify and evaluate alternative strategies for encoding, storing, managing, and exchanging preservation metadata PREMIS Survey Report (September 2004) Snapshot of current practices/emerging trends related to managing and using preservation metadata in digital archiving systems  Objective 2: Define implementable, core preservation metadata, with guidelines/recommendations for management and use

PREMIS Data Dictionary  May 2005: Data Dictionary for Preservation Metadata: Final Report of the PREMIS Working Group  237-page report includes: PREMIS Data Dictionary 1.0 Context/assumptions, data model, usage examples  Set of XML schema to support implementation  Data Dictionary: Comprehensive view of information needed to support digital preservation Guidelines/recommendations to support creation, use, management Used Framework as starting point Based on deep pool of institutional experiences in setting up and managing operational capacity for digital preservation

2005 British Conservation Awards: Digital Preservation Award 2006 Society of American Archivists Preservation Publication Award

Some guiding principles …  “Implementable, core, preservation metadata”: “Preservation metadata”: maintain viability, renderability, understandability, authenticity, identity in a preservation context “Core”: What most preservation repositories need to know to preserve digital materials over the long-term “Implementable”: rigorously defined; supported by usage guidelines/recommendations; emphasis on automated workflows  “Technical neutrality”: Digital archiving system: no assumptions about specific archiving technology, system/DB architectures, preservation strategy Metadata management: no assumptions about whether metadata is stored locally or in external registry; recorded explicitly or known implicitly; instantiated in one metadata element or multiple elements Promotes flexibility, applicability in wide range of contexts

Scope  What PREMIS DD is: Common data model for organizing/thinking about preservation metadata Guidance for local implementations Standard for exchanging information packages between repositories  What PREMIS DD is not: Out-of-the-box solution: need to instantiate as metadata elements in repository system All needed metadata: excludes business rules, format- specific technical metadata, descriptive metadata for access, non-core preservation metadata Lifecycle management of objects outside repository Rights management: limited to permissions regarding actions taken within repository

PREMIS Maintenance Activity  Web site: Permanent Web presence, hosted by Library of Congress Central destination for PREMIS-related info, announcements, resources Home of the PREMIS Implementers’ Group (PIG) discussion list  PREMIS Editorial Committee: Set directions/priorities for PREMIS development Coordinate future revisions of Data Dictionary and XML schema Membership: Library of Congress, OCLC, FCLA, National Archives of Scotland, British Library, National Library of Australia, U. of Goettingen, LANL, Ex Libris, Library Archives Canada

Current activities  First revision of Data Dictionary (PREMIS 2.0) Documenting errata and proposed revisions to Data Dictionary (feedback through PIG list)  PREMIS Implementers’ Registry  Consultancies (funded by Library of Congress): Rights issues for digital preservation (Karen Coyle) PREMIS implementation guidelines and recommendations (Deborah Woodyard-Robinson)  PREMIS Tutorials: Glasgow, Boston, Stockholm, Albuquerque, Washington

DATA MODEL

The PREMIS Data Model  Data model includes: Entities: “things” relevant to digital preservation that are described by preservation metadata (Intellectual Entities, Objects, Events, Rights, Agents) Properties of Entities (semantic units) Relationships between Entities  Why have data model? Organizational convenience (for development and use) Useful framework for distinguishing applicability of semantic units across different types of Entities and different types of Objects But: not a formal entity-relationship model; not sufficient to design databases

PREMIS Data Model Intellectual Entities Objects Rights Agents Events

Intellectual Entity Examples:  Rabbit Run by John Updike (a book)  “Maggie at the beach” (a photograph)  The Library of Congress Website (a website)  The Library of Congress: American Memory Home page (a web page)  Set of content that is considered a single intellectual unit for purposes of management and description (e.g., a book, a photograph, a map, a database)  May include other Intellectual Entities (e.g. a website that includes a web page)  **Has one or more digital representations**  Not fully described in PREMIS DD, but can be linked to in metadata describing digital representation Int Entities Objects Events Agents Rights

Object Examples:  chapter1.pdf (a file)  chapter1.pdf + chapter2.pdf + chapter3.pdf (representation of a book w/3 chapters)  TIFF file containing header and 2 images (2 bitstreams (images), each with own set of properties (semantic units): e.g., identifiers, technical metadata, inhibitors, … )  Discrete unit of information in digital form  **Objects are what repository actually preserves**  Three types of Object: FILE: named and ordered sequence of bytes that is known by an operating system REPRESENTATION: set of files, including structural metadata, that, taken together, constitute a complete rendering of an Intellectual Entity BITSTREAM: data within a file with properties relevant for preservation purposes (but needs additional structure or reformatting to be stand-alone file) Int Entities Objects Events Agents Rights

Object Example 1: photo in two formats Intellectual Entity: “Picture of my dog” Representation1: TIFF version File 1: dog.TIFF Representation 2: JPEG2000 version File 2: dog.JP2 Bitstream 1: Embedded metadata

Object Example 2: book in two versions Intellectual Entity Da Vinci Code by Dan Brown Representation 1 Page image version File 1: page1.tiff File 2: page2.tiff File N: pageN.tiff File N+1: METS.xml Representation 2 ebook version File 1: book.lit

An important aside about Objects  Repository does NOT have to control Objects at all levels E.g., repository may only manage files, not representations or bit streams.  The PREMIS DD tells you: IF you control at the representation level, these are the semantic units (properties) that pertain to representations; IF you control at the file level, these are the semantic units (properties) that pertain to files; IF you control at the bit stream level, these are the semantic units (properties) that pertain to bit streams; AND IF you control at multiple levels, you need to record relationships between them (more on this soon).

Event Examples:  Validation Event: use JHOVE tool to verify that chapter1.pdf is a valid PDF file  Ingest Event: transform an OAIS SIP into an AIP (one Event or multiple Events?)  Migration Event: create a new version of an Object in an up-to- date format  An action that involves or impacts at least one Object or Agent associated with or known by the preservation repository  Helps document digital provenance. Can track history of Object through the chain of Events that occur during the Objects lifecycle  Determining which Events are in scope is up to the repository (e.g., Events which occur before ingest, or after de-accession)  Determining which Events should be recorded, and at what level of granularity is up to the repository Int Entities Objects Events Agents Rights

Agent Examples:  Priscilla Caplan (a person)  Florida Center for Library Automation (an organization)  Dark Archive in the Sunshine State implementation (a system)  JHOVE version 1.0 (a software program)  Person, organization, or software program/system associated with an Event or a Right (permission statement)  Agents are associated only indirectly to Objects through Events or Rights  Not defined in detail in PREMIS DD; not considered core preservation metadata beyond identification Int Entities Objects Events Agents Rights

Example:  Priscilla Caplan grants FCLA digital repository permission to make three copies of metadata_fundamentals.pdf for preservation purposes.  An agreement with a rights holder that grants permission for the repository to undertake an action(s) associated with an Object(s) in the repository.  Not a full rights expression language; focuses exclusively on permissions that take the form: Agent X grants Permission Y to the repository in regard to Object Z. Int Entities Objects Events Agents Rights

Semantic units  A semantic unit is a property of an Entity Something you need to know about an Object, Event, Agent, Right Piece of information most repositories need to know in order to carry out their digital preservation functions  Two kinds of semantic unit: Container: groups together related semantic units Semantic components: semantic units grouped under the same container  Example: ObjectIdentifier [container] ObjectIdentifierType [semantic component] ObjectIdentifierValue [semantic component]

Semantic units and metadata elements  A semantic unit is not a metadata element Metadata element is an implementation decision (how and whether a semantic unit is recorded in the system)  Examples: Semantic unit can be recorded in single metadata element, or multiple elements: Example: significantProperties: break up into separate elements for content, “look and feel”, and functionality, or record all in 1 element Semantic unit can be recorded explicitly, or known implicitly Example: IdentifierType: created/assigned internally by repository, assigned to all Objects, so no need to record  However it is implemented/recorded, a semantic unit should be recoverable from archiving system (broadly defined) PREMIS Data Dictionary describes semantic units relevant to most digital preservation activities and contexts

IDENTIFIERS AND RELATIONSHIPS

Identifiers  Instances of Objects, Events, Agents and Rights statements are uniquely identified by Identifiers [enitity]Identifier [entity]IdentifierType = a specification of the domain in which identifier is unique (e.g. URI, DOI, PURL) [entity]IdentifierValue = the identifier string itself ObjectIdentifier ObjectIdentifierType = DRS ObjectIdentifierValue = EventIdentifier EventIdentifierType = DRS EventIdentifierValue = Syntax Example

Some notes on Identifiers  “IdentifierType” optimally should contain sufficient information to indicate: How to build the value Who is the naming authority Example from previous slide: ObjectIdentifierType = “DRS” (Harvard’s Digital Repository Service). Could have also put “URL” (since identifier is unique in both domains) but “DRS” conveys more information.  If all identifiers are local to repository system, it is unlikely that IdentifierType would be recorded for each identifier in the system BUT should be supplied when exchanging data with others  Identifiers can be created inside or outside the repository Example: PURLs

Relationships  Many different types of information relevant to preservation can be expressed as relationships: e.g., “A is part of B”, “A is scanned from B”, “A is a version of B”  PREMIS Data Dictionary supports expression of relationships between: Different Objects Across same level or different levels Structural: relationships between parts of a whole Derivation: relationships resulting from replication or transformation of an Object Different Entities  Relationships are established through reference to Identifiers of other Objects or Entities

Relationships between Objects: Which, How, Why  WHICH Objects are related? relatedObjectIdentification: type, value relatedObjectSequence: documents “ordered” relationships: e.g., pages, chapters, slide #  HOW are the Objects related? relationshipType: structural, derivation relationshipSubType: “is part of”, “is source of”, “is derived from”  WHY are the Objects related? Was relationship result of an Event? (e.g., “migration”, “replication”) relatedEventIdentification: type, value relatedEventSequence: ordered sequence of Events Event 1: Convert Excel spreadsheet to ASCII tab-delimited file Event 2: Convert ASCII file to new spreadsheet format Avoids numerous bilateral format-to-format conversions

Example: Structural relationship File “is part of” Representation relationship [part of the description of File] relationshipType = structural relationshipSubType = is part of relatedObjectIdentification [the Web page] relatedObjectIdentifierType = repositoryID relatedObjectIdentifierValue = relatedObjectSequence = 0 relatedEventIdentification [none] File: graphic.gif Representation: Web Page is part of

Example: Derivation relationship File 1 “is source of” File 2 through Migration Event relationship [part of description of File 1] relationshipType = derivation relationshipSubType = is source of relatedObjectIdentification [identifier of File 2] relatedObjectIdentifierType = repositoryID relatedObjectIdentifierValue = F relatedObjectSequence [none] relatedEventIdentification [Migration Event ID] relatedEventIdentifierType = repEventID relatedEventIdentifierValue = E0192 relatedEventSequence [none] File 1 (original) is source of through event Migration Event File 2 (migrated)

Relationships between different Entities linkingIntellectualEntityIdentifier linkingIntellectualEntityIdentifierType linkingIntellectualEntityIdentifierValue linkingPermissionStatementIdentifier linkingPermissionStatementIdentifierType linkingPermissionStatementIdentifierValue linkingEventIdentifier – can you guess the two sub-elements?  Identifiers are used to link related Entities together  For example, an Object can link to one or more Intellectual Entities, Rights statements, and Events via “linking” semantic units Int Entities Objects Events Agents Rights

Data dictionary descriptions Semantic unit Name that is descriptive and unique. Use externally aids interoperability. Need not be used internally in repository. Semantic components If a container, lists its sub-elements. Each component has own entry. Definition Meaning of semantic unit Rationale Why the unit is needed (if not obvious) Data constraint How it should be encoded; ”Container”: an umbrella for two or more; no values given ”None”: can take any form ”Value should be taken from a controlled vocabulary” Object category Representation File Bit stream Applicability Whether it applies to the category of object Examples Illustrative examples of values Repeatability Whether it can take multiple values Obligation Whether values must be given. ”Mandatory”: something the repository must know independent of how or whether the repository records it. Means mandatory if applicable. If not explicitly recorded, it must be provided in exchange. Creation /maintenance notes Information about how values may be obtained or updated. Usage notes Information about intended use. For each level of Object

Sample Data Dictionary entry