PAWN: A Novel Ingestion Workflow Technology for Digital Preservation

Slides:



Advertisements
Similar presentations
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
Advertisements

October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
Fedora Users’ Conference Rutgers University May 14, 2005 Researching Fedora's Ability to Serve as a Preservation System for Electronic University Records.
PREMIS in Thought: Data Center for LC Digital Holdings Ardys Kozbial, Arwen Hutt, David Minor February 11, 2008.
ISO & OAI-PMH By Neal Harmeyer, Amy Hatfield, and Brandon Beatty PURDUE UNIVERSITY RESEARCH REPOSITORY.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
ADAPT An Approach to Digital Archiving and Preservation Technology Principal Investigator: Joseph JaJa Lead Programmers: Mike Smorul and Mike McGann Graduate.
PAWN: Producer-Archive Workflow Network University of Maryland Institute for Advanced Computer Studies Joseph Ja’Ja, Mike Smorul, Mike McGann.
May Archiving PAWN: A Policy-Driven Software Environment for Implementing Producer- Archive Interactions in Support of Long Term Digital.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Producer-Archive Workflow Network (PAWN) Goals Consistent with the Open Archival Information System (OAIS) model Use of web/grid technologies and platform.
Supporting Customized Archival Practices Using the Producer-Archive Workflow Network (PAWN) Mike Smorul, Mike McGann, Joseph JaJa.
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
Brief Overview of Major Enhancements to PAWN. Producer – Archive Workflow Network (PAWN) Distributed and secure ingestion of digital objects into the.
PREMIS What is PREMIS? o Preservation Metadata Implementation Strategies When is PREMIS use? o PREMIS is used for “repository design, evaluation, and archived.
July NAGARA 1 Producer-Archive Workflow Network Mike Smorul, Mike McGann, Joseph JaJa Institute for Advanced Computer Science Studies University.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
WMS: Democratizing Data
PAWN Progress July 06, Overview of changes New flexible environment for setting up and managing interactions between producers and the archive Domains.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Robust Technologies for Automated Ingestion and Long-Term Preservation of Digital Information Principal Investigator: Joseph JaJa Lead Programmers: Mike.
PAWN: Producer-Archive Workflow Network University of Maryland Institute for Advanced Computer Studies Joseph JaJa, Mike Smorul, Mike McGann.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
PAWN: Producer-Archive Workflow Network University of Maryland Institute for Advanced Computer Studies Joseph Ja’Ja, Mike Smorul, Mike McGann.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Robust Technologies for Automated Ingestion and Long-Term Preservation of Digital Information PI: Joseph JaJa Co-PIs: Allison Druin and Doug Oard Major.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
Archival Prototypes and Lessons Learned Mike Smorul UMIACS.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Digital Object: A Virtual Online Storage Solution 598C Course Project Huajing Li.
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
Finding a New Way Richard Pearce-Moses Deputy Director for Technology & Information Resources Arizona State Library, Archives and Public Records Using.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context Paul Bevan DAMS Implementation Manager
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
Use & Access 26 March Use “Proof of Concept” Model for General Libraries & IS faculty Model for General Libraries & IS faculty Test bed for DSpace.
The FCLA Digital Archive Joint Meeting of CSUL Committees, 2005.
Persistent Digital Archives and Library System (PeDALS)
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
Interoperability and Collection of Preservation Metadata for Digital Repository Content Matt Cordial, Tom Habing, Bill Ingram, Robert Manaster University.
Fedora and the Preservation of University Electronic Records Project NHPRC Electronic Records Research Grant Kevin L. Glick Manuscripts and Archives, Yale.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
M-1 INGEST OVERVIEW Don Sawyer National Space Science Data Center NASA/GSFC October 13, 1999.
DSpace System Architecture 11 July 2002 DSpace System Architecture.
Implementing PREMIS in DigiTool Michael Kaplan ALA 2007 Update.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
NARA Report: NARA Persistent Archives Prototype Bill Underwood GTRI, Atlanta CCSDS, MOIMS DAI / IPR WGs Toulouse, 2 Nov-5 Nov 2004.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
OAIS (archive) Producer Management Consumer. Representation Information Data Object Information Object Interpreted using its Yields.
OAIS (archive) OAIS (archive) Producer Management Consumer.
R2R ↔ NODC Steve Rutz NODC Observing Systems Team Leader May 12, 2011 Presented by L. Pikula, IODE OceanTeacher Course Data Management for Information.
PAWN: Producer-Archive Workflow Network
Joint Meeting of CSUL Committees,
Metadata Issues in Long-term Management of Data and Metadata
OAIS Producer (archive) Consumer Management
Building A Repository for Digital Objects
Policy-Based Data Management integrated Rule Oriented Data System
Joseph JaJa, Mike Smorul, and Sangchul Song
Implementing an Institutional Repository: Part II
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Presentation transcript:

PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, Mike McGann, and Fritz McCall

Overall Principles Consistent with the Open Archival Information System (OAIS) model Distributed, secure ingestion Use of web/grid technologies – platform independent Minimal client-side requirements Ease of integration with archival storage or data grid systems.

Producer

Producer Provides data to an Archive based on a prior agreement. Consists of a management/metadata server and an ingestion client. Provides initial arrangement, context, and metadata.

Archive - receiving

Archive – receiving Receives data from a Producer Validates bitstreams and metadata, and sends acknowledgement to Producer. Arranges into collections and specifies preservation policy. Publishes bitstreams into a digital archive.

Archive – Long term preservation Implemented using grid technologies. Use the existing prototype NARA/UMD/SDSC site. Automated replication and integrity checking. Enforces access control and preservation policy

Ingestion Workflow Negotiate Submission Agreement. Workflow Initialization and Submission Information Packet (SIP) creation. Transfer of SIPs to archive. Validation of SIP transfer Organization of data into collections and transfer into persistent archive.

Submission Agreement Based on data appraisal and record schedule, including format and metadata. Create machine actionable set of rules describing items. Final Submission Agreement is composed of: METS document for application defaults METS Constraint document to limit METS form to submission parameters

METS Overview Provides a framework for linking structural organization of objects with metadata. Using XML namespace, metadata from various XML schema can be attached to objects Ie, dublin core, FGDC, etc Extensible for more complex metadata http://www.loc.gov/standards/mets/ FGDC - Federal Geographic Data Committee

Sample METS Document

Why METS Constraints? METS doesn’t provide a way to create machine interpretable rules describing a collection Ie: allow only JPEG files in certain structural areas METS profiles allow for developer interpretable rules, not machine interpretable

METS Constraints Allows structural, metadata, and file constraints. Structural Constraints: Restrict child div’s and restrict pointers to div, file, and other mets documents File Constraints: Restrict files by mime-type or validation tests Metadata Constraints: Restrict allowed metadata schema.

METS Constraints - Template <?xml version="1.0" encoding="UTF-8"?> <mets …. > <!-- validation test section, referenced in the constraints document --> <amdSec> <techMD ID="xmltest"> <mdWrap MDTYPE="OTHER"> <xmlData> <val:validation NAME="xmltext" DESCRIPTION="Test for valid xml documents" MIMETYPE="text/xml"> <val:valgrp required="true"> <val:valtest name=“xml" required="true"> <val:description>generic xml test for any file</val:description> </val:valtest> </val:valgrp> </val:validation> </xmlData> </mdWrap> </techMD> </amdSec> <!-- base div structure to use for all clients --> <structMap> <div ID="ID1" LABEL="Research & Development Records"> <div ID="ID1.1" LABEL="Research & Development Project Records"> <div ID="ID1.1.1" LABEL="R&D Project Case Files"/> <div ID="ID1.1.2" LABEL="R&D Record Series"/> </div> </structMap> </mets>

METS Constraints - Rules <?xml version="1.0" encoding="UTF-8"?> <metsconstraint …> <filegrp ID="FILE1" NAME="Text Document"> <!-- Files can be identified either by MIMETYPE, or TESTID in skeleton METS document or both --> <file NAME="html document" MIMETYPE="text/html"/> <file TESTID="xmltext" NAME="xml document" MIMETYPE="text/xml"/> </filegrp> <!-- Apply rules to predefined div's and link to required file/metadata tests above --> <divrule DIVID="ID1" RESTRICTDIV="true" RESTRICTFTPR="true" RESTRICTMPTR="true"/> <divrule DIVID="ID1.1" RESTRICTDIV="true" RESTRICTFTPR="true" RESTRICTMPTR="true"/> <divrule DIVID="ID1.1.1" RESTRICTMPTR="true"> <filetype FILEGROUPID="FILE1"/> </divrule> <divrule DIVID="ID1.1.2" RESTRICTMPTR="true"/> </metsconstraint>

Ingestion Workflow Negotiate Submission Agreement. Workflow Initialization and Submission Information Packet creation. Transfer of SIPs to archive. Validation of SIP transfer Organization of data into collections and transfer into persistent archive.

Initialize Ingestion workflow Instantiate Producer management server to track registered objects Establish a working trust relationship with the Archive Issue clients.

Create SIP Each client registers objects stored locally with producer management server Register file types, validation tests, etc Client follows rules in Submission Agreement Producer-wide agents can arrange registered object to give a broader context

SIP Example METS Handles all areas of a SIP except Physical Object and Descriptive Information Descriptive Information can be embedded into METS as 3rd party XML schema

Mapping SIP metadata to METS Packaging Information SIP only exists in entirety during transit METS Flocat sections allow mapping of metadata to physical object at various stages in transit. Content Information Physical Object – encoded in http/tar stream Representation Information – point to validation services at an archive rather than viewer. Tests are assumed to be representative of viewers

Mapping SIP metadata to METS (cont) Preservation Description Information Provenance – stacked File location tags Context – provided by structural map section Reference – can be embedded in various descriptive metadata sections (Dublin Core, etc) Fixity – Provided by checksums in each file.

Client Interface

Ingestion Workflow Negotiate Submission Agreement. Workflow Initialization and Submission Information Packet creation. Transfer of SIPs to archive. Validation of SIP transfer Organization of data into collections and transfer into persistent archive.

Transfer SIP to archive Retrieve previously registered SIP from producer management server Authenticate to archive Update provenance information in METS document with file structure of SIP Transfer METS document describing SIP and container for SIP physical objects Archive acknowledges transfer completion to producer management server

Ingestion Workflow Negotiate Submission Agreement. Workflow Initialization and Submission Information Packet creation. Transfer of SIP to archive. Validation of SIP transfer Organization of data into collections and transfer into persistent archive.

Validation of SIP transfer Check incoming SIP against constraints documents. Ensure object integrity by verifying checksums/cryptographic digest Validate bitstreams against tests described in METS document Update METS document with validation results and movement of objects on receiving server

Ingestion Workflow Negotiate Submission Agreement. Workflow Initialization and Submission Information Packet creation. Transfer of SIP to archive. Validation of SIP transfer Organization of data into collections and transfer into persistent archive.

Final transfer to archive Transfer objects to digital archive Update provenance information in METS document with handle to object in archive Transfer METS document into archive Return accept/reject messages to producer metadata server

Component Overview

Producer Components Database to track registered objects Certificate Authority management Web service for archive security check Management server supplies web service interfaces to ingestion clients and management operations. Clients are designed to be standalone, with security certificates issued by producer

Archive Components Receiving servers validate connecting clients and validate SIPs Validation Services are simple webservice calls. Abstract I/O layer into digital archive.

Recap Implemented using web technologies Architecture independent OAIS compliant XML based metadata METS based SIPs Add-on constraints describing Submission Agreement