Presentation is loading. Please wait.

Presentation is loading. Please wait.

PAWN: A Novel Ingestion Workflow Technology for Digital Preservation

Similar presentations


Presentation on theme: "PAWN: A Novel Ingestion Workflow Technology for Digital Preservation"— Presentation transcript:

1 PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Mike Smorul, Joseph JaJa, Yang Wang, Mike McGann, and Fritz McCall

2 Overall Principles Consistent with the Open Archival Information System (OAIS) model Distributed, secure ingestion Use of web/grid technologies – platform independent Minimal client-side requirements Ease of integration with archival storage or data grid systems.

3 Producer

4 Producer Provides data to an Archive based on a prior agreement.
Consists of a management/metadata server and an ingestion client. Provides initial arrangement, context, and metadata.

5 Archive - receiving

6 Archive – receiving Receives data from a Producer
Validates bitstreams and metadata, and sends acknowledgement to Producer. Arranges into collections and specifies preservation policy. Publishes bitstreams into a digital archive.

7 Archive – Long term preservation
Implemented using grid technologies. Use the existing prototype NARA/UMD/SDSC site. Automated replication and integrity checking. Enforces access control and preservation policy

8 Ingestion Workflow Negotiate Submission Agreement.
Workflow Initialization and Submission Information Packet (SIP) creation. Transfer of SIPs to archive. Validation of SIP transfer Organization of data into collections and transfer into persistent archive.

9 Submission Agreement Based on data appraisal and record schedule, including format and metadata. Create machine actionable set of rules describing items. Final Submission Agreement is composed of: METS document for application defaults METS Constraint document to limit METS form to submission parameters

10 METS Overview Provides a framework for linking structural organization of objects with metadata. Using XML namespace, metadata from various XML schema can be attached to objects Ie, dublin core, FGDC, etc Extensible for more complex metadata FGDC - Federal Geographic Data Committee

11 Sample METS Document

12 Why METS Constraints? METS doesn’t provide a way to create machine interpretable rules describing a collection Ie: allow only JPEG files in certain structural areas METS profiles allow for developer interpretable rules, not machine interpretable

13 METS Constraints Allows structural, metadata, and file constraints.
Structural Constraints: Restrict child div’s and restrict pointers to div, file, and other mets documents File Constraints: Restrict files by mime-type or validation tests Metadata Constraints: Restrict allowed metadata schema.

14 METS Constraints - Template
<?xml version="1.0" encoding="UTF-8"?> <mets …. > <!-- validation test section, referenced in the constraints document --> <amdSec> <techMD ID="xmltest"> <mdWrap MDTYPE="OTHER"> <xmlData> <val:validation NAME="xmltext" DESCRIPTION="Test for valid xml documents" MIMETYPE="text/xml"> <val:valgrp required="true"> <val:valtest name=“xml" required="true"> <val:description>generic xml test for any file</val:description> </val:valtest> </val:valgrp> </val:validation> </xmlData> </mdWrap> </techMD> </amdSec> <!-- base div structure to use for all clients --> <structMap> <div ID="ID1" LABEL="Research & Development Records"> <div ID="ID1.1" LABEL="Research & Development Project Records"> <div ID="ID1.1.1" LABEL="R&D Project Case Files"/> <div ID="ID1.1.2" LABEL="R&D Record Series"/> </div> </structMap> </mets>

15 METS Constraints - Rules
<?xml version="1.0" encoding="UTF-8"?> <metsconstraint …> <filegrp ID="FILE1" NAME="Text Document"> <!-- Files can be identified either by MIMETYPE, or TESTID in skeleton METS document or both --> <file NAME="html document" MIMETYPE="text/html"/> <file TESTID="xmltext" NAME="xml document" MIMETYPE="text/xml"/> </filegrp> <!-- Apply rules to predefined div's and link to required file/metadata tests above --> <divrule DIVID="ID1" RESTRICTDIV="true" RESTRICTFTPR="true" RESTRICTMPTR="true"/> <divrule DIVID="ID1.1" RESTRICTDIV="true" RESTRICTFTPR="true" RESTRICTMPTR="true"/> <divrule DIVID="ID1.1.1" RESTRICTMPTR="true"> <filetype FILEGROUPID="FILE1"/> </divrule> <divrule DIVID="ID1.1.2" RESTRICTMPTR="true"/> </metsconstraint>

16 Ingestion Workflow Negotiate Submission Agreement.
Workflow Initialization and Submission Information Packet creation. Transfer of SIPs to archive. Validation of SIP transfer Organization of data into collections and transfer into persistent archive.

17 Initialize Ingestion workflow
Instantiate Producer management server to track registered objects Establish a working trust relationship with the Archive Issue clients.

18 Create SIP Each client registers objects stored locally with producer management server Register file types, validation tests, etc Client follows rules in Submission Agreement Producer-wide agents can arrange registered object to give a broader context

19 SIP Example METS Handles all areas of a SIP except Physical Object and Descriptive Information Descriptive Information can be embedded into METS as 3rd party XML schema

20 Mapping SIP metadata to METS
Packaging Information SIP only exists in entirety during transit METS Flocat sections allow mapping of metadata to physical object at various stages in transit. Content Information Physical Object – encoded in http/tar stream Representation Information – point to validation services at an archive rather than viewer. Tests are assumed to be representative of viewers

21 Mapping SIP metadata to METS (cont)
Preservation Description Information Provenance – stacked File location tags Context – provided by structural map section Reference – can be embedded in various descriptive metadata sections (Dublin Core, etc) Fixity – Provided by checksums in each file.

22 Client Interface

23 Ingestion Workflow Negotiate Submission Agreement.
Workflow Initialization and Submission Information Packet creation. Transfer of SIPs to archive. Validation of SIP transfer Organization of data into collections and transfer into persistent archive.

24 Transfer SIP to archive
Retrieve previously registered SIP from producer management server Authenticate to archive Update provenance information in METS document with file structure of SIP Transfer METS document describing SIP and container for SIP physical objects Archive acknowledges transfer completion to producer management server

25 Ingestion Workflow Negotiate Submission Agreement.
Workflow Initialization and Submission Information Packet creation. Transfer of SIP to archive. Validation of SIP transfer Organization of data into collections and transfer into persistent archive.

26 Validation of SIP transfer
Check incoming SIP against constraints documents. Ensure object integrity by verifying checksums/cryptographic digest Validate bitstreams against tests described in METS document Update METS document with validation results and movement of objects on receiving server

27 Ingestion Workflow Negotiate Submission Agreement.
Workflow Initialization and Submission Information Packet creation. Transfer of SIP to archive. Validation of SIP transfer Organization of data into collections and transfer into persistent archive.

28 Final transfer to archive
Transfer objects to digital archive Update provenance information in METS document with handle to object in archive Transfer METS document into archive Return accept/reject messages to producer metadata server

29 Component Overview

30 Producer Components Database to track registered objects
Certificate Authority management Web service for archive security check Management server supplies web service interfaces to ingestion clients and management operations. Clients are designed to be standalone, with security certificates issued by producer

31 Archive Components Receiving servers validate connecting clients and validate SIPs Validation Services are simple webservice calls. Abstract I/O layer into digital archive.

32 Recap Implemented using web technologies Architecture independent
OAIS compliant XML based metadata METS based SIPs Add-on constraints describing Submission Agreement


Download ppt "PAWN: A Novel Ingestion Workflow Technology for Digital Preservation"

Similar presentations


Ads by Google