US GPO AIP Independence Test

Slides:



Advertisements
Similar presentations
Long-Term Preservation. Technical Approaches to Long-Term Preservation the challenge is to interpret formats a similar development: sound carriers From.
Advertisements

October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
Fedora Users’ Conference Rutgers University May 14, 2005 Researching Fedora's Ability to Serve as a Preservation System for Electronic University Records.
An Introduction June 17, 2013 Open Archival Information System (OAIS)
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
METS In order to reconstruct the archive, we will need to understand the METS files. METS is schema that provides a flexible mechanism for encoding descriptive,
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
Mark Evans, Tessella Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22 nd May 2013 PREMIS Practical Strategies For Preservation Metadata.
XML Parsing Using Java APIs AIP Independence project Fall 2010.
US GPO AIP Independence Test CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ.
US GPO AIP Independence Test CS 496A – Senior Design Fall 2010 Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong.
MODS What is MODS: When is MODS use:
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
AIP Archival Information Package – Defines how digital objects and its associated metadata are packaged using XML based files. METS (binding file) MODS.
MODS What is MODS: o Stands for Metadata Object Descriptive Schema o MODS is an XML descriptive metadata standard.  Uses the XML schema language of the.
US GPO AIP Independence Test CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ.
An Overview of Selected ISO Standards Applicable to Digital Archives Science Archives in the 21st Century 25 April 2007 Donald Sawyer - NASA/GSFC/NSSDC.
Ingest and Dissemination with DAITSS Presented by Randy Fischer, Programmer, Florida Center for Library Automation, University of Florida DigCCurr2007.
US GPO AIP Independence Test CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ.
International Council on Archives Section on University and Research Institution Archives Michigan State University September 7, 2005 Preserving Electronic.
Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context Paul Bevan DAMS Implementation Manager
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
OAIS Open Archival Information System. “Content creators, systems developers, custodians, and future users are all potential stakeholders in the preservation.
DAITSS: Dark Archive in the Sunshine State Priscilla Caplan, Florida Center for Library Automation DCC Workshop on Long-term Curation within Digital Repositories.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
PREMIS Rathachai Chawuthai Information Management CSIM / AIT.
Creating Archive Information Packages for Data Sets: Early Experiments with Digital Library Standards Ruth Duerr, NSIDC MiQun Yang, THG Azhar Sikander,
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
The FCLA Digital Archive Joint Meeting of CSUL Committees, 2005.
Implementation of PREMIS in METS Rebecca Guenther Sr. Networking & Standards Specialist, Library of Congress PREMIS Implementation Fair San.
OCLC Online Computer Library Center Preservation Metadata Standards PREMIS & METS Taylor Surface, OCLC.
Linked Digital Archive Institutional Repository Rathachai Chawuthai CSIM/SET/AIT.
PREMIS Implementation Fair, San Francisco, CA October 7, Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich Knowledge.
Selene Dalecky March 20, 2007 FDsys: GPO’s Digital Content System.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
Interoperability and Collection of Preservation Metadata for Digital Repository Content Matt Cordial, Tom Habing, Bill Ingram, Robert Manaster University.
Some OAIS Concepts ICPSR Conforming to OAIS 1. Fulfill 6 OAIS Responsibilities 2. Conform to the OAIS Information Model.
The OAIS Reference Model Michael Day, Digital Curation Centre UKOLN, University of Bath Reference Models meeting,
DAITSS and the Florida Digital Archive Priscilla Caplan Florida Center for Library Automation iPRES 2006.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
The OAIS Reference Model and Trustworthy Repositories Josh Lubell Manufacturing Engineering Laboratory NIST
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
The OAIS model SEEDS meeting May 5 th, 2015, Lausanne Bojana Tasic.
Cedars work on metadata Michael Day UKOLN, University of Bath Cedars Workshop Manchester, February 2002.
OAIS (archive) Producer Management Consumer. Representation Information Data Object Information Object Interpreted using its Yields.
Developing a Dark Archive for OJS Journals Yu-Hung Lin, Metadata Librarian for Continuing Resources, Scholarship and Data Rutgers University 1 10/7/2015.
Joint Meeting of CSUL Committees,
Preserving Digital Collections
Tiewei (Lucy) Liu Metadata Librarian June 26, 2016
Ingest and Dissemination with DAITSS
OAIS Producer (archive) Consumer Management
DAITSS: Dark Archive in the Sunshine State
DAITSS and the Florida Digital Archive
Exercise: understanding authenticity evidence
U.S. Government Printing Office FDsys Update
Exercise: understanding authenticity evidence
Introduction to DSpace
Integrating PREMIS and METS
Implementing an Institutional Repository: Part II
An Open Archival Repository System for UT Austin
Oya Y. Rieger Cornell University Library May 2004
Open Archival Information System
Robin Dale RLG OAIS Functionality Robin Dale RLG
The Reference Model for an Open Archival Information System (OAIS)
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Presentation transcript:

US GPO AIP Independence Test CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ Abbott GPO contact: Kate Zwaard

Overview Background AIP Solution Strategy Conclusion OAIS Fdsys Information Packages Project Objective AIP METS, MODS, and PREMIS Solution Strategy XML parsing Deliverables Repositories Testing Conclusion

OAIS Open Archival Information System “An OAIS is an archive consisting of an organization of people and systems that has accepted the responsibility to preserve information and make it available for a Designated Community.” Developed by the Consultative Committee on Space Data Systems (ISO 14721:2003)

FDsys Federal Digital System An OAIS maintained by the U.S. Government Printing Office to provide public access to information submitted by Congress and Federal agencies.

OAIS Primary Functions Ingest – Turn SIPs into AIPs Archival Storage – Storage and retrieval of AIPs Data Management – Populating, maintaining, and accessing the varieties of information Administration – Control day to day operations Preservation Planning – Maintaining archive accessibility Access – Functions for accessing the archive

Information Packages The information package is a conceptual linking of content information with its preservation description and packaging information. Information packages are a critical component of OAIS. Three kinds of information packages (before, after, and during ingestion) SIP – Submission Information Package AIP – Archive Information Package DIP – Distribution Information Package

Project Objective Prove AIP Independence. An AIP is independent if, even in the event of catastrophic and irretrievable loss or damage to the content management system, it is still possible for a knowledgeable user to make sense of the data.

Project Objective This project simulates FDsys breaking down due to some catastrophic attack or error. We are attempting to categorize and reconstruct an amount of sample data from FDsys without the context of the actual CMS. The only references we have available, other than the actual files in the archive, are publicly defined standards.

Composition of an AIP Archival Information Package METS (binding file) Defines how digital objects and its associated metadata are packaged using XML based files. METS (binding file) MODS PREMIS 9

AIP: METS Schema XML file format Seven major sections

AIP: METS Schema 5 Major Sections 1) METS Header 2) Descriptive Metadata 3) Administrative Metadata 4) File Section 5) Structural Map

AIP: MODS Descriptive metadata Extension to METS Top-level elements Mandatory Recommended Optional 12

AIP: MODS 13

AIP: PREMIS Preservation metadata Extension to METS PREMIS Data Model Intellectual Entity Object Entity Event Entity Agent Entity Rights Entity* 14

AIP: PREMIS 15

Solution Strategy The data we have been provided is in the form of AIPs, not SIPs. Repository software can only ingest SIPs. We must therefore write scripts to parse the AIPs in such a way as to construct SIPs from an arbitrary file structure, and then ingest those SIPs into a repository. The goal is to create new and independent AIPs for the input data.

XML Parsing We plan to use the Java programming language for our scripting needs. The Java API for XML Processing (JAXP) is the standard Java library for parsing XML It provides several different possible representations for XML After being rendered human-readable, the AIP files will need to be converted into a new SIP schema of our own design, which would only describe information that still appears relevant.

XML Parsing Example This is a portion of a sample FDsys MODS file that summarizes a bill in Congress: <extension><collectionCode>BILLS</collectionCode><searchTitle>To increase Federal Pell Grants for the children of fallen public safety officers, and for other purposes.;Officer Daniel Faulkner Children of Fallen Heroes Scholarship Act of 2010;S. 3880 (IS)</searchTitle><category>Bills and Statutes</category><waisDatabaseName>111_cong_bills</waisDatabaseName><branch>legislative</branch><dateIngested>2010-10-06</dateIngested></extension>

XML Parsing Example Once properly parsed, the output might look something like this: <extension> Collection code: “BILLS” Search title: “To increase Federal Pell Grants for the children of fallen public safety officers, and for other purposes.;Officer Daniel Faulkner Children of Fallen Heroes Scholarship Act of 2010;S. 3880 (IS)” Category: “Bills and Statutes” WAIS database name: “111_cong_bills” Branch: legislative Date ingested: 2010-10-06 </extension>

A Note on Deliverables Because our aim is not to design software, this is not a typical computer science design project. Instead, we are conducting coded experimental tests on real data and forming conclusions based on the results. Our deliverables will most likely include: a written report of our findings and any recommendations for improvement a reorganized and re-ingested version of the input data

Testing After parsing and organizing the data, it will be important to perform checks to ensure that the reconstruction is accurate. We may send a preliminary report to GPO for verification. The exact testing procedure is still undefined, as at this point we haven’t had a chance to investigate the data at depth yet. Our testing goals should become clearer once we understand exactly what types of data we are dealing with.

Repositories We will use a third party repository software to ingest the created SIPs. DSpace, Fedora Commons (Duraspace) Based on a few simple technologies: Java MySQL Apache Tomcat JavaScript Server

Conclusion If we succeed at interpreting and reusing the data, then we will have proven that the AIPs are truly independent from FDsys. If we fail, then we have evidence that perhaps they are not sufficiently independent. In either case, we hope that the results of our project will assist GPO in making FDsys more robust and secure. Our thanks to Kate, Dr. Abbott, and Dr. Pamula for their support.