Presentation is loading. Please wait.

Presentation is loading. Please wait.

US GPO AIP Independence Test CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ.

Similar presentations


Presentation on theme: "US GPO AIP Independence Test CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ."— Presentation transcript:

1 US GPO AIP Independence Test CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ Abbott GPO contact: Kate Zwaard

2 Overview  Background OAIS FDsys Project Objectives  AIP METS, MODS, and PREMIS  Solution Strategy XML parsing A note on deliverables Repositories Testing  Conclusion

3 OAIS Open Archival Information System  “An OAIS is an archive consisting of an organization of people and systems that has accepted the responsibility to preserve information and make it available for a Designated Community”  Developed by the Consultive Committee on Space Data Systems (ISO 14721:2003)

4 FDsys Federal Digital System  FDsys – Am OAIS maintained by the U.S. Government Printing Office to provide public access to information submitted by Congress and Federal agencies.

5 OAIS Primary Functions  Ingest – Turn SIPs into AIPs  Archival Storage – Storage and retrieval of AIPs  Data Management – Populating, maintaining and accessing the varieties of information  Administration – Controls day to day operations  Preservation Planning – Maintaining archive accessibility  Access – Functions for access of archive

6 Information Package - critical component of OAIS  The information package is a conceptual linking of content information with its preservation description and packaging information.  Three kinds of information packages (before, after, and during ingestion) SIP – Submission Information Package AIP – Archive Information Package DIP – Distribution Information Package

7 Composition of an AIP  Archival Information Package Defines how digital objects and its associated metadata are packaged using XML based files.  METS (binding file)  MODS  PREMIS

8 Project Objective: Prove AIP Independence  An AIP is independent if, in the event of catastrophic and irretrievable loss or damage of the content management system, a knowledgeable user can still make sense of the data.

9 Project Objectives  This project simulates FDsys breaking down due to some catastrophic attack or error.  We are attempting to categorize and reconstruct an amount of sample data from FDsys outside the context of the actual CMS. The only references we have available, other than the actual files in the archive, are publicly defined standards.  It is our hope that this project will help GPO improve the robustness of their file system.

10 AIP: METS  Schema  XML file format  Seven major sections

11 AIP: METS Schema  5 Major Sections 1) METS Header METS Header 2) Descriptive Metadata Descriptive Metadata 3) Administrative Metadata Administrative Metadata 4) File Section File Section 5) Structural Map Structural Map

12 AIP: MODS  Descriptive metadata  Extension to METS  Top-level elements Mandatory Recommended Optional

13 AIP: MODS

14 AIP: PREMIS  Preservation metadata  Extension to METS  PREMIS Data Model Intellectual Entity Object Entity Event Entity Agent Entity Rights Entity*

15 AIP: PREMIS

16 Solution Strategy  The data we have received are AIPs, not SIPs. Repository software can only ingest SIPs. We must therefore write scripts to parse the AIPs in such a way to construct SIPs from an arbitrary file structure, and then ingest those SIPs into a repository software in order to create new AIPs for the same information.

17 XML Parsing  We plan to use the Java programming language for our scripting needs. The Java API for XML Processing (JAXP) is the standard Java library for parsing XML  It provides several different possible representations for XML  After being rendered human-readable, the AIP files will need to be converted into a new SIP schema of our own design, which would only describe information that still appears relevant.

18 XML Parsing Example  This is a portion of a sample FDsys MODS file that summarizes a bill in Congress: BILLS To increase Federal Pell Grants for the children of fallen public safety officers, and for other purposes.;Officer Daniel Faulkner Children of Fallen Heroes Scholarship Act of 2010;S. 3880 (IS) Bills and Statutes 111_cong _bills legislative 2010-10- 06

19 XML Parsing Example  We might expect this type of output once properly parsed: Collection code: “BILLS” Search title: “To increase Federal Pell Grants for the children of fallen public safety officers, and for other purposes.;Officer Daniel Faulkner Children of Fallen Heroes Scholarship Act of 2010;S. 3880 (IS)” Category: “Bills and Statutes” WAIS database name: “111_cong_bills” Branch: legislative Date ingested: 2010-10-06

20 A Note on Deliverables  Because our aim is not to design software, this is not a typical computer science design project. Instead, we are conducting coded experimental tests on real data and forming conclusions based on the results.  Deliverables will most likely include: a written report of our findings and recommendations a reorganized version of the input data

21 Testing  After parsing and organizing the data, it will be important to perform checks to ensure that the reconstruction is accurate. We may send a preliminary report to GPO for verification.  The exact testing procedure is still undefined, as we haven’t had a chance to investigate the data in depth yet. Our goals should be clearer once we understand exactly what type of data we are dealing with.

22 Repositories  Third party repository software to ingest created SIPs.  DSpace, Fedora Commons (Duraspace)  Based on a few simple technologies: Java MySQL Apache Tomcat JavaScript Server

23 Conclusion  Our thanks to Kate, Dr. Abbott, and Dr. Pamula for their support.


Download ppt "US GPO AIP Independence Test CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ."

Similar presentations


Ads by Google