Presentation is loading. Please wait.

Presentation is loading. Please wait.

US GPO AIP Independence Test

Similar presentations


Presentation on theme: "US GPO AIP Independence Test"— Presentation transcript:

1 US GPO AIP Independence Test
CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ Abbott GPO contact: Kate Zwaard

2 Overview Background AIP Solution Strategy Conclusion OAIS Fdsys
Information Packages Project Objective AIP METS, MODS, and PREMIS Solution Strategy XML parsing Deliverables Repositories Testing Conclusion

3 OAIS Open Archival Information System
“An OAIS is an archive consisting of an organization of people and systems that has accepted the responsibility to preserve information and make it available for a Designated Community.” Developed by the Consultative Committee on Space Data Systems (ISO 14721:2003)

4 FDsys Federal Digital System
An OAIS maintained by the U.S. Government Printing Office to provide public access to information submitted by Congress and Federal agencies.

5 OAIS Primary Functions
Ingest – Turn SIPs into AIPs Archival Storage – Storage and retrieval of AIPs Data Management – Populating, maintaining, and accessing the varieties of information Administration – Control day to day operations Preservation Planning – Maintaining archive accessibility Access – Functions for accessing the archive

6 Information Packages The information package is a conceptual linking of content information with its preservation description and packaging information. Information packages are a critical component of OAIS. Three kinds of information packages (before, after, and during ingestion) SIP – Submission Information Package AIP – Archive Information Package DIP – Distribution Information Package

7 Project Objective Prove AIP Independence.
An AIP is independent if, even in the event of catastrophic and irretrievable loss or damage to the content management system, it is still possible for a knowledgeable user to make sense of the data.

8 Project Objective This project simulates FDsys breaking down due to some catastrophic attack or error. We are attempting to categorize and reconstruct an amount of sample data from FDsys without the context of the actual CMS. The only references we have available, other than the actual files in the archive, are publicly defined standards.

9 Composition of an AIP Archival Information Package METS (binding file)
Defines how digital objects and its associated metadata are packaged using XML based files. METS (binding file) MODS PREMIS 9

10 AIP: METS Schema XML file format Seven major sections

11 AIP: METS Schema 5 Major Sections 1) METS Header
2) Descriptive Metadata 3) Administrative Metadata 4) File Section 5) Structural Map

12 AIP: MODS Descriptive metadata Extension to METS Top-level elements
Mandatory Recommended Optional 12

13 AIP: MODS 13

14 AIP: PREMIS Preservation metadata Extension to METS PREMIS Data Model
Intellectual Entity Object Entity Event Entity Agent Entity Rights Entity* 14

15 AIP: PREMIS 15

16 Solution Strategy The data we have been provided is in the form of AIPs, not SIPs. Repository software can only ingest SIPs. We must therefore write scripts to parse the AIPs in such a way as to construct SIPs from an arbitrary file structure, and then ingest those SIPs into a repository. The goal is to create new and independent AIPs for the input data.

17 XML Parsing We plan to use the Java programming language for our scripting needs. The Java API for XML Processing (JAXP) is the standard Java library for parsing XML It provides several different possible representations for XML After being rendered human-readable, the AIP files will need to be converted into a new SIP schema of our own design, which would only describe information that still appears relevant.

18 XML Parsing Example This is a portion of a sample FDsys MODS file that summarizes a bill in Congress: <extension><collectionCode>BILLS</collectionCode><searchTitle>To increase Federal Pell Grants for the children of fallen public safety officers, and for other purposes.;Officer Daniel Faulkner Children of Fallen Heroes Scholarship Act of 2010;S (IS)</searchTitle><category>Bills and Statutes</category><waisDatabaseName>111_cong_bills</waisDatabaseName><branch>legislative</branch><dateIngested> </dateIngested></extension>

19 XML Parsing Example Once properly parsed, the output might look something like this: <extension> Collection code: “BILLS” Search title: “To increase Federal Pell Grants for the children of fallen public safety officers, and for other purposes.;Officer Daniel Faulkner Children of Fallen Heroes Scholarship Act of 2010;S (IS)” Category: “Bills and Statutes” WAIS database name: “111_cong_bills” Branch: legislative Date ingested: </extension>

20 A Note on Deliverables Because our aim is not to design software, this is not a typical computer science design project. Instead, we are conducting coded experimental tests on real data and forming conclusions based on the results. Our deliverables will most likely include: a written report of our findings and any recommendations for improvement a reorganized and re-ingested version of the input data

21 Testing After parsing and organizing the data, it will be important to perform checks to ensure that the reconstruction is accurate. We may send a preliminary report to GPO for verification. The exact testing procedure is still undefined, as at this point we haven’t had a chance to investigate the data at depth yet. Our testing goals should become clearer once we understand exactly what types of data we are dealing with.

22 Repositories We will use a third party repository software to ingest the created SIPs. DSpace, Fedora Commons (Duraspace) Based on a few simple technologies: Java MySQL Apache Tomcat JavaScript Server

23 Conclusion If we succeed at interpreting and reusing the data, then we will have proven that the AIPs are truly independent from FDsys. If we fail, then we have evidence that perhaps they are not sufficiently independent. In either case, we hope that the results of our project will assist GPO in making FDsys more robust and secure. Our thanks to Kate, Dr. Abbott, and Dr. Pamula for their support.


Download ppt "US GPO AIP Independence Test"

Similar presentations


Ads by Google