US GPO AIP Independence Test CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ.

Slides:



Advertisements
Similar presentations
A Tour of the OAIS Reference Model Brian Lavoie Research Scientist Office of Research OCLC Museum Computer Network Annual Conference September 2002.
Advertisements

Long-Term Preservation. Technical Approaches to Long-Term Preservation the challenge is to interpret formats a similar development: sound carriers From.
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
Fedora Users’ Conference Rutgers University May 14, 2005 Researching Fedora's Ability to Serve as a Preservation System for Electronic University Records.
An Introduction June 17, 2013 Open Archival Information System (OAIS)
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
METS In order to reconstruct the archive, we will need to understand the METS files. METS is schema that provides a flexible mechanism for encoding descriptive,
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
Funded by: © AHDS Sherpa DP – a Technical Architecture for a Disaggregated Preservation Service Mark Hedges Arts and Humanities Data Service King’s College.
Andrea Fojtu Charles University in Prague, National Library of the CR.
Mark Evans, Tessella Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22 nd May 2013 PREMIS Practical Strategies For Preservation Metadata.
XML Parsing Using Java APIs AIP Independence project Fall 2010.
3. Technical and administrative metadata standards Metadata Standards and Applications.
US GPO AIP Independence Test CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ.
US GPO AIP Independence Test CS 496A – Senior Design Fall 2010 Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong.
MODS What is MODS: When is MODS use:
PREMIS What is PREMIS? o Preservation Metadata Implementation Strategies When is PREMIS use? o PREMIS is used for “repository design, evaluation, and archived.
AIP Archival Information Package – Defines how digital objects and its associated metadata are packaged using XML based files. METS (binding file) MODS.
WMS: Democratizing Data
Descriptive Metadata o When will mods.xml be used by METS (aip.xml) ?  METS will use the mods.xml to encode descriptive metadata. Information that describes,
MODS What is MODS: o Stands for Metadata Object Descriptive Schema o MODS is an XML descriptive metadata standard.  Uses the XML schema language of the.
US GPO AIP Independence Test CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ.
DIGITIZATION OF COMPUTER SCIENCE QUESTION PAPERS IN BHARATHIDASAN UNIVERSITY LIBRARY By V. MUTHULAKSHMI SUPERVISOR Dr. M. SURULINATHI Assistant Professor.
Preserving Digital Collections Andrea Goethals Florida Center for Library Automation (FCLA)
An Overview of Selected ISO Standards Applicable to Digital Archives Science Archives in the 21st Century 25 April 2007 Donald Sawyer - NASA/GSFC/NSSDC.
Ingest and Dissemination with DAITSS Presented by Randy Fischer, Programmer, Florida Center for Library Automation, University of Florida DigCCurr2007.
International Council on Archives Section on University and Research Institution Archives Michigan State University September 7, 2005 Preserving Electronic.
Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context Paul Bevan DAMS Implementation Manager
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
U.S. Government Printing Office FDsys Update Spring Depository Library Council April 16, 2007.
Government Printing Office The mission of GPO is to produce, preserve, and distribute the official publications and information products of the Federal.
OAIS Open Archival Information System. “Content creators, systems developers, custodians, and future users are all potential stakeholders in the preservation.
DAITSS: Dark Archive in the Sunshine State Priscilla Caplan, Florida Center for Library Automation DCC Workshop on Long-term Curation within Digital Repositories.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
PREMIS Rathachai Chawuthai Information Management CSIM / AIT.
Creating Archive Information Packages for Data Sets: Early Experiments with Digital Library Standards Ruth Duerr, NSIDC MiQun Yang, THG Azhar Sikander,
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
Implementation of PREMIS in METS Rebecca Guenther Sr. Networking & Standards Specialist, Library of Congress PREMIS Implementation Fair San.
OCLC Online Computer Library Center Preservation Metadata Standards PREMIS & METS Taylor Surface, OCLC.
PREMIS Implementation Fair, San Francisco, CA October 7, Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich Knowledge.
Selene Dalecky March 20, 2007 FDsys: GPO’s Digital Content System.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
Interoperability and Collection of Preservation Metadata for Digital Repository Content Matt Cordial, Tom Habing, Bill Ingram, Robert Manaster University.
Some OAIS Concepts ICPSR Conforming to OAIS 1. Fulfill 6 OAIS Responsibilities 2. Conform to the OAIS Information Model.
The OAIS Reference Model Michael Day, Digital Curation Centre UKOLN, University of Bath Reference Models meeting,
GPO’s Future Digital System (FDsys) November 2, 2006 LS&CM CENDI Presentation.
DAITSS and the Florida Digital Archive Priscilla Caplan Florida Center for Library Automation iPRES 2006.
DSpace An Open Source Dynamic Digital Repository Xizi (Cecilia) Cai IS565 Spring 2013 DL Topic Presentation.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
The OAIS model SEEDS meeting May 5 th, 2015, Lausanne Bojana Tasic.
Cedars work on metadata Michael Day UKOLN, University of Bath Cedars Workshop Manchester, February 2002.
OAIS (archive) Producer Management Consumer. Representation Information Data Object Information Object Interpreted using its Yields.
Developing a Dark Archive for OJS Journals Yu-Hung Lin, Metadata Librarian for Continuing Resources, Scholarship and Data Rutgers University 1 10/7/2015.
Preserving Digital Collections
US GPO AIP Independence Test
DAITSS: Dark Archive in the Sunshine State
DAITSS and the Florida Digital Archive
Exercise: understanding authenticity evidence
Implementing an Institutional Repository: Part II
An Open Archival Repository System for UT Austin
Open Archival Information System
Robin Dale RLG OAIS Functionality Robin Dale RLG
The Reference Model for an Open Archival Information System (OAIS)
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Presentation transcript:

US GPO AIP Independence Test CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ Abbott GPO contact: Kate Zwaard

Overview  Background OAIS FDsys Project Objectives  AIP METS, MODS, and PREMIS  Solution Strategy XML parsing A note on deliverables Repositories Testing  Conclusion

OAIS Open Archival Information System  “An OAIS is an archive consisting of an organization of people and systems that has accepted the responsibility to preserve information and make it available for a Designated Community”  Developed by the Consultive Committee on Space Data Systems (ISO 14721:2003)

FDsys Federal Digital System  FDsys – Am OAIS maintained by the U.S. Government Printing Office to provide public access to information submitted by Congress and Federal agencies.

OAIS Primary Functions  Ingest – Turn SIPs into AIPs  Archival Storage – Storage and retrieval of AIPs  Data Management – Populating, maintaining and accessing the varieties of information  Administration – Controls day to day operations  Preservation Planning – Maintaining archive accessibility  Access – Functions for access of archive

Information Package - critical component of OAIS  The information package is a conceptual linking of content information with its preservation description and packaging information.  Three kinds of information packages (before, after, and during ingestion) SIP – Submission Information Package AIP – Archive Information Package DIP – Distribution Information Package

Composition of an AIP  Archival Information Package Defines how digital objects and its associated metadata are packaged using XML based files.  METS (binding file)  MODS  PREMIS

Project Objective: Prove AIP Independence  An AIP is independent if, in the event of catastrophic and irretrievable loss or damage of the content management system, a knowledgeable user can still make sense of the data.

Project Objectives  This project simulates FDsys breaking down due to some catastrophic attack or error.  We are attempting to categorize and reconstruct an amount of sample data from FDsys outside the context of the actual CMS. The only references we have available, other than the actual files in the archive, are publicly defined standards.  It is our hope that this project will help GPO improve the robustness of their file system.

AIP: METS  Schema  XML file format  Seven major sections

AIP: METS Schema  5 Major Sections 1) METS Header METS Header 2) Descriptive Metadata Descriptive Metadata 3) Administrative Metadata Administrative Metadata 4) File Section File Section 5) Structural Map Structural Map

AIP: MODS  Descriptive metadata  Extension to METS  Top-level elements Mandatory Recommended Optional

AIP: MODS

AIP: PREMIS  Preservation metadata  Extension to METS  PREMIS Data Model Intellectual Entity Object Entity Event Entity Agent Entity Rights Entity*

AIP: PREMIS

Solution Strategy  The data we have received are AIPs, not SIPs. Repository software can only ingest SIPs. We must therefore write scripts to parse the AIPs in such a way to construct SIPs from an arbitrary file structure, and then ingest those SIPs into a repository software in order to create new AIPs for the same information.

XML Parsing  We plan to use the Java programming language for our scripting needs. The Java API for XML Processing (JAXP) is the standard Java library for parsing XML  It provides several different possible representations for XML  After being rendered human-readable, the AIP files will need to be converted into a new SIP schema of our own design, which would only describe information that still appears relevant.

XML Parsing Example  This is a portion of a sample FDsys MODS file that summarizes a bill in Congress: BILLS To increase Federal Pell Grants for the children of fallen public safety officers, and for other purposes.;Officer Daniel Faulkner Children of Fallen Heroes Scholarship Act of 2010;S (IS) Bills and Statutes 111_cong _bills legislative

XML Parsing Example  We might expect this type of output once properly parsed: Collection code: “BILLS” Search title: “To increase Federal Pell Grants for the children of fallen public safety officers, and for other purposes.;Officer Daniel Faulkner Children of Fallen Heroes Scholarship Act of 2010;S (IS)” Category: “Bills and Statutes” WAIS database name: “111_cong_bills” Branch: legislative Date ingested:

A Note on Deliverables  Because our aim is not to design software, this is not a typical computer science design project. Instead, we are conducting coded experimental tests on real data and forming conclusions based on the results.  Deliverables will most likely include: a written report of our findings and recommendations a reorganized version of the input data

Testing  After parsing and organizing the data, it will be important to perform checks to ensure that the reconstruction is accurate. We may send a preliminary report to GPO for verification.  The exact testing procedure is still undefined, as we haven’t had a chance to investigate the data in depth yet. Our goals should be clearer once we understand exactly what type of data we are dealing with.

Repositories  Third party repository software to ingest created SIPs.  DSpace, Fedora Commons (Duraspace)  Based on a few simple technologies: Java MySQL Apache Tomcat JavaScript Server

Conclusion  Our thanks to Kate, Dr. Abbott, and Dr. Pamula for their support.