National Geospatial Digital Archive Greg Janée University of California at Santa Barbara.

Slides:



Advertisements
Similar presentations
Research Data Access and Preservation Summit Panel 2 - Promoting Re-Use of Scientific Collections Some responses to the questions posed... John Harrison.
Advertisements

Panel 2 – Promoting Re-Use of Scientific Collections John Harrison SHAMAN Project University of Liverpool
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
National Geospatial Digital Archive Greg Janée. Greg Janée May 31, Outline Two preservation misadventures Digital preservation problems Genesis.
Long-term Preservation as a Relay Greg Janée University of California at Santa Barbara.
PREMIS Implementation Fair San Francisco, CA, October Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich Knowledge.
Long-Term Preservation of Astronomical Research Results Robert Hanisch US National Virtual Observatory Space Telescope Science Institute Baltimore, MD.
Brief Overview of Major Enhancements to PAWN. Producer – Archive Workflow Network (PAWN) Distributed and secure ingestion of digital objects into the.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Architecture & Data Management of XML-Based Digital Video Library System Jacky C.K. Ma Michael R. Lyu.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
Archival Prototypes and Lessons Learned Mike Smorul UMIACS.
NDIIPP and NGDA National Preservation Network For Digital Content.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
An Overview of Selected ISO Standards Applicable to Digital Archives Science Archives in the 21st Century 25 April 2007 Donald Sawyer - NASA/GSFC/NSSDC.
Digital Object: A Virtual Online Storage Solution 598C Course Project Huajing Li.
Metadata (for the data users downstream) RFC GIS Workshop July 2007 NOAA/NESDIS/NGDC Documentation.
Catherine Masi, National Geospatial Digital Archive May 16, 2005 NGDA Format Registry  Why do we need a FR? We are designing with long-term storage in.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
ESRI User Conference, August 8, 2006 Long-term archiving of geospatial data: the NGDA project Julie Sweetkind-Singer John Banning Stanford University.
Metadata Handling in the North Carolina Geospatial Data Project (NCGDAP) NCSU Libraries Steve Morris Head of Digital Library Initiatives Rob Farrell Geospatial.
National Digital Information Infrastructure and Preservation Program (NDIIPP) CNI Project Briefing December 5, 2005.
File format registries - a global infrastructure for local persistence Andreas Aschenbrenner, ERPANET.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
National Geospatial Digital Archive Greg Janée University of California at Santa Barbara.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
Greg Janée topics Fedora NGDA project activities Two study ideas MODIS Preservation as series-of-handoffs.
PREMIS Implementation Fair, San Francisco, CA October 7, Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich Knowledge.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
NGDA Architecture Update Greg Janée. Greg Janée May 16, Three motivations Archival has to be cheap & easy –little incentive –no funding Need to.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
National Geospatial Digital Archive Greg Janée UC Santa Barbara.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
M-1 INGEST OVERVIEW Don Sawyer National Space Science Data Center NASA/GSFC October 13, 1999.
DSpace System Architecture 11 July 2002 DSpace System Architecture.
Storage Why is storage an issue? Space requirements Persistence Accessibility Needs depend on purpose of storage Capture/encoding Access/delivery Preservation.
Software Reuse Course: # The Johns-Hopkins University Montgomery County Campus Fall 2000 Session 4 Lecture # 3 - September 28, 2004.
HDF and HDF-EOS: Implications for Long-Term Archiving and Data Access.
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Preservation Functionality in a Digital Archive Erik Oltmans Koninklijke Bibliotheek Raymond J. van Diessen IBM Business Consulting Services Hilde van.
Alexandria Digital Library The ADL Testbed Greg Janée
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
International Planetary Data Alliance Registry Project Update September 16, 2011.
OAIS (archive) OAIS (archive) Producer Management Consumer.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
IPDA Registry Definitions Project Dan Crichton Pedro Osuna Alain Sarkissian.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Alexandria Digital Library ADL Metadata Architecture Greg Janée.
Metadata Issues in Long-term Management of Data and Metadata
OAIS Producer (archive) Consumer Management
Joseph JaJa, Mike Smorul, and Sangchul Song
Flexible Extensible Digital Object Repository Architecture
GSAF Grid Storage Access Framework
Flexible Extensible Digital Object Repository Architecture
The Re3gistry software and the INSPIRE Registry
CNI Project Briefing December 5, 2005
Presentation transcript:

National Geospatial Digital Archive Greg Janée University of California at Santa Barbara

Greg Janée Edinburgh workshop Overview One of 8 NDIIPP projects funded by Library of Congress –joint project with Stanford University Goal: long-term, wide-scale preservation of geospatial data Preservation architecture & prototype archive –single-digit terabytes –CaSIL: GIS datasets, remote-sensing imagery, aerial photography –Rumsey collection: scanned maps

Greg Janée Edinburgh workshop Common starting hypothesis recent content now take action now years

Greg Janée Edinburgh workshop NGDA starting hypothesis old content now - 50now + 50now take action content ancient content “mid-century perspective”

Greg Janée Edinburgh workshop Mid-century perspective Repeated migrations across storage media and storage systems –past and future Repeated migrations across archive management systems –each possibly necessitating transformation and reorganization of archived content Repeated handoffs between institutions –each implementing different policies

Greg Janée Edinburgh workshop Mid-century perspective Migrations/handoffs may occur asynchronously –different evolution rates, pressures Ability to interpret archived data may change and deteriorate Information value, resource levels change over time –need an ultra-low cost, “fallback” preservation mode

Greg Janée Edinburgh workshop NGDA architecture goals Facilitate migration at all levels –separate levels to accommodate asynchronicity Provide fallback mode –for individual objects and entire archives Capture semantics Cheap & easy –or preservation can’t be large-scale

Greg Janée Edinburgh workshop Semantics Def: knowledge needed to interpret and use information that is not shared by the target user community Simple documents: –descriptive metadata, format specification sufficient Remote sensing imagery –data interpretation, usage, processing, calibration –in practice, such semantics are handled separately Climate data records –require periodic reprocessing

Greg Janée Edinburgh workshop Ozone reprocessing requirements xDRs Delivered IPs Engineering data (incl. C3S data if not in RDRs) Upload files Databases Software (source code) Calibration artifacts –data –analysis tools –tables –logs –notebooks –instrument design All project documentation All scientific papers All reports * Courtesy of Mike Linda, NASA GSFC; from 2006 NOAA CLASS workshop

Greg Janée Edinburgh workshop NGDA architecture archive server builds and validates archival objects; associates objects with semantics archive server builds and validates archival objects; associates objects with semantics reliable storage subsystem Archivas cluster reliable storage subsystem Archivas cluster format registry maintains directory of formats; stores specification documents; models inter-format relationships format registry maintains directory of formats; stores specification documents; models inter-format relationships registry wiki supports collaborative management of format registry registry wiki supports collaborative management of format registry ingest crawler crawls provider content; maps content to archival objects; maintains identifier associations ingest crawler crawls provider content; maps content to archival objects; maintains identifier associations ADL provides spatiotemporal, other types of search; integrated OAI server ADL provides spatiotemporal, other types of search; integrated OAI server NGDA archive data model defines uniform, self-contained representation of archival objects, object semantics, and inter-object relationships storage API abstracts storage subsystem webview crawlable, HTML view of archive webview crawlable, HTML view of archive ADL mapper maps archival objects to ADL items ADL mapper maps archival objects to ADL items 1 SII “single item ingest”; archive management SII “single item ingest”; archive management ingestaccess export

Greg Janée Edinburgh workshop Federation interaction points 1.Format registry… provides a central place for data providers to describe file semantics, and for archives and end users to reference those semantics. 2.Ingest services and tools… allow data providers to transfer content into an archive. 3.Access services… allow end users to search for and use content across the entire federation, and allow third parties to provide value-added access services. 4.Archive data model… defines a uniform representation of archive content; archives that implement or map to the data model can employ NGDA tools to provide access and export services. 5.Export function… transfers archive content in bulk to other archives for replication and migration purposes; ancillary object semantics are automatically included.

Greg Janée Edinburgh workshop Storage system requirements Req’s: –associate UUIDs/RIDs with bitstreams –retrieve global/local bitstream by UUID/RID –determine (parent) UUID of any bitstream –list all UUIDs Satisfied by: –any filesystem –any kind of UUIDs tag:library.ucsb.edu,2005:identifier

Greg Janée Edinburgh workshop Data model Physical implementation of OAIS logical model –filesystem –files and directories identified by UUIDs –XML manifests Organizing principle: archival object –individually reusable unit of information –groups metadata, data, derivatives, etc. Inter-object relationships –semantic definitions –lineage –collections and other aggregations

Greg Janée Edinburgh workshop Archival objects manifest UUID component RID UUID

Greg Janée Edinburgh workshop Towards a more layered architecture providersusers archive

Greg Janée Edinburgh workshop Towards a more layered architecture storage virtualization layer provides structure-neutral storage interoperability between archival, working storage implements storage policies storage virtualization layer provides structure-neutral storage interoperability between archival, working storage implements storage policies archive object layer defines standard structuring of content maintains persistent associations to semantics archive object layer defines standard structuring of content maintains persistent associations to semantics archive asserts control defines policy

Greg Janée Edinburgh workshop Questions?