Preservation and Archiving Special Interest Group Spring Meeting San Francisco, 27-29 May 2008 Preservation Characterization Stephen Abrams California.

Slides:



Advertisements
Similar presentations
Research Data Access and Preservation Summit Panel 2 - Promoting Re-Use of Scientific Collections Some responses to the questions posed... John Harrison.
Advertisements

Panel 2 – Promoting Re-Use of Scientific Collections John Harrison SHAMAN Project University of Liverpool
Characterisation Adrian Brown The National Archives, UK.
Introduction to Planets Hans Hofman Nationaal Archief Netherlands Prague, 17 October 2008.
Theo Andrew, Edinburgh University Library Choosing Suitable Open-Source Repository Software Choosing Suitable Open Source Repository Software Theo Andrew.
Introduction to the BinX Library eDIKT project team Ted Wen Robert Carroll
An Operational Metadata Framework For Searching, Indexing, and Retrieving Distributed GIServices on the Internet By Ming-Hsiang.
Introduction to Kuali Rice ITANA Screen2Screen: Kuali on Campus May 2009 Eric Westfall – Kuali Rice Project Manager.
More Better Metadata SAA 2014 Panel: Metadata and Digital Preservation: How Much Do We Really Need? Andrea Goethals, Harvard Library Even v.
A Very Brief Introduction to iRODS
Funded by: © AHDS Sherpa DP – a Technical Architecture for a Disaggregated Preservation Service Mark Hedges Arts and Humanities Data Service King’s College.
Depositing e-material to The National Library of Sweden.
JHOVE2 A Next-Generation Architecture for Format-Aware Preservation Processing Stephen Abrams Harvard University Evan Owens Portico Tom Cramer Stanford.
Preservation Metadata Extraction and Collection : Tools and Techniques Mat Black National Library of New Zealand Te Puna Matauranga o Aotearoa.
H a r v a r d U n i v e r s i t y L i b r a r y Global Digital Format Registry An Update July 2006.
Next Generation Node (NGN) Technical Overview April 2007.
SOAPI: a flexible toolkit for implementing ingest and preservation workflows Mark Hedges Centre for e-Research, King’s College London Arts and Humanities.
WMES3103 : INFORMATION RETRIEVAL
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
© 2010 Microsoft Corporation. All rights reserved. Quality Assurance: Towards Tools for Characterizing and Comparing Digital Documents Natasa Milic-Frayling.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
A Framework for Distributed Preservation Workflows Rainer Schmidt AIT Austrian Institute of Technology iPres 2009, Oct. 5, San.
Angelika Menne-Haritz The MEX editor - METS and the presentation of digitised archives The MEX editor: METS and the Internet presentation of.
Use of METS in CDL Digital Special Collections Brian Tingle.
An Overview of Selected ISO Standards Applicable to Digital Archives Science Archives in the 21st Century 25 April 2007 Donald Sawyer - NASA/GSFC/NSSDC.
Open source administration software for education software development simplified KRAD Kuali Application Development Framework.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse 2.
ETD Repositories Using DSpace Software Andrew Penman The Robert Gordon University 27 th September 2004.
2005 Adobe Systems Incorporated. All Rights Reserved. 1 Ontolog Forum Gunar Penikis Sr. Product Manager Adobe Systems.
Statewide Digitization and the FCLA Digital Archive Priscilla Caplan, Florida Center for Library Automation Statewide Digitization Planners Meeting OCLC,
Web-based workflow software to support book digitization and dissemination The Mounting Books project books.northwestern.edu Open Repositories 2009 Meeting,
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.
FITS: The File Information Tool Set
Repositories collect lots of technical metadata, but lack tools to use it to better understand the objects in their care, and to apply it precisely in.
Eric Westfall – Indiana University Jeremy Hanson – Iowa State University Building Applications with the KNS.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Universität Innsbruck Leopold Franzens  Copyright 2007 DERI Innsbruck EASAIER 18 Month Coordination Meeting, Tel Aviv, Israel WP 2 – Media.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
© 2006 IBM Corporation IBM WebSphere Portlet Factory Architecture.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
File format registries - a global infrastructure for local persistence Andreas Aschenbrenner, ERPANET.
JH VE 2 The Fifth International Conference on Preservation of Digital Objects British Library, September 2008 What? So What? The Next-Generation.
Building Applications with the KNS. The History of the KNS KFS spent a large amount of development time up front, using the best talent from each of the.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Selected Topics in Software Engineering - Distributed Software Development.
The FCLA Digital Archive Joint Meeting of CSUL Committees, 2005.
The Statistics New Zealand Prototype PREMIS creation tool Euan Cochrane PREMIS Fair October 2009
Small steps and lasting impact: making a start with preservation or It’s not all NASA Patricia Sleeman Digital Archives and Repositories University of.
Conceptual Data Modelling for Digital Preservation Planets and PREMIS Angela Dappert.
The State of PREMIS Brian Lavoie Research Scientist OCLC PREMIS Implementation Fair San Francisco, CA October 7, 2009.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
ModelPedia Model Driven Engineering Graphical User Interfaces for Web 2.0 Sites Centro de Informática – CIn/UFPe ORCAS Group Eclipse GMF Fábio M. Pereira.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
The New DRS Introduction. What is DRS? Digital repository for preservation and access – Maintains integrity of deposited content – Preserves content for.
Identity Management Issues and Needs Grace Agnew, Rutgers University Libraries.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
JH VE 2 JHOVE2 A Next-Generation Architecture for Format-Aware Characterization British Library, 1 October 2008 Stephen Abrams California Digital Library.
JH VE 2 Digital Library Federation Fall Forum Providence, November 12-14, 2008 JH VE 2 Needs Assessment and Functional Requirements Stephen Abrams California.
Preservation Metadata Initiatives: Status and Direction Brian Lavoie Senior Research Scientist Office of Research OCLC Archiving Web Resources Canberra.
Towards Unifying Vector and Raster Data Models for Hybrid Spatial Regions Philip Dougherty.
Utilizing the Benefits of Native XML Database Technologies Alan Cornish Systems Librarian Washington State University Libraries.
Infrastructure and Workflow for the Formal Evaluation of Semantic Search Technologies Stuart N. Wrigley 1, Raúl García-Castro 2 and Cassia Trojahn 3 1.
Joint Meeting of CSUL Committees,
DAITSS and the Florida Digital Archive
An Introduction to Tessella and The Safety Deposit Box Platform
Policy-Based Data Management integrated Rule Oriented Data System
Statewide Digitization and the FCLA Digital Archive
Andrea Goethals, Harvard Library
Malte Dreyer – Matthias Razum
MDT OCL 1.3 Mini-deck June 10, 2009.
Presentation transcript:

Preservation and Archiving Special Interest Group Spring Meeting San Francisco, May 2008 Preservation Characterization Stephen Abrams California Digital Library

Characterization /ker-ik-t(ə-)rə-zā'-shən/ noun 1. The action or result of characterizing. 2. Description of characteristics or essential features.

Characterization Knowing what you have, as a stable starting point for iterative preservation analysis, planning, and action Adopted from A. Brown, “Developing Practical Approaches to Active Preservation,” IJDC 1:2 (June 2007).

What? So what? What do you have? –Identification –Feature extraction –Conformance What should you do with what you have? –Assessment

Ingest workflow

Migration workflow

Two approaches to characterization Implicit –Custom grammars defining a single format processed by a generic engine that understands all grammars Unix file National Archives (UK) DROID Open Grid Forum DFDL Planets XCEL/XCDL Explicit –Plug-in framework with custom modules that each understand a single format NLNZ Metadata Extractor JHOVE

Why choose one over the over? Implicit –ProMore sustainable in the long term –ConIs the formal notation rich enough to capture all nuances of formats of interest? Explicit –ProIt’s just programming –ConIt’s more programming

JH VE Extensible framework for format identification, validation, and characterization –Pluggable format-specific modules for: GIF, JPEG, JPEG 2000, TIFF AIFF, WAVE ASCII, HTML, UTF-8, XML PDF –GUI, command-line, and Java API Collaborative project of Harvard University and the JSTOR Electronic-Archive Initiative –Funded by Andrew W. Mellon Foundation –GNU LGPL license

JH VE 2 A next generation architecture for format-aware preservation processing –Three-fold goals: Re-factor the existing architecture to achieve higher performance, simplify system integration, and encourage third-party enhancement Provide significant new function (Re-) Implement modules Collaborative project of CDL, Portico, and Stanford University –Funded by Library of Congress/NDIIPP –Open source BSD license

JH VE 2 enhancements JHOVE assumed 1 object = 1 file = 1 format But what about… –TIFF with embedded ICC profile and XMP metadata 1 object = 1 file = 3 formats –JPEG 2000 JPX fragmentation 1 object = n files = 1 format –ESRI Shapefile 1 object = 3 files = 3 formats JHOVE2 will support 1 object = n files = m formats

JH VE 2 enhancements Generic plug-in interface Configurable set of modules iteratively invoked against each object Inter-module memory structure for stateful processing Identification de-coupled from conformance Standardized handling of format profiles and error reporting Configurable conformance criteria API level support for limited editing

JH VE 2 modules Identification Feature extraction and conformance for: –GIF, JPEG, JPEG 2000, TIFF –AIFF, WAVE –ASCII, HTML, SGML, UTF-8, XML –PDF –Shapefile –ICC Symbolic display of selected binary formats Assessment based on prior characterization and locally- defined policy rules and heuristics

JH VE 2 modules Identification Feature extraction and conformance for: –GIF, JPEG, JPEG 2000, TIFF –AIFF, WAVE –ASCII, HTML, SGML, UTF-8, XML –PDF –Shapefile –ICC Symbolic display of selected binary formats Assessment based on prior characterization and locally- defined policy rules and heuristics

JH VE 2 modules Identification Feature extraction and conformance for: – JPEG 2000, TIFF – WAVE –ASCII, SGML, UTF-8, XML –PDF –Shapefile –ICC Symbolic display of selected binary formats Assessment based on prior characterization and locally- defined policy rules and heuristics

JH VE 2 data abstraction Determine the “natural” conceptual structures of a format and their component attributes –Each such structure maps to a class with methods for parsing, validating, reporting, and serializing –Each such attribute maps to a field with accessor and mutator methods UTF-8  Character TIFF  IFH and IFD JPEG 2000  Box PDF  boolean, number, string, name, array, dictionary, stream, and null

JH VE 2 timeline Months 1-6 Outreach, design, and prototyping Months 7-9 Core APIs and framework Months Modules

For more information… droid.sourceforge.net forge.gridforum.org/projects/dfdl-wg hki.uni-koeln.de/planets/ meta-extractor.sourceforge.net hul.harvard.edu/jhove