Preservation Seminar 8 Jan 2007 1 CASPAR: Long term preservation of digitally encoded information David Giaretta.

Slides:



Advertisements
Similar presentations
doi> Digital Object Identifier: overview
Advertisements

CASPAR Validation. Metrics CASPAR Approach Representation Information (RepInfo) RepInfo Networks and their maintenance.
A centre of expertise in data curation and preservation CETIS MDR SIG::28 June 2006::University of Bath Funded by: This work is licensed under the Creative.
Long-Term Preservation. Technical Approaches to Long-Term Preservation the challenge is to interpret formats a similar development: sound carriers From.
Software change management
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
DigCCurr 2007: What digital curators do and what they need to know The CASPAR view on: What digital curators do and what they need to know : Research Perspectives.
Project Overview APA Conference 2012 ESA/ESRIN (Frascati), 6-7 November 2012 D. Giaretta (APA)
CODATA 2006, Beijing, China Oct CASPAR: Early results and future goals David Giaretta.
SCIDIP-ES services and toolkits David Giaretta. Preserving digitally encoded information Ensure that digitally encoded information are understandable.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
SCIDIP-ES Components Oct ,Brussels. Basic Preservation Strategies Often stated as: “Emulate or Migrate” OAIS concepts change these to: Add Representation.
Mark Evans, Tessella Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22 nd May 2013 PREMIS Practical Strategies For Preservation Metadata.
Common Use Cases for Preservation Metadata Deborah Woodyard-Robinson Digital Preservation Consultant Long-term Repositories:
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
Future Access to the Scientific and Cultural Heritage – A shared Responsibility Birte Christensen-Dalsgaard State and University Library.
DCS Architecture Bob Krzaczek. Key Design Requirement Distilled from the DCS Mission statement and the results of the Conceptual Design Review (June 1999):
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network Co-ordinated by aparsen.eu #APARSEN.
Course Instructor: Aisha Azeem
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 5 Slide 1 Requirements engineering l The process of establishing the services that the.
Who is doing a good job in digital preservation? Audit and Certification of Digital Repositories: ISO and the European Framework.
Different approaches to digital preservation Hilde van Wijngaarden Digital Preservation Officer Koninklijke Bibliotheek/ National Library of the Netherlands.
David Giaretta Associate Director (Development) Funders: DCC Development Digital Curation Centre a centre of expertise in data curation and preservation.
SEMINAR ON :. ORGANISATION Organizations are formal social units devoted to attainment of specific goals. Organizations use certain resources to produce.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Science Archives in the 21st Century 25/26 April Towards an International standard for Audit and Certification of Digital Repositories David Giaretta.
DCS Overview MCS/DCS Technical Interchange Meeting August, 2000.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 6 Slide 1 Requirements Engineering Processes l Processes used to discover, analyse and.
An Overview of MPEG-21 Cory McKay. Introduction Built on top of MPEG-4 and MPEG-7 standards Much more than just an audiovisual standard Meant to be a.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
APARSEN Metadata for preservation, curation and interoperability Workshop on Research Metadata in Context 7-8 Sept 2010, Nijmegen David Giaretta APA and.
Digital Preservation 101, or, How to Keep Bits for Centuries Julie C. Swierczek Digital Asset Manager and Digital Archivist Harvard Art Museums.
Configuration Management (CM)
OAIS Open Archival Information System. “Content creators, systems developers, custodians, and future users are all potential stakeholders in the preservation.
CASPAR Cultural, Artistic and Scientific knowledge for Preservation Access and Retrieval.
Reference Model for an Open Archival Information System (OAIS) ESIP Summer Meeting John Garrett – ADNET Systems at NASA/GSFC ESIP Summer Meeting.
© DATAMAT S.p.A. – Giuseppe Avellino, Stefano Beco, Barbara Cantalupo, Andrea Cavallini A Semantic Workflow Authoring Tool for Programming Grids.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
CASPAR Framework and Lessons Learned David Giaretta.
XML Web Services Architecture Siddharth Ruchandani CS 6362 – SW Architecture & Design Summer /11/05.
Fundamental Programming: Fundamental Programming K.Chinnasarn, Ph.D.
The european ITM Task Force data structure F. Imbeaux.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
ESA UNCLASSIFIED – For Official Use Data Stewardship Interest Group WGISS-40 Meeting Preservation of SW & Documents at CEOS Agencies Approaches and Lessons.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
The OAIS Reference Model Michael Day, Digital Curation Centre UKOLN, University of Bath Reference Models meeting,
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Data Preservation at Rutherford Lab David Corney 9 th July 2010 KEK.
Data Management and Digital Preservation Carly Dearborn, MSIS Digital Preservation & Electronic Records Archivist
BNSC Agency Report David Giaretta Colorado Springs 16 Jan 2007.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
IPDA Registry Definitions Project Dan Crichton Pedro Osuna Alain Sarkissian.
PV 2009, ESAC, Spain, 1-3 Dec Long term data and knowledge preservation for the Earth Sciences Archive S. ALBANI (ESA) D. Giaretta (STFC) PV 2009.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network aparsen.eu #APARSEN Options.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
DP Knowhow: Open Archival Information Systems (OAIS) in ISO APA/C-DAC International Conference on Digital Preservation and the Development of Trusted.
An Approach to Software Preservation
Dependency Management
David Giaretta Colorado Springs 16 Jan 2007
CASPAR Cultural, Artistic and Scientific knowledge for Preservation Access and Retrieval.
Active Data Management in Space 20m DG
Outline Pursue Interoperability: Digital Libraries
An Open Archival Repository System for UT Austin
Chapter 5 Architectural Design.
Presentation transcript:

Preservation Seminar 8 Jan CASPAR: Long term preservation of digitally encoded information David Giaretta

Preservation Seminar 8 Jan CASPAR aims Produce tools and techniques to support digital preservation and make it easier to share the cost –must be relatively easy to use –must have a low “buy-in” in terms of effort required for adoption –must avoid requiring wholesale change of everyone else’s systems –must be decentralised and reproducible so that it can live on after the formal end of the CASPAR project –must be “preservable” –must be open: open source, open standards Cannot do everything but should do something broadly useful Working closely with the UK Digital Curation Centre –

Preservation Seminar 8 Jan Digital Preservation… Easy to do… …as long as you can provide money forever Easy to test claims about tools… …as long as you live a long time

Preservation Seminar 8 Jan Validation Demonstrate theoretical basis “Accelerated lifetime” tests –Changes in hardware –Changes in environment –Changes in Designated Community Demonstrate increased trustworthiness –Measured using draft Certification Standard

Preservation Seminar 8 Jan Digital Preservation Need to preserve information & knowledge – not just “the bits” –Documents, videos are rendered – simple? –Data – must be processed – in new ways - harder Need to manage knowledge to keep archives alive through time –Preservation is a process, not a one-time event –Preservation is expensive – costs need to be shared The alternative is money – endless supplies of money Open Archival Information Systems Reference Model (ISO 14721) provides a general conceptual framework (

Preservation Seminar 8 Jan Disincentives for preservation: cost Money Time Budget available If cost of preserving old information increases… Need to show that costs are contained

Preservation Seminar 8 Jan Immediate benefits of Digital Preservation: Use of Unfamiliar Data Global Cyber-Infrastructures allow users to find and try to use data from many sources –Some sources will be familiar –Most available sources will be unfamiliar How can one be sure that the unfamiliar data is used correctly Garbage in – garbage out Need to be able to deal with unfamiliar data whether it is contemporary or old (preserved)

Preservation Seminar 8 Jan OAIS Reference Model ISO : Reference Model for an Open Archival Information Systems (OAIS). An OAIS is an archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community. Long Term Preservation: The act of maintaining information, in a correct and Independently Understandable form, over the Long Term. Long Term is long enough to be concerned with the impacts of changing technologies, including support for new media and data formats, or with a changing user community. Designated Community: An identified group of potential Consumers who should be able to understand a particular set of information. The Designated Community may be composed of multiple user communities. Has sufficient documentation to allow the information to be understood and used by the Designated Community without having to resort to special resources not widely available, including named individuals. OASISOAI XX

Preservation Seminar 8 Jan OAIS Reference Model – Functional Model

Preservation Seminar 8 Jan OAIS Information Model Information Object Representation Information 1+ interpreted using 1+ Data Object interpreted using Physical Object Digital Object Bit Sequence 1+ Recursion ends at KNOWLEDGEBASE of the DESIGNATED COMMUNITY (this knowledge will change over time and region)

Preservation Seminar 8 Jan Rep.Info. Classification

Preservation Seminar 8 Jan FITS FILE FITS STANDARD PDF STANDARD FITS JAVA s/w JAVA VM PDF s/w FITS DICTIONARY SPECIFICATION UNICODE SPECIFICATION XML SPECIFICATION

Preservation Seminar 8 Jan Representation Information The Data Object is “interpreted using” the Representation Information (RepInfo) The Reference Model is designed to ensure that an OAIS is not set the impossible task of having to provide all possible RepInfo immediately Hence: –Take account of the Designated Community and its associated Knowledge Base The amount of RepInfo is not fixed –Additional RepInfo will be needed over time

Preservation Seminar 8 Jan Early Results High level architecture for sharing cost and access to Representation Information Detailed examinations of specific datasets to understand what is really needed to keep them understandable and usable

Preservation Seminar 8 Jan Rep. Info. Use and maintenance

Preservation Seminar 8 Jan Registry for Representation Info The Digital Object could have RepInfo packed with it, as well as CPID Support automated access & processing 1 – User gets data from archive. Data has associated Curation Persistent Identifier (CPID) 2 2 – User unfamiliar with data so requests Rep.Info.using CPID – User receives Rep.Info – which has its own CPID in case it is not immediately usable

Preservation Seminar 8 Jan CASPAR information flow architecture Rep Info

Preservation Seminar 8 Jan CASPAR Testbeds Three testbeds –Cultural: UNESCO –Performing Arts: INA, IRCAM –Scientific: ESA and CCLRC Complex, multi-source, multifaceted data Many common preservation & evaluation & validation issues Some specific requirements on preservation (technical, delivery, legal) –Specific user communities/ Knowledge bases Also test the OAIS model

Preservation Seminar 8 Jan Science: CCLRC example World map of ionosondes

Preservation Seminar 8 Jan Laser facility produces Binary data normally used by proprietary software Describe using EAST data description language Use in generic application (shown here) to display/process Example of use of RepInfo

Preservation Seminar 8 Jan Some Issues Difficult to derive physical quantities from data –Can be analysed in multiple ways –Raises fundamental questions about Representation Information Common automated method is proprietary –Data structure also proprietary –Paper documentation - restricted access Provenance and trust

Preservation Seminar 8 Jan ESA example GOME Global Ozone Monitoring Instrument on ERS-2

Preservation Seminar 8 Jan GOME data processing

Preservation Seminar 8 Jan GOME Level 4 product: Integration of GOME, other data and models GOME Level 3 product: Integration of time and space data GOME Level 2 product: Ozone profile at given location

Preservation Seminar 8 Jan Some Issues Provenance and Context of processed data relationship to Representation Information of raw data and Knowledge base of Designated Community

Preservation Seminar 8 Jan UNESCO examples DATA: Scanned documents and maps Aerial and close range photography (Digital photogrammetry) Monument measurements (Laser scanning) Satellite images (Remote sensing and image processing) Multi-scale digital cartography (Geographic information systems (GIS) and CAD) 3D models, virtual tours (Computer visualization) Mandatory Documentation: Identification of property Description of property Justification of inscription State of conservation and factors affecting the property Protection and Management Monitoring Documentation Contact information of responsible authorities Signature on behalf of the State Party(ies) World Heritage List

Preservation Seminar 8 Jan Performing Arts examples Examples: Score MAX/MSP patches Additional instructions Figure 2: Preservation of interactive multimedia performances Motion Analysis and Recognition Motion- Multimedia Mapping Strategy Multimedia Generation GUI (For monitor & control) Motion Capture and Processing Motions 3D motion data Multimedia output Mapping Parameters

Preservation Seminar 8 Jan Some Issues What is Preservation of “performability”? –Composer’s intention Authenticity Proprietary software and hardware Copyright Digital Rights Management

Preservation Seminar 8 Jan Shared Infrastructure Registries of Representation Information Persistent Identifier name resolvers –DOI? ARK? URL? – none are guaranteed Interfaces – support preservation and interoperability Standards – Preservation Description Information –Fixity, Provenance, Reference, Context

Preservation Seminar 8 Jan Accreditation/Certification for repositories Long-standing demand for ability to measure Trustability of digital repositories Part of OAIS “roadmap” RLG/NARA working group –Version 1.0 Audit and Certification Checklist about to be released New open workgroup to produce ISO standard for Audit and Certification –See to join mailing listhttp://mailman.ccsds.org/cgi-bin/mailman/listinfo/moims-rac

Preservation Seminar 8 Jan Knowledge at the heart of preservation Knowledge driven approach Knowledge management to support long-term preservation of concepts/information including: –Single, complex, on demand, interactive objects –DRM –Authenticity –Access –Storage –Designated Community – descriptions Knowledge base definition ontologies

Preservation Seminar 8 Jan Possible Infrastructure Build-up European Preservation Infrastructure Task Force on Permanent Access Alliance Other Alliance Members CCLRC Curation Activities CASPAR Other CCLRC projects FP7 projects

Preservation Seminar 8 Jan WHEN Component architecture and prototypes by month 12 Framework architecture month 18 Component integration months Testbed implementations months Project completion month 42

Preservation Seminar 8 Jan

Preservation Seminar 8 Jan Conclusions Information and Knowledge – needs more than just storing the “bits” Understanding and being able to process the vast amount of unfamiliar data which is available is hard It is expensive –Costs must be shared So far the Open Archival Information Systems Reference Model provides conceptual framework –Many similarities can be exploited –Many subtleties need to be explored Watch this space

Preservation Seminar 8 Jan BACKUP SLIDES

Preservation Seminar 8 Jan Example RepInfo Label A Label is itself RepInfo. It provides a way to collect together in a sensible way lots of individual pieces of RepInfo

Preservation Seminar 8 Jan Re-using RepInfo Existing RepInfo can be used to build up further RepInfo –E.g. refer to existing RepInfo in labels

Preservation Seminar 8 Jan Versioning and LID Each object has a unique identifier Versions of an object share a “logical ID” (LID) Simply using the LID gives the latest version Can specify a particular version

Preservation Seminar 8 Jan Clients DCC Registry: –Web browser –Thick client ( Any Registry –Applications using API

Preservation Seminar 8 Jan GUI access to Registry

Preservation Seminar 8 Jan Classifications Many Classification Schemes Help to find RepInfo

Preservation Seminar 8 Jan Initial RepInfo Simple text –ASCII –Unicode –UTF7/8 PDF, Word(!) FITS format FITS standard dictionaries Things that are “MISSING”

Preservation Seminar 8 Jan RepInfo entry Simple command line tool

Preservation Seminar 8 Jan Creating Repinfo There are many tools which can be used to create RepInfo: –Simple text editor to create text describing the data –Complex tools to capture data description e.g. EAST (see next slides) DFDL etc –Programming languages of various sorts

Preservation Seminar 8 Jan EAST descriptions

Preservation Seminar 8 Jan Snapshot d ’écran OASIS OASIS tool for creating EAST descriptions

Preservation Seminar 8 Jan Example of EAST description

Preservation Seminar 8 Jan Using RepInfo A pointer to RepInfo can be attached to data The RepInfo can be used to –Display –Examine –Process –Re-use the data

Preservation Seminar 8 Jan Laser facility produces Binary data normally used by proprietary software Describe using EAST data description language Use in generic application (shown here) to display/process Example of use of RepInfo

Preservation Seminar 8 Jan Simple Buy-In Need to add RepInfo to your Data Objects? Does the RepInfo already exist? –Yes: get its ID and put that in a label –No: register what you have – be assigned an ID. Add more details later when needed Or others can add more details

Preservation Seminar 8 Jan Preservation Issues Given a file or a stream of bits how does one know what Representation Information is needed (this question applies to Representation Information itself as well as to the digital objects we are primarily interested in preserving and using); how does one know, for example, if this thing is in FITS format? Someone may simply “know” what it is and how to deal with it i.e. the bits are within the Knowledge Base One may be able to recognise the format by looking for various types of patterns. One may feed the bits into all available interpreters to see which accept the data as valid Other means…. The only safe way: have an associated label which points to the appropriate Representation Information –Note this does not exclude the other methods e.g. for data rescue

Preservation Seminar 8 Jan Example Label:

Preservation Seminar 8 Jan Access to Registry Send a letter? Phone? ? Read the Web page and copy the relevant information? Software Access? –URL –Web Service –Application?

Preservation Seminar 8 Jan Registries – software access Roll-your-own?

Preservation Seminar 8 Jan Lazy person’s Registry/Repository Use existing standards –UDDI No repository –ebXML Additional advantage: helps integration with the GRID

Preservation Seminar 8 Jan Registry/Repository access Interface and protocols – JAXR “standard” Can talk to UDDI and ebXML registries FreebXML implementation –many access methods URL, Web Services, API, Etc..

Preservation Seminar 8 Jan Persistent IDs Findability –Persistent IDs DOI, URN, ARK, PURL, etc What can we rely on? Don’t put all your eggs in one basket

Preservation Seminar 8 Jan Example e1fe9271-cd a63e-b112ebf792c / For example the ARK identifier is created by appending the string in "value" to that in the resolver of resolverType="ark".

Preservation Seminar 8 Jan Registry/ Repository (regrep) Has to be a trusted repository (of RepInfo) –Authenticity of RepInfo –Access control –Certificates/Digests : (are they trustable over the long term?) Extensibility Distributed –Share the effort Notification Service

Preservation Seminar 8 Jan Operating Registries See RegistryProcedures RegistryProcedures