A centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16 th October 2007 Funded by: This work is licensed.

Slides:



Advertisements
Similar presentations
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
Advertisements

A Tour of the OAIS Reference Model Brian Lavoie Research Scientist Office of Research OCLC Museum Computer Network Annual Conference September 2002.
DRIVER Building a worldwide scientific data repository infrastructure in support of scholarly communication 1 JISC/CNI Conference, Belfast, July.
DRIVER Long Term Preservation for Enhanced Publications in the DRIVER Infrastructure 1 WePreserve Workshop, October 2008 Dale Peters, Scientific Technical.
Curating Research: problems and policy Dale Peters Scientific Technical Manager DRIVER II.
Introduction to Planets Hans Hofman Nationaal Archief Netherlands Prague, 17 October 2008.
Platter Planning Tool For Trusted Electronic Repositories
The Reference Model for an Open Archival Information System (OAIS) Michael Day Digital Curation Centre UKOLN, University of Bath
A centre of expertise in data curation and preservation DCC/NeSC eScience Workshop, June 2008 Working in partnership with the eScience community This work.
… because good research needs good data DAF at KeepIt Digital preservation tools for repositories, 19/01/10, Southampton Funded by: This work is licensed.
© S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.
Digital Preservation: Logical and bit-stream preservation using Plato and Eprints Introduction: Digital Preservation Recap Hannes Kulovits Andreas Rauber.
Metadata for preservation: the Cedars perspective
Issues and approaches to preservation metadata Michael Day UKOLN: UK Office for Library and Information Networking University of Bath
The OAIS Reference Model: current implementations Michael Day, UKOLN, University of Bath Chinese-European Workshop.
The PREMIS Data Dictionary Michael Day Digital Curation Centre UKOLN, University of Bath JORUM, JISC and DCC.
A centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge23 May 2006 Funded by: This work is licensed under the Creative Commons.
A centre of expertise in digital information management UKOLN is supported by: Curating the Scientific Record: The Challenges Ahead Dr.
Joint Information Systems Committee Digital Library Services BL/JISC Workshop Rachel Bruce JISC Programme Director The Digital Library and its Services,
Preservation Metadata Initiatives: Practicality, Sustainability, and Interoperability Michael Day UKOLN, University of Bath ERPANET Training.
Federation The eCrystals Federation Dr Simon Coles, University of Southampton, UK Dr Liz Lyon, UKOLN, University of Bath, UK Open Repositories 2008, University.
A centre of expertise in digital information management UKOLN is supported by: UK Perspectives on the Curation and Preservation of Scientific.
Federation eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton,
A centre of expertise in digital information management UKOLN is supported by: Memory institutions and the social fabric of the Web Dr.
EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.
Collection-level description & the Information Landscape: users evaluate strategies for resource discovery Collection Description Focus Workshop 5 Cambridge,
A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.
A centre of expertise in data curation and preservation DCC Workshop: Curating sApril 24 – 25, 2006 Funded by: This work is licensed under the Creative.
A centre of expertise in data curation and preservation UKOLN Open ForumIWMW June 2006 Funded by: This work is licensed under the Creative Commons.
A centre of expertise in data curation and preservation London :: ARK Group Workshop: Archiving the Web :: 28 Sept 2006 Funded by: This work is licensed.
A centre of expertise in data curation and preservation National FoI Group Birmingham07 March 2007 Funded by: This work is licensed under the Creative.
A centre of expertise in data curation and preservation SoA Annual Conference::York::August 2008 Funded by: This work is licensed under the Creative Commons.
A centre of expertise in data curation and preservation CETIS MDR SIG::28 June 2006::University of Bath Funded by: This work is licensed under the Creative.
Integrating metadata schema registries with digital preservation systems to support interoperability Michael Day UKOLN, University.
DigCCurr 2007: What digital curators do and what they need to know The CASPAR view on: What digital curators do and what they need to know : Research Perspectives.
HATHI TRUST A Shared Digital Repository Digital Repositories for Preservation and Access Digital Directions 2013 Jeremy York July 22, 2013 Unless otherwise.
An Introduction June 17, 2013 Open Archival Information System (OAIS)
A centre of expertise in data curation and preservation MIS Seminar :: University of Edinburgh :: 2 October 2006 Funded by: This work is licensed under.
SCIDIP-ES Components Oct ,Brussels. Basic Preservation Strategies Often stated as: “Emulate or Migrate” OAIS concepts change these to: Add Representation.
EPrints Workshop, January eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.
E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network Co-ordinated by aparsen.eu #APARSEN.
An Overview of Selected ISO Standards Applicable to Digital Archives Science Archives in the 21st Century 25 April 2007 Donald Sawyer - NASA/GSFC/NSSDC.
David Giaretta Associate Director (Development) Funders: DCC Development Digital Curation Centre a centre of expertise in data curation and preservation.
Seminar: OAIS Model application in digital preservation projects Michael Day, Digital Curation Centre UKOLN, University of Bath.
Seminar: OAIS Model application in digital preservation projects Michael Day, Digital Curation Centre UKOLN, University of Bath.
Science Archives in the 21st Century 25/26 April Towards an International standard for Audit and Certification of Digital Repositories David Giaretta.
Data Archiving and Networked Services DANS is an institute of KNAW en NWO Trusted Digital Archives and the Data Seal of Approval Peter Doorn Data Archiving.
OAIS Open Archival Information System. “Content creators, systems developers, custodians, and future users are all potential stakeholders in the preservation.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
File format registries - a global infrastructure for local persistence Andreas Aschenbrenner, ERPANET.
UKOLN is supported by: Digital Preservation Benefits Tools Project Dissemination Workshop Dr Liz Lyon, Associate Director, UK Digital Curation Centre Director,
Metadata for digital preservation: a review of recent developments Michael Day UKOLN, University of Bath ECDL2001, 5th European Conference.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
OAIS Based Certification David Giaretta ERPANET WORKSHOP Antwerpen April 2004.
The OAIS Reference Model Michael Day, Digital Curation Centre UKOLN, University of Bath Reference Models meeting,
Preservation metadata and the Cedars project Michael Day UKOLN: UK Office for Library and Information Networking University of Bath
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
Cedars work on metadata Michael Day UKOLN, University of Bath Cedars Workshop Manchester, February 2002.
An overview of the Reference Model for an Open Archival Information System (OAIS) Michael Day, Digital Curation Centre UKOLN, University.
BNSC Agency Report David Giaretta Colorado Springs 16 Jan 2007.
Trusted Repository Systems Overview
Joseph JaJa, Mike Smorul, and Sangchul Song
eCrystals Federation: Open Repositories for global Open Science
Open Archival Information System
The Reference Model for an Open Archival Information System (OAIS)
eCrystals Federation: Open Repositories for global Open Science
Presentation transcript:

a centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16 th October 2007 Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. sa/2.5/scotland/ An overview of the OAIS and Representation Information Digital Curation Centre – Imperial College Internet Centre Workshop Imperial College, London 16 th October 2007 Manjula Patel UKOLN, DCC University of Bath, UK

a centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16th October 2007 Presentation Outline The OAIS Reference Model Background Concepts Functional Model Information Model Representation Information and Networks Responsibilities and Conformance Registry/Repository of Representation Information DCC Development RRoRI Case studies: crystallography, engineering

a centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16th October 2007 OAIS Background OAIS -Reference Model for an Open Archival Information System Development led by the Consultative Committee for Space Data Systems (CCSDS) Adopted as ISO 14721:2003 (currently under review) Open refers to development of the model in an open forum Reference Model, not a blueprint for implementation Establishes a common framework of terms and concepts Identifies the basic functions of an OAIS Defines an information model Three major areas of influence: ­ Preservation metadata schemas ­ Architecture and system design ­ Conformance criteria for archival repositories

a centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16th October 2007 OAIS Definition and Selected Concepts OAIS: An archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community Designated Community: Community of stakeholders and users that the OAIS serves Knowledge Base: A set of information, incorporated by a user or system, that allows that user or system to understand the received information Information Object: Data Object + Representation Information Representation Information: any information required to render, interpret and understand digital data Information Package: Content Information + Preservation Description Information + Packaging Information (Submission, Archival and Dissemination Information Packages) Preservation Description Information: Provenance, Context, Reference, Fixity information

a centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16th October 2007 OAIS Functional Model OAIS Functional Entities (Figure 4-1)

a centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16th October 2007 OAIS Functional Entities Ingest: services and functions that accept SIPs from Producers; prepares AIPs for storage, and ensures that AIPs and their supporting Descriptive Information become established within the OAIS Archival Storage: services and functions used for the storage and retrieval of AIPs Data Management: services and functions for populating, maintaining, and accessing a wide variety of information Administration: services and functions needed to control the operation of the other OAIS functional entities on a day-to-day basis Preservation Planning: services and functions for monitoring the OAIS environment and ensuring that content remains accessible to the Designated Community Access: services and functions which make the archival information holdings and related services visible to Consumers

a centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16th October 2007 OAIS Information Object Information Object Representation Information 1+ interpreted using 1+ Data Object interpreted using Physical Object Digital Object Bit Sequence 1+ OAIS Information Object (Figure 4-10)

a centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16th October 2007 OAIS Representation Information (RI) Representation Information: any information required to render, interpret and understand digital data (includes file formats, software, algorithms, standards, semantic information etc.) Representation Information is recursive in nature Essential that Representation Information itself is curated and preserved to maintain access to (render and interpret) digital data

a centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16th October 2007 OAIS Representation Information Classification

a centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16th October 2007 Types of Representation Information Structure e.g. file formats for text, images, audio, moving images, datasets, 3D models Semantic e.g. data dictionaries and knowledge organisation systems such as schemata, ontology, metadata vocabularies and thesauri Other e.g. software, algorithms, standards, time dependent information, actions, processes

a centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16th October 2007 OAIS Representation Information Network OAIS Representation Information Object (Figure 4-11) Recursion is terminated based on the designated communitys knowledge base

a centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16th October 2007 OAIS Responsibilities and Conformance OAIS Mandatory Responsibilities : Negotiating and accepting information Obtaining sufficient control of the information to ensure long-term preservation Determining the "designated community" Ensuring that information is "independently understandable" Following documented policies and procedures Making the preserved information available Many repositories or preservation tools claim OAIS compliance : e.g., DSpace, OCLC Digital Archive, METS, LOCKSS etc.

a centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16th October 2007 OAIS…More Conformance and Certification –OCLC/RLG Digital Archive Attributes Working Group (Report on Trusted Digital repositories, 2002) –RLG-NARA Task Force on Digital Repository Certification (Draft checklist for self-certification, August 2005) –Trustworthy Repositories Audit & Certification (TRAC): Criteria and Checklist (CRL, Feb. 2007) Archival Information Units and Archival Information Collections Information Package transformations, e.g. for Ingest and Access Preservation perspectives: –Migration e.g refreshment, replication, repackaging, transformation –Preservation of look and feel (e.g. emulation, virtual machines) Archive interoperability, e.g. P2P, federation

a centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16th October 2007 DCC: Development Led by David Giaretta, Science and Technology Facilities Council DCC Approach to Digital Curation sets out the path for development activities based on the OAIS Monitoring international standards Development of a Registry/Repository of Representation Information (RRoRI) Recommendations for tools and methods for generating Representation Information Creating test-beds for digital curation tools Creating auditing and certification processes for trusted repositories

a centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16th October 2007 RRoRI Representation Information is the key to long-term access RRoRI should be OAIS compliant Emphasis on interoperability and automated use Vision is to have a global, distributed network of RI Provide an infrastructure of reliable and trusted RI which other archives can rely on Investigate how RI fits into the work of other projects and initiatives Work now being undertaken jointly with the CASPAR Project –Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval –Integrated Project co-funded by EU FP6 Programme, April 2006

a centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16th October 2007 RRoRI: Curation Persistent Identifier Idea of RI is the key –Information Object: a specific object to be archived –RI: all information required to interpret and render the object –RI Label: used to connect RI to an Information Object RI label serves as a mechanism for accessing RI in the RRoRI –A label attached to each digital object –Label should identify RI –Provides mechanism for combining individual RI components –May be a structured digital object itself (to cope with packaging of multiple objects) RI label has a Curation Persistent Identifier (CPID)

a centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16th October 2007 Use of CPID The Digital Object could have RI packed with it, as well as CPID Support automated access & processing 1 User gets data from archive. Data has associated Curation Persistent Identifier (CPID) 2 2 User unfamiliar with data so requests RI using CPID User receives RI – which has its own CPID in case it is not immediately usable David Giaretta, 2007

a centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16th October 2007 freebXML registry SOAP messaging Java API HTTP access GUI Tool (label creation and RI ingest) RRoRI: Technical Platform

a centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16th October 2007 RRoRI Web access

a centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16th October 2007 GUI Tool Facilitates creation of RI labels and ingest of RI

a centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16th October 2007 Two case studies (preliminary work) eBank-UK Phase 3 study –JISC-funded from Sept 2006-June 2007 –UKOLN (lead), University of Southampton (NCS), University of Manchester –Open access to datasets –Linking research data to publications and scholarly communication Knowledge & Information Management through life (KIM-GC) –8 Academic partners –Industrial partners: construction; aerospace, defence suppliers; MOD; NHS –£5.5 million total funding, £3.68 million EPSRC/ESRC, Oct 2005-Oct 2008 –Develop tools and techniques for sustainable representation of product, process and design rationale –Develop approaches to learning about products in service – the performance of the artefact and its impact on users –Investigate the dynamics of knowledge use throughout the life-cycle of complex product-service systems, and make recommendations for improved effectiveness –Develop an intellectual framework for the above

a centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16th October 2007 eBank-UK Study M. Patel and S. Coles, "A Study of Curation and Preservation issues in the eCrystals Data Repository and proposed federation", Sept –audit and certification (TRAC, DRAMBORA, NESTOR, ISO International repository audit and certification BOF Group) –OAIS and Representation Information –eBank-UK application profile and preservation metadata –e-Prints.org repository platform

a centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16th October 2007 RAW DATADERIVED DATARESULTS DATA Initialisation: mount new sample, set up data collection Collection: collect data Processing: process and correct images Solution: solve structures Refinement: refine structure CIF: produce Crystallographic Information File Validation: chemical & crystallographic checks Report: generate Crystal Structure Report Crystallography Workflow Simon Coles, 2006

a centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16th October 2007 Capturing RI: eCrystals Repository Bounded domain (within an academic environment) Limited number of stakeholders International Union of Crystallography (IUCr) UK National Crystallography Service (NCS) Cambridge Crystallography Data Centre (CCD) Royal Society of Crystallography Chemistry Central Reciprocal Net Open standards and software e.g. checkcif, CML, INChI Culture for sharing data Well-established workflow for crystallography experiments One dominant file format (CIF) - international exchange format

a centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16th October 2007 Capturing RI: KIM-GC Project Engineering is a broad area (mechanical, electrical, civil; architecture, construction, defence etc.) Vested commercial interests Proliferation of proprietary file formats Closed software solutions IGES 5.3: first popular exchange format (STEP still immature)

a centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16th October 2007 Conclusions Need digital curation throughout the useful lifetime of digital data ­ Maximise potential of digital data ­ Maximise investment in digital data ­ Curation should be planned for from the outset A preservation strategy based on RI depends on a global, well- engineered, distributed network of RI ­ Needs coordination and collaboration on a global scale Domain expertise required for creation of comprehensive RI networks Actual task of creating RI networks is time-consuming and non- trivial ­ Need simple and automated tools and procedures Likely to be gaps in global networks of RI ­ Business case for using a store of RI is clear, however the case for submitting RI to the global effort is less clear

a centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16th October 2007 Selected References OAIS Reference Model: DPC Technology Watch Report on OAIS model by Brian Lavoie (OCLC Research): Trustworthy Repositories Audit & Certification (TRAC): Criteria and Checklist (CRL): RLG/NARA Task Force on Digital Repository Certification: DRAMBORA -Digital Repository Audit Method Based on Risk Assessment, March 2007, Digital Curation Centre (DCC) and Digital Preservation Europe (DPE), DCC Development White Paper DCC Approach to Digital Curation under Development: CASPAR Project: M. Patel and S. Coles, "A Study of Curation and Preservation issues in the eCrystals Data Repository and proposed federation", Sept eBank-UK Project Knowledge & Information Management through Life: A Grand Challenge Project

a centre of expertise in data curation and preservation eScience Collaborative Workshop, Imperial College, 16th October 2007 Questions? Thank you for your attention Manjula Patel UKOLN, DCC University of Bath, UK