Presentation on theme: "SCIDIP-ES services and toolkits David Giaretta. Preserving digitally encoded information Ensure that digitally encoded information are understandable."— Presentation transcript:
SCIDIP-ES services and toolkits David Giaretta
Preserving digitally encoded information Ensure that digitally encoded information are understandable and usable over the long term – Long term could start at just a few years Need to do something because things become “unfamiliar” over time But the same techniques enable use of data which is “unfamiliar” right now
The OAIS Reference Model is concerned with the Long Term preservation of information provides vital concepts that are necessary to preserve digitally encoded information provides testable mandatory responsibilities provides useful vocabulary and check-lists is widely used in the design and description of archives and libraries. forms the basis of a number of follow-on standards which are being developed. OAIS CONFORMANCE Mandatory responsibilities Negotiate for and accept appropriate information from information Producers. Obtain sufficient control of the information provided to the level needed to ensure Long Term Preservation. Determine, either by itself or in conjunction with other parties, which communities should become the Designated Community and, therefore, should be able to understand the information provided, thereby defining its Knowledge Base. Ensure that the information to be preserved is Independently Understandable to the Designated Community. In particular, the Designated Community should be able to understand the information without needing special resources such as the assistance of the experts who produced the information. Follow documented policies and procedures which ensure that the information is preserved against all reasonable contingencies, including the demise of the archive, ensuring that it is never deleted unless allowed as part of an approved strategy. There should be no ad-hoc deletions. Make the preserved information available to the Designated Community and enable the information to be disseminated as copies of, or as traceable to, the original submitted Data Objects with evidence supporting its Authenticity. Long Term Preservation: The act of maintaining information, Independently Understandable by a Designated Community, and with evidence supporting its Authenticity, over the Long Term. OAIS Functional Model – useful terminology “Open Archival Information System (OAIS), now adopted as the “de facto” standard for building digital archives" NSF: Cyberinfrastructure Vision for 21st Century DiscoveryCyberinfrastructure Vision for 21st Century Discovery Available free from for more information see OAIS Information Model – key concepts needed for conformance The information that maps a Data Object into more meaningful concepts. Examples include software, ontologies, formal data descriptions, human readable documentation, web pages... Representation Information is itself Information and hence there is a network – a kind of recursion. This recursion stops when it matches the Designated Community’s Knowledge Base AIP: a set of information that has, in principle, all the qualities needed for permanent, or indefinite, Long Term Preservation of a designated Information Object 2002, updated 2011
Information model: Representation Information The Information Model is key Recursion ends at KNOWLEDGEBASE of the DESIGNATED COMMUNITY (this knowledge will change over time and region)
Archival Information Package Preservation Description Information Preservation Description Information Content Information further described by Package Description Packaging Information derived from described by delimited by identifies Data Object Data Object Representation Information Representation Information Physical Object Digital Object Structure Information Semantic Information Reference Information Provenance Information Context Information Fixity Information Other Representation Information Interpreted using Bit adds meaning to Access Rights Information Interpreted using 1 * *
PARSE.Insight: Indication of distribution of researchers’ responses Researchers: 1/3 Europe 1/3 USA 1/3 rest of world Incomplete sample of respondees Overall: 44% Europe 33% USA 23% rest of world
What? Data spectrum (R)
Sharing of data (R) How open is your data?
Sharing of data (R) Which constrains do you see in making data open?
Threats to preservation 1.The ones we trust to look after the digital holdings may let us down. 2.The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future. 3.Loss of ability to identify the location of data. 4.Access and use restrictions (e.g. Digital Rights Management) may not be respected in the future. 5.Evidence may be lost because the origin and authenticity of the data may be uncertain. 6.Lack of sustainable hardware, software or support of computer environment may make the information inaccessible. 7.Users may be unable to understand or use the data e.g. the semantics, format or algorithms involved.
Threats to preservation (R) The ones we trust to look after the digital holdings may let us down The current custodian of the data may cease to exist Loss of ability to identify the location of data Access and use restrictions may not be respected in the future Evidence may be lost Lack of sustainable hardware/software Users may be unable to understand or use the data
Threats to preservation (R) Users may be unable to understand or use the data e.g. the semantics, format or algorithms involved.
What works - evidence
CASPAR in brief Prototyped discipline independent Infrastructure components Carried out fundamental research based on and contributing to OAIS Developed toolkits for Representation Information, Authenticity, Digital Rights etc Provided substantial collection of evidence, validated by the designated communities, supporting their effectiveness for digital preservation by: accelerated lifetime tests using changes in hardware, software, environment and knowledge base of designated communities using many types of digitally encoded information – data and documents from science (STFC, ESA), cultural heritage (UNESCO) and contemporary performing arts (CIANT, INA, IRCAM, Univ Leeds) Infrastructure to support preservation of all types of digitally encoded information. Supports maintenance of Representation Information Networks. simple, re-implementable interfaces no single point of failure decentralised heterogeneous asynchronous Toolkits to create all components of AIPs Test scenarios vs Threats to digital preservation For more information see and
FUTURE Users may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved Non-maintainability of essential hardware, software or support environment may make the information inaccessible The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity Access and use restrictions may not be respected in the future Loss of ability to identify the location of data The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future The ones we trust to look after the digital holdings may let us down
Preservation Infrastructure Services which are not centralised, no single point of failure Supplements for existing archives to improve their ability to preserve their holdings – Do not replace everything – small additions – better certification result Simple services which can be maintained into the future
SCIDIP-ES in brief Upgrade CASPAR prototype components into scalable, robust e- infrastructure components to support digital preservation of all types of digital objects decentralised, heterogeneous, asynchronous, no single point of failure Persistent, simple re- implementable interfaces critical mass of users: Earth science as initial focus Other disciplines via APA DIGITAL PRESERVATION RESEARCH needed to create the tools needed to create the “metadata” used by the e-infrastructure and user applications. Tools may be domain dependent. Must include Rep. Info. Network of the metadata SCIence Data Infrastructure for Preservation – with focus on Earth Science Led by ESA. Currently in negotiation with EU. For more information see Storage Service Gap Identification Service Orchestration Service RepInfo Registry Service Preservation Strategy Toolkit Process Virtualisation Toolkit Finding Aid Toolkit Cloud Storage Persistent ID i/f Service External PI services ISO Certification Organisation Certification Toolkit External Access/Use Services E-INFRASTRUCTURE TOOLKITS Archives User applications Domain independent Infrastructure counters threats identified by PARSE.Insight based on CASPAR prototypes APARSEN will produce a common vision to allow a coherent approach Will help archives with certification
ThreatRequirement for solution Users may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved Ability to create and maintain adequate Representation Information Non-maintainability of essential hardware, software or support environment may make the information inaccessible Ability to share information about the availability of hardware and software and their replacements/substitutes The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity Ability to bring together evidence from diverse sources about the Authenticity of a digital object Access and use restrictions may make it difficult to reuse data, or alternatively may not be respected in future Ability to deal with Digital Rights correctly in a changing and evolving environment Loss of ability to identify the location of data An ID resolver which is really persistent The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future Brokering of organisations to hold data and the ability to package together the information needed to transfer information between organisations ready for long term preservation The ones we trust to look after the digital holdings may let us down Certification process so that one can have confidence about whom to trust to preserve data holdings over the long term RepInfo toolkit, Packager and Registry – to create and store Representation Information. In addition the Orchestration Manager and Knowledge Gap Manager help to ensure that the RepInfo is adequate. Registry and Orchestration Manager to exchange information about the obsolescence of hardware and software, amongst other changes. The Representation Information will include such things as software source code and emulators. Authenticity toolkit will allow one to capture evidence from many sources which may be used to judge Authenticity. Digital Rights and Access Rights tools allow one to virtualise and preserve the DRM and Access Rights information which exist at the time the Content Information is submitted for preservation. Persistent Identifier system: such a system will allow objects to be located over time. Orchestration Manager will, amongst other things, allow the exchange of information about datasets which need to be passed from one curator to another. The Audit and Certification standard to which CASPAR has contributed will allow a certification process to be set up.
AUTHENTICITY/ ANNOTATION FINDING AIDS DRM DAMS REGISTRY DATA STORE ORCHESTRATION PACKAGING REPINFO TOOLBOX GAP MGR Q5: Please explain by means of a graphic a potential distribution of the SCIDIP-ES infrastructure with respect to geographical locations (for example for storage), and with a mapping to the OAIS model. DATA STORE AIP (Archival Information Package) 5
Summary – SCIDIP-ES services and toolkits Demonstrated demand for these services Demonstrated effectiveness across domains Maintainable