Presentation is loading. Please wait.

Presentation is loading. Please wait.

FAIR Data Management, Trustworthy Digital Repositories and Business Continuity / Disaster Preparedness

Similar presentations


Presentation on theme: "FAIR Data Management, Trustworthy Digital Repositories and Business Continuity / Disaster Preparedness"— Presentation transcript:

1

2 FAIR Data Management, Trustworthy Digital Repositories and Business Continuity / Disaster Preparedness

3 Typical EU H2020 Call Text Research Infrastructures, such as the ones on the ESFRI roadmap and others, are characterised by the very significant data volumes they generate and handle. These data are of interest to thousands of researchers across scientific disciplines and to other potential users via Open Access policies. Effective data preservation and Open Access for immediate and future sharing and re-use are a fundamental component of today’s research infrastructures.

4 FAIR DMPs & TDRs If we want to be able to share data, we need to store them in a Trustworthy Digital Repository (TDR). Data created and used by scientists should be managed, curated, and archived in such a way to preserve the initial investment in collecting them. Researchers must be certain that data held in archives remain useful and meaningful into the future. Funding authorities increasingly require continued access to data produced by the projects they fund, and have made this an important element in [FAIR] Data Management Plans (DMPs). Indeed, some funders now stipulate that the data they fund must be deposited in a trustworthy repository.

5 What is FAIR? Expert Group on turning FAIR into reality
TO BE FINDABLE: F1. (meta)data are assigned a globally unique and eternally persistent identifier. F2. data are described with rich metadata. F3. (meta)data are registered or indexed in a searchable resource. F4. metadata specify the data identifier. TO BE ACCESSIBLE: A1 (meta)data are retrievable by their identifier using a standardized communications protocol. A1.1 the protocol is open, free, and universally implementable. A1.2 the protocol allows for an authentication and authorization procedure, where necessary. A2 metadata are accessible, even when the data are no longer available. TO BE INTEROPERABLE: I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. I2. (meta)data use vocabularies that follow FAIR principles. I3. (meta)data include qualified references to other (meta)data. TO BE RE-USABLE: R1. meta(data) have a plurality of accurate and relevant attributes. R1.1. (meta)data are released with a clear and accessible data usage license. R1.2. (meta)data are associated with their provenance. R1.3. (meta)data meet domain-relevant community standards.

6 What is a Trustworthy Digital Repository?
Generally assumed to mean one that has undergone (self-)certification according to a recognised process These include DSA, the new WDS / DSA, NESTOR and ISO 16363 All based on the OAIS reference model (ISO 14721) Some view these processes as a hierarchy, whereby you start e.g. with DSA and then re-certify with NESTOR and finally ISO 16363 CERN is pursuing ISO certification directly – viewed as the most relevant (and no advantage in “triple certification”)

7 Organisational Infrastructure
3.1 Governance & Organisational Viability Mission Statement, Preservation Policy, Implementation plan(s) etc. Operational Circular, DPHEP Reports 3.2 Organisational Structure & Staffing Duties, staffing, professional development etc. 3.3 Procedural accountability & preservation policy framework Designated communities, knowledge bases, policies & reviews, change management, transparency & accountability etc. Generic descriptions refined by project DMPs 3.4 Financial sustainability Business planning processes, financial practices and procedures etc. 3.5 Contracts, licenses & liabilities For the digital materials preserved…

8 Infrastructure & Security Risk Management
5.1 Technical Infrastructure Risk Management Technology watches, h/w & s/w changes, detection of bit corruption or loss, reporting, security updates, storage media refreshing, change management, critical processes, handling of multiple data copies etc 5.2 Security Risk Management Security risks (data, systems, personnel, physical plant), disaster preparedness and recovery plans …

9 Digital Object Management
4.1 Ingest: acquisition of content 4.2 Ingest: creation of the AIP Archival Information Package 4.3 Preservation planning 4.4 AIP Preservation 4.5 Information management “FAIR” etc 4.6 Access management

10 Selected Metrics (Sub-Metrics are supposed to clarify the parent metric)
The repository shall have a documented history of the changes to its operations, procedures, software, and hardware  The repository shall define, collect, track, and appropriately provide its information integrity measurements The repository shall employ technology watches or other technology monitoring notification systems The repository shall have defined processes for storage media and/or hardware change The repository shall have implemented controls to adequately address each of the defined security risks The repository shall have suitable written disaster preparedness and recovery plans, including at least one off-site copy [ of recovery plan and key data ] [ more detail on this later ]

11 Why Certify? (And how can CERN help?)
Certification allows you to have confidence that are offering solid long-term data preservation services – and have all the necessary “infrastructure” and procedures around It allows you to negotiate with your users – and perhaps also funding agencies – on the same basis It may be of value in future H2020 and other projects (It might eventually become a quasi-requirement) And is not that difficult anyway (but requires good knowledge across a broad spectrum of the organisation). CERN can help by sharing its responses to ISO metrics as well as experience with this and other certification methods

12 5.2.4 – Disaster Preparedness / Recovery (1/2)
5.2.4 The repository shall have suitable written disaster preparedness and recovery plan(s), including at least one off-site backup of all preserved information together with an offsite copy of the recovery plan(s). Supporting Text This is necessary in order to ensure that sufficient backup and recovery capabilities are in place to facilitate continuing preservation of and access to systems and their content with limited disruption of services. Examples of Ways the Repository Can Demonstrate It Is Meeting This Requirement Repository employs the codes of practice found in the ISO series of standards; disaster and recovery plans; information about and proof of at least one off-site copy of preserved information; service continuity plan; documentation linking roles with activities; local geological, geographical, or meteorological data or threat assessments. Repository maintains ISO certification.

13 5.2.4 – Discussion (2/2) The level of detail in a disaster plan, and the specific risks addressed need to be appropriate to the repository’s location and service expectations. Fire is an almost universal concern, but earthquakes may not require specific planning at all locations. The disaster plan must, however, deal with unspecified situations that would have specific consequences, such as lack of access to a building or widespread illness among critical staff. In the event of a disaster at the repository, the repository may want to contact local and/or national disaster recovery bodies for assistance. Repositories may also conduct a variety of disaster drills that may involve their parent organization or the community at large.

14 5.1.1.2 – Backup Functionality (1/2)
The repository shall have adequate hardware and software support for backup functionality sufficient for preserving the repository content and tracking repository functions. Supporting Text This is necessary in order to ensure continued access to and tracking of preservation functions applied to the digital objects in their custody. Examples of Ways the Repository Can Demonstrate It Is Meeting This Requirement Documentation of what is being backed up and how often; audit log/inventory of backups; validation of completed backups; disaster recovery plan, policy and documentation; fire drills; testing of backups; support contracts for hardware and software for backup mechanisms; demonstrated preservation of system metadata such as access controls, location of replicas, audit trails, checksum values. [ More … ]

15 5.1.1.2 – Backup Functionality – Discussion (2/2)
The repository should be able to demonstrate the adequacy of the processes, hardware, and software for its backup systems and the full range of ingest, preservation, and dissemination functions required of a repository entrusted with long-term preservation. Simple backup mechanisms must preserve not only the repository main content, but also the system metadata generated by the preservation functions. Repositories need to develop backup plans that ensure their continuity of operations across all failure modes.

16 Summary Disaster preparedness is important for Long Term Data Preservation (as well as “Business Continuity”): no point in preserving the bits if you are at risk (and unable to recover) from “disasters” Experience with WLCG is that all sorts of “disasters” do happen – and can cause major disruptions: Fire in UPS systems; Leaking cooling systems damaging backup tapes; Network outages (ocean floor cable cut by fishing trawler, network cables cut by motorway construction); Earthquakes; Hurricanes and typhoons; Human error!

17

18


Download ppt "FAIR Data Management, Trustworthy Digital Repositories and Business Continuity / Disaster Preparedness"

Similar presentations


Ads by Google