Presentation is loading. Please wait.

Presentation is loading. Please wait.

M. Stockhause et al. Martina Stockhause, Michael Lautenschlager, Frank Toussaint Deutsches Klimarechenzentrum (DKRZ) World Data Centre for Climate (WDCC)

Similar presentations


Presentation on theme: "M. Stockhause et al. Martina Stockhause, Michael Lautenschlager, Frank Toussaint Deutsches Klimarechenzentrum (DKRZ) World Data Centre for Climate (WDCC)"— Presentation transcript:

1 M. Stockhause et al. Martina Stockhause, Michael Lautenschlager, Frank Toussaint Deutsches Klimarechenzentrum (DKRZ) World Data Centre for Climate (WDCC) DKRZ: Long-term archiving requirements DKRZ LTA, ESGF Conference 2014

2 M. Stockhause et al. The purpose of Long-term archival (LTA) and the IPCC DDC is to provide stable data for long-term interdisciplinary (re-)use: permanent and persistent data access stable and complete data well-documented high-quality (for acceptance) citable data entities (for credit) 2 Long-Term Archival at DKRZ (1) 09.-11.12.2014 DKRZ LTA, ESGF Conference 2014

3 M. Stockhause et al. 3 Long-Term Archival at DKRZ (2)  Long-term archive for climate data, esp. Earth System Model data  IPCC Data Distribution Centre (IPCC-DDC) for climate model output: http://www.ipcc-data.org http://www.ipcc-data.org  DOI Data publisher since 2004 (1 st DOI in the DataCite catalog 18.03.2004: doi:10.1594/WDCC/EH4_OPYC_SRES_A2 )doi:10.1594/WDCC/EH4_OPYC_SRES_A2 World Data Center for Climate (WDCC) at DKRZ: 09.-11.12.2014 DKRZ LTA, ESGF Conference 2014

4 M. Stockhause et al. QC Repository CIM Repository Long-Term Archive WDC Climate at DKRZ Long-Term Archive CERA2 Metadata 2 Technical 1 Transfer of Use MD by WDCC ESGF index node ESGF data nodes 1 Transfer of data by WDCC 1 Transfer of ext. MD by WDCC CMIP5 data and metadata Temporary Storage 3 Long-Term Archival Quality Assurance 4 DataCite DOI Publication Process CMIP5 Experience 4 09.-11.12.2014 Data Manager LTA manager operates in a diverse and heterogeneous technical environment under development. Questions to be solved: Who is the repository contact? How to identify? Is a mapping needed for DRS_ids? How to access? Who is the data creator? 2 Citation Information DKRZ LTA, ESGF Conference 2014

5 M. Stockhause et al. 5 Requirements for LTA: Identification (1)  Reliable identification of data and metadata objects by PID and DRS_id (and of persons by ORCID): Use of controlled vocabulary (CV) for DRS components, e.g. institute, model, experiment Consistent ESGF data base over time: persistence of metadata and strict versioning Links provided between data and external metadata  Verification of data and metadata objects by MD5 checksums 09.-11.12.2014 DKRZ LTA, ESGF Conference 2014 See also: Data Citation Principles at https://www.force11.org/datacitationhttps://www.force11.org/datacitation

6 M. Stockhause et al. 6 Requirements for LTA: QC (2) Quality control information and data citations ESGF published together with the data:  Quality Control: When? - Quality control to be performed as early as possible and as detailed as affordable, e.g.: Check at least DRS conformance prior to ESGF publication How? - Improve the operability of the QC2 tool  Citation of data: Collection and ESGF publication of author lists, titles etc. together with the data If operable: assign a PID to a citation entity 09.-11.12.2014 DKRZ LTA, ESGF Conference 2014

7 M. Stockhause et al. 7 Requirements for LTA: Organizational (3)  Operable infrastructure with defined stable interfaces are required (to ESGF and external repositories)  Introduce a Data Management Plan defining the data workflow including quality procedures and ESGF data node manager commitments on e.g. versioning.  Definition/Implementation of a core data subset to prioritize data replication and LTA, e.g. use ESGF product facet  Improved interaction with data creators (In CMIP5 they were approached multiple times from project manager, CIM, Quality/Citation.) 09.-11.12.2014 DKRZ LTA, ESGF Conference 2014

8 M. Stockhause et al. QC Repository CIM Repository Long-Term Archive WDC Climate at DKRZ Long-Term Archive CERA2 Metadata 2 Technical 1 Get IDs for data ESGF index node ESGF data nodes 1 Transfer of data + MD by WDCC MIP data and metadata Temporary Storage 3 Long-Term Archival Quality Assurance 4 DataCite DOI Publication Process Requirements for LTA (4) 8 09.-11.12.2014 Data Manager LTA manager is able to collect data and metadata by asking the ESGF index for access information. other related repositories, e.g. user annotations, version change information, data citation … and MD access DKRZ LTA, ESGF Conference 2014

9 M. Stockhause et al. www.dkrz.de www.wdc-climate.de www.ipcc-data.org Stockhause (2014): Long-term archiving workflow in CMIP5 – a first review, IS-ENES Workshop on workflow solutions, 03.-05.06.2014, Hamburg, Germany, PDF.PDF Stockhause et al. (2012): Quality assessment concept of the World Data Center for Climate and its application to CMIP5 data, Geosci. Model Dev., 5, 1023–1032, doi:10.5194/gmd-5-1023-2012.doi:10.5194/gmd-5-1023-2012 9 09.-11.12.2014 DKRZ LTA, ESGF Conference 2014

10 M. Stockhause et al. QC Repository CIM Repository Long-Term Archive WDC Climate at DKRZ Long-Term Archive CERA2 Metadata 2 Technical 1 Get IDs for data ESGF index node ESGF data nodes 1 Transfer of data + MD by WDCC MIP data and metadata Temporary Storage 3 Long-Term Archival Quality Assurance 4 DataCite DOI Publication Process Requirements for LTA (4) 10 09.-11.12.2014 Data Manager Identification Access Validation other related repositories, e.g. user annotations, version change information, data citation … and MD access DKRZ LTA, ESGF Conference 2014


Download ppt "M. Stockhause et al. Martina Stockhause, Michael Lautenschlager, Frank Toussaint Deutsches Klimarechenzentrum (DKRZ) World Data Centre for Climate (WDCC)"

Similar presentations


Ads by Google