Presentation is loading. Please wait.

Presentation is loading. Please wait.

Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, M. Lautenschlager, H. Luthardt,

Similar presentations


Presentation on theme: "Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, M. Lautenschlager, H. Luthardt,"— Presentation transcript:

1 Preservation and Long Term Access of Data at the World Data Centre for Climate
Frank Toussaint N.P. Drakenberg, H. Höck, M. Lautenschlager, H. Luthardt, H.Ramthun, M. Stockhause, H. Thiemann World Data Centre for Climate at the German Climate Computing Centre (DKRZ) Hamburg, Germany

2 Overview The WDC for Climate in several collaborations
Data Storage: Technology – Tapes and Disks Data Storage: LObStER – the Tape Storage Tool Storage Policy Long Term Archiving DOI - Digital Object Identifier

3 WDCC: General Layout

4 The World Data Centre for Climate
The German Climate Computing Centre (DKRZ) is held by… Max Planck Society, University of Hamburg, and others. Mission: Provide HP computing power and storage for the German Earth Science community

5 The WDCC as WIS Data Collection & Production Centre
WMO Information System (WIS) National Centres Global Information System Centres Data Collection and Production Centres Components of the WIS NC National Centres GISC Global Information System Centres (12..14) DCPC Data Collection and Production Centres (>100) Some institutes are more than one of these (NC, GISC, DCPC).

6 The WDCC in the ICSU World Data System
International Council for Science (ICSU) World Data System (WDS) World Data Centres (WDC) WDC Cluster Earth System Research: WDC-Mare, WDC-RSAT, WDC-Climate

7 Replicated model output
CMIP5 Data Nodes Replicated model output CMIP5/IPCC-AR5 PCMDI, BADC, & WDCC form a data federation About 1 PB Data are replicated UK: BADC ~ 1 PByte HD US: PCMDI: ~1 PByte HD CMIP5/IPCC Data Federation DE: WDCC ~1 PByte HD 7 7

8 Evolution of Data Quantities
Climate Model Data: Relative homogeneous but huge amounts! Needed: Tape access (nearline)

9 TAPE STORAGE: Hardware Basis

10 1 Petabyte disks, 9 PB tapes Web access to 500 Terabytes
WDCC 1 Petabyte disks, 9 PB tapes Web access to 500 Terabytes

11 Data Flows CERA Midtier Storage@DKRZ TDS Archive: files Appl. Server
Container: Blobs Appl. Server TDS LobServer HPSS 9 PB CERA DB Layer What Where Who When How Midtier

12 LOBSTER: DtaStreams & ContainerFmt

13 LObStER: Large Object Storage and Efficient Retrieval
Huge amounts of data in each container file Very different sizes of records: 64b .. 2 Gb Efficient administration of all records Irregular access patterns (access latency independent of the record position) Transactional behaviour for read/write Fault tolerance for HD, controller, tapes, etc

14 Lobster configuration manager specific JDBC- drivers loaded
Application generic JDBC-driver Lobster configuration manager Application Applic. Server (lks) Intranet Internet specific JDBC- drivers loaded

15 Lobster object manager
Oracle RDB (or other)‏ Cache Lobster object manager show-container read-record fetch-records

16 LObStER: The Data Containers
Container files with blocked format 64-bit files and 64-bit internal position referencing Max file size: PBytes Entries stored in ≥1 blocks Block sizes 2k, k ∈ { 8, 9, 10, …, 62 }

17 indirect-pointer-block
LObStER: The Data Containers header-blocks direct-pointer-blocks indirect-pointer-block data-blocks

18 MD: CERA & Catalogues

19 Insert/Update on views/tables
Metadata input Input.wdc-climate.de XML templates validate Experiment Dataset Dataset_group Additional_info.. upload (ftp, http, WebDAV) CERA2 XML Repository Cera2_temp xmlload Split xml files Insert/Update on views/tables xsl Tools Editors CDO Ncdump/ncgen ESG Publisher.. GeoNetwork xmlspy,Oxygen Attarabi xforms …

20 LTA: Storage Policy

21 Long Term Archiving Several steps: specification & concept
filling of metadata & data quality checks & DOI LTA for, e.g., EUCLIPSE, MedCLIVAR, combine

22 Storage Concept for Projects
Tape space distribution to archive classes at DKRZ part of the “work” space on tape because GFS too small “docu” domain consists of WDCC no expiration dates in “arch” domain parts of “arch” domain belong to “docu” but not yet documented

23 LTA Costs depend on complexity and efforts at our site: metadata
reformatting etc

24 Long Term Archiving Quality Checks on three levels QC L1: conformity to general standards (format, ...) QC L2: coarse automated content checks QC L3: detailed spot checks: TQA – Technical Quality Assurance SQA – Scientific Quality Assurance

25 QC: Example CMIP5

26 LTA: CMIP5 as an Example of a Federated Activity
Distributed QC Level2 Checks at Multiple Sites Central QC Repository Central QC Level3 Checks DOI Publication Agency Long-Term Archive QC services QC Service Layer QC services QC Service Layer Project QC Metadata Repository QC L3 Tools QC L2 Tool SQA GUI

27 Data Long Term Archive (LTA)
LTA: CMIP5 as an Example Data Nodes IDF Data Catalogue MD Input DOI Catalogue Data Quality Control MD on model & simulation MD on data MD on quality Project MD Repository Registration MD harvest during project Data from nodes MD export DOI Publication Agency with Long Term Archive TQA SQA by Author Data Long Term Archive (LTA) MD LTA DOI Target Page DOI access MD harvest after archiving

28 DOI: Publishing, IDF & Catalogues

29 International DOI Foundation
WDC-Climate as Publishing Agency of the IDF doi.org DataCite.org tib-hannover.de wdc-climate.de International DOI Foundation Registration Agencies National Organizations Publisher International DOI Foundation DataCite TIB, BL, … WDCC, …

30 Visibility of LTA Data in Public Catalogues
DOI is given Catalogue metadata is sent to the Registration Agency via the national organization

31 SUMMRAY: Data Life Cycle

32 Virtual Research Environment
The Data Life Cycle Management Virtual Research Environment Data Dissemination Data Production Data Evaluation Long Term Archive

33 E N D

34 Thank you, Questions?


Download ppt "Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, M. Lautenschlager, H. Luthardt,"

Similar presentations


Ads by Google