Presentation is loading. Please wait.

Presentation is loading. Please wait.

Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, M. Lautenschlager, H. Luthardt,

Similar presentations


Presentation on theme: "Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, M. Lautenschlager, H. Luthardt,"— Presentation transcript:

1 Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, M. Lautenschlager, H. Luthardt, H.Ramthun, M. Stockhause, H. Thiemann World Data Centre for Climate at the German Climate Computing Centre (DKRZ) Hamburg, Germany

2 The WDC for Climate in several collaborations Data Storage: Technology – Tapes and Disks Data Storage: LObStER – the Tape Storage Tool Storage Policy Long Term Archiving DOI - Digital Object Identifier

3 WDCC: General Layout

4 The German Climate Computing Centre (DKRZ) is held by… Max Planck Society, University of Hamburg, and others. Mission: Provide HP computing power and storage for the German Earth Science community The World Data Centre for Climate

5 WMO Information System (WIS) National Centres Global Information System Centres Data Collection and Production Centres The WDCC as WIS Data Collection & Production Centre

6 The WDCC in the ICSU World Data System International Council for Science (ICSU) World Data System (WDS) World Data Centres (WDC) WDC Cluster Earth System Research: WDC-Mare, WDC-RSAT, WDC-Climate

7 UK: BADC ~ 1 PByte HD DE: WDCC ~1 PByte HD US: PCMDI: ~1 PByte HD CMIP5/IPCC Data Federation Replicated model output 7 CMIP5 Data Nodes CMIP5/IPCC-AR5 PCMDI, BADC, & WDCC form a data federation About 1 PB Data are replicated

8 Evolution of Data Quantities Climate Model Data: Relative homogeneous but huge amounts! Needed: Tape access (nearline)

9 TAPE STORAGE: Hardware Basis

10 WDCC 1 Petabyte disks, 9 PB tapes Web access to 500 Terabytes

11 Archive: files Container: Blobs Appl. Server TDS LobServer HPSS 9 PB CERA DB Layer What Where Who When How Midtier

12 LOBSTER: DtaStreams & ContainerFmt

13  Huge amounts of data in each container file  Very different sizes of records: 64b.. 2 Gb  Efficient administration of all records  Irregular access patterns (access latency independent of the record position)  Transactional behaviour for read/write  Fault tolerance for HD, controller, tapes, etc LObStER: Large Object Storage and Efficient Retrieval

14 Lobster configuration manager generic JDBC-driver Application   specific JDBC- drivers loaded LObStER Intranet Internet Application

15 show-container read-record fetch-records Lobster object manager Cache Oracle RDB (or other)‏ LObStER

16  Container files with blocked format  64-bit files and 64-bit internal position referencing  Max file size: PBytes  Entries stored in ≥1 blocks  Block sizes 2 k, k ∈ { 8, 9, 10, …, 62 } LObStER: The Data Containers

17 direct-pointer-blocks data-blocks indirect-pointer-block header-blocks LObStER: The Data Containers

18 MD: CERA & Catalogues

19 XML Repository Cera2_temp XML templates Editors Experiment Dataset Dataset_group Additional_info.. GeoNetwork xmlspy,Oxygen Attarabi xforms … Tools CDO Ncdump/ncgen ESG Publisher.. validate upload (ftp, http, WebDAV) xmlload Split xml files Insert/Update on views/tables CERA2 Input.wdc-climate.de xsl Metadata input

20 LTA: Storage Policy

21 Several steps: o specification & concept o filling of metadata & data o quality checks & DOI LTA for, e.g., EUCLIPSE, MedCLIVAR, combine Long Term Archiving

22 Tape space distribution to archive classes at DKRZ part of the “work” space on tape because GFS too small “docu” domain consists of WDCC no expiration dates in “arch” domain parts of “arch” domain belong to “docu” but not yet documented

23 LTA Costs depend on complexity and efforts at our site: metadata reformatting etc

24 Quality Checks on three levels QC L1: conformity to general standards (format,...) QC L2: coarse automated content checks QC L3: detailed spot checks: TQA – Technical Quality Assurance SQA – Scientific Quality Assurance Long Term Archiving

25 QC: Example CMIP5

26 QC services QC Service Layer Distributed QC Level2 Checks at Multiple Sites Central QC Repository Central QC Level3 Checks DOI Publication Agency Long-Term Archive QC L2 Tool QC Service Layer QC L3 Tools SQA GUI Project QC Metadata Repository

27 Data DOI Publication Agency with Long Term Archive TQA DOI Target Page Project MD Repository Quality Control Data Catalogue MD Input DOI Catalogue MD LTA Data Long Term Archive (LTA) SQA by Author MD on data MD on quality MD on model & simulation MD harvest during project MD harvest after archiving DOI access Registration Data from nodes Data Nodes IDF MD export

28 DOI: Publishing, IDF & Catalogues

29 WDC-Climate as Publishing Agency of the IDF International DOI Foundation International DOI Foundation Registration Agencies National Organizations Publisher DataCite doi.org DataCite.org tib-hannover.de wdc-climate.de TIB, BL, … WDCC, …

30 Visibility of LTA Data in Public Catalogues DOI is given Catalogue metadata is sent to the Registration Agency via the national organization

31 SUMMRAY: Data Life Cycle

32 The Data Life Cycle Management Virtual Research Environment Data Production Data Evaluation Data Dissemination Long Term Archive

33 E N D

34


Download ppt "Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, M. Lautenschlager, H. Luthardt,"

Similar presentations


Ads by Google