Presentation is loading. Please wait.

Presentation is loading. Please wait.

CAS2K11 in Annecy, France September 11 – 14, 2011 Data Infrastructures at DKRZ Michael Lautenschlager.

Similar presentations


Presentation on theme: "CAS2K11 in Annecy, France September 11 – 14, 2011 Data Infrastructures at DKRZ Michael Lautenschlager."— Presentation transcript:

1 CAS2K11 in Annecy, France September 11 – 14, 2011 Data Infrastructures at DKRZ Michael Lautenschlager

2 © DKRZ Overview DKRZ Climate research as data intensive science Data life cycle and services at DKRZ Data infrastructure development Content 2 12.09.2011 M. Lautenschlager (DKRZ) CAS2K11

3 © DKRZ 3 12.09.2011 M. Lautenschlager (DKRZ) CAS2K11 DKRZ - to provide high performance computing platforms, sophisticated and high capacity data management, and superior service for premium climate science. Mission High performance compute, storage, and visualization systems optimized for climate research Parallelization and optimization of climate models and workflows Efficient management of highest data volumes 3D visualization to communicate research results Support of current projects on climate research Our Competences

4 © DKRZ 4 12.09.2011 M. Lautenschlager (DKRZ) CAS2K11 Building

5 © DKRZ 5 12.09.2011 M. Lautenschlager (DKRZ) CAS2K11 Computer Hall Compute Nodes Disk Subsystem Air Conditioning

6 © DKRZ 6 12.09.2011 M. Lautenschlager (DKRZ) CAS2K11 IBM Power6-System 264 nodes with 8448 cores Clock rate 4,7 GHz Compute power per core 18,8 GFLOPS Maximum compute power 159 TFLOPS Linpack 110 TFLOPS and rank 72 in TOP500 of 2011 Main memory more than 20 TB Hard disk storage 7 PB Interconnect 8x DDR Infiniband Cooling 75% water, 25% air Compute Service

7 © DKRZ 7 12.09.2011 M. Lautenschlager (DKRZ) CAS2K11 Tape Library

8 © DKRZ 8 12.09.2011 M. Lautenschlager (DKRZ) CAS2K11  HPSS – High Performance Storage System  7x Sun StorageTek SL8500  In total 67,000 media slots  More than 100 PB storage capacity  90 tape drives ◦ LTO-5, LTO-4, T10000A/B ◦ 9940B, 9840C Tape Library

9 © DKRZ 9 12.09.2011 M. Lautenschlager (DKRZ) CAS2K11 Long-term data archive Appr. 500 TB climate data Fully documented Search engine Field-based data access Server side data processing (sub-setting, format conversion) Data download free of charge World Data Center for Climate (approved by ICSU in 2003)

10 © DKRZ Data Volume Increase: small to high PB 10 12.09.2011 M. Lautenschlager (DKRZ) CAS2K11 DKRZ Overpeck et al., Science 2011 TB IPCC GCM Data

11 © DKRZ CMIP5 Data Federation 11 12.09.2011 M. Lautenschlager (DKRZ) CAS2K11 Data estimates 2010: 10 PB in total 2.5 PB WCRP requested 1 PB IPCC-AR5 core Summary Modeling centers13 Models17 Data nodes13 Gateways5 Datasets11051 Size170.78 TB CMIP5 Archive Status Friday, 09. September 2011 11:34AM (UTC) ESG infrastructure for CMIP5 provided by NCAR (ESG Portal) PCMDI (ESG Data Node)

12 © DKRZ Hey, Tansley and Tolle (2009) „The Fourth Paradigm“: – Data-intensive science consists of three basic activities: capture, curation, and analysis. Data comes in all scales and shapes, covering large international experiments; cross-laboratory, single-laboratory, and individual observations; and potentially individuals’ lives. The discipline and scale of individual experiments and especially their data rates make the issue of tools a formidable problem. (Page XIII) Climate Modeling: – In international experiments like CMIP5 data are produced without knowing all applications beforehand and these data are projected for interdisciplinary utilization (impact). This broad data application increases the volume of archived data and adds additional requirements compared to community specific data applications. Climate Research as Data Intensive Science 12 12.09.2011 M. Lautenschlager (DKRZ) CAS2K11

13 © DKRZ Complete data description with respect to browse, discover and use research data Efficient data access via common interfaces in standard formats Efficient data processing workflows even in data federations (data mining might provide new methods for information discovery) Common security management across data federations in order to offer unique access to individual archives Data replication for security and access performance Agreed quality assurance workflow and documentation of data processing and quality level in metadata in order to assign accepted quality levels Transparent data federation management ………. Data Management Requirements for Data Intensive Science 13 12.09.2011 M. Lautenschlager (DKRZ) CAS2K11

14 © DKRZ Starting today we need in future for climate data archives: – Sufficient information to find and select data properly – Sufficient standardization for automatic data processing – Transparent data quality flags to convince people to trust the archive federation – New methods to identify new information in federated data archives (data mining) – Complete data life cycle support for seamless management of large/huge amount of data volumes My Essentials 14 12.09.2011 M. Lautenschlager (DKRZ) CAS2K11

15 © DKRZ Data Life Cycle Management 15 12.09.2011 M. Lautenschlager (DKRZ) CAS2K11 DKRZ distinguishes two layers: a)Virtual research environments integrates community-based scientific research b)Long-term archiving supports interdisciplinary data utilization

16 © DKRZ 16 12.09.2011 M. Lautenschlager (DKRZ) CAS2K11 Services at DKRZ: Creation

17 © DKRZ 17 12.09.2011 M. Lautenschlager (DKRZ) CAS2K11 Services at DKRZ: Evaluation Code Optimization CMIP5

18 © DKRZ 18 12.09.2011 M. Lautenschlager (DKRZ) CAS2K11 Services at DKRZ: Archiving CERA CIM (EU-METAFOR) WDCC (CERA):

19 © DKRZ 19 12.09.2011 M. Lautenschlager (DKRZ) CAS2K11 Services at DKRZ: Dissemination IS-ENES C3-Grid CMIP5 / ESGF

20 © DKRZ 20 12.09.2011 M. Lautenschlager (DKRZ) CAS2K11 International Cooperation in Data Infrastructure Development IS-ENES: Infrastructure for the European Network for Earth System Modeling (https://is.enes.org/)https://is.enes.org/ ExArch: Climate analytics on distributed exascale data archives (G8 project) EUDAT: EUropean DATa (EU-FP7 project starting at October 1st) ESGF: Earth System Grid Federation (http://esgf.org/)http://esgf.org/ GO-ESSP: Global Organization for Earth System Science Portals (http://go-essp.gfdl.noaa.gov/)http://go-essp.gfdl.noaa.gov/ Target infrastructure:

21 © DKRZ 21 12.09.2011 M. Lautenschlager (DKRZ) CAS2K11 Future Development: Identification of distinct data objects in data federations with PID and handle system (Cooperation with European Persistent Identifier Consortium (EPIC), http://pidconsortium.eu/).http://pidconsortium.eu/ DataCite scientific data publication entity: DOI has been assigned Digital objects are frozen and approved by author Citation reference is assigned for direct use in scientific literature Realized with QC-L3 in the CMIP5 data quality assessment Data Objects NetCDF/CF including use metadata Metadata Objects CIM metadata for browse + discovery Information Objects Related more general information Transaction Record Dissemination info. of digital objects Digital Object Architecture of Climate Model Data

22 © DKRZ Peak compute performance 150 TFLOPS -> 3 PFLOPS (x20) Disk capacity 7 PB -> 150 PB (x20) Tape capacity 100 PB -> 1 EB (x10) Are we ready for the data tsunami? Are the products ready for the data tsunami? We will be happy to discuss these issues with you - before the data sweeps us away Planned DKRZ extension in 2014 22 12.09.2011 M. Lautenschlager (DKRZ) CAS2K11

23 © DKRZ 23 12.09.2011 M. Lautenschlager (DKRZ) CAS2K11 Thank you for your Attention! http://www.dkrz.de


Download ppt "CAS2K11 in Annecy, France September 11 – 14, 2011 Data Infrastructures at DKRZ Michael Lautenschlager."

Similar presentations


Ads by Google