CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten zmaw.de NCAR, October 27th – 29th, 2008.

Slides:



Advertisements
Similar presentations
Std-doi Publication of Climate Data at WDCC DataCite Summer Meeting 7./8. June 2010 Publication of climate data Heinke Höck World Data Center for Climate.
Advertisements

Andrew Hanushevsky7-Feb Andrew Hanushevsky Stanford Linear Accelerator Center Produced under contract DE-AC03-76SF00515 between Stanford University.
Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, M. Lautenschlager, H. Luthardt,
Oracle Architecture. Instances and Databases (1/2)
M.Lautenschlager (WDCC/MPI-M) / / 1 The CEOP Model Data Archive at the World Data Center for Climate as part of the CEOP Data Network CEOP / IGWCO.
1 Overview of Storage and Indexing Chapter 8 (part 1)
1 Storing Data: Disks and Files Yanlei Diao UMass Amherst Feb 15, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Memory Management (II)
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
DBMS Functions Data, Storage, Retrieval, and Update
Harvard University Oracle Database Administration Session 5 Data Storage.
Operational Dataset Update Functionality Included in the NCAR Research Data Archive Management System 1 Zaihua Ji Doug Schuster Steven Worley Computational.
M.Lautenschlager (WDCC / MPI-M) / / 1 WS Spatiotemporal Databases for Geosciences, Biomedical sciences and Physical sciences Edinburgh, November.
Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, S. Kindermann, M. Lautenschlager,
M.Lautenschlager (WDCC / MPI-M) / / 1 GO-ESSP at LLNL Livermore, June 19th – 21st, 2006 World Data Center Climate: Status and Portal Integration.
1 Lecture 7: Data structures for databases I Jose M. Peña
M. Lautenschlager (M&D/MPIM)1 The CERA Database Michael Lautenschlager Modelle und Daten Max-Planck-Institut für Meteorologie Workshop "Definition.
Upcoming Enhancements to the HST Archive Mark Kyprianou Operations and Engineering Division Data System Branch.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
CHAPTER 11 Large Objects. Need for Large Objects Data type to store objects that contain large amount of text, log, image, video, or audio data. Most.
Oracle Data Block Oracle Concepts Manual. Oracle Rows Oracle Concepts Manual.
DISTRIBUTED DATA FLOW WEB-SERVICES FOR ACCESSING AND PROCESSING OF BIG DATA SETS IN EARTH SCIENCES A.A. Poyda 1, M.N. Zhizhin 1, D.P. Medvedev 2, D.Y.
1 Designing a Data Exchange - Best Practices Data Exchange Scenarios –Sender vs. Receiver-initiated exchanges –Node Design Best Practices: –Handling Large.
CSE 781 – DATABASE MANAGEMENT SYSTEMS Introduction To Oracle 10g Rajika Tandon.
HNDComputing – DeMontfort University  DeMontfort University 2011 Database Fundamentals wk2 Database Design ConceptsDatabase Design Concepts Database Design.
Database Systems Slide 1 Database Systems Lecture 5 Overview of Oracle Database Architecture - Concept Manual : Chapters 1,8 Lecturer : Dr Bela Stantic.
Bulk Metadata Structures in CERA Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie World Data Center for Climate.
M.Lautenschlager (WDCC, Hamburg) / / 1 Semantic Data Management for Organising Terabyte Data Archives Michael Lautenschlager World Data Center.
M.Lautenschlager (WDCC, Hamburg) / / 1 Semantic Data Management for Organising Terabyte Data Archives Michael Lautenschlager World Data Center.
Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Wolfgang.
DBMS Implementation Chapter 6.4 V3.0 Napier University Dr Gordon Russell.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
M.Lautenschlager (WDCC, Hamburg) / / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure.
IDA / ADIT Databasteknik Databaser och bioinformatik Data structures and Indexing (I) Fang Wei-Kleiner.
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
Database structure and space Management. Database Structure An ORACLE database has both a physical and logical structure. By separating physical and logical.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
Week 4 Lecture 2 Advanced Table Management. Learning Objectives  Create tables with large object (LOB) columns and tables that are index-organized 
Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht,
08-Nov Database TEG workshop, Nov 2011 ATLAS Oracle database applications and plans for use of the Oracle 11g enhancements Gancho Dimitrov.
H. Thiemann (M&D) / / 1 Hannes Thiemann M&D Statusseminar, 22. April 2004.
Dataset registration process Sergey Sukhonosov, Dr. Sergey Belov National Oceanographic Data Centre, Russia Training course on establishment of the ODP.
Lautenschlager + Thiemann (M&D/MPI-M) / / 1 Introduction Course 2006 Services and Facilities of DKRZ and M&D Integrating Model and Data Infrastructure.
20 Copyright © 2008, Oracle. All rights reserved. Cache Management.
Andrea Valassi (CERN IT-DB)CHEP 2004 Poster Session (Thursday, 30 September 2004) 1 HARP DATA AND SOFTWARE MIGRATION FROM TO ORACLE Authors: A.Valassi,
Oracle Architecture - Structure. Oracle Architecture - Structure The Oracle Server architecture 1. Structures are well-defined objects that store the.
IPCC WG II + III Requirements for AR5 Data Management GO-ESSP Meeting, Paris, Michael Lautenschlager, Hans Luthardt World Data Center Climate.
Bigtable: A Distributed Storage System for Structured Data
Hannes Thiemann Michael Lautenschlager Deutsches Klimarechenzentrum GmbH, Germany EGU 2010.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Lecture 24: GFS.
M. Lautenschlager (M&D/MPIM)1 WDC on Climate as Part of the CERA 1 Database System Michael Lautenschlager Modelle und Daten Max-Planck-Institut.
Chapter 5 Record Storage and Primary File Organizations
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
3 Copyright © 2006, Oracle. All rights reserved. Designing and Developing for Performance.
Memory management The main purpose of a computer system is to execute programs. These programs, together with the data they access, must be in main memory.
1. Gridded Data Sub-setting Services through the RDA at NCAR Doug Schuster, Steve Worley, Bob Dattore, Dave Stepaniak.
2005 – 06 – - ESSP1 WDC Climate : Web Access to Metadata and Data Frank Toussaint World Data Center for Climate (M&D/MPI-Met, Hamburg)
Physical Database Design and Performance
ACL SCREEN Start with Standard ACL Project Screen.
Disk Storage, Basic File Structures, and Buffer Management
Computer Architecture
Managing Tables.
Database administration
Best Practices in Higher Education Student Data Warehousing Forum
Presentation transcript:

CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten zmaw.de NCAR, October 27th – 29th, 2008

Contents Statistics Requirements + Features General architecture Implementation (current and new) Migration Summary

Basic Statistics WDCC / CERA: General Statistics at :00:10 Database Size (TByte): 370 Number of blobs: (6.6 billion) Data access by fields and not by files. Number of experiments: 1146 Number of datasets: Total size divided by number of BLOBs gives the average size of data access granules: kB/BLOB

Users by continent Active Users 1-Jan-2008 until 14-Oct-2008

Download destinations Download destinations 1-Jan-2008 until 14-Oct-2008

Records per download

Recordsize

Requirements and constraints  Access over WAN  Downloads typically quite small, but huge downloads to some extent.  Small downloads imply that users are not willing to wait long …  We can not scan through large files for each download  Granularity has to be small

Datatypes Model data Climatological runs (global and regional) (IPCC, …) Weather forecasts (DPHASE, CEOP, …) Reanalysis data Observational data (COPS, CARIBIC, …) Satellite data products

Formats CERA provides the ability to store data of any format: These are the formats used GRIB (60%) NetCDF (18%) Other (22%)

General Architecture Midtier Data

General Architecture MetadataData Proxy Webserver Appl. Server Entry Reference Status Distribution Contact Coverage Parameter Spatial Reference Local Adm. Data Access Data Org Select timestep + region Convert format

Storage within CERA 1Data of timestep i 2Data of timestep i+1 3Data of timestep i+2 nData of timestep i+n … Database Table Data of single variable Index

Handicap Handicap: not enough disk space available Data stored within database: approx. 400 TB Disks available: approx. 24 TB  Database has been coupled transparently to the HSM system  How do we avoid frequent tape accesses?  Big cache   Store data as close as possible according to the needs of users: split into single variables

TBS - RW Tbl Partition 1 TBS - RW Tbl Partition 2 dxdb TBS - RO Tbl Partition 1 All tablespaces are moved “at once” to dxdb MigoutMigin Data migration

Inside the datafile Primary Key Lob Index Table Blob data Header 128k

Frontend versus Backend Header 128k Filesystem FrontendHSM Backend Header 128k Part 1 = 512 MB Part 2 = 512 MB

Retrieving data 4 Header 128k Tape Request

Warehouse features Compression – nothing special used within the server Partitioning – allow parts of data to be moved to HSM Backup Nologging - beware of crash … Read only - two copies on tape

New implementation Metadata database will stay as is Oracle Databases holding data will be replaced by a new, self-made development Why? There is a certain risk that a future version of Oracle may not work with a / any HSM system On the long run some license costs shall be saved

General Architecture - new MetadataData Webserver Appl. Server Oracle-DB Blobserver

CERA-Container Instead of keeping data within blobs in Oracle databases, data records will be kept within so called CERA Container Files. Ability to keep huge number of records. They provide fast access independent of position within file (granular access). Provided fault tolerance against tape damages by keeping checksums within the files. Enclose read/write operations against container files in transactions. Well known format

Migration Concept / Team (namely Peter Drakenberg, DKRZ)  Not yet really finished Software  First software ready, in order to migrate data Convert old data  Started last week, but will take at least a year

Dataflow: outbound 1 2 Webserver Appl. Server 3 4 MetadataData Processing

Dataflow: inbound MetadataDataserver Postprocessing Model run GFS

Summary CERA allows for the storage of data of different kind Format independent Metadata enables addressing of internal and external data Users are typically fetching only small amounts of data. System allows for efficient access to small data granules By using warehousing functions like Partitioning by using small Oracle database Blobs or - in future - CERA Container files.

Thank you !