Presentation is loading. Please wait.

Presentation is loading. Please wait.

CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten zmaw.de NCAR, October 27th – 29th, 2008.

Similar presentations


Presentation on theme: "CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten zmaw.de NCAR, October 27th – 29th, 2008."— Presentation transcript:

1 CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

2 Contents Statistics Requirements + Features General architecture Implementation (current and new) Migration Summary

3 Basic Statistics WDCC / CERA: General Statistics at 01-10-2008 00:00:10 Database Size (TByte): 370 Number of blobs: 6663287791 (6.6 billion) Data access by fields and not by files. Number of experiments: 1146 Number of datasets: 142062 Total size divided by number of BLOBs gives the average size of data access granules: 50 - 60 kB/BLOB

4 Users by continent Active Users 1-Jan-2008 until 14-Oct-2008

5 Download destinations Download destinations 1-Jan-2008 until 14-Oct-2008

6 Records per download

7 Recordsize

8 Requirements and constraints  Access over WAN  Downloads typically quite small, but huge downloads to some extent.  Small downloads imply that users are not willing to wait long …  We can not scan through large files for each download  Granularity has to be small

9 Datatypes Model data Climatological runs (global and regional) (IPCC, …) Weather forecasts (DPHASE, CEOP, …) Reanalysis data Observational data (COPS, CARIBIC, …) Satellite data products

10 Formats CERA provides the ability to store data of any format: These are the formats used GRIB (60%) NetCDF (18%) Other (22%)

11 General Architecture Midtier Data

12 General Architecture MetadataData Proxy Webserver Appl. Server Entry Reference Status Distribution Contact Coverage Parameter Spatial Reference Local Adm. Data Access Data Org Select timestep + region Convert format

13 Storage within CERA 1Data of timestep i 2Data of timestep i+1 3Data of timestep i+2 nData of timestep i+n … Database Table Data of single variable Index

14 Handicap Handicap: not enough disk space available Data stored within database: approx. 400 TB Disks available: approx. 24 TB  Database has been coupled transparently to the HSM system  How do we avoid frequent tape accesses?  Big cache   Store data as close as possible according to the needs of users: split into single variables

15 TBS - RW Tbl Partition 1 TBS - RW Tbl Partition 2 dxdb TBS - RO Tbl Partition 1 All tablespaces are moved “at once” to dxdb MigoutMigin Data migration

16 Inside the datafile Primary Key Lob Index Table Blob data Header 128k

17 Frontend versus Backend Header 128k Filesystem FrontendHSM Backend Header 128k Part 1 = 512 MB Part 2 = 512 MB

18 Retrieving data 4 Header 128k 31 25 Tape Request

19 Warehouse features Compression – nothing special used within the server Partitioning – allow parts of data to be moved to HSM Backup Nologging - beware of crash … Read only - two copies on tape

20 New implementation Metadata database will stay as is Oracle Databases holding data will be replaced by a new, self-made development Why? There is a certain risk that a future version of Oracle may not work with a / any HSM system On the long run some license costs shall be saved

21 General Architecture - new MetadataData Webserver Appl. Server Oracle-DB Blobserver

22 CERA-Container Instead of keeping data within blobs in Oracle databases, data records will be kept within so called CERA Container Files. Ability to keep huge number of records. They provide fast access independent of position within file (granular access). Provided fault tolerance against tape damages by keeping checksums within the files. Enclose read/write operations against container files in transactions. Well known format

23 Migration Concept / Team (namely Peter Drakenberg, DKRZ)  Not yet really finished Software  First software ready, in order to migrate data Convert old data  Started last week, but will take at least a year

24 Dataflow: outbound 1 2 Webserver Appl. Server 3 4 MetadataData 5 6 7 8 Processing

25 Dataflow: inbound MetadataDataserver Postprocessing Model run GFS

26 Summary CERA allows for the storage of data of different kind Format independent Metadata enables addressing of internal and external data Users are typically fetching only small amounts of data. System allows for efficient access to small data granules By using warehousing functions like Partitioning by using small Oracle database Blobs or - in future - CERA Container files.

27 Thank you !


Download ppt "CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten zmaw.de NCAR, October 27th – 29th, 2008."

Similar presentations


Ads by Google