Presentation on theme: "Introduction to Observational DataBase (ODB) sami."— Presentation transcript:
1Introduction to Observational DataBase (ODB) sami. saarinen@ecmwf Introduction to Observational DataBase (ODB) 25-Apr-2007
2Overview Introduction to ODB Creating a simple database Use of simulobs2odb –programVisualizing data using basic odbviewerMore complex databasesODB within IFS/4DVAR-systemManipulating ODB data from Fortran90Few tools: odbsql, odbdiff, odbcompress, odbdup, odb2netcdfODBTk : A GUI-based ODB visualisation toolkitA separate presentation & demo by Paul Burton
3Introduction to ODBODB is a tailor made (hierarchical) database software developed at ECMWF to manage very large observational data volumes through the ECMWF IFS/4DVAR-system on highly parallel supercomputer systemsODB also enables flexible post-processing of observational data even on a desktop computerODB software is written in C and Fortran-90 languages and is available virtually on any Unix-systems (and now also for Windows/CYGWIN)The software can be installed from source code (“tar-ball“) normally in a less than an hour
7… Introduction to ODBAn observational database usually contains following items:Observation identification, position and time coordinatesObservation value, pressure levels, channel numbersVarious quality control flagsObs. departures from background and analysis fieldsSatellite specific informationOther closely related informationAll information can be accessed via ODB/SQL language and Fortran90 interfaceAlso a direct (read-only) access to ODB-data is now availableno programming effort to “scan” ODB-data
8Basic components of ODB ODB/SQL-languageData Definition Language: To describe what data items belong to database, what are their data types and how they are related (if any) to each otherData Query Language: To query and return a subset of data which satisfies certain user specified conditions. This is the key feature of the ODB software !!Fortran90 interface layerData manipulation : create, update & remove dataExecute ODB/SQL-queries and retrieve filtered dataTo control MPI and OpenMP-parallelization
9Creating a simple ODB database We will create a very simple database using text filesThe 3 text files describeData layout i.e. what data items will go into ODBLocation and time information of observationsActual observation measurement information for each location at the given pressure levelsFeed these files into simulobs2odb-programDiscover the data values in database by using odbviewer
10Data definition layout : MYDB.ddl CREATE TABLE hdr AS (seqno pk1int,obstype pk1int,codetype pk1int,lat pk9real,lon pk9real,date yyyymmdd,time hhmmss,body @LINK,);CREATE TABLE body AS (entryno pk1int,varno pk1int,vertco_type pk1int,press pk9real,obsvalue pk9real, );
11Input file#2 : hdr.txt #hdr obstype = 2 codetype = 141 seqno lat lon date time body.len
13Running simulobs2odb Initialize ODB interactive environment : use odb Create database using the following simple command :simulobs2odb –l MYDB –i hdr.txt –i body.txtAs a result of these commands, a small database called MYDB has been created and it contains one data pool with two tables hdr and body, which are linked (related) to each other via data typeIt is now easy to extend database by providing more data, or specifying more data items, or adding more tables, or all above at the same time
14Visualizing with odbviewer History: odbviewer was originally written to be used as a debugging tool for ODB software developmentLinked with ECMWF graphics package MAGICS/MAGICS++Displays coverage plotsAlso a textual report generatorDisplays output of data queries“Sensitive” to ODB/SQL-language : tries automatically produce both coverage plot and textual report for the userTextual report itself can be invaluable source of information for further post-processing tasksMaking use of the new and more economical tool odbsql
15Running odbviewer Go to database directory cd MYDB Run odbviewer –q ‘SELECT lat,lon,press,obsvalue\FROM hdr, body \WHERE obstype = 2’
17Some odbviewer options -h List of options (gimme some “help” !)-q ‘SQL-stmt’ Provide ODB/SQL-statement inline-v viewname/poolno Choose SQL name (& optionally pool number)-p “1-10,12,15” Choose from a subset of pools-R No radians-to-degrees conversion for (lat,lon)-r Enforce radians-to-degrees conversion-k Show (lat,lon) in degrees even if they were in radians in DB-c Clean start (i.e. recompile all)-e editor Choose preferred editor-e batch Run in batch mode (same as –e pipe)-N Do not produce a report at all-I Do not show plot immediately-P projection Change display projection-C file.cmap Supply a color map file-A plot_area Choose plotting area-F (en)Force to use the old style odbviewer over ‘odbsql’
18More complex databases In reality databases usually contain many more tables (>>5) than in the simple example earlierEach table can contain 10—50 data columnsThere can also be a sophisticated data hierarchy (see the next slide) to describe potentially quite complex relationships between tablesIn order to provide a good parallel performance on supercomputers, data tables are furthermore divided into data pools, which enables parallel I/O, too:They behave like sub-databases within a databaseAllows much bigger data sets than otherwise possible
22AMSU-A data after screening Under 10% left active !!
23Typical ODB usage at ECMWF … Database can be created interactively or in batch modeWe usually run our in-house BUFR2ODB in batch-modeNew observation types can also be fed in via text fileComplete database manipulation prefer using Fortran90- interface, but any read/only-database can also be accessed via rudimentary client-server –interface (C/C++)Another possibility is to run the new tool – odbsqlNo need to use of ODB/SQL compilation systemNo need to write a single line of Fortran90The tool is under development
24… Typical ODB usage at ECMWF When database has been created, the application program queries data via precompiled ODB/SQL and places the result data (also known as view ) into a data matrix allocated by the user programThere can virtually be any number of active views at any given time. These can be updated and fed back to databaseDue to ODB, the use of WMO BUFR has therefore been minimized at ECMWF in order to enable faster and more robust processing of observations
25ECMWF BUFR to ODB conversion ODBs at ECMWF are normally created by using bufr2odbEnables MPI-parallel database creation efficientAllows retrospective inspection of Feedback BUFR data by converting it into ODB (slow & not all data in BUFR)bufr2odb can also be used interactively, for example: bufr2odb –i bufr_input_file –I 1-20 –n 4The preceding example creates 4 pools of ECMA database from the given BUFR input file, but includes only BUFR subtypes from 1 to 20 (inclusive)Feedback BUFR to ODB works similarly:fb2odb –i feedback_bufr_file –n 8 –u 2
26Manipulating ODB from Fortran90 Currently Fortran90 is the only way to fill an ODB databasesimulobs2odb is also a Fortran90-program underneathlikewise odbviewer or practically any other ODB-toolAlso: to fetch and update data, Fortran90 is necessaryODB Fortran90 interface layer offers a comprehensive set of functions toOpen & close databaseAttach to & execute precompiled ODB/SQL queriesLoad, update & store queried data
27An example ODB program program main use odb_module implicit none integer(4) :: h, rc, nra, nrows, ncols, npools, j, jpreal(8), allocatable :: x(:,:)npools = 0h = ODB_open(‘MYDB’, ’OLD’, npools=npools)< data manipulation loop ; see next page >rc = ODB_close(h, save=.TRUE.)end program main
28Data manipulation loop DO jp=1,npools! Execute SQL, allocate space, get data into matrixrc = ODB_select(h,’sqlview’,nrows,ncols,poolno=jp)allocate(x(nrows,0:ncols))rc = ODB_get(h,’sqlview’,x,nrows,ncols,poolno=jp)! Update data, put back to DB, deallocate spacecall update(x,nrows,ncols) ! Not an ODB-routinerc = ODB_put(h,’sqlview’,x,nrows,ncols,poolno=jp)deallocate(x)rc = ODB_cancel(h,’sqlview’,poolno=jp)! Use the following only with READONLY-databases! rc = ODB_release(h,poolno=jp)ENDDO
29Compile, link and run use odb # once per session (2) odbcomp MYDB.ddl # once only;often from file MYDB.sch(3) odbcomp sqlview.sql # recompile only when changed(4) odbf90 main.F90 update.F90 –lMYDB –o main.x # link(5) ./main.x # run
31odbsql A new tool to access ODB data in read/only –mode Does not generate C-code, but dives directly into dataUsually faster than generated C-code with exception of accessing large amounts of satellite data (investigated)The tool is under active development right nowUsage: odbsql –q ‘SELECT column(s) FROM table(s) WHERE …’ \–s starting_row –n number_of_rows_to_display \[–X] [other_options]
32ODB/SQL – examples (1)SET $t2m = 39; // Scalar parameters, whose values …SET $synop = 1; // … can be overridden in Fortran90CREATE VIEW t2m ASSELECT an_depar, fg_depar, lat, lon, obsvalueFROM hdr, bodyWHEREobstype = $synop // Give me synopsAND= $t2m // Give me 2 meter temperaturesobsvalue is not NULL ; // Don’t want missing data
36odbdiff Enables comparison of two ODB databases for differences A very useful tool when trying to identify errors/differences between operational and experimental 4DVAR runsUsually a non-trivial taskUsage:odbdiff –q ‘SELECT …’ /dir1/DATABASE1 /dir2/DATABASE2By default the command brings up an xdiff-window with respect to differencesIf latitude and longitude were also given in the data query, then it also produces a difference plot using odbviewer-tool
37odbcompressEnables to create very compact databases from the existing ones forarchiving purposes, orfor smaller database footprint (disk occupancy)Makes post-processing considerably fasterThe user can choose toTruncate the data precision, and/orLeave out columns that are less of an importanceTypical compression ratios vary between 2.5X … 11Xthe high compression achieved for satellite data !!
38odbdup/odbmerge Allows f.ex. database sharing between multiple users Over shared (e.g. NFS, Lustre, GPFS, GFS) disksDuplicates [merges] database(s) by copying metadata (low in volume), but shares the actual (high volume) binary dataAlso enables creation of time-series database, for example: odbdup –i “200701*/ECMA.conv” –o USERDBThe previous example creates a new database labelled as USERDB, which presumably spans over the all conventional observations during the January 2007The main point : user has now access to whole month of data as if it was a single database !!
39odb2netcdfTranslates the result of a given ODB-query (or whole ODB- table) into a series of NetCDF-files, by default one file for each ODB data pool (i.e. partition)Usage:odb2netcdf –q ‘SELECT …’ [-p pool_number] [-P]The result files can be viewed with the standard NetCDF tools like ncdump and ncviewThe files can also be created in the NetCDF packed format (caveat : truncated data precision), -P option was used
40Some interesting facts on ODB Written mainly in C-languageExcept Fortran90-interface and IFS/4DVAR interfaceExcept BUFRODB (by Milan Dragosavac, ECMWF)ODB/SQL is currently converted into C-code10 lines of SQL generates >> 100 lines of C-codeStandalone ODB installation (w/o IFS) is also availableTested at least on the following machinesSGI/Altix, IBM Power3/4/5, Linux Intel/AMDFujitsu VPPs, NEC SX, Cray XT3/4Automatic binary data conversion guarantees database portability between different machines
41… and some ODB “limitations” ODB software is clearly meant for large scale computation since – given lots of memory and disk space, fast CPUs:A single program can handle up to 2^31 ODB databasesA single database can have up to 2^31 data poolsA single database can have any number of tablesA single table in a data pool can have up to 2^31 rows and (by default) 9999 columnsA single ODB/SQL-query over active data pools can retrieve up to 2^31 rows in one goThese really big numbers show that ODBs potential is on parallel computers. Yet we haven’t forgotten the PCs!
42Finally…ODB software is developed to allow unprecedented amounts of satellite data through the IFS/4DVAR systemSoftware has been operational at ECMWF since June’2000, but is still evolvingEmphasis is now on graphical post-processing and how to enable fast access to very large amounts of dataWho is using ODB outside ECMWF ? At least …MeteoFrance, Hungarian MS, SMHI, FMIAladin and some HIRLAM nationsAustralian Bureau of MeteorologyUniversity of Vienna via re-analysis ERA40 collaboration