Presentation on theme: "Slide 1 TIGGE phase1: Experience with exchanging large amount of NWP data in near real-time Baudouin Raoult Data and Services Section ECMWF."— Presentation transcript:
Slide 1 TIGGE phase1: Experience with exchanging large amount of NWP data in near real-time Baudouin Raoult Data and Services Section ECMWF
Slide 2 The TIGGE core dataset THORPEX Interactive Grand Global Ensemble Global ensemble forecasts to around 14 days generated routinely at different centres around the world Outputs collected in near real time and stored in a common format for access by the research community Easy access to long series of data is necessary for applications such as bias correction and the optimal combination of ensembles from different sources
Slide 3 Building the TIGGE database Three archive centres: CMA, NCAR and ECMWF Ten data providers: -Already sending data routinely: ECMWF, JMA (Japan), UK Met Office (UK), CMA (China), NCEP (USA), MSC (Canada), Météo- France (France), BOM (Australia), KMA (Korea) -Coming soon: CPTEC (Brazil) Exchanges using UNIDATA LDM, HTTP and FTP Operational since 1st of October 2006 88 TB, growing by ~ 1 TB/week -1.5 millions fields/day
Slide 4 Archive Centre Current Data Provider Future Data Provider NCAR NCEP CMC UKMO ECMWF MeteoFrance JMA KMA CMA BoM CPTEC TIGGE Archive Centres and Data Providers
Slide 5 Strong governance Precise definition of: -Which products: list of parameters, levels, steps, units,… -Which format: GRIB2 -Which transport protocol: UNIDATA’s LDM -Which naming convention: WMO file name convention Only exception: the grid and resolution -Choice of the data provider. Data provider to provide interpolation to regular lat/lon -Best possible model output Many tools and examples: -Sample dataset available -Various GRIB2 tools, “tigge_check” validator, … -Scripts that implement exchange protocol Web site with documentation, sample data set, tools, news….
Slide 7 Quality assurance: homogeneity Homogeneity is paramount for TIGGE to succeed -The more consistent the archive the easier it will be to develop applications There are three aspects to homogeneity: -Common terminology (parameters names, file names,…) -Common data format (format, units, …) -Definition of an agreed list of products (Parameters, Steps, levels, …) What is not homogeneous: -Resolution -Base time (although most provider have a run a 12 UTC) -Forecast length -Number of ensemble
Slide 8 QA: Checking for homogeneity E.g. Cloud-cover: instantaneous or six hourly?
Slide 9 QA: Completeness The objective is to have 100% complete datasets at the Archive Centres Completeness may not be achieved for two reasons: -The transfer of the data to the Archive Centre fails -Operational activities at a data provider are interrupted and back filling past runs is impractical Incomplete datasets are often very difficult to use Most of the current tools (e.g. epsgrams) used for ensemble forecasts assume a fixed number of members from day to day -These tools will have to be adapted
Slide 11 GRIB to NetCDF Conversion t, EGRR, 1 t (1,2,3,4) d (1,2,3,4) Metadata t, ECMF, 2 t, EGRR, 2 t, ECMF, 1 d, EGRR, 1 d, EGRR, 2 d, ECMF, 1 d, ECMF, 2 Gather metadata and message locations Create NetCDF file structure Populate NetCDF parameter arrays (1,2,3,4) represents ensemble member id (Realization) GRIB FileNetCDF File
Slide 12 Ensemble NetCDF File Structure NetCDF File format -Based on available CF conventions -File organization built according to Doblas-Reyes (ENSEMBLES project) proposed NetCDF file structure -Provides grid/ensemble specific metadata for each member Data Provider Forecast type (perturbed, control, deterministic) -Allows for multiple combinations of initialization times and forecast periods within one file. Pairs of initialization and forecast step
Slide 13 Ensemble NetCDF File Structure NetCDF Parameter structure (5 dimensions): -Reftime -Realization (Ensemble member id) -Level -Latitude -Longitude “Coordinate” variables are use to describe: -Realization Provides metadata associated with each ensemble grid. -Reftime Allows for multiple initialization times and forecast periods to be contained within one file
Slide 14 Tool Performance GRIB-2 Simple Packing to NetCDF 32 BIT -GRIB-2 size x ~2 GRIB-2 Simple Packing to NetCDF 16 BIT -Similar size GRIB-2 JPEG 2000 to NetCDF 32 BIT -GRIB-2 size x ~8 GRIB-2 JPEG 2000 to NetCDF 16 BIT -GRIB-2 size x ~4 Issue: packing of 4D fields (e.g. 2D + levels + time steps) -Packing in NetCDF similar to simple packing in GRIB2: Value = scale_factor * packed_value+ add_offset ; -All dimensions shares the same scale_factor and add_offset -For 16 bits, only different 65536 values can be encoded. This is a problem if there is a lot of variation in the 4D matrices
Slide 15 GRIB2 WMO Standard Fine control on numerical accuracy of grid values Good compression (Lossless JPEG) GRIB is a record format -Many GRIBs can be written in a single file GRIB Edition 2 is template based -It can easily be extended
Slide 16 NetCDF Work on the converter gave us a good understanding of both formats NetCDF is a file format -Merging/splitting NetCDF files is non-trivial Need to agree on a convention (CF) -Only lat/long and reduced grid (?) so far. Work in progress for adding other grids to the CF -There is no way to support multiple grids in the same file Choose a convention for multi fields per NetCDF files -All levels? All variables? All time steps? Simple packing possible, but only a convention -2 to 8 times larger than GRIB2
Slide 17 Conclusion True interoperability -Data format, Units -Clear definition of the parameters (semantics) -Common tools are required (only guarantee of true interoperability) Strong governance is needed GRIB2 vs NetCDF -Different usage patterns -NetCDF: file based, little compression, need to agree on a convention -GRIB2: record based, easier to manage large volumes, WMO Standard