Presentation is loading. Please wait.

Presentation is loading. Please wait.

Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779.

Similar presentations


Presentation on theme: "Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779."— Presentation transcript:

1 Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779

2 Drach2Sept. 10, 2002 Overview I. Earth System Grid: Grid Access to Climate Research Data II. Metadata Standards for Gridded Climate Data

3 Part I ESG: Grid Access to Climate Research Data

4 Drach4Sept. 10, 2002  The goal of ESG is to make climate data – particularly climate model data – an easily accessible community resource. The project is funded by the SciDAC program: Scientific Discovery through Advanced Computing.  Enabling researchers to understand and make effective use of very large, distributed climate datasets is critical. The broad strategy is to develop a collection of server-side capabilities – minimize the amount of data movement.  Multiple interfaces to ESG will allow researchers to focus on science rather than issues of data transfer, format, and data set manipulation.  Foundation is Globus Grid technology Earth System Grid Overview

5 Drach5Sept. 10, 2002  Globus middleware supports linkage of distributed data archives, supercomputers, workstations, local disk caches into data/computational grids.  GridFTP: high-performance, secure, robust data transfer mechanism: protocol, server, client library.  ESG is integrating OpenDAP (DODS protocol) with GridFTP protocol.  Single sign-on using Grid Security Infrastructure  Proxy certificates  Community Authorization Service (CAS)  Replica Location Service: manages copying and placement of files in a distributed environment.  Logical vs. physical files  http://www.globus.org ESG uses Globus Grid technology.

6 Drach6Sept. 10, 2002 ESG: U.S. Collaborations & Development ORNL: Climate storage & computational resources ORNL: Climate storage & computational resources ANL: Computational grids, & grid-based applications ANL: Computational grids, & grid-based applications USC/ISI: Computational grids, & grid-based applications USC/ISI: Computational grids, & grid-based applications NCAR: Climate change predication and scenarios NCAR: Climate change predication and scenarios LBNL: Climate storage Facility and access LBNL: Climate storage Facility and access LLNL: Model diagnostics & inter-comparison LLNL: Model diagnostics & inter-comparison

7 Drach7Sept. 10, 2002 Program for Climate Model Diagnosis and Intercomparison  Validation and intercomparison of atmospheric general circulation models, coupled ocean-atmosphere models  Development of analysis software, quality control, archiving, distribution of model results. Climate Data Analysis Tools (CDAT) is a Python-based analysis and visualization system.  Global warming detection studies  CMIP (coupled models) and AMIP (atmospheric GCMs) gather model simulation results from thirty modeling groups worldwide.

8 Drach8Sept. 10, 2002 PCMDI and Model Development Modeling groups PCMDI Diagnosis, quality control, data archival Simulation data Controlled simulation runs Feedback to modelers Gridded observation data Observations Data assimilation PCMDI

9 Drach9Sept. 10, 2002 ESG-II Architecture Portals Servers Middleware

10 Drach10Sept. 10, 2002 ESG: Metadata Services METADATA EXTRACTION METADATA EXTRACTION METADATA DISPLAY METADATA DISPLAY METADATA BROWSING METADATA BROWSING METADATA QUERY METADATA QUERY ESG CLIENTS API & USER INTERFACES Data & Metadata Catalog Dublin Core Database CF Database mirror Dublin Core XML Files COMMENTS XML Files METADATA HOLDINGS METADATA ANNOTATION METADATA ANNOTATION METADATA VALIDATION METADATA VALIDATION METADATA ACCESS (update, insert, delete, query) METADATA ACCESS (update, insert, delete, query) SERVICE TRANSLATION LIBRARY SERVICE TRANSLATION LIBRARY CORE METADATA SERVICES METADATA AGGREGATION METADATA AGGREGATION METADATA DISCOVERY METADATA DISCOVERY METADATA & DATA REGISTRATION METADATA & DATA REGISTRATION PUBLISHING HIGH LEVEL METADATA SERVICES SEARCH & DISCOVERY ADMINISTRATION BROWSING & DISPLAY ANALYSIS & VISUALIZATION

11 Drach11Sept. 10, 2002  OpenDAP (DODS): Distributed Oceanographic Data System (Unidata)  Integrations of Globus GridFTP, DODS data access  THREDDS: THematic Real ‑ time Environmental Distributed Data Services (Unidata)  LAS: Live Access Server (NOAA Pacific Marine Environmental Laboratory)  Works with CDAT, Ferret, GrADS, …  CDAT: Climate Data Analysis Tools (PCMDI), includes CDMS: Climate Data Management System, VCDAT visualization  Community Data Portal project (NCAR)  NCL (NCAR)  Globus Grid technology(ANL, ISI): GridFTP, CAS Community Authorization Service ESG is leveraging off existing software and projects.

12 Drach12Sept. 10, 2002 CDAT: Example of an ESG GUI Client Access

13 Drach13Sept. 10, 2002 LAS/CDAT: Example of a Web- based Data Portal  Technology: Web Based (end user requirements) LAS, DODS, ESG (i.e., Globus), CDAT  Portal should hide/simplify the Grid for users Single sign-on Community-based authorization Simplified resource location Remote job submission, management  Accesses the ESG Grid Testbed

14 Part II Metadata Standards for Gridded Climate Data

15 Drach15Sept. 10, 2002  Most climate simulation data are in the form of gridded datasets: collections of variables as a function of longitude, latitude, time, and vertical level.  A dataset is a logical container:  A file  An aggregation of files  A collection of database tables  Model-generated data  Model data  Derived data: zonal averages, global averages, virtual variables  Observational data, including reanalyses  Attributes in the form of (name, value) pairs, array values Climate Model Datasets

16 Drach16Sept. 10, 2002  Suitable basis for storing data, but lack the metadata to support certain application requirements  netCDF (UCAR)  array data model  flexible attribute/value metadata model  simple API  HDF (NCSA, NASA)  collection of APIs, can be tailored to specific data models including scientific data sets, satellite data, point data Binary formats

17 Drach17Sept. 10, 2002  GRIB (WMO, ECMWF, NCEP)  mixed sequential/array data model  tailored for simulation output, supports common horizontal grid types  hardwired metadata model  good compression capabilities  lacks a standard API Binary formats

18 Drach18Sept. 10, 2002  Self-describing binary formats are flexible, but underconstrain representation of coordinate systems. Coordinate Systems Index Space Variable Space Coordinate Space Coordinate System Time(i) Latitude(j,k) Longitude(j,k) V = Temperature(Time, Latitude, Longitude) V’ = Temperature(i,j,k)

19 Drach19Sept. 10, 2002  Curvilinear grid - Los Alamos POP ocean model Horizontal Grids Temperature(i,j) Latitude(i,j) Longitude(i,j) Lat_bounds(i,j,4) Lon_bounds(i,j,4)

20 Drach20Sept. 10, 2002  Reduced grid Horizontal Grids Temperature(i,j) Latitude(i) Longitude(i,j) Lat_bounds(i,2) Lon_bounds(i,j,4)

21 Drach21Sept. 10, 2002  General grid – Colorado State geodesic grid Horizontal Grids Temperature(npts) Latitude(npts) Longitude(npts) Lat_bounds(npts,6) Lon_bounds(npts,6)

22 Drach22Sept. 10, 2002  Applications must be able to recognize the spatial/temporal coordinate axes.  Visualization: continental overlays  Data: selection by axis type Spatial/temporal location file = cdms.open(‘sample.nc’) temperature = file[‘temperature’] data = temperature(latitude=(-45.0, 45.0)) file = cdms.open(‘sample.nc’) temperature = file[‘temperature’] data = temperature(latitude=(-45.0, 45.0))

23 Drach23Sept. 10, 2002  Climate simulations use different types of calendars  ‘proleptic’ Gregorian  Julian  Mixed Gregorian/Julian  No leap years (noleap)  30-day months  Climatologies represent multi-year averages. Time representation and calendars

24 Drach24Sept. 10, 2002  Several conventions have been developed to augment the netCDF data model.  Represent a balance between needs of data producers and data consumers.  COARDS convention  1D coordinates axes, rectilinear horizontal grids  axis identification based on units  variables limited to four dimensions  ordering of dimensions fixed  http://ferret.wrc.noaa.gov/noaa_coop/coop_cdf_profile.html Metadata conventions

25 Drach25Sept. 10, 2002  CF (Climate and Forecast) convention  Based on earlier conventions, COARDS and GDT  multidimensional coordinates (auxiliary coordinate variables)  simplified axis identification  specific representation for several horizontal grid types  rectilinear  curvilinear  reduced grids  variables can have an arbitrary number of dimensions  no constraint on ordering of dimensions  non-Gregorian calendars  standard name table  http://www.cgd.ucar.edu/cms/eaton/cf-metadata/ Metadata conventions

26 Drach26Sept. 10, 2002  Ability to recognize comparable quantities is fundamental to model intercomparison.  CF defines a schema for standard name tables  XML representation used for table of standard variable names and descriptions  standard_name attribute is optional. No restriction on variable names.  Relationship to ontology development? Comparability of quantities Program for Climate Model Diagnosis and Intercomparison support@pcmdi.llnl.gov Pa Pressure defined at the level of the mean topography within the grid box. air_pressure_at_sea_level

27 Drach27Sept. 10, 2002  ESG has adopted the netCDF data model and the CF convention as standards  Other standards and conventions will follow.  NcML markup language. ESG metadata

28 Drach28Sept. 10, 2002  CF and NcML apply to data aggregates as well as files  Data aggregation: collections of files/datasets are treated as single entities.  array model  netCDF-like  tailored for extraction of 'hyperslabs' of data  Aspects of aggregation:  combining/merging variables  joining variables  creating new coordinate axes  overlaying/adding metadata  nesting datasets Aggregation

29 Drach29Sept. 10, 2002  Aggregation maps well to multifile datasets: multifile datasets can be thought of as 'partitioned' into files. Variables may 'span' multiple files.  Usually a dataset is partitioned on time and/or vertical level axes.  PCMDI CDAT supports aggregations via the cdscan utility, uses XML representation  THREDDS/DODS aggregation server (http://www.unidata.ucar.edu/project s/THREDDS/) Aggregation Time Level Variable

30 Drach30Sept. 10, 2002  The Earth System Grid project is developing metadata services to support a variety of schemas and conventions.  The initial focus of ESG is to enable climate researchers to make effective use of distributed, model-generated datasets.  The netCDF schema and CF convention are the foundation for representation of this data. Summary


Download ppt "Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES-149779."

Similar presentations


Ads by Google