Presentation is loading. Please wait.

Presentation is loading. Please wait.

Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group.

Similar presentations

Presentation on theme: "Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group."— Presentation transcript:

1 Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group

2 Outline Data interoperability: what and why? Factors affecting data interoperability Implementations that support interoperability

3 What is Data Interoperability? Data interoperability exists when a data user is able to work with (view, analyze, process, etc.) a data provider's science data or model output “transparently,” without having to reformat the data, write special tools to read or extract the data, or rely on specific proprietary software. Quicker data usability, easier portability, more transparency – S. Volz

4 Illustration: Panoply DATASET COMPARISON North American Reanalysis from NCDC Atmospheric Infrared Sounder (AIRS) from GES DISC PROCEDURE 1.Cut and paste NARR OPeNDAP URL 2.Double-click variable to display 3.Repeat for AIRS

5 What good is data interoperability? Makes it easier to write tools that work with many datasets......Which increases the ability to work with multiple datasets together......And promotes user-satisfaction and early experiences with ( {your|my|our} data)......Which enhances a dataset’s life-cycle economics.

6 FACTORS AFFECTING DATA INTEROPERABILITY There is no single path to interoperability…

7 File Formats Standard formats – More economical to develop general tools – Format is well documented – APIs* exist – Many datasets enabled by one set of code modules “Self-describing” formats – Contain embedded metadata to interpret the content, context, and/or structure of the file *Application Programming Interfaces

8 File Structures Coordinates: where and named how? – Latitude, longitude – Vertical dimension: altitude, pressure, sigma level, depth,... – Time Flat vs. hierarchical Simple vs. complex

9 Usage Metadata Inside file vs. separate file – Easy for users to lose a separate file – A key benefit of self-describing formats Variable-level metadata – Units – Fill Value – Scale / offset File-level metadata Standards (e.g., CF-1, HDF-EOS, ISO 19115)

10 Grids Common grids enable dataset comparison, merging, etc. Reprojection from one grid to another usually loses information Tradeoff – Most appropriate grid for a dataset vs.... –...most commonly used grid in the “community” – Keep in mind that the potential community may be much broader than you think

11 Names and Units Variable names – Standard names (CF-1) – Unique names within file Some tools have difficulty with hierarchies having variables with the same name in different branches Dimension / coordinate names – Latitude, longitude, time, altitude/pressure Unit names – Standard units – Unit conversion Note that altitude pressure requires additional information Filenames – Descriptive filenames: dataset, version, data date/time…

12 Sidebar: Data Identifiers Filenames, even descriptive ones, may not be completely reliable as unique identifiers Identifiers are ideally embedded within the data file Uniquely identifying datasets and data files helps: – Catalog interoperability – Transparency / provenance – Citation metrics See Ruth Duerr’s talk on recommendations for unique identifiers for datasets and granules Future tools may make use of these embedded identifiers: look up references, get related data...


14 CF-1 Climate-Forecast convention – Popular in modeling community – Extending to point and satellite data Coordinate system: Key for tool usage – Latitude + longitude Specifications for both regular L3 grids and L2 swaths – Time, vertical – Recognizable via units (e.g. “degrees_north”) Standard variable names: Key for model incorporation Most often associated with netCDF – Also applicable in OPeNDAP – Work is underway to apply to HDF5

15 OPeNDAP Open-Source Project for a Network Data Access Protocol Client-Server framework – Standard web (GET) request syntax Remote fine-grained access to data files Presents a standard data model and “format” to clients Supports multiple formats on the back end – HDF, netCDF, ASCII, GRIB, binary Multiple server implementations – Hyrax, THREDDS, ERDDAP, GDS, Dapper, PyDAP, TSDS... Client support in many tools – IDV, McIDAS-V, GrADS, Matlab, IDL, Ferret, Panoply

16 Web Coverage Service Client-Server framework – Open Geospatial Consortium protocol – Standard web (GET) request syntax Multiple response formats, including GeoTIFF, netCDF/CF-1 and HDF-EOS Includes spatial subsetting BUT: – Client support is still nascent outside GIS community – Some datatypes are difficult or impossible to fit into WCS (e.g., limb-scanning profiles)

17 Semantic Web Enables machine recognition of: – names – relationships Effective for: – Metadata – Small ASCII data Use of semantic web to make Earth Science data interoperable is still in its experimental phase

18 Data Tools for Use with Interoperable Data Panoply – IDV – McIDAS-V – GrADS – Ferret –

19 Summary Data users benefit from data interoperability – More tools available to handle more datasets Consider format, structure, grids, metadata and naming If interoperability cannot be built in at data production, some tools (OPeNDAP, WCS, semantic web) can compensate......IF the metadata and information content of the data are sufficient


21 References Practical Data Interoperability for Earth Scientists Recommendations for Data Level Interoperability HDF HDF-EOS netCDF OPeNDAP: CF-1 Web Coverage Service

22 OPeNDAP URL examples Get metadata in XML 5/2010/285/AIRS.2010.10.12.090.L2.RetStd.v5.2.2.0.G10286064818.hdf. ddx Get data slice in ASCII: 5/2010/285/AIRS.2010.10.12.090.L2.RetStd.v5.2.2.0.G10286064818.hdf. ascii?H2OMMRStd[0:1:44][0:1:29][4:1:5] Data access URL for clients (IDV, Panoply): 5/2010/285/AIRS.2010.10.12.090.L2.RetStd.v5.2.2.0.G10286064818.hdf

Download ppt "Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group."

Similar presentations

Ads by Google