Presentation is loading. Please wait.

Presentation is loading. Please wait.

File Formats, Conventions, and Data Level Interoperability ESDSWG New Orleans, Oct 20, 2010 Joe Glassy, Chris Lynnes ESDSWG Tech Infusion.

Similar presentations


Presentation on theme: "File Formats, Conventions, and Data Level Interoperability ESDSWG New Orleans, Oct 20, 2010 Joe Glassy, Chris Lynnes ESDSWG Tech Infusion."— Presentation transcript:

1 File Formats, Conventions, and Data Level Interoperability ESDSWG New Orleans, Oct 20, 2010 Joe Glassy, Chris Lynnes ESDSWG Tech Infusion

2 Introduction & overview Outline of objectives: – Discuss role of standard, self-describing “File formats” in data level interoperability – Summarize common file formats in use, their properties, & benefits --“data life cycle economics” – Discuss criteria for choosing a file format, matching it to needs of consumer/producers. – Discuss critical role of Conventions – any file format needs good recipes to make them interoperable! – Examples: NASA Measures F/T, SMAP, AIRs, Aura

3 Role(s) Of File Formats in Interoperability File formats represent versatile “packages” for multi-dimensional science data and metadata. Offer self-describing “well-known structures” to codify desired, common conventions and practices. Offer well-documented reference cases to encapsulate specific data models. Standard file formats dock with format-aware tools to offer users a seamless end-to-end experience and platform portability Enhance Mission-to-Mission continuity

4 …investment  life-cycle economics…

5 Why (and how) are file formats important? Standard formats – Come with thorough documentation – Provide good Reference implementations Common formats – More datasets in a format  more tools that read that format Canonical structures and names  general purpose handlers for coordinates, etc.  smarter tools

6 A generic work flow… Consider user community needs and culture, fit within architecture, institutional policies & preferences Choose a standard file format (or sub-variant) Design a convention-enabled, specific internal layout with metadata interfaces Prototype: Implement in prototype, evaluate Implement in production context Integrate within discovery and catalog environments (Catalog interoperability…)

7 Examples of standard file formats HDF5 – a file format on its own, as well as a broad foundation for others netCDF v4 (stable at v4.1.1, newest : v4.1.2-beta1) – v4 Classic (widespread adoption, some limitations…) – v4 Enhanced (support Groups, User-defined, variable length types, and more) netCDF v3 Classic (legacy+, tools+, but limited) HDFEOS2, HDFEOS5 – EOS Terra, Aqua, Aura… HDF4 – legacy, extensive use by MODIS Terra, Aqua Many other domain-specific, less generic formats abound… (need transform tools to/from HDF?)

8 Some selection criteria… Do file-format’s capabilities support required functionality? What is breadth of acceptance, adoption within larger community? (and/or, does institutional policy dictate a specific format?) Presence and quality of documentation (reference, examples and especially tutorials), API software, and community support? Contribution to investment, data life-cycle economics? What is the level of standardization? Adaptability of format to widely used conventions like CF 1.x, or other accepted convention(s)?

9 Internal Layout / Design (once format is chosen & adopted…) Define &refine High level organization /structure /DATA /METADATA Distinguish ‘data’ from ‘metadata’, core structure vs. ‘attributes’ – Dimensions, Coordinate Variables, projection attributes – Missing_data, _Fillvalue vs. internal fill value – Units, Gain, offset, min, max, range, etc. Prototype it! – Leverage script environments (Python H5Py, PyTables, etc) – Panoply, HDFView also quick, useful for prototyping, feedback

10 Using “Groups” HDF5 (and NetCDF v4-Enhanced) support full use of groups e.g. /DATA vs. /METADATA, etc. Groups useful in partitioning out functionally related sets of data or attributes; Hierarchical view mimics file-system Facilitates appropriate information-hiding, highlights needed info, shield other (principle of least privilege…) Well supported by modern tools (Panoply, HDFViews, PyTables, H5Py) and low-lev APIs.

11 Example(s) of File Formats In Action HDF5 – NASA Measures – NASA Measures Freeze/Thaw (soon available at NSIDC) – AQUA AIRS Level 2 (from earlier talk) : – 0/285/AIRS L2.RetStd.v G hdf 0/285/AIRS L2.RetStd.v G hdf Aura TES ( TES-Aura_L3-CH4_r _F01_05.he5 )

12 Example: NASA Measures Freeze/Thaw, Daily in HDF5 Metadata Block: Attributes

13 Example: NASA Measures Daily Freeze/Thaw in HDF5 Data Variable (FT_SSMI) and Attributes

14 Example: NASA Level 2 AIRS (Swath) in HDF4

15 Example: NetCDF, (tos) Sea surface temperatures collected by PCMDI for use by the IPCC, illustrating CF v1.0 layoutIPCC

16 Example: TES (HDFEOS5) illustrating CF v1.0 layout

17 CF Conventions & file formats: --how they contribute to interoperability. CF v1.4.x -- the term “CF” is now broader than just climate-forecasting! Standard Name Table -- a step towards wider adoption of names, controlled vocabularies, units terminology CF v1.4.x provides tool-makers with helpful “lingua- franca” guidance. Within a file-format, adopting conventions like CF promotes common layout, names, semantics, for dataset-to-dataset compatibility -- a key to wider data level interoperability.

18 Attributes vs. Metadata? one man’s ceiling is another man’s floor… Collection level vs. Data Set vs. Granule level Structural vs. science-content Swath vs. grid vs. point Commonly used attributes: – CONVENTIONS attrib, communicates which convention was used – Basic globals: title, history, institution, source, references – Coordinate variables, axis, formula_terms – Units, _Fillvalue, missing_data, valid_range – Short_name, long_name, other provenance – (gain,offset /scale_factor,addOffset), etc.

19 Challenges? (just a few remain…) Evolution, bifurcation, asymmetric support can result in occasional user confusion: – HDF v1.8.x vs. v1.6.x families? – NetCDF v4 Enhanced vs. NetCDF v4 Classic vs. v3? – HDFEOS5 vs. HDFEOS2? Both GUI tool and API support tends to vary by platform (Linux, Mac, Win7) and sub-flavor… Multi-library dependency stacks beg for fully bundled, version-matched end-to-end install pkg! Conventions community (CF v1.4.x) and metadata standards communities also in motion (but that’s good too…)

20 Resources : URLs Climate Forecast (CF) Conventions (now at 1.4.x): – – HDF: – HDFEOS – – NetCDF: – – ml ml General: – Describing_Formats –

21 Resources: File format related Tools Panoply: HDFView: OpenDAP : IDV : McIDAS : Python : – h5py : – PyTables: Perl : PDL-IO-HDF5, and Biohdf? Many others: HEG, MTD, HDFEOS plug-in for HDFview, HDFLook, (ncdump, h5dump, and cousins), GRADS, Matlab, binary APIs

22 A provisional DOI, UUID Strategy What we used for NASA Measures Freeze/Thaw, daily (v2) just delivered: – DOI: assigned to our reference paper, by IEEE Transactions in Geoscience and Remote Sensing – UUID recipe, seedString = Import uuid uuid= uuid.uuid5(seedString)


Download ppt "File Formats, Conventions, and Data Level Interoperability ESDSWG New Orleans, Oct 20, 2010 Joe Glassy, Chris Lynnes ESDSWG Tech Infusion."

Similar presentations


Ads by Google