Presentation is loading. Please wait.

Presentation is loading. Please wait.

Unidata & NetCDF BoF Scientific File Formats

Similar presentations


Presentation on theme: "Unidata & NetCDF BoF Scientific File Formats"— Presentation transcript:

1 Unidata & NetCDF BoF Scientific File Formats
RDA Fourth Plenary Meeting 23 September 2014 Amsterdam, NL Dr. Mohan Ramamurthy Unidata Program Center UCAR Community Programs

2 Unidata: A Geosciences Data Facility
Established in 1984 Funded primarily by NSF on a five-year proposal cycle The collateral benefits have been far and wide

3 Core Activities Acquire and distribute real-time meteorological data for education, research, and outreach Develop software for accessing, managing, analyzing, visualizing, and effectively using geosciences data Provide comprehensive training and support to users Facilitate advancement of standards and conventions Advocate on behalf of the community and negotiate data & software agreements Assess and respond to community needs, engage the stakeholders, and promote sharing of data, tools, and ideas Provide funds to universities to enable/enhance their participation

4 A Snapshot of Products & Services
Software: Data Distribution: LDM Remote Data Access: THREDDS, ADDE, and RAMADDA Data Management: netCDF, UDUNITS, and Rosetta Analysis and Visualization: GEMPAK, McIDAS, IDV, and AWIPS II GIS support via TDS (WCS, WMS) and KML and Shapefiles Data: Over 30 data streams provided in real-time Data collection, cataloging, and distribution Both push and pull technologies are used User Support & Training: Direct support Community mailing lists Annual Training Workshops, Triennial Users Workshops, and Regional Workshops as needed Community: Equipment Awards to universities; Seminars; Information Commons; Advocacy; Data standards

5 Real-time Data Distribution
Model Satellite Radar About 30 different streams of real-time weather data from diverse sources are provided to the community, which collectively move over 30 Terabytes of data per week.

6 Remote Data Access OPeNDAP ADDE HTTP FTP WCS and WMS
Complements the IDD/LDM push data delivery system Made available via THREDDS Data Server, RAMADDA, and ADDE data servers that support several protocols: OPeNDAP ADDE HTTP FTP WCS and WMS

7 NetCDF: Experiences and Lessons Learned
Slides courtesy of Russ Rew, Unidata

8 NetCDF: not just a format
A standard format for platform-independent data (NASA ESDS-RFC-011) CF-netCDF has been adopted as a formal OGC binary encoding standard But netCDF is also A data model for multidimensional and structured scientific data A set of application programming interfaces (C, Java, Fortran, C++, …) for data access A reference implementation for the APIs David Arctur of OGC suggested CF-netCDF as an OGC standard in May 2009 (at an IOOS meeting). Ben Domenico presented this at the June 2009 OGC meeting, with general agreement that it Would be approprate. Use of netCDF reference implementation software in many analysis and visualization packages is At least as important as the standard format for interoperability and preservation. The software interfaces insulate programs from format changes netCDF is also a foundation for other standards, e.g. the CF Conventions

9 NetCDF History April 11, 2011 NetCDF for developers 4 8

10 Classic netCDF data model
File location: Filename create( ), open( ), … Variables and attributes have one of six primitive data types. DataType PrimitiveType char byte short int float double Dimension name: String length: int isUnlimited( ) Attribute name: String type: DataType values: 1D array Variable name: String shape: Dimension[ ] type: DataType array: read( ), … A file has variables, dimensions, and attributes. Variables also have attributes. Variables may share dimensions, indicating a common grid. One dimension may be of unlimited length.

11 Enhanced netCDF data model, for netCDF-4
A file has a top-level unnamed group. Each group may contain one or more named subgroups, user-defined types, variables, dimensions, and attributes. Variables also have attributes. Variables may share dimensions, indicating a common grid. One or more dimensions may be of unlimited length. Dimension name: String length: int isUnlimited( ) Attribute type: DataType values: 1D array Variable shape: Dimension[ ] type: DataType array: read( ), … Group File location: Filename create( ), open( ), … Variables and attributes have one of twelve primitive data types or one of four user-defined types. DataType PrimitiveType char byte short int int64 float double unsigned byte unsigned short unsigned int unsigned int64 string UserDefinedType typename: String Compound VariableLength Enum Opaque Compatible evolution of data model, by extensions, permits compatible evolution of format and APIs.

12 Infrastructure for sharing scientific data
Applications depend on lower layers Sharing requires agreements formats protocols conventions Data need metadata Remember this?

13 NetCDF infrastructure
Provides format and library for netCDF data model Endorsed by several standards bodies Active conventions communities Several servers and protocols for remote data access Many open source and commercial utilities and applications netCDF plays a part in all levels of the infrastructure April 14, 2011 April 12, 2011 Intro to netCDF 6 12

14 How do formats change? Simple formats don’t change, they’re defined once and frozen forever ASCII GRIB 1 GRIB 2 Some formats change infrequently and usually incompatibly Complex formats (and their software) may evolve in lots of small increments 4.0.1 3.6.3 netCDF 2.4.3 1.0

15 Compatibility commitment
For scientific data, preserving access to data for future generations should be sacrosanct Strong commitment is needed to ensure practical access to old data by new programs Careful library evolution can ensure data and API compatibility

16 Declaration of Compatibility
For future access to archives, netCDF development will continue to ensure the compatibility of: Data access: netCDF software will provide both read and write access to all earlier forms of netCDF data. Programming interfaces: C and Fortran programs using documented netCDF APIs from previous versions will continue to work after recompiling and relinking (if needed). Future versions: netCDF will continue to support both data access compatibility and API compatibility in future releases. Declaration of Compatibility Along with this commitment was an admission that keeping netCDF releases backward compatible might Seem difficult, but it would really be easy compared to the amount of effort required to use Microsoft PowerPoint drawing tools to create this slide’s border.

17 Aspects of compatibility
Costs Effort to support older interfaces and formats Comprehensive compatibility testing with every software release Benefits Data in archives don’t have to change Client program sources don’t have to change Software can access archived data without being aware of format version Implemented compatibly by evolving data model Add or grow abstractions, instead of replacing them Ensure previous data model is included in enhanced data model software and formats can evolve without affecting access to archived data

18 NetCDF Metrics NetCDF-C downloads (last 12 months): 101,703 (116 “countries” = Top-level Internet Domains) NetCDF-java downloads (last 12 months): 14,716 Defects per 1000 lines of code (Coverity estimate): 0.36 Google hits in April 2014 for "netcdf-3": 828,000 Google hits in April 2014 for "netcdf-4": 759,000 Google scholar entries in April 2014 for "netcdf": 11,000 Free software packages that can access netCDF data: 83 Commercial packages that can access netCDF data: 23 Number of license plates with NETCDF: 1

19 Impact on Climate Science

20 NetCDF: Lessons Learned
Giving developers freedom to explore a solution space can lead to great software. Reimplementing someone else's weak implementation of a good idea is a good strategy for developing useful software. Porting software to a large variety of platforms leads to higher quality software, because some platforms reveal bugs that others don’t. Employing a test driven development and agile development process Comprehensive testing and tools help, but a large user community will find innovative ways to use the software to discover bugs that would be difficult to anticipate. A serious bug can go undetected through many releases if it is rare and hard to detect (the "dreaded nofill bug"). David Arctur of OGC suggested CF-netCDF as an OGC standard in May 2009 (at an IOOS meeting). Ben Domenico presented this at the June 2009 OGC meeting, with general agreement that it Would be approprate. Use of netCDF reference implementation software in many analysis and visualization packages is At least as important as the standard format for interoperability and preservation. The software interfaces insulate programs from format changes netCDF is also a foundation for other standards, e.g. the CF Conventions

21 Summary NetCDF has become a de facto standard in the atmospheric sciences and oceanography, providing a key component of cyberinfrastructure for the geosciences. NetCDF has made it possible for data providers, scientists, software developers, and data archives to make use of standard interfaces and formats to share and reuse data and benefit from the resulting interoperability. It has a large, active, and diverse collection of users, willing to contribute to its continued use. Over 100 commercial and free applications can access netCDF data. Many projects and archives, including CMIP-5 and World Data Centers, are storing large volumes of data in netCDF form, for future use . Think of the children!

22 Concluding Remarks Format obsolescence need not be an issue for data durability and preservation Evolve data models by extension, not by incompatible modification Preserve previous programming interfaces Support previous format variants transparently Avoid gratuitous invention of new formats Data preservation and stewardship requires much more than dealing with format standards and their evolution Think of the children!

23 Questions? Contact: mohan@ucar.edu http://www.unidata.ucar.edu/
Unidata is funded primarily by the National Science Foundation (Grant NSF )


Download ppt "Unidata & NetCDF BoF Scientific File Formats"

Similar presentations


Ads by Google