Unidata’s Common Data Model and NetCDF Java Library API Overview John Caron Unidata/UCAR Nov 2008.

Slides:



Advertisements
Similar presentations
Complex Scientific Analytics in Earth Science at Extreme Scale John Caron University Corporation for Atmospheric Research Boulder, CO Oct 6, 2010.
Advertisements

Reading HDF family of formats via NetCDF-Java / CDM
Recent Work in Progress
The Model Output Interoperability Experiment in the Gulf of Maine: A Success Story Made Possible By CF, NcML, NetCDF-Java and THREDDS Rich Signell (USGS,
The NCAR Command Language (NCL) and the NetCDF Data Format Research Tools Presentation Matthew Janiga 10/30/2012.
A Common Data Model In the Middle Tier Enabling Data Access in Workflows … HDF/HDF-EOS Workshop XIV September 29, 2010 Doug Lindholm Laboratory for Atmospheric.
ESCI/CMIP5 Tools - Jeudi 2 octobre CMIP5 Tools Earth System Grid-NetCDF4- CMOR2.0-Gridspec-Hyrax …
Streaming NetCDF John Caron July What does NetCDF do for you? Data Storage: machine-, OS-, compiler-independent Standard API (Application Programming.
THREDDS, CDM, OPeNDAP, netCDF and Related Conventions John Caron Unidata/UCAR Sep 2007.
7 +/- 2 Maybe Good Ideas John Caron June (1) NetCDF-Java (aka CDM) has lots of functionality, but only available in Java – NcML Aggregation – Access.
The Future of NetCDF Russ Rew UCAR Unidata Program Center Acknowledgments: John Caron, Ed Hartnett, NASA’s Earth Science Technology Office, National Science.
NetCDF An Effective Way to Store and Retrieve Scientific Datasets Jianwei Li 02/11/2002.
Unidata TDS Workshop THREDDS Data Server Overview October 2014.
Status of netCDF-3, netCDF-4, and CF Conventions Russ Rew Community Standards for Unstructured Grids Workshop, Boulder
John Caron Unidata October 2012
1 Writing NetCDF Files: Formats, Models, Conventions, and Best Practices Russ Rew, UCAR Unidata June 28, 2007.
OPeNDAP and the Data Access Protocol (DAP) Original version by Dave Fulker.
Implementation of Model Data Interoperability for IOOS: Successes and Lessons Learned Rich Signell USGS Woods Hole, MA / NOAA Silver Spring USA Model Data.
1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006.
Unidata’s TDS Workshop TDS Overview – Part II October 2012.
HDF5 A new file format & software for high performance scientific data management.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Unidata TDS Workshop TDS Overview – Part I XX-XX October 2014.
Unidata’s Common Data Model John Caron Unidata/UCAR Nov 2006.
THREDDS Data Server Ethan Davis GEOSS Climate Workshop 23 September 2011.
Coverages and the DAP2 Data Model James Gallagher.
NetCDF-Java Overview John Caron Oct 29, Contents Data Models / Shared Dimensions Coordinate Systems Feature Types NetCDF Markup Language (NcML)
NcML Aggregation vs Feature Collections. NcML functionality 1.Modify the objects found in CDM files – Especially Attributes – Don’t have to rewrite the.
Unidata’s TDS Workshop TDS Overview – Part II Unidata July 2011.
Mid-Course Review: NetCDF in the Current Proposal Period Russ Rew
Accomplishments and Remaining Challenges: THREDDS Data Server and Common Data Model Ethan Davis Unidata Policy Committee Meeting May 2011.
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
IOOS Model Data Interoperability Design ROMS POM WW3 WRF ECOM NcML Common Data Model OPeNDAP+CF WCS NetCDF Subset THREDDS Data Server Standardized (CF)
THREDDS Data Server Unidata’s Common Data Model Background / Summary John Caron Unidata/UCAR Mar 2007.
Integrating netCDF and OPeNDAP (The DrNO Project) Dr. Dennis Heimbigner Unidata Go-ESSP Workshop Seattle, WA, Sept
IOOS Modeling Testbed Cyberinfrastructure Rich Signell, USGS, Woods Hole, MA IOOS-RA-Briefing, Feb 14, 2012.
Unidata TDS Workshop THREDDS Data Server Overview
Recent developments with the THREDDS Data Server (TDS) and related Tools: covering TDS, NCML, WCS, forecast aggregation and not including stuff covered.
NetCDF Data Model Issues Russ Rew, UCAR Unidata NetCDF 2010 Workshop
Unidata’s Common Data Model and the THREDDS Data Server John Caron Unidata/UCAR, Boulder CO Jan 6, 2006 ESIP Winter 2006.
IOOS Data Services with the THREDDS Data Server Rich Signell USGS, Woods Hole IOOS DMAC Workshop Silver Spring Sep 10, 2013 Rich Signell USGS, Woods Hole.
THREDDS Catalogs Ethan Davis UCAR/Unidata NASA ESDSWG Standards Process Group meeting, 17 July 2007.
Unidata’s TDS Workshop TDS Overview – Part I July 2011.
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
Unidata's Involvement in Developing and Supporting Climate Science Infrastructure Russ Rew UCAR Unidata April 2010.
NetCDF-4: Software Implementing an Enhanced Data Model for the Geosciences Russ Rew, Ed Hartnett, and John Caron UCAR Unidata Program, Boulder
NetCDF and Scientific Data Durability Russ Rew, UCAR Unidata ESIP Federation Summer Meeting
Data File Formats: netCDF by Tom Whittaker University of Wisconsin-Madison SSEC/CIMSS 2009 MUG Meeting June, 2009.
Advances in the NetCDF Data Model, Format, and Software Russ Rew Coauthors: John Caron, Ed Hartnett, Dennis Heimbigner UCAR Unidata December 2010.
GIS for Atmospheric Sciences and Hydrology By David R. Maidment University of Texas at Austin National Center for Atmospheric Research, 6 July 2005.
Data Interoperability at the IRI: translating between data cultures Benno Blumenthal International Research Institute for Climate Prediction Columbia University.
Weathertop Consulting, LLC Server-side OPeNDAP Analysis – Concrete steps toward a generalized framework via a reference implementation using F-TDS Roland.
Grids and Beyond: netCDF-CF and ISO/OGC Features and Coverages Ethan Davis, John Caron, Ben Domenico UCAR/Unidata AMS IIPS, 23 January 2008.
Common Data Model Scientific Feature Types John Caron UCAR/Unidata July 8, 2008.
Unidata Technologies Relevant to GO-ESSP: An Update Russ Rew
CF 2.0 Coming Soon? (Climate and Forecast Conventions for netCDF) Ethan Davis ESO Developing Standards - ESIP Summer Mtg 14 July 2015.
Developing Conventions for netCDF-4 Russ Rew, UCAR Unidata June 11, 2007 GO-ESSP.
Rich Signell Roland Viger Curtis Price USGS Community for Data Integration Feb 15, 2012.
NetCDF: Data Model, Programming Interfaces, Conventions and Format Adapted from Presentations by Russ Rew Unidata Program Center University Corporation.
Interoperability Day Introduction Standards-based Web Services Interfaces to Existing Atmospheric/Oceanographic Data Systems Ben Domenico Unidata Program.
Update on Unidata Technologies for Data Access Russ Rew
THREDDS Data Server (TDS) and Data Discovery John Caron Unidata/UCAR May 15, 2006.
Unidata Infrastructure for Data Services Russ Rew GO-ESSP Workshop, LLNL
NetCDF Data Model Details Russ Rew, UCAR Unidata NetCDF 2009 Workshop
Other Projects Relevant (and Not So Relevant) to the SODA Ideal: NetCDF, HDF, OLE/COM/DCOM, OpenDoc, Zope Sheila Denn INLS April 16, 2001.
NetCDF-Java version 2.2 Common Data Model John Caron Unidata/UCAR Dec 10, 2004.
Moving from HDF4 to HDF5/netCDF-4
Unidata & NetCDF BoF Scientific File Formats
What is NetCDF ? And what are its plans for world domination?
Recent Work in Progress
Presentation transcript:

Unidata’s Common Data Model and NetCDF Java Library API Overview John Caron Unidata/UCAR Nov 2008

Java = Programmer Productivity Portability Object Oriented Libraries everywhere Thriving open source development Strong typing (aka type safety) –needed for large development projects Good tools: IDEs, debuggers, profilers Very productive Java is faster than C for some applications – eg multithreaded server

Tomcat: The Definitive Guide, Jason Brittain (O’Reilley 2007)

Java Virtual Machine / Operating Systems JVM options –Linux, Solaris, Windows (Sun) –Mac OS X (Apple) –AIX, Linux, Windows, z/OS (IBM) –HP-UX (Hewlett-Packard)

Java Negatives Linking Java with C/Fortran apps is difficult Arguably not suitable for large scale numerical computation –Type safety, array safety, strict reproducibility –Multicore CPU challenge could shift Specialized languages can be more productive

NetCDF-Java library 100% Java Open Source (LGPL, MIT) Independent implementation Used as a component in other software (partial) –Integrated Data Viewer, THREDDS Data Server (Unidata) –Panoply (NASA) –ncBrowse (EPIC/NOAA) –Java NEXRAD Viewer (NCDC/NOAA) –MyWorld GIS (Northwestern) –EDC for ArcGIS, ERRDAP (SFSC/NOAA) –Live Access Server (PMEL/NOAA) –ncWMS (Reading) –Matlab plug-in (USGS)

NetcdfDataset Application Scientific Feature Types NetCDF-Java/ CDM architecture OPeNDAP THREDDS Catalog.xml NetCDF-3 HDF5 I/O service provider GRIB GINI NIDS NetcdfFile NetCDF-4 … Nexrad DMSP CoordSystem Builder Datatype Adapter NcML

NetCDF Java Release Plans Current Stable Release NetCDF-Java 2.2 –Maintenance, bug fixes only Development version 4.0 –Extensive refactor, enhance performance –Extended data types for NetCDF4 –Sequences : variable length Structures –Scientific Feature Types refactor –Nested Tables abstract model for point features (point, station, trajectory, profile) –By the end of the year

Format Readers (CDM files) General: NetCDF, OPeNDAP, HDF5, NetCDF4, HDF4, HDF-EOS Gridded: GRIB-1, GRIB-2, GEMPAK Radar: NEXRAD 2&3, DORADE, CINRAD, Universal Format, TDWR Point: BUFR, ASCII Satellite: DMSP, GINI, McIDAS AREA Misc: GTOPO, Lightning, etc Others in development (partial): –AVHRR, GPCP, GACP, SRB, SSMI, HIRS (NCDC)

HTTP Tomcat Server THREDDS Data Server Datasets catalog.xml motherlode.ucar.edu THREDDS Server NetCDF-Java library Remote Access IDD Data HTTPServer NetcdfSubset WCS OPeNDAP configCatalog.xml

NcML Datasets Application dataset NcML dataset Application dataset NcML dataset THREDDS dataset

<netcdf xmlns=" location=“/data/nids/N0R_ _2147"> NcML example

TDS / NcML example

TDS / NcML aggregation

Common Data Model

What’s a Data Model? An Abstract Data Model describes data objects and what methods you can use on them. An API is the interface to the Data Model for a specific programming language A file format is a way to persist the objects in the Data Model. A data access protocol like OPeNDAP plays the role of a file format (sort of). An Abstract Data Model removes the details of any particular API and the persistence format.

Scientific Feature Types Grid Point Radial Trajectory Swath StationProfile Coordinate Systems Common Data Model Data Access netCDF-3, HDF5, OPeNDAP BUFR, GRIB1, GRIB2, NEXRAD, NIDS, McIDAS, GEMPAK, GINI, DMSP, HDF4, HDF-EOS, DORADE, GTOPO, ASCII

NetcdfDataset Application Scientific Feature Types NetCDF-Java/ CDM architecture OPeNDAP THREDDS Catalog.xml NetCDF-3 HDF5 I/O service provider GRIB GINI NIDS NetcdfFile NetCDF-4 … Nexrad DMSP CoordSystem Builder Datatype Adapter NcML

Common Data Model (Data Access Layer)

NetCDF-4 Data Model Dimension name: String length: int isUnlimited( ) Attribute name: String type: DataType values: 1D array Variable name: String shape: Dimension[ ] type: DataType array: read( ), … Group name: String File location: Filename create( ), open( ), … UserDefinedType typename: String PrimitiveType char byte short intint64 float double unsigned byte unsigned short unsigned int unsigned int64 string DataType Compound VariableLength Enum Opaque User-defined types, including compound types, may be stored with other data. A file has a top-level unnamed group. Each group may contain one or more named subgroups, variables, dimensions, and attributes. A variable may also have attributes. Variables may share dimensions, indicating a common grid. One or more dimensions may be of unlimited length.

Coordinate Systems

Coordinate Systems as Functions Data variable V, with n dimensions Vdim = {dim k, k=0,n-1} is a function from domain Vdim to R V:Vdim → R A coordinate variable for V is also a function CV:Vdim → R A coordinate system for V, CSV, is a set of m coordinate variables for V CSV = {CV j, j=0,m-1} CSV:Vdim → R m The coordinates of the (i,j,k) data point are the m values {CV 1 (i,j,k), CV 2 (i,j,k), CV 3 (i,j,k),…} A coordinate system must be invertible.

Coordinate Systems NetCDF, OPeNDAP, HDF data models do not have integrated coordinate systems – so georeferencing not part of API –Need conventions to specify (eg CF-1, COARDS, etc) Contrast GRIB, HDF-EOS, other specialized formats

Coordinate Variables dimensions: lat = 64; lon = 128; variables: float lat(lat); float lon(lon); float time; double temperature(lat,lon); coordinates=“lat lon time”;

Limitations of 1D Coordinate Variables Non lat/lon horizontal grids: float temperature(y,x) float lat(y, x); float lon(y, x); Trajectory data: float NKoreaRadioactivity (pt); float lat(pt); float lon(pt); float altitude(pt); float time(pt)

Coordinate Systems UML

Projections (CF) albers_conical_equal_area lambert_azimuthal_equal_area lambert_conformal_conic mcidas_area mercator orthographic rotated_pole stereographic (including polar) transverse_mercator UTM (ellipsoidal) vertical_perspective

Vertical Transforms (CF) atmosphere_sigma atmosphere_hybrid_sigma_pressure atmosphere_hybrid_height ocean_s ocean_sigma existing3DField

Add your own Transform Pluggable framework –Add at runtime –CoordTransBuilder.registerTransform() Implement CoordTransBuilderIF

Scientific Feature Types

Based on datasets Unidata is familiar with –APIs are evolving Intended to scale to large, multifile collections Intended to support “specialized queries” –Space, Time These form the basis for NetCDF-Java implementations Two categories : Grids and Points

Gridded Data Grid: multidimensional grid, separable coordinates Radial: a connected set of radials using polar coordinates collected into sweeps Swath: a two dimensional grid, track and cross-track coordinates Unstructured Grids: finite element models, coastal modeling

Gridded Data float gridData(t,z,y,x); float t(t); float y(y); float x(x); float z(z); float lat(y,x); float lon(y,x); float height(t,z,y,x); Cartesian coordinates Data is 2,3,4D All dimensions have 1D coordinate variables (separable)

Radial Data float radialData(radial, gate) : float distance(gate) float azimuth(radial) float elevation(radial) float time(radial) float origin_lat; float origin_lon; float origin_alt; Polar coordinates 2D: radials collected into sweeps Not separate time dimension

Swath float swathData( track, xtrack) float lat( track, xtrack ) float lon( track, xtrack ) float alt( track, xtrack ) float time(track) two dimensional track and cross-track not separate time dimension orbit tracking allows fast search

Unstructured Grid Pt dimension not connected Need to specify the connectivity explicitly No implementation in the CDM yet float unstructGrid(t,z,pt); float lat(pt); float lon(pt); float time(t); float height(z);

float data(sample); Point: measured at one point in time and space Station: time-series of points at the same location Profile: points along a vertical line Station Profile: a time-series of profiles at same location. Trajectory: points along a 1D curve in time/space Section: a collection of profile features which originate along a trajectory. 1D Feature Types (“point data”)

Point Observation Data Table { lat, lon, z, time; obs1, obs2,... } obs(sample); Set of measurements at the same point in space and time = obs Collection of obs = dataset Sample dimension not connected float obs1(sample); float obs2(sample); float lat(sample); float lon(sample); float z(sample); float time(sample);

Time-series Station Data Table { stationId; lat, lon, z; Table { time; obs1, obs2,... } obs(*); // connected } stn(stn); // not connected float obs1(sample); float obs2(sample); int stn_id(sample); float time(sample); int stationId(stn); float lat(stn); float lon(stn); float z(stn); float obs1(sample); float obs2(sample); float lat(sample); float lon(sample); float z(sample); float time(sample); float obs1(stn, time); float obs2(stn, time); float time(stn, time); int stationId(stn); float lat(stn); float lon(stn); float z(stn);

Profile Data Table { profileId; lat, lon, time; Table { z; obs1, obs2,... } obs(*); // connected } profile(profile); // not connected float obs1(sample); float obs2(sample); int profile_id(sample); float z(sample); int profileId(profile); float lat(profile); float lon(profile); float time(profile); float obs1(profile, level); float obs2(profile, level); float z(profile, level); float time(profile); float lat(profile); float lon(profile); float obs1(sample); float obs2(sample); float lat(sample); float lon(sample); float z(sample); float time(sample);

Time-series Profile Station Data Table { stationId; lat, lon; Table { time; Table { z; obs1, obs2,... } obs(*); // connected } profile(*); // connected } stn(stn); // not connected float obs1(profile, level); float obs2(profile, level); float z(profile, level); float time(profile); float lat(profile); float lon(profile); float obs1(stn, time, level); float obs2(stn, time, level); float z(stn, time, level); float time(stn, time); float lat(stn); float lon(stn);

Table { trajectory_id; Table { lat, lon, z, time; obs1, obs2,... } obs(*); // connected } traj(traj) // not connected Trajectory Data float obs1(sample); float obs2(sample); float lat(sample); float lon(sample); float z(sample); float time(sample); int trajectory_id(sample); float obs1(traj,obs); float obs2(traj,obs); float lat(traj,obs); float lon(traj,obs); float z(traj,obs); float time(traj,obs); int trajectory_id(traj);

Section Data float obs1(traj,profile,level); float obs2(traj,profile,level); float z(traj,profile,level); float lat(traj,profile); float lon(traj,profile); float time(traj, profile); Table { section_id; Table { surface_obs // data anywhere lat, lon, time Table { depth; obs1, obs2,... } obs(*); // connected } profile(*); // connected } section(*) // not connected

Nested Table Notation (1) 1.A feature instance is a row in a table. 2.A table is a collection of features of the same type. The table may be fixed or variable length. 3.A nested (child) table is owned by a row in the parent table. 4.Both coordinates and data variables can be at any level of the nesting. 5.A feature type is represented as nested tables of specific form. 6.A feature collection is an unconnected collection of a specific feature type. Table { data1, data2 lat, lon, time; Table { z; obs1, obs2,... } obs(17); } profile(*);

Nested Table Notation (2) A constant coordinate can be factored out to the top level. This is logically joined to any nested table with the same dimension. dim level = 17; float z(level); Table { data1, data2 lat, lon, time; Table { obs1, obs2,... } obs(level); } profile(*);

Nested Table Notation (3) A coordinate in an inner table is connected; a coordinate in the outermost table is unconnected. Table { stationId; lat, lon; Table { time; Table { z; obs1, obs2,... } obs(*); // connected } profile(*); // connected } stn(stn); // not connected Table { lat, lon, z, time; obs1, obs2,... } point(sample); Table { trajectory_id; Table { lat, lon, z, time; obs1, obs2,... } obs(*); // connected } traj(traj) // not connected

Relational model Nested Tables are a hierarchical data model (tree structure) Simple transformation to relational model – explicitly add join variables to tables Table { stationId; lat, lon, z; Table { time; obs1, obs2,... } obs(42); } stn(stn); RTable { stationId // primary key lat, lon, z; } stn RTable { stationId // secondary key time; obs1, obs2,... } obs;

Nested Model Summary Compact notation to describe 1D point feature types –Connectivity of points is key property –Variable/fixed length table dimensions can be notated easily –Constant/varying coordinates can be easily seen Can be translated to relational model to get different performance tradeoffs More details

Feature Type implementations Netcdf-Java library GridGridDatatype RadialRadialSweepFeature Swath- Unstructured Grids- PointPointFeature StationStationTimeSeriesFeature ProfileProfileFeature TrajectoryTrajectoryFeature StationProfileStationProfileFeature SectionSectionFeature

Encoding Feature Types in NetCDF Using CF Conventions CF-1.0 focused on Grids Other types are being studied / proposed Unidata proposal for point obs NCAR/EOL working on Radial data (netCDF4) NPOESS/GOES-R using netCDF4 for satellite (swath) –Unidata has proposal to NOAA/NASA Working group for unstructured grids Happening now!