Unidata’s Common Data Model John Caron Unidata/UCAR Nov 2006.

Slides:



Advertisements
Similar presentations
Complex Scientific Analytics in Earth Science at Extreme Scale John Caron University Corporation for Atmospheric Research Boulder, CO Oct 6, 2010.
Advertisements

Reading HDF family of formats via NetCDF-Java / CDM
Recent Work in Progress
The Model Output Interoperability Experiment in the Gulf of Maine: A Success Story Made Possible By CF, NcML, NetCDF-Java and THREDDS Rich Signell (USGS,
A Unified Data Model and Programming Interface for Working with Scientific Data Doug Lindholm Laboratory for Atmospheric and Space Physics University of.
A Common Data Model In the Middle Tier Enabling Data Access in Workflows … HDF/HDF-EOS Workshop XIV September 29, 2010 Doug Lindholm Laboratory for Atmospheric.
Streaming NetCDF John Caron July What does NetCDF do for you? Data Storage: machine-, OS-, compiler-independent Standard API (Application Programming.
® OGC Web Services Initiative, Phase 9 (OWS-9): Innovations Thread - OPeNDAP James Gallagher and Nathan Potter, OPeNDAP © 2012 Open Geospatial Consortium.
THREDDS, CDM, OPeNDAP, netCDF and Related Conventions John Caron Unidata/UCAR Sep 2007.
The Future of NetCDF Russ Rew UCAR Unidata Program Center Acknowledgments: John Caron, Ed Hartnett, NASA’s Earth Science Technology Office, National Science.
NextGen Network-Enabled Weather (NNEW) Concepts Aaron Braeckel.
Активное распределенное хранилище для многомерных массивов Дмитрий Медведев ИКИ РАН.
Unidata TDS Workshop THREDDS Data Server Overview October 2014.
THREDDS Data Server, OGC WCS, CRS, and CF Ethan Davis UCAR Unidata 2008 GO-ESSP, Seattle.
1 The NOAA Weather and Climate Toolkit Steve Ansari, Stephen Del Greco (NOAA / NCDC) Mark Phillips (UNC-Asheville / NEMAC) Bill Hankins (STG Inc.)
THREDDS Data Server, OGC WCS, CRS, and CF Ethan Davis UCAR Unidata 2008 GO-ESSP, Seattle.
Status of netCDF-3, netCDF-4, and CF Conventions Russ Rew Community Standards for Unstructured Grids Workshop, Boulder
John Caron Unidata October 2012
OPeNDAP and the Data Access Protocol (DAP) Original version by Dave Fulker.
Implementation of Model Data Interoperability for IOOS: Successes and Lessons Learned Rich Signell USGS Woods Hole, MA / NOAA Silver Spring USA Model Data.
NetCDF for Developers and Data Providers Russ Rew, UCAR Unidata ICTP Advanced School on High Performance and Grid Computing 14 April 2011.
Unidata’s TDS Workshop TDS Overview – Part II October 2012.
Unidata TDS Workshop TDS Overview – Part I XX-XX October 2014.
THREDDS Data Server Ethan Davis GEOSS Climate Workshop 23 September 2011.
Coverages and the DAP2 Data Model James Gallagher.
Weathertop Consulting, LLC Wednesday, January 14, 2009 IIPS 11A.2 1 A General Purpose System for Server-side Analysis of Earth Science Data Roland Schweitzer.
NetCDF-Java Overview John Caron Oct 29, Contents Data Models / Shared Dimensions Coordinate Systems Feature Types NetCDF Markup Language (NcML)
NcML Aggregation vs Feature Collections. NcML functionality 1.Modify the objects found in CDM files – Especially Attributes – Don’t have to rewrite the.
Unidata’s TDS Workshop TDS Overview – Part II Unidata July 2011.
Mid-Course Review: NetCDF in the Current Proposal Period Russ Rew
Accomplishments and Remaining Challenges: THREDDS Data Server and Common Data Model Ethan Davis Unidata Policy Committee Meeting May 2011.
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
IOOS Model Data Interoperability Design ROMS POM WW3 WRF ECOM NcML Common Data Model OPeNDAP+CF WCS NetCDF Subset THREDDS Data Server Standardized (CF)
THREDDS Data Server Unidata’s Common Data Model Background / Summary John Caron Unidata/UCAR Mar 2007.
1 Dapper and Argo Joe Sirott PMEL/NOAA. 2 What is Dapper? Web server that provides distributed access to in-situ data via OPeNDAP protocol Clients include.
Integrating netCDF and OPeNDAP (The DrNO Project) Dr. Dennis Heimbigner Unidata Go-ESSP Workshop Seattle, WA, Sept
Unidata TDS Workshop THREDDS Data Server Overview
Recent developments with the THREDDS Data Server (TDS) and related Tools: covering TDS, NCML, WCS, forecast aggregation and not including stuff covered.
Unidata’s Common Data Model and the THREDDS Data Server John Caron Unidata/UCAR, Boulder CO Jan 6, 2006 ESIP Winter 2006.
IOOS Data Services with the THREDDS Data Server Rich Signell USGS, Woods Hole IOOS DMAC Workshop Silver Spring Sep 10, 2013 Rich Signell USGS, Woods Hole.
THREDDS Catalogs Ethan Davis UCAR/Unidata NASA ESDSWG Standards Process Group meeting, 17 July 2007.
Unidata’s TDS Workshop TDS Overview – Part I July 2011.
The HDF Group Data Interoperability The HDF Group Staff Sep , 2010HDF/HDF-EOS Workshop XIV1.
Unidata’s Common Data Model and NetCDF Java Library API Overview John Caron Unidata/UCAR Nov 2008.
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
Unidata's Involvement in Developing and Supporting Climate Science Infrastructure Russ Rew UCAR Unidata April 2010.
NetCDF-4: Software Implementing an Enhanced Data Model for the Geosciences Russ Rew, Ed Hartnett, and John Caron UCAR Unidata Program, Boulder
NetCDF and Scientific Data Durability Russ Rew, UCAR Unidata ESIP Federation Summer Meeting
Data File Formats: netCDF by Tom Whittaker University of Wisconsin-Madison SSEC/CIMSS 2009 MUG Meeting June, 2009.
Advances in the NetCDF Data Model, Format, and Software Russ Rew Coauthors: John Caron, Ed Hartnett, Dennis Heimbigner UCAR Unidata December 2010.
GIS for Atmospheric Sciences and Hydrology By David R. Maidment University of Texas at Austin National Center for Atmospheric Research, 6 July 2005.
Weathertop Consulting, LLC Server-side OPeNDAP Analysis – Concrete steps toward a generalized framework via a reference implementation using F-TDS Roland.
Grids and Beyond: netCDF-CF and ISO/OGC Features and Coverages Ethan Davis, John Caron, Ben Domenico UCAR/Unidata AMS IIPS, 23 January 2008.
Data Stewardship at the NOAA Data Centers Sub Topic - Value Added Products ESIP Federation Meeting, Washington, DC January 6-8, 2009.
UC 2006 Tech Session 1 NetCDF in ArcGIS 9.2. UC 2006 Tech Session2 Overview Introduction to Multidimensional DataIntroduction to Multidimensional Data.
Common Data Model Scientific Feature Types John Caron UCAR/Unidata July 8, 2008.
Unidata Technologies Relevant to GO-ESSP: An Update Russ Rew
CF 2.0 Coming Soon? (Climate and Forecast Conventions for netCDF) Ethan Davis ESO Developing Standards - ESIP Summer Mtg 14 July 2015.
Rich Signell Roland Viger Curtis Price USGS Community for Data Integration Feb 15, 2012.
NetCDF: Data Model, Programming Interfaces, Conventions and Format Adapted from Presentations by Russ Rew Unidata Program Center University Corporation.
Interoperability Day Introduction Standards-based Web Services Interfaces to Existing Atmospheric/Oceanographic Data Systems Ben Domenico Unidata Program.
Update on Unidata Technologies for Data Access Russ Rew
THREDDS Data Server (TDS) and Data Discovery John Caron Unidata/UCAR May 15, 2006.
Unidata Infrastructure for Data Services Russ Rew GO-ESSP Workshop, LLNL
NetCDF Data Model Details Russ Rew, UCAR Unidata NetCDF 2009 Workshop
NetCDF-Java version 2.2 Common Data Model John Caron Unidata/UCAR Dec 10, 2004.
IRI/LDEO Climate Data Library M.Benno Blumenthal, Michael Bell, and John del Corral International Research Institute for Climate and Society Columbia University.
What is NetCDF ? And what are its plans for world domination?
Recent Work in Progress
Remote Data Access Update
Presentation transcript:

Unidata’s Common Data Model John Caron Unidata/UCAR Nov 2006

Goals / Overview Look at the landscape of scientific datasets from a few thousand feet up. What semantics are needed to make these useful? –georeferencing –specialized subsetting

What’s a Data Model? An Abstract Data Model describes data objects and what methods you can use on them. An API is the interface to the Data Model for a specific programming language A file format is a way to persist the objects in the Data Model. An Abstract Data Model removes the details of any particular API and the persistence format.

Coordinate Systems Common Data Model Layers Data Access Scientific Datatypes Grid Point Radial Trajectory Swath StationProfile

NetcdfDataset Application Scientific Datatypes NetCDF-Java version 2.2 architecture OPeNDAP THREDDS Catalog.xml NetCDF-3 HDF5 I/O service provider GRIB GINI NIDS NetcdfFile NetCDF-4 … Nexrad DMSP CoordSystem Builder Datatype Adapter ADDE NcML

NetCDF-4 and Common Data Model (Data Access Layer)

I/O Service Provider Implementations General: NetCDF, HDF5, OPeNDAP Gridded: GRIB-1, GRIB-2 Radar: NEXRAD level 2 and 3, DORADE Point: BUFR, ASCII Satellite: DMSP, GINI In development –NOAA: GOES (Knapp/Nelson), many others

Coordinate Systems needed NetCDF, OPeNDAP, HDF data models do not have integrated coordinate systems – so georeferencing not part of API –Need conventions to specify (eg CF-1, COARDS, etc) Contrast GRIB, HDF-EOS, other specialized formats

NetCDF Coordinate Variables dimensions: lat = 64; lon = 128; variables: float lat(lat); float lon(lon); double temperature(lat,lon);

Coordinate Variables –One-dimension variable with same name as its dimension –Strictly monotonic values –No missing values The coordinates of a point (i,j,k) is {CV1(i), CV2(j), CV3(k)}

Limitations of 1D Coordinate Variables Non lat/lon horizontal grids: float temperature(y,x) float lat(y, x); float lon(y, x); Trajectory data: float NKoreaRadioactivity (pt); float lat(pt); float lon(pt); float altitude(pt); float time(pt)

General Coordinates in CF-1.0 float P(y,x); P:coordinates = “lat lon”; float lat(y, x); float lon(y, x); float Sr90(pt); Sr90:coordinates = “lat lon altitude time”;

Coordinate Systems (abstract) A Coordinate System for a data variable is a set of Coordinate Variables 2 such that the coordinates of the (i,j,k) data point is {CV1(i,j,k),CV2(i,j,k),CV3(i,j,k),CV4(i,j,k)…} previous was {CV1(i), CV2(j), CV3(k)} The dimensions of each Coordinate Variable must be a subset of the dimensions of the data variable.

Need Coordinate Axis Types float gridData(t,z,y,x); float time(t); float y(y); float x(x); float lat(y,x); float lon(y,x); float height(t,z,y,x); float radialData(radial, gate) float distance(gate) float azimuth(radial) float elevation(radial) float time(radial)

The same?? float stationObs(pt); float lat(pt); float lon(pt); float z(pt); float time(pt); float trajectory(pt); float lat(pt); float lon(pt); float z(pt); float time(pt);

Revised Coordinate Systems 1.Specify Coordinate Variables 2.Specify Coordinate Types (time, lat, lon, projection x, y, height, pressure, z, radial, azimuth, elevation) 3.Specify connectivity (implicit or explicit) between data points –Implicit: Neighbors in index space are (connected) neighbors in coordinate space. Allows efficient searching.

Gridded Data Connected means Neighbors in index space are neighbors in coordinate space float gridData(t,z,y,x); float time(t); // Time float y(y); // GeoX float x(x); // GeoY float z(t,z,y,x); // Height or Pressure Cartesian coordinates All dimensions are connected

Coordinate Systems UML

Scientific Data Types Based on datasets Unidata is familiar with –APIs are evolving How are data points connected? Intended to scale to large, multifile collections Intended to support “specialized queries” –Space, Time Corresponding “standard” NetCDF file conventions

Gridded Data float gridData(t,z,y,x); float time(t); float y(y); float x(x); float lat(y,x); float lon(y,x); float height(t,z,y,x); Cartesian coordinates All dimensions are connected x, y, z, time recently added runtime and ensemble refactored into GridDatatype interface

GridDatatype methods CoordinateAxis getTaxis(); CoordinateAxis getXaxis(); CoordinateAxis getYaxis(); CoordinateAxis getZaxis(); Projection getProjection(); int[] findXYindexFromCoord( double x_coord, double y_coord); LatLonRect getLatLonBoundingBox(); Array getDataSlice (Range[] …) GridDatatype makeSubset (Range[] …)

Radial Data radialData(radial, gate) : distance(gate) azimuth(radial) elevation(radial) time(radial) Polar coordinates All dimensions are connected Not separate time dimension

Swath swathData(line,cell) lat(line,cell) lon(line,cell) time(line) z(line,cell) ?? lat/lon coordinates not separate time dimension all dimensions are connected

Point Observation Data Structure { lat, lon, z, time; v1, v2,... } obs( pt); Set of measurements at the same point in space and time Point dimension not connected float obs1(pt); float obs2(pt); float lat(pt); float lon(pt); float z(pt); float time(pt);

PointObsDataset Methods // Iterator Iterator getData( LatLonRect boundingBox, Date start, Date end);

Time series Station Data Structure { name; lat, lon, z; Structure{ time; v1, v2,... } obs(*); // connected } stn(stn); // not connected

StationObs Methods // List List getStations( LatLonRect boundingBox); // Iterator Iterator getData( Station s, Date start, Date end);

Structure { name; Structure { lat, lon, z, time; v1, v2,... } obs(*); // connected } traj(traj) // not connected Trajectory Data Structure { lat, lon, z, time; v1, v2,... } obs(pt); // connected pt dimension is connected Collection dimension not connected

Profiler/Sounding Station Data Structure { name; lat, lon, time; Structure { z; v1, v2,... } obs(*); // connected } loc(nloc); // not connected Structure { name; lat, lon; Structure { time, Structure { z; v1, v2,... } obs(*); // connected } time(*); // connected } stn(stn); // not connected

Unstructured Grid float unstructGrid(t,z,pt); float lat(pt); float lon(pt); float time(t); float height(z); Pt dimension not connected Looks the same as point data Need to specify the connectivity explicitly

Data Types Summary Data access through a standard API Convenient georeferencing Specialized subsetting methods –Efficiency for large datasets

File Format #N File Format #2 File Format #1 CDM Visualization &Analysis Payoff N + M instead of N * M things on your TODO List! NetCDF file OpenDAP Server WCS Service Web Service

HTTP Tomcat Server THREDDS Data Server Datasets Catalog.xml hostname.edu THREDDS Server Application NetCDF-Java library IDD Data OPeNDAP HTTPServer WCS

Next: DataType Aggregation Work at the CDM DataType level, know (some) data semantics Forecast Model Collection –Combine multiple model forecasts into single dataset with two time dimensions –With NOAA/IOOS (Steve Hankin) Point/Station/Trajectory/Profile Data –Allow space/time queries, return nested sequences –Start from / standardize “Dapper conventions”

Forecast Model Collections

Conclusion Standardized Data Access in good shape –HDF5, NetCDF, OPeNDAP –Write an IOSP for proprietary formats (Java) But that’s not good enough! To do: –Standard representations of coordinate systems –Classifications of data types, standard services for them