Streaming NetCDF John Caron July 2011. What does NetCDF do for you? Data Storage: machine-, OS-, compiler-independent Standard API (Application Programming.

Slides:



Advertisements
Similar presentations
James Gallagher OPeNDAP 1/10/14
Advertisements

1 Projection Indexes in HDF5 Rishi Rakesh Sinha The HDF Group.
Recent Work in Progress
THREDDS Status John Caron Unidata 5/7/2013. Outline Release schedule Aggregations -> featureCollections / NCSS GRIB refactor Discrete Sampling Geometry.
A Common Data Model In the Middle Tier Enabling Data Access in Workflows … HDF/HDF-EOS Workshop XIV September 29, 2010 Doug Lindholm Laboratory for Atmospheric.
® OGC Web Services Initiative, Phase 9 (OWS-9): Innovations Thread - OPeNDAP James Gallagher and Nathan Potter, OPeNDAP © 2012 Open Geospatial Consortium.
THREDDS, CDM, OPeNDAP, netCDF and Related Conventions John Caron Unidata/UCAR Sep 2007.
7 +/- 2 Maybe Good Ideas John Caron June (1) NetCDF-Java (aka CDM) has lots of functionality, but only available in Java – NcML Aggregation – Access.
The Future of NetCDF Russ Rew UCAR Unidata Program Center Acknowledgments: John Caron, Ed Hartnett, NASA’s Earth Science Technology Office, National Science.
NetCDF An Effective Way to Store and Retrieve Scientific Datasets Jianwei Li 02/11/2002.
© 2007 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.1 Computer Networks and Internets with Internet Applications, 4e By Douglas.
Unidata TDS Workshop THREDDS Data Server Overview October 2014.
Status of netCDF-3, netCDF-4, and CF Conventions Russ Rew Community Standards for Unstructured Grids Workshop, Boulder
John Caron Unidata October 2012
Avro Apache Course: Distributed class Student ID: AM Name: Azzaya Galbazar
OPeNDAP and the Data Access Protocol (DAP) Original version by Dave Fulker.
1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006.
NetCDF for Developers and Data Providers Russ Rew, UCAR Unidata ICTP Advanced School on High Performance and Grid Computing 14 April 2011.
Unidata’s TDS Workshop TDS Overview – Part II October 2012.
Feature Collections Subsetting 1. Overview 2. NCSS 2.1. Dataset description 2.2. Grid requests 2.3. Grid as point requests 3. CdmrFeature.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Unidata TDS Workshop TDS Overview – Part I XX-XX October 2014.
Unidata’s Common Data Model John Caron Unidata/UCAR Nov 2006.
THREDDS Data Server Ethan Davis GEOSS Climate Workshop 23 September 2011.
Coverages and the DAP2 Data Model James Gallagher.
NetCDF-Java Overview John Caron Oct 29, Contents Data Models / Shared Dimensions Coordinate Systems Feature Types NetCDF Markup Language (NcML)
NcML Aggregation vs Feature Collections. NcML functionality 1.Modify the objects found in CDM files – Especially Attributes – Don’t have to rewrite the.
Mid-Course Review: NetCDF in the Current Proposal Period Russ Rew
The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1.
Accomplishments and Remaining Challenges: THREDDS Data Server and Common Data Model Ethan Davis Unidata Policy Committee Meeting May 2011.
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
Integrating netCDF and OPeNDAP (The DrNO Project) Dr. Dennis Heimbigner Unidata Go-ESSP Workshop Seattle, WA, Sept
DAP4 James Gallagher & Ethan Davis OPeNDAP and Unidata.
Unidata TDS Workshop THREDDS Data Server Overview
Accessing Remote Datasets using the DAP protocol through the netCDF interface. Dr. Dennis Heimbigner Unidata netCDF Workshop August 3-4, 2009.
1 HDF5 Life cycle of data Boeing September 19, 2006.
Recent developments with the THREDDS Data Server (TDS) and related Tools: covering TDS, NCML, WCS, forecast aggregation and not including stuff covered.
Unidata’s Common Data Model and the THREDDS Data Server John Caron Unidata/UCAR, Boulder CO Jan 6, 2006 ESIP Winter 2006.
DM_PPT_NP_v01 SESIP_0715_JR HDF Server HDF for the Web John Readey The HDF Group Champaign Illinois USA.
THREDDS Catalogs Ethan Davis UCAR/Unidata NASA ESDSWG Standards Process Group meeting, 17 July 2007.
Copyright © 2012 UNICOM Systems, Inc. Confidential Information z/Ware Product Overview illustro Systems International A Division of UNICOM Global.
Unidata’s TDS Workshop TDS Overview – Part I July 2011.
Remote Data Access with OPeNDAP Dr. Dennis Heimbigner Unidata netCDF Workshop October 25, 2012.
SUPPORTING SQL QUERIES FOR SUBSETTING LARGE- SCALE DATASETS IN PARAVIEW SC’11 UltraVis Workshop, November 13, 2011 Yu Su*, Gagan Agrawal*, Jon Woodring†
File Systems cs550 Operating Systems David Monismith.
NetCDF-4: Software Implementing an Enhanced Data Model for the Geosciences Russ Rew, Ed Hartnett, and John Caron UCAR Unidata Program, Boulder
NetCDF and Scientific Data Durability Russ Rew, UCAR Unidata ESIP Federation Summer Meeting
GrADS-DODS Server An open-source tool for distributed data access and analysis Joe Wielgosz, Brian Doty, Jennifer Adams COLA/IGES - Calverton, MD
GIS for Atmospheric Sciences and Hydrology By David R. Maidment University of Texas at Austin National Center for Atmospheric Research, 6 July 2005.
Weathertop Consulting, LLC Server-side OPeNDAP Analysis – Concrete steps toward a generalized framework via a reference implementation using F-TDS Roland.
LAS and THREDDS: Partners for Education Roland Schweitzer Steve Hankin Jonathan Callahan Joe Mclean Kevin O’Brien Ansley Manke Yonghua Wei.
Unidata Technologies Relevant to GO-ESSP: An Update Russ Rew
OGC Web Services with complex data Stephen Pascoe How OGC Web Services relate to GML Application Schema.
Update on Unidata Technologies for Data Access Russ Rew
THREDDS Data Server (TDS) and Data Discovery John Caron Unidata/UCAR May 15, 2006.
TSDS (HPDE DAP). Objectives (1) develop a standard API for time series-like data, (2) develop a software package, TSDS (Time Series Data Server), that.
Unidata Infrastructure for Data Services Russ Rew GO-ESSP Workshop, LLNL
NetCDF Data Model Details Russ Rew, UCAR Unidata NetCDF 2009 Workshop
® Sponsored by Improving Access to Point Cloud Data 98th OGC Technical Committee Washington DC, USA 8 March 2016 Keith Ryden Esri Software Development.
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
NetCDF-Java version 2.2 Common Data Model John Caron Unidata/UCAR Dec 10, 2004.
The Client-Server Model
IRI Data Library Overview
Spark Presentation.
HDF5 Metadata and Page Buffering
Efficiently serving HDF5 via OPeNDAP
Remote Data Access Update
Remote Data Access Update
OPeNDAP/Hyrax Interfaces
Adapting an existing web server to S3
Presentation transcript:

Streaming NetCDF John Caron July 2011

What does NetCDF do for you? Data Storage: machine-, OS-, compiler-independent Standard API (Application Programming Interface) Multidimensional array data model Efficient extraction of data subsets – Subset specified by array index ranges – Random access files – Predictable cost

NetCDF-3 file format Header Non-record Variable Record (unlimited) Variables Variable 1 Variable 2 Variable 3 … Record 0 Record 1 float var1(z, y, x) Row-major order float rvar2(0, z, y, x) float rvar3(0, z, y, x) float rvar1(0, z, y, x) float rvar2(1, z, y, x) float rvar3(1, z, y, x) float rvar1(1, z, y, x) unlimited…

NetCDF-3 = Read-optimized Very fast to read in header = “schema” Disk layout is fixed – Simplest possible – Programmer decides on unlimited dimension – Easy for programmers to understand and predict I/O costs

NetCDF-4 file format Built on HDF-5 Big Data Much more complicated than netCDF-3 – “Fractal heaps” – B-trees everywhere – Data stored in variable-length chunks – Each chunk can have multiple “filters”, e.g. compression

Multidimensional chunking

HDF5 Disk layout is not fixed Knowing schema != knowing data layout Programmer chooses chunking/compression, then trusts library Like a File system, but not part of OS

OPeNDAP Remote access to netCDF files Index space Similar data model Different binary “format” Opendap response is not a netCDF file

New Paradigm : “Web Services” HTTP / URL Standard Interfaces Standard “payload” (HTML / XML) OGC WxS (Web Map Service, Web Coverage Service, Web Feature Service) – Queries in Coordinate Space (Lat/Lon/Time) – netCDF is now an OGC standard payload

Returning a netCDF response Can we write a netCDF file directly to the socket without first writing to disk? – netCDF-3: sometimes Must know size of unlimited dimension beforehand Can’t use standard libraries NetCDF and application code gets mixed – netCDF-4 : not practical Not impossible, but not worth pursuing

Queries are not in index space You have a large collection of “features” spread out over many files User makes a request for all features in a bounding box You don’t know how many features satisfy the request Server wants to query multiple files in parallel, write out results directly to socket

Design goals for a new netCDF file format Allows direct writes to network == streaming Append only Concat multiple files -> valid file Easily convert to/from netCDF-3 and netCDF-4 – No loss of information in either direction Read with Java or C netCDF libraries without conversion “write optimized”

Implementation decisions for ncstream = “streaming netCDF” ncstream = sequence of variable length messages Full CDM/netCDF-4 data model Binary encoding using Google's Protobuf Protobuf – Binary object serialization, cross language, transport neutral, extensible – Very fast compared to XML – 2-3x faster than TDS OPeNDAP Post-processing creates indexes for efficiency

ncstream file format … message … … … index

CDM Remote Access Web Service Subsetting in index space Supports full CDM/netCDF-4 data model – Can be used instead of DAP 2.0 for queries in index space Simple REST interface Uses ncstream for encoding Have experimental version in Java – CDM (NetCDF-Java library), ToolsUI client – TDS (THREDDS Data Server) cdmRemote service type – May enable in IDV soon Have “pre-alpha” version in netCDF-C library

CDM Remote Feature Web Service Subsetting in coordinate space REST interface / ncstream for encoding cdmrFeature service type in TDS Follow on to Netcdf Subset Service – Point Feature datasets Alpha version in TDS since version 4.2 Beta version in TDS 4.3 (Sep 2011)

Application Java Client Accessing Point Feature Collections Data TDS Coordinate Systems Data Access cdmrFeature Ncstream cdmRemote ncstream CDM Point Feature API CDM Remote API

Problem: how to get CDM functionality into netCDF C library? Desired functionality – NcML, Aggregation – Access many other file formats (GRIB, BUFR, NEXRAD, etc) Java has to run in its own process, cant be linked into C code Reimplement CDM functionality in C library – 200K+ LOC – OO, inheritence – OTOH, avoid blind alleys, use 3 rd party libraries Leave Java in its own process, communicate across processes

Possibility: CdmRemote Server TDS variant Lightweight server for CDM datasets – Zero configuration – Local filesystem – Allow one to cache expensive objects Java and C clients – Allow non-Java applications access to CDM stack – Coordinate space queries – Virtual datasets – Feature Types

Application C Client C library – enable other languages Data TDS Coordinate Systems Data Access cdmrFeature Ncstream cdmRemote ncstream CDM Point Feature API CDM Remote API Python / ?

TODO (lots) Indexes Compression Convert to netCDF-4 CdmRemote server Finalize protocol

Conclusions ncstream = experimental netCDF file format cdmremote = experimental remote access to CDM data cdmrFeature = experimental “query in coordinate space” web service Hope to have it kickable by end of year More info: google cdmremote, ncstream