Toward Rich, User- Defined Aggregation & Subset-Selection Services Dave Fulker, President, OPeNDAP, Inc ESIP Summer Meeting on 9-12 July 2013 Thursday.

Slides:



Advertisements
Similar presentations
INDIANAUNIVERSITYINDIANAUNIVERSITY GENI Global Environment for Network Innovation James Williams Director – International Networking Director – Operational.
Advertisements

1 NASA CEOP Status & Demo CEOS WGISS-25 Sanya, China February 27, 2008 Yonsook Enloe.
James Gallagher OPeNDAP 1/10/14
The HDF Group ESIP Summer Meeting Easy access HDF files via Hyrax Kent Yang The HDF Group 1 July 8 – 11, 2014.
Recent Work in Progress
OPeNDAP-Unidata Development of DAP4 (a Data Access Protocol) Describing Progress and Seeking Input at the ESIP Summer Meeting 2012 by Dave Fulker (OPeNDAP.
A Common Data Model In the Middle Tier Enabling Data Access in Workflows … HDF/HDF-EOS Workshop XIV September 29, 2010 Doug Lindholm Laboratory for Atmospheric.
Streaming NetCDF John Caron July What does NetCDF do for you? Data Storage: machine-, OS-, compiler-independent Standard API (Application Programming.
® OGC Web Services Initiative, Phase 9 (OWS-9): Innovations Thread - OPeNDAP James Gallagher and Nathan Potter, OPeNDAP © 2012 Open Geospatial Consortium.
DLESE and NSDL The role of the Digital Library for Earth System Education* (DLESE) in the National SMETE Digital Library Presented by Dave Fulker Director.
View, through an architectural lens, of OPeNDAP’s Data Access Protocol (DAP2) A candidate OGC Standard (OGC Pending Document ) by James Gallagher.
HDF5 OPeNDAP Project Update and Demo MuQun Yang and Hyo-Kyung Lee (The HDF Group) James Gallagher (OPeNDAP, Inc.)
The Future of NetCDF Russ Rew UCAR Unidata Program Center Acknowledgments: John Caron, Ed Hartnett, NASA’s Earth Science Technology Office, National Science.
Enterprise Search With SharePoint Portal Server V2 Steve Tullis, Program Manager, Business Portal Group 3/5/2003.
LAS & NVODS S.Hankin -- Sep NVODS and the Live Access Server (LAS) Steve Hankin, PI (NOAA/PMEL) Jon Callahan (U of WA/JISAO) Ansley Manke (NOAA/PMEL)
OPeNDAP Present and Future An Overview Encompassing Current Projects & Potential New Directions Dave Fulker and James Gallagher.
The HDF Group HDF/HDF-EOS Workshop XIV1 Easy Access of NASA HDF data via OPeNDAP Kent Yang and Joe Lee The HDF Group September 28,2010.
Unidata TDS Workshop THREDDS Data Server Overview October 2014.
Introduction Downloading and sifting through large volumes of data stored in differing formats can be a time-consuming and sometimes frustrating process.
OPeNDAP and the Data Access Protocol (DAP) Original version by Dave Fulker.
VrRBO with THREDDS data store. Paths & URLs THREDDS server THREDDS data directory.
Quick Unidata Overview NetCDF Workshop 25 October 2012 Russ Rew.
Implementation of Model Data Interoperability for IOOS: Successes and Lessons Learned Rich Signell USGS Woods Hole, MA / NOAA Silver Spring USA Model Data.
The HDF Group ESIP Summer Meeting HDF OPeNDAP update Kent Yang The HDF Group 1 July 8 – 11, 2014.
Unidata’s TDS Workshop TDS Overview – Part II October 2012.
DM_PPT_NP_v01 SESIP_0715_AJ HDF Product Designer Aleksandar Jelenak, H. Joe Lee, Ted Habermann Gerd Heber, John Readey, Joel Plutchak The HDF Group HDF.
Unidata TDS Workshop TDS Overview – Part I XX-XX October 2014.
1 OPeNDAP/ECHO Demo Integrating and Chaining services September, 2006 CEOS WGISS 22 Annapolis, MD.
THREDDS Data Server Ethan Davis GEOSS Climate Workshop 23 September 2011.
Weathertop Consulting, LLC Wednesday, January 14, 2009 IIPS 11A.2 1 A General Purpose System for Server-side Analysis of Earth Science Data Roland Schweitzer.
ATMOSPHERIC SCIENCE DATA CENTER ‘Best’ Practices for Aggregating Subset Results from Archived Datasets Walter E. Baskin 1, Jennifer Perez 2 (1) Science.
Mid-Course Review: NetCDF in the Current Proposal Period Russ Rew
Accomplishments and Remaining Challenges: THREDDS Data Server and Common Data Model Ethan Davis Unidata Policy Committee Meeting May 2011.
The HDF Group ESIP Summer Meeting HDF Studio John Readey The HDF Group 1 July 8 – 11, 2014.
Integrated Model Data Management S.Hankin ESMF July ‘04 Integrated data management in the ESMF (ESME) Steve Hankin (NOAA/PMEL & IOOS/DMAC) ESMF Team meeting.
HDF5 OPeNDAP Project Update and Demo MuQun Yang and Hyo-Kyung Lee (The HDF Group) James Gallagher (OPeNDAP, Inc.) 1HDF and HDF-EOS Workshop XII10/17/2008.
Integrating netCDF and OPeNDAP (The DrNO Project) Dr. Dennis Heimbigner Unidata Go-ESSP Workshop Seattle, WA, Sept
DAP4 James Gallagher & Ethan Davis OPeNDAP and Unidata.
Unidata TDS Workshop THREDDS Data Server Overview
OPeNDAP Developer Meeting Summary Session Friday Feb 23 OPeNDAP Developer Meeting Friday - AM session.
Recent developments with the THREDDS Data Server (TDS) and related Tools: covering TDS, NCML, WCS, forecast aggregation and not including stuff covered.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Unidata’s Common Data Model and the THREDDS Data Server John Caron Unidata/UCAR, Boulder CO Jan 6, 2006 ESIP Winter 2006.
Unidata’s TDS Workshop TDS Overview – Part I July 2011.
Remote Data Access with OPeNDAP Dr. Dennis Heimbigner Unidata netCDF Workshop October 25, 2012.
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
ESIP Vision: “Achieve a sustainable world” by Serving as facilitator and advisor for the Earth science information community Promoting efficient flow of.
1 NASA CEOP Final Summary CEOS WGISS-26 Boulder, Colorado September 23, 2008 Yonsook Enloe
Ed Armstrong – PI Luca Cinquini Chris Mattmann NASA Jet Propulsion Laboratory Frank O’Brien Zach Siegrist System Science Applications, Inc. 18 July 2012.
HDF5 OPeNDAP Project Update and Demo MuQun Yang and Hyo-Kyung Lee (The HDF Group) James Gallagher (OPeNDAP, Inc.) 1HDF and HDF-EOS Workshop XII, Aurora,
Weathertop Consulting, LLC Server-side OPeNDAP Analysis – Concrete steps toward a generalized framework via a reference implementation using F-TDS Roland.
Sun Earth Connection Distributed Data Services Presented at the Principle Investigator's Meeting NASA's Applied Information Systems Research Program 5.
End-to-End Data Services A Few Personal Thoughts Unidata Staff Meeting 2 September 2009.
1 2.5 DISTRIBUTED DATA INTEGRATION WTF-CEOP (WGISS Test Facility for CEOP) May 2007 Yonsook Enloe (NASA/SGT) Chris Lynnes (NASA)
Update on Unidata Technologies for Data Access Russ Rew
Unidata Infrastructure for Data Services Russ Rew GO-ESSP Workshop, LLNL
HDF5 OPeNDAP Project Update and Demo MuQun Yang and Hyo-Kyung Lee (The HDF Group) James Gallagher (OPeNDAP, Inc.) 1HDF and HDF-EOS Workshop XII, Aurora,
ESIP Vision: “Achieve a sustainable world” by Serving as facilitator and advisor for the Earth science information community Promoting efficient flow of.
Advancing netCDF-CF for the Geoscience Community
DAP+NETCDF Using the netCDF-4 Data Model
James Gallagher OPeNDAP
Plans for an Enhanced NetCDF-4 Interface to HDF5 Data
Improving Data Access, Discovery, and Usability
Remote Data Access Update
Access HDF5 Datasets via OPeNDAP’s Data Access Protocol (DAP)
Accessing Remote Datasets through the netCDF interface.
OPeNDAP: Accessing Data in a Distributed, Heterogeneous Environment
Dispatch Layer and the NetCDF Architecture
GENI Global Environment for Network Innovation
Adapting an existing web server to S3
Presentation transcript:

Toward Rich, User- Defined Aggregation & Subset-Selection Services Dave Fulker, President, OPeNDAP, Inc ESIP Summer Meeting on 9-12 July 2013 Thursday 3:30 PM Session Aggregation / Subsetting — What’s It To You

2 A Thin Slice of Subset-Selection History ✦ ‘70s — Relational data bases demonstrated Data access [data model + operations] ✴ Ops included select-by-value subsetting ✴ But without array types (generally) ✦ ‘80s-‘90s — CDF, NetCDF, HDF ✴ Non-tabular data models (arrays / select-by-index) ✴ But without select-by-value operations ✦ Mid ‘90s — DAP2 (DODS ➔ OPeNDAP) ✴ Remote access (a Web service) ✴ Most often employed via NetCDF API

3 Why Arrays Merited Sacrifice of Select-by-Value ✦ N-dim arrays are natural in key cases ✴ Imagery (satellites, radars, telescopes...) ✴ Simulations on rectangular meshes ✦ Arrays often yield great efficiencies ✴ Store/retrieve/subset without searching/testing ✴ Ideal for derivative & image-processing ops ✦ Furthermore... ✴ What does select-by-value mean?

4 Note: OPeNDAP Actually Offers Both Forms of Subset Selection ✦ DAP2 (1993) embraced “sequences” ✴ With select-by-value subset creation ✦ DAP2 distinguished these from arrays, where selection occurs by index ✴ Exception: bounding-box constraints may be applied to “coordinate-variable” arrays ✦ DAP sequences are relatively rare ✴ Partly because netCDF API ignores them ✴ Notable exceptions: ERDAP & JGOFS

5 A Thin Slice of (NetCDF) Aggregation History ✦ late ‘80s — Unidata discussions (incl ideas about Unix-style data filters) → NetCDF ✦ late ‘90‘s — Zender’s NetCDF Operators included “concatenation” ✦ ‘00s — THREDDS virtual aggregation ✴ Defined to be a server configuration (employing NCML), akin to concatenation ✴ Adopted in other DAP-based servers ✴ Configured by providers, not users

6 NCML-Style (Virtual) Aggregation Is Ideal when ✦ The results are “natural” E.g., time is a coordinate rather than something encoded in (obtuse) granule names ✦ Multi-granule access enhances usability Easy to subset in desired ways Does not lead to excessive delays, etc. ✦ Providers can’t go wrong...

7...but NCML-Based Aggregation by Providers Is Less than Ideal... ✦ If granules don’t “line up” naturally E.g., swath data might be aggregated in various ways, none satisfying all users ✦ If cross-granule access is slow/costly E.g., retrieval might involve per-granule overhead such that latencies accumulate ✦ Lesson: abstraction is nice; concrete is hard

8 Idea: Empower Users for Subset Selection & Aggregation ✦ Even in hard cases, users may know how & whether granules should be aggregated ✦ Users may know how they’d like to receive results of test-by-value operations ✦ Service-invocation protocols could allow (beyond subset selection) — ✴ A rich set of pre-retrieval operations or even workflows

9 OPeNDAP ✴ Is Pursuing ✴ with Unidata & NOAA/PMEL ✦ Funds (NSF & NASA) to explore: A rich set of pre-retrieval server functions (beyond index-based subset selection) A protocol/language for invoking these ✦ Success would augment current efforts (NOAA-funded) to complete DAP4

10 Final Thoughts ✦ Pre-retrieval ops will be of increasing value (data-proximate computation…) ✦ Invocation language is key to success ✴ Must enable community-driven extension ✴ Aggregation expressions must refer to multiple granules ✦ There are significant challenges ✴ Granule/subset selection often requires iteration ✴ Operations affect provenance & other metadata ✴ Should avoid ops that increase xfer volumes ✴ Async/cached responses may be required...

11 Thank You ✦ OPeNDAP site — ✦ Dave Fulker