Aggregating Gridded Data Aggregating time points: 10,000's of data files: sst[latitude][longitude] become one virtual dataset: sst[time][latitude][longitude] Aggregating variables: Many files with one variable per file become one virtual dataset with all variables
Subsetting Gridded Data OPeNDAP Projection Constraints sst[57:57][121:2:141][163:2:183] ERDDAP: sst[(2012-08-12)][(20):2:(40)][(-140):2:(-120)] Huge time-saver: User can just request what she needs (1%). Aggregated datasets need to be subset-able.
Aggregating In-Situ and Tabular Data A database-like table with rows and columns E.g., One file has data for one buoy for one month. It isn't a multi-dimensional grid. There are no dimensions. Aggregating features and time points: Features: stations, trajectories, profiles,... Append into a giant virtual table.
Subsetting In-Situ and Tabular Data OPeNDAP Selection Constraints (no indices, because no multi-dimensional grids) longitude,latitude,time,sst&sst>35 Easy to create. Uses domain units (degC). Very flexible. (Based on database's SQL SELECT.) Huge time-saver User can just request what she needs (1%). Aggregated datasets need to be subset-able.
Don't Treat In-Situ/Tabular Data Like Gridded Data CF DSG stores in-situ data as as gridded.nc Fine for storage, not for subsetting. Problem: Indices aren't domain units. How do you request sst>35 with indices? Problem: Indices aren't real-world sequence. Grid: lat is a sequence. lat[42:53] has meaning. Table: Buoy number isn't. &lat>20&lat<40 is buoy #2,14,26,109, not buoy[42:53] Problem: 5 CF DSG data structures.
Option: Treat Gridded Data Like Tabular Data Standard request: time, lat, lon bounding box What about unusual requests of gridded data, e.g., SST>35 ("Select by value") ERDDAP's EDDTableFromEDDGrid creates a giant virtual table from a gridded dataset. Columns: longitude, latitude, time, sst Query: e.g., longitude,latitude,time,sst&sst>35 Response: a table (one data point per row) Risk: huge effort for server.
Summary: Huge Advantages of Aggregation and Subsetting Users can find and deal with one aggregated dataset. Users can make one subset request to one aggregated dataset Grids: indices to get a temporal and spatial subset. Tables (selection constraints): any subset you want. (Not: one subset request to each unaggregated file, or worse, using FTP to download lots of entire files.) Don't treat tabular/in-situ data like gridded data.
Aggregation and Subsetting in ERDDAP (a middleman data server) http://coastwatch.pfeg.noaa.gov/erddap Bob Simons NOAA NMFS SWFSC ERD