Presentation is loading. Please wait.

Presentation is loading. Please wait.

DAP, ERDDAP, and Tabular (Sequence) Datasets Try it: Bob Simons NOAA NMFS SWFSC ERD OBIS SOS Custom DAP ERDDAP...

Similar presentations


Presentation on theme: "DAP, ERDDAP, and Tabular (Sequence) Datasets Try it: Bob Simons NOAA NMFS SWFSC ERD OBIS SOS Custom DAP ERDDAP..."— Presentation transcript:

1 DAP, ERDDAP, and Tabular (Sequence) Datasets Try it: Bob Simons NOAA NMFS SWFSC ERD OBIS SOS Custom DAP ERDDAP... Database ERDDAP Files Your Favorite Client Software

2 My Goals for this Presentation 1.Tell you more about ERDDAP. 2.Raise awareness and appreciation of tabular data. 3.Convince you that tabular datasets are best served as DAP sequences. And that serving them in DAP as 1D or 2D gridded datasets is a bad idea. (This has nothing to do with how they are stored.) Bonus: 3 powerful ideas: 1.Abstractions (capture the essence; hide the instance details) 2.Representations (different file formats) 3.Reusability (value is multiplied)

3 1) ERDDAP

4

5

6 ERDDAP Features (Re)serves diverse local and remote datasets Abstraction: thanks to DAP, the source differences are hidden. Serves gridded and tabular datasets Offers a unified place to search for datasets Full-text, category-based, or advanced. Encourages improved metadata So users can understand the dataset. Offers a standard way to request data from any dataset For humans: forms on web pages. For computers: DAP, WMS, (SOS) web services. Offers a choice of response file formats Different representations Standardizes time formats (Here, different representations are trouble.) As Strings - ISO 8601:2004(E), e.g., T20:00:00Z As numbers - seconds since T00:00:00Z Is reusable.

7 2) Tabular Data

8 Tabular Datasets Tabular data sources: databases, OBIS, SOS, CSV files, flat.nc files, CF DSG.nc files,... Geospatial CF Discrete Sampling Geometry (DSG) feature types: Point: whale sightings Profile: disposable CTD TimeSeries: moored buoy TimeSeriesProfile: CTD Trajectory: ship TrajectoryProfile: profiling glider Non-Geospatial laboratory data, references, fish disease lists, ecosystem: what eats what,... Larry Ellison is rich because databases are reusable for numerous types of data.

9 (ERD)DAP Data Requests: Gridded vs. Tabular Datasets Gridded Datasets (DAP projection constraints) DAP: ?temperature[437] [46:1:162][122:282] ERDDAP: ?temperature[( )][(22):(51)][(-145):(-105)] Tabular Datasets (DAP selection constraints) DAP: ?s.id,s.owner,s.time,s.latitude,s.longitude,s.wtemp&s.id="sp031"&s.time>= ERDDAP: ?id,owner,time,latitude,longitude,wtemp&id="sp031"&time>= idownertypetimelatitudelongitudewtempatmp 46088NDBC3m Discus T14:20:00Z NDBC3m Discus T14:50:00Z SANF1SFSUC-MAN T16:00:00Z SANF1SFSUC-MAN T17:00:00Z

10 (ERD)DAP Sequence Requests vs. Database SQL Requests (ERD)DAP: ?id,owner,type,time,latitude,longitude,wtemp&id="46088"&time>= SQL: SELECT id,owner,type,time,latitude,longitude,wtemp FROM s WHERE id="46088" AND time>= Pablo Picasso: "Good artists copy, great artists steal."

11 Related Tables vs. One Table idownertypelatitudelongitudetimewtempatmp 46088NDBC3m Discus T14:20:00Z NDBC3m Discus T14:50:00Z NC312NCSUC-MAN T16:00:00Z NC312NCSUC-MAN T17:00:00Z idtimewtempatmp T14:20:00Z T14:50:00Z NC T16:00:00Z NC T17:00:00Z idownertypelatitudelongitude 46088NDBC3m Discus NDBC6m Discus BP114BP3m DIscus NC312NCSUC-MAN Join (Denormalized) Buoy Table Observation Table Normalized

12 Yeah, but why doesn't ERDDAP support nested sequences? It does, but just internally. ERDDAP (re)presents the dataset as a single table. One table is an abstraction. It hides details. The average user understands a table. One vs. many tables: just different representations. This lets all tabular datasets have the same structure. The results of a DAP or SQL query is always one table. There are many file format representations of one table.

13 3) Tabular datasets are best served as DAP sequences. (Why DAP Sequences Rock!) And that serving them in DAP as 1D or 2D gridded datasets is a bad idea. (This has nothing to do with how they are stored.)

14 Why Sequences Rock! Reason #1 If the data is coming from a relational database, OBIS, or SOS, the dataset can't be served as a gridded dataset. There are no index (row) numbers. It isn't easy/possible to know how many rows there are. The order of the rows may change at any time. New rows are added as new data arrives: frequently.

15 Why Sequences Rock! Reason #2 Serving tabular data in DAP as 1D or 2D gridded datasets is a bad idea. Logic: Men:mortal. Socrates:man. Socrates:mortal. Grids:handled well by DAP. Treat table as:grid. Treat table as grid:handled well? Grid dimensions usually represent a physical continuum. DAP: ?temperature[408:437][46:1:162][122:282] ERDDAP: ?temperature[( ):( )][(22):(51)][(-145):(-105)] No arrangement of tabular dataset dimensions works well 2D [buoy][time]: buoy is not a continuum, time leads to wasted space 1D [time]: fine, but then you need 1000 datasets (1 per buoy) 1D [row]: aggregated, but row isn't a continuum. In every case, it's hard to know which rows to request. The rows you want are scattered through the dataset. so you have to either download everything or make numerous requests. Serving a DSG file directly: too many formats, too hard to query.

16 Why Sequences Rock! Reason #3 DAP sequence requests use the terminology of the dataset. (It's easy.) ?id,owner,type,latitude,longitude&distinct() ?id,type,latitude,longitude&owner="NDBC"&distinct() ?id&latitude>=22&latitude =-145&longitude<=-105&distinct() ?id&latitude>=22&latitude =-145&longitude = &distinct() ?&latitude>=22&latitude =-145&longitude = indexidownertypelatitudelongitudetimewtempatmp NDBC3m Discus T14:20:00Z NDBC3m Discus T14:50:00Z BP114BP3m Discus T02:00:00Z BP114BP3m discus T04:00:00Z NC312NCSUC-MAN T16:00:00Z NC312NCSUC-MAN T17:00:00Z NDBC6m Discus T14:20:00Z NDBC6m Discus T14:50:00Z Making these requests with index numbers is a difficult (not for Roberto), multi-step, programming task. And it's inefficient.

17 Why Sequences Rock! Reason #4 Because declarative languages (SQL, DAP selection constraints) let you describe what you want, not how to get it. ?id,owner,type,latitude,longitude&distinct() ?id,type,latitude,longitude&owner="NDBC"&distinct() ?id&latitude>=22&latitude =-145&longitude =22&latitude =-145&longitude = &distinct() ?&latitude>=22&latitude =-145&longitude = With imperative languages (C, Fortran, Java, Python), you must describe, step-by- step, how to solve the problem. 1) Request all latitudes. 2) Filter 3) Request all longitudes. 4) Multiple requests because data is scattered throughout the dataset.

18 Why Sequences Rock! Reason #5 Because the other options all suck. Serving the datasets as grids doesn't work. You now understand why, right? Serve the data files via FTP. Getting a chunk of data is all or nothing. Makes user deal with various file formats. Custom forms and web services are too much work to make. Custom: 6+ months per dataset? Ongoing maintenance. No consistency! Reusable: 1 day, minimal maintenance, consistent! Give trusted colleagues access to the database or the files. That's not making the data public! Don't let anyone else use the data. This is actually the #1 method of fisheries data distribution.

19 My Goals for this Presentation 1.Tell you more about ERDDAP. 2.Raise awareness and appreciation of tabular data. 3.Convince you that tabular datasets are best served as DAP sequences. And that serving them in DAP as 1D or 2D gridded datasets is a bad idea. (This has nothing to do with how they are stored.) Bonus: 3 powerful ideas: 1.Abstractions (capture the essence; hide the instance details) 2.Representations (different file formats) 3.Reusability (value is multiplied)

20 Thank you!


Download ppt "DAP, ERDDAP, and Tabular (Sequence) Datasets Try it: Bob Simons NOAA NMFS SWFSC ERD OBIS SOS Custom DAP ERDDAP..."

Similar presentations


Ads by Google