Presentation is loading. Please wait.

Presentation is loading. Please wait.

SSDS: Data for Science A Walkthrough of Proposed SSDS Capabilities 4 April 2002 John Graybeal.

Similar presentations


Presentation on theme: "SSDS: Data for Science A Walkthrough of Proposed SSDS Capabilities 4 April 2002 John Graybeal."— Presentation transcript:

1 SSDS: Data for Science A Walkthrough of Proposed SSDS Capabilities 4 April 2002 John Graybeal

2 2002.04.041 Topics What you want to hear: What data is in SSDS How to access data How to display data How to command instruments What else you should know: How easy to use is it? Are we sure the data’s OK? –Raw data always available? –Is it reliable? Is time right? What if there’s a problem? –Can we tell what happened? –Can we gracefully recover? Is data distributable/secure? What aren’t you getting?

3 2002.04.042 What Data is Available? 1.All data produced by MOOS instruments Data is available ‘right away’ if sent to shore, or Data could be loaded later, directly from device 2.Other data which has been submitted to SSDS Submitted data must follow basic ISI/SSDS guidelines Can be brand new (e.g., calibrations), or derived (e.g., from other SSDS data) 3.“Metadata” (descriptive info) about the aboveMetadata  Notes –SSDS should not replicate external data stores –Someday could re-process existing MBARI data –Operational data can also be sent to SSDS and ingested

4 2002.04.043 Metadata “Explained” Metadata is just “data about other data” –My metadata may be your science data, or vice- versa 4 metadata types MOOS will handle (≈static) –Packet headers (source, timestamp, sequence) –Packet descriptions (item 1=“Depth”, 2=“Lat”) –Device (data source) descriptions –Rich science metadata (status, calibration info) Everything else is ‘just data’ Wherever possible, we’ll try to keep it simple

5 2002.04.044 How To Access the Data? Ask (catalog) for data of interest (search by device, date, data item name, or combination) Choose a data set (sets?) of interest, click to access –Probably multiple text formats—what’s important? (ASCII CSV? ODV? netCDF? other?) –Do you need to monitor or process ‘streaming’ data? What more advanced features are needed? Desired? –Displaying same item across multiple data sets? –Selecting specific items or times within data set? –Processed data products…Sub-setting or interpolating data by time or item? Averaging? Filtering? …? –Combining 2 data sets using time as reference?

6 2002.04.045 How to Display the Data? Basic plots will be available via web interface –Quick look in the truest sense –We don’t want to create yet another plotting program Data will be available to existing tools –Minimum capability is usable files (ASCII, netCDF, ?) –Ideal is to embed SSDS data access directly into tools In this model, software within Matlab (for example) can open anything in the archive Browsing from within application would be a big plus Some (many) tools may do this for free; others we can ‘help’ Before discussing further, you should understand the way we want SSDS (and MOOS) to work

7 2002.04.046 MOOS Data Architecture Devices Observing Platform Shore Side Data System User Applications (User Tools) Data Presentation Communications Archiving Applications/ Interfaces 101110 110011 110 234 999 Data line 1 more data last data OceanSideShoreSide Cataloging

8 2002.04.047 How to Access Instrument (by the way, it’s not an SSDS task) Devices Observing Platform Shore Side Data System User Applications (User Tools) Data Presentation Communications Archiving Applications/ Interfaces 101110 110011 110 234 999 Data line 1 more data last data OceanSideShoreSide Cataloging

9 2002.04.048 How Data Access Works Devices Observing Platform Shore Side Data System User Applications (User Tools) Data Presentation Communications Archiving Applications/ Interfaces 101110 110011 110 234 999 Data line 1 more data last data OceanSideShoreSide Cataloging     

10 2002.04.049 How Data Access Works 1.SSDS automatically notified of instrument information –Instrument qualification and installation on MOOS –Instrument configuration (default settings, changes) –Data record descriptions (syntactic and semantic) –Arrival of new data records 2.SSDS automatically catalogs, archives all arriving data 3.Users search catalog for data of interest –References to archived data returned with search results –Source data can be accessed via the references 4.User can then view (or subscribe to?) the source data –Various formats provided, including basic plots –Connections to advanced presentation packages supported

11 2002.04.0410 Topics What you want to hear: What data is in SSDS How to access data How to display data How to command instruments What else you should know: How easy to use is it? Are we sure the data’s OK? –Raw data always available? –Is it reliable? Is time right? What if there’s a problem? –Can we tell what happened? –Can we gracefully recover? Is data distributable/secure? What aren’t you getting?What aren’t you getting?

12 2002.04.0411 How easy to use is it? The Hard Part I: Providing ISI instrument drivers –Templates should be available, useful for most devices The Hard Part II: Describe your data streams –Must define instrument data streams before deploying –Even this can be easy (define your data as a “blob”; but…) Steps to get data should be pretty easy (1-step?) –Find it in catalog (may be many items with similar names) –Ask for it in your favorite basic format –Plug it in to your favorite application MOOS/ISI/SSDS makes many things simple –Timestamps: synchronous, reliable, available –Data transfer, archive, backup all handled automatically –Operational relationships (particularly location) tracked

13 2002.04.0412 Are we sure the data is OK? Raw data always available? –The system is designed around this core concept –Even if SSDS dies, raw data won’t go away Is data reliable (what you see is what was sent)? –Same software for ALL data communication and management -- excellent reliability, less work Is time base correct for the data? –Uniform time base for all MOOS/ISI components –Of course, you have to send data via ISI data paths If you keep it in the instrument, ISI can’t timestamp it

14 2002.04.0413 What if there’s a problem? Can we tell what happened (and avoid it)? –Certain systematic information will be available Other data arrivals from device/platform/observatory Indications of instrument events, reconfigurations –Operational data can be sent and maintained Transfer rates, connection reliability, power status Systemic events and errors Can we gracefully recover? Yes! (within reason) –All the transferred raw data is kept in SSDS –All the instrument’s raw data is saved on wet side –System designed for graceful data (re)processing

15 2002.04.0414 Is data distributable? Is data secure? Request: Give colleagues access to ‘my’ data –Model A: Everyone has access to all data (w/fuzz) –Model B: MBARI Internal vs MBARI External Option1: Make ‘your’ data available externally Option 2: Bring them to MBARI Option 3: Send them a report of your data –Model C: Configurable data access security Notionally follow Unix (self, group, other) model Note this model costs more (amount TBD) to implement (Note: Access security is also central to confidently enforcing proprietary periods.)

16 2002.04.0415 What aren’t you getting? Totally transparent way of doing business –Some accommodation to infrastructure is required Very low latencies in data streaming, archiving –Latency may be from sensor to shore, and from shore to archival interface –Total latency not to exceed 1 hour (?) Domain-specific data (re-)processing Advanced data merging and reprocessing Sophisticated data plotting/analysis via web interface (High-bandwidth, always-on access to device) A perfect, fully functional system on day 1

17 2002.04.0416 Data Mgt Architecture

18 2002.04.0417 Conclusions SSDS should improve data management for all users –At minimum, easier access to your own data and plots –Straightforward access to all MBARI data (references) –More reliable data storage, time references, metadata links –Better long-term usability (gives us more time) Development will be incremental –Full-featured release targeted for MSE 2003 –Prototypes will exist before then (soon!), but may evolve –Features will grow with third-party solutions Many questions about first-order science priorities –Which general-purpose functions do you really need? –What are most useful data formats? application interfaces? –How important is fine-grained access security?


Download ppt "SSDS: Data for Science A Walkthrough of Proposed SSDS Capabilities 4 April 2002 John Graybeal."

Similar presentations


Ads by Google