Presentation is loading. Please wait.

Presentation is loading. Please wait.

Three Flavors of Data Science Data Simulations and Sensor Readings Catalog Data Metadata; descriptors of datasets, data products and other processing artifacts.

Similar presentations


Presentation on theme: "Three Flavors of Data Science Data Simulations and Sensor Readings Catalog Data Metadata; descriptors of datasets, data products and other processing artifacts."— Presentation transcript:

1 Three Flavors of Data Science Data Simulations and Sensor Readings Catalog Data Metadata; descriptors of datasets, data products and other processing artifacts. Active Data Data associated with logging, monitoring and scheduling compute tasks.

2 Three Flavors of Data (1) Science Data  Simulation Data: Solutions to partial differential equations governing the physics of the Columbia River Estuary  Sensor Data: measurements of the physical characteristics used to guide and validate simulations Wanted: Simple means for specifying new data products from these raw data and computing them efficiently Approach: Data manipulation language based on a GridField data model.

3 Three Flavors of Data (2) Catalog Data Explicit metadata to describe system artifacts Wanted: Tools to locate artifacts given descriptors (query) A metadata collection facility that tolerates change  The metadata we wish to collect may change (eg, new product ‘lines’ are developed)  The source of the metadata may change (eg, file naming conventions or directory structures evolve.) Approach: Generic database; custom collection scripts

4 Three Flavors of Data (3) Active Data Data describing past, current, and future compute tasks. Wanted: Tools for scheduling, monitoring, and managing...  individual tasks (eg, a single data product derivation)  groups of interdependent tasks (eg, a daily forecast run)  campaigns (eg, a series of calibration runs followed by a re-computation of the runs of 2002 with a different implicitness) Approach: undecided

5 Simulation Data: GridFields The data product suite exhibits recurring processing idioms larger grids reduced to smaller grids Ex: ‘estuary’ data products vs. ‘far’ data products grids mapped to other grids Ex: 3D grid mapped to a 2D slice grids combined Ex: 1D depth grid ‘crossed’ with a 2D horizontal grid.

6 Simulation Data: GridFields (2) We’re expressing these idioms as operators over a grid-based data model. Advantages: Simpler recipes  5 ops for all the data products (plus helper functions) Flexible model; fewer maintenance troubles  N dimensions uniform handling of space and time (maybe more...)  Any cell type segments, triangles, quadrangles, arbitrary polytopes Optimization opportunities  operators prescribe semantics, but not implementation  topological equivalences exposed and exploited

7 Simulation Data: GridFields (3) Status: Core operators functional Simple examples hooked to XMVIS for viewing Todo:  Examples hooked to VTK  Write/Test examples from the current product suite  Support GridFields too large for memory  Expose a nice syntax for writing recipes

8 Catalog Data: Collection Where is the Metadata? File Name File Path Version: 1.04 Variable: salt : File Content 1_salt.63 /forecasts/2003-184/run/images/isosal_estuary7/anim-sal_estuary_7.gif Other Files?

9 Collection scripts For each file type the meta-data collection mechanism is different. gifs binary output Param.in Use a script for each file type that will emit meta-data for that type of file. Only these simple scripts need change as the system evolves

10 Example: gif animation CorieDate = “2003-184” Region = “Estuary” Lat = xxxx Long = xxxx /forecasts/2003-184/.../isosal_estuary7/anim-sal_estuary_7.gif Variable = “Salinity” Type = “Animation” Depth = “7” product line = “isoline” Here, a script can just parse the path and file name

11 Example: Binary output Need a different mechanism than for gif animations; might be convenient to implement it in a different script. /forecasts/2003-184/run/1_salt.gif Variable= “Salinity” What about number of nodes? Mean Sea Level? We need to access the file’s content 1_salt.63 nodes: 55817 msl: 4285 :

12 Architecture Reflector creates XML file containing meta-data for each file and also stores the meta-data into the database Reflector determines file type (based on regular expressions) and calls appropriate collection script Collection script uses an “AddItem” Perl function to return the meta-data back to the reflector Reflector Collection Script invokes Meta-data DB XML

13 Metadata in XML and DB? These XML files give you filesystem-based access to the metadata for an artifact Use “info” to present the XML in a readable form: /../run> info 1_salt.63 variable: salt version: 1.04 msl: 4285 nodes: 55817 Also useful if DB is inaccessible.

14 Minor Technical Change Previously we had suggested that the collection scripts should emit metadata on standard output We have provided a perl function AddItem(Name,Value,Notes,Type)

15 How does this help ? Find artifacts via descriptors (query)  ‘find animations showing the estuary where we used a constant bottom friction coefficient’  where region = “estuary” and type = “animation” and ntau = “0” Write robust metadata-driven programs  Chris’ low bandwidth zoom web app  Stay-Fresh Powerpoint Slides


Download ppt "Three Flavors of Data Science Data Simulations and Sensor Readings Catalog Data Metadata; descriptors of datasets, data products and other processing artifacts."

Similar presentations


Ads by Google