Download presentation
Presentation is loading. Please wait.
1
Three Flavors of Data Science Data Simulations and Sensor Readings Catalog Data Metadata; descriptors of datasets, data products and other processing artifacts. Active Data Data associated with logging, monitoring and scheduling compute tasks.
2
Three Flavors of Data (1) Science Data Simulation Data: Solutions to partial differential equations governing the physics of the Columbia River Estuary Sensor Data: measurements of the physical characteristics used to guide and validate simulations Wanted: Simple means for specifying new data products from these raw data and computing them efficiently Approach: Data manipulation language based on a GridField data model.
3
Three Flavors of Data (2) Catalog Data Explicit metadata to describe system artifacts Wanted: Tools to locate artifacts given descriptors (query) A metadata collection facility that tolerates change The metadata we wish to collect may change (eg, new product ‘lines’ are developed) The source of the metadata may change (eg, file naming conventions or directory structures evolve.) Approach: Generic database; custom collection scripts
4
Three Flavors of Data (3) Active Data Data describing past, current, and future compute tasks. Wanted: Tools for scheduling, monitoring, and managing... individual tasks (eg, a single data product derivation) groups of interdependent tasks (eg, a daily forecast run) campaigns (eg, a series of calibration runs followed by a re-computation of the runs of 2002 with a different implicitness) Approach: undecided
5
Simulation Data: GridFields The data product suite exhibits recurring processing idioms larger grids reduced to smaller grids Ex: ‘estuary’ data products vs. ‘far’ data products grids mapped to other grids Ex: 3D grid mapped to a 2D slice grids combined Ex: 1D depth grid ‘crossed’ with a 2D horizontal grid.
6
Simulation Data: GridFields (2) We’re expressing these idioms as operators over a grid-based data model. Advantages: Simpler recipes 5 ops for all the data products (plus helper functions) Flexible model; fewer maintenance troubles N dimensions uniform handling of space and time (maybe more...) Any cell type segments, triangles, quadrangles, arbitrary polytopes Optimization opportunities operators prescribe semantics, but not implementation topological equivalences exposed and exploited
7
Simulation Data: GridFields (3) Status: Core operators functional Simple examples hooked to XMVIS for viewing Todo: Examples hooked to VTK Write/Test examples from the current product suite Support GridFields too large for memory Expose a nice syntax for writing recipes
8
Catalog Data: Collection Where is the Metadata? File Name File Path Version: 1.04 Variable: salt : File Content 1_salt.63 /forecasts/2003-184/run/images/isosal_estuary7/anim-sal_estuary_7.gif Other Files?
9
Collection scripts For each file type the meta-data collection mechanism is different. gifs binary output Param.in Use a script for each file type that will emit meta-data for that type of file. Only these simple scripts need change as the system evolves
10
Example: gif animation CorieDate = “2003-184” Region = “Estuary” Lat = xxxx Long = xxxx /forecasts/2003-184/.../isosal_estuary7/anim-sal_estuary_7.gif Variable = “Salinity” Type = “Animation” Depth = “7” product line = “isoline” Here, a script can just parse the path and file name
11
Example: Binary output Need a different mechanism than for gif animations; might be convenient to implement it in a different script. /forecasts/2003-184/run/1_salt.gif Variable= “Salinity” What about number of nodes? Mean Sea Level? We need to access the file’s content 1_salt.63 nodes: 55817 msl: 4285 :
12
Architecture Reflector creates XML file containing meta-data for each file and also stores the meta-data into the database Reflector determines file type (based on regular expressions) and calls appropriate collection script Collection script uses an “AddItem” Perl function to return the meta-data back to the reflector Reflector Collection Script invokes Meta-data DB XML
13
Metadata in XML and DB? These XML files give you filesystem-based access to the metadata for an artifact Use “info” to present the XML in a readable form: /../run> info 1_salt.63 variable: salt version: 1.04 msl: 4285 nodes: 55817 Also useful if DB is inaccessible.
14
Minor Technical Change Previously we had suggested that the collection scripts should emit metadata on standard output We have provided a perl function AddItem(Name,Value,Notes,Type)
15
How does this help ? Find artifacts via descriptors (query) ‘find animations showing the estuary where we used a constant bottom friction coefficient’ where region = “estuary” and type = “animation” and ntau = “0” Write robust metadata-driven programs Chris’ low bandwidth zoom web app Stay-Fresh Powerpoint Slides
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.