Victoria, May 2006 DAL for theorists: Implementation of the SNAP service for the TVO Claudio Gheller, Giuseppe Fiameni InterUniversitary Computing Center.

Victoria, May 2006 DAL for theorists: Implementation of the SNAP service for the TVO Claudio Gheller, Giuseppe Fiameni InterUniversitary Computing Center CINECA, Bologna Ugo Becciani, Alessandro Costa Astrophysical Observatory of Catania

Victoria, May 2006 The Simple Numerical Access Protocol Service The Snap service extracts or "cuts out" rectangular (spherical or even irregular) regions of some larger theory dataset, returning a subset of the requested size to the client. Snap basic components: DATA SNAP code SERVICE

Victoria, May 2006 1. Data and Data Model In order to analyze the needs of data produced by numerical simulations, we have considered a wide spectrum of applications: Particle based Cosmological simulations Grid based Cosmological simulations Magnatohydrodynamics simulations Planck mission simulated data... (thanks to V. Antonuccio, G. Bodo, S. Borgani, N. Lanza, L. Tornatore) At the moment, we consider only RAW data

Victoria, May 2006 1. Data In general, data produced by numerical simulations are Large (GB to TB scale) Monolithic (few files contains plenty of data) Uncompressible Non standard (propretary formats are the rule) Non portable (depend from simulation machine) No (or few) annotations – metadata Heterogeneous in units (often code units)

Victoria, May 2006 Data: the HDF5 format HDF5 (http://hdf.ncsa.uiuc.edu) represents a possible solution to deal with such data HDF5 is Portable between most of modern platform High performance Well supported Well documented Rich of tools HDF5 data files are Platform independent (portable) Well organized Self defined Metadata enriched Efficiently accessible HDF5 drawbacks Requires some expertise and skill to be used Information are difficult to access Can be subject to major library changes (see HDF4 to HDF5)

Victoria, May 2006 Data: our HDF5 implementation Each file represents an output time The structure is simple: all the data objects are at the root level: /BmMassDensityDataset {512, 512, 512} /BmTemperatureDataset {512, 512, 512} /BmVelocityDataset {512, 512, 512, 3} /DmMassDensityDataset {512, 512, 512} /DmPositionDataset {134217728, 3} /DmVelocityDataset {134217728, 3} HDF5 metadata make the file completely self-consistent Structural metadata (strictly required from the library) rank Dimensionality Annotation metadata (required from our implementation) Data object name Data object description Unit Formula Data objects (at the moment) can be: Structured grid: rank 4 (scalars or vectors) Unstructured points: rank 2 (scalar or vectors)

Victoria, May 2006 2. Data Model A Simulation Data Model (hereafter DM) is a way of describing an astrophysical simulation. The DM is a way to provide a conceptual, logical and interoperable description of a simulation. It is not tied to the way data providers internally store, describe, manage or organize their archives. We propose a data model for simulation data providing a general architecture which encompasses any kind of simulations. The architecture is hierarchical The root node of the hierarchy (Level 0), our basic object, is the Simulation (Level 0 of hierarchy). DM must characterize completely the Simulation.

Victoria, May 2006 Data Model schema

Victoria, May 2006 Data Model: 1 Starting from the Observation model and according to pub1, also for Simulations we can define three main classes: Data, Characterization and Provenance. These three classes, which compose the ‘ Level 1 ’ of the DM, are further specified in subclasses in a hierarchical pattern.

Victoria, May 2006 Data Model: 2 Raw data are annotated by a number of Level 2 metadata. The purpose of these metadata is to keep track of a file location and its format characteristics. In particular:  Link to the physical location of the file in the filesystem  File format  … Details about its content are managed by the Characterization class.

Victoria, May 2006 Data model: 3

Victoria, May 2006 Data Model: 4 The Provenance object contains the information describing the simulation as a whole. The Provenance object is defined as ’the description of how the dataset was created’ which for a simulation we are able to describe entirely. Two simulations with the same Provenance parameters are identical. Provenance can be broken down into the Theory, Computation and Parameters.

Victoria, May 2006 Data model: 5

Victoria, May 2006 Implementation of the model The database is at present implemented on a PostgreSQL Linux installation.

Victoria, May 2006 The Archive The archive is made up of: Data Metadata Applications The Data Archive Filesystem (SFS technology, HDF5 files) Storage Resource Broker (SRB) (in progress) HPC systems Federated resource Federated resource SQL Database (PostgreSQL technology) Server Applications (C++, Java) Download, Snap, ConvertToHdf5…

Victoria, May 2006 The Snap Code: overview The Snap code acts on large datafiles on different platforms. Therefore it has been implemented according to the following requirements: Efficiency Robustness Portability Extensibility We have adopted the C++ programming language over the HDF5 format and APIs. It is compiled under Linux (Gnu Compiler) and AIX (xlC compiler) Source HDF5 file Dataset1... Dataset N Snapped HDF5 file Dataset1... Dataset M SNAP service Download

Victoria, May 2006 The Snap Code Input: Data filename Data objects (one or more) Spatial Units Box Center Box Size Output filename Data objects names Output: One ore more HDF5 file with the same descriptive metadata as the original dataset. Goal : select all the data that fall inside a pre-defined region. At present the region can be only rectangular.

Victoria, May 2006 The Snap Code Mesh Based Data : Selection is performed using HDF5 hyperslabs selection functions. Only necessary data are loaded in memory. Selection is extremely fast. Particle Based Data : Particle positions are loaded in memory Particles inside the selected region are identified and their ids are stored (linked list) Other particle based dataset are loaded in memory and the list is used to select target particles Selected particles are written in the file Procedure can become “heavy” Data Geometry and Topology : at present we support regular mesh based data and unstructured data (particles). The data structure is crucial for the Snap implementation features Future upgrades : Support of spherical (or even irregular) regions Support of periodic boundary conditions Parallel implementation

Victoria, May 2006 Access to the Archive (the service) The archive can be accessed in two complementary ways: Via web and web portal Via web service and high level applications Data Archive (data + metadata+apps) WEBWEB SERVICE Web Portal VisIVO User app. 1 User app. 2 TomCat+Axis OGSA-DAI PHP, Java…

Victoria, May 2006 DAL for theorists: Implementation of the SNAP service for the TVO Claudio Gheller, Giuseppe Fiameni InterUniversitary Computing Center.

Similar presentations

Presentation on theme: "Victoria, May 2006 DAL for theorists: Implementation of the SNAP service for the TVO Claudio Gheller, Giuseppe Fiameni InterUniversitary Computing Center."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Victoria, May 2006 DAL for theorists: Implementation of the SNAP service for the TVO Claudio Gheller, Giuseppe Fiameni InterUniversitary Computing Center.

Similar presentations

Presentation on theme: "Victoria, May 2006 DAL for theorists: Implementation of the SNAP service for the TVO Claudio Gheller, Giuseppe Fiameni InterUniversitary Computing Center."— Presentation transcript:

Similar presentations

About project

Feedback