Victoria, May 2006 DAL for theorists: Implementation of the SNAP service for the TVO Claudio Gheller, Giuseppe Fiameni InterUniversitary Computing Center.

Slides:



Advertisements
Similar presentations
Victoria, May Breakout Session III Theory Interest Group Breakout Session III Victoria, May
Advertisements

National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Remote Visualisation System (RVS) By: Anil Chandra.
Pulan Yu School of Informatics Indiana University Bloomington Web service based Varuna.Net.
Database System Concepts and Architecture
Software Frame Simulator (SFS) Technion CS Computer Communications Lab (236340) in cooperation with ECI telecom Uri Ferri & Ynon Cohen January 2007.
Technical Architectures
DCS Architecture Bob Krzaczek. Key Design Requirement Distilled from the DCS Mission statement and the results of the Conceptual Design Review (June 1999):
Connecting HPIO Capabilities with Domain Specific Needs Rob Ross MCS Division Argonne National Laboratory
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Overview of Database Languages and Architectures.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Chapter 2 Database System Concepts and Architecture
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Leicester, February 24, 2005 VisIVO, a VO-Enabled tool for Scientific Visualization and Data Analysis. VO-TECH Project. Stage01 Ugo Becciani INAF – Astrophysical.
Chapter 9 Database Planning, Design, and Administration Sungchul Hong.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
T Network Application Frameworks and XML Web Services and WSDL Sasu Tarkoma Based on slides by Pekka Nikander.
CST203-2 Database Management Systems Lecture 2. One Tier Architecture Eg: In this scenario, a workgroup database is stored in a shared location on a single.
HDF5 A new file format & software for high performance scientific data management.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Edinburgh, January 25, 2005 VisIVO, a VO-Enabled tool for Scientific Visualization and Data Analysis: Overview and Demo 1. Ugo Becciani (OACt): Introduction.
Tunis International Centre for Environmental Technologies Small Seminar on Networking Technology Information Centers UNFCCC secretariat offices Bonn, Germany.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
A Domain-Specific Modeling Language for Scientific Data Composition and Interoperability Hyun ChoUniversity of Alabama at Birmingham Jeff GrayUniversity.
Database System Development Lifecycle 1.  Main components of the Infn System  What is Database System Development Life Cycle (DSDLC)  Phases of the.
Integrated Grid workflow for mesoscale weather modeling and visualization Zhizhin, M., A. Polyakov, D. Medvedev, A. Poyda, S. Berezin Space Research Institute.
The european ITM Task Force data structure F. Imbeaux.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
1 MSCS 237 Overview of web technologies (A specific type of distributed systems)
Garching, April 2007 The Simple Numerical Access Protocol (SNAP) for theoretical data Claudio Gheller CINECA
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition Copyright © 2004 Pearson Education, Inc. Slide 2-1 Data Models Data Model: A set.
Uwe SchindlerGES 2007 – May 2-4, 2007 Data Information Service based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler 1, Benny Bräuer.
State Key Laboratory of Resources and Environmental Information System China Integration of Grid Service and Web Processing Service Gao Ang State Key Laboratory.
May 2003National Coastal Data Development Center Brief Introduction Two components Data Exchange Infrastructure (DEI) Spatial Data Model (SDM) Together,
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
ECE450 - Software Engineering II1 ECE450 – Software Engineering II Today: Introduction to Software Architecture.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
Ohio State University Department of Computer Science and Engineering An Approach for Automatic Data Virtualization Li Weng, Gagan Agrawal et al.
Active Storage Processing in Parallel File Systems Jarek Nieplocha Evan Felix Juan Piernas-Canovas SDM CENTER.
NOVA A Networked Object-Based EnVironment for Analysis “Framework Components for Distributed Computing” Pavel Nevski, Sasha Vanyashin, Torre Wenaus US.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Improving User Access to Metadata for Public and Restricted Use US Federal Statistical Files William C. Block Jeremy Williams Lars Vilhuber Carl Lagoze.
Garching, June 2008 SimDAP Simulation Data Access Protocol Claudio Gheller CINECA
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Aleksandar Drašković Enterprise Architect deroso Solutions GmbH Data shredding: a deep dive into SharePoint 2013 storage architecture.
The HDF Group Introduction to HDF5 Session Two Data Model Comparison HDF5 File Format 1 Copyright © 2010 The HDF Group. All Rights Reserved.
The Database Project a starting work by Arnauld Albert, Cristiano Bozza.
1. Gridded Data Sub-setting Services through the RDA at NCAR Doug Schuster, Steve Worley, Bob Dattore, Dave Stepaniak.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
Chapter 9 Database Planning, Design, and Administration Transparencies © Pearson Education Limited 1995, 2005.
ISC321 Database Systems I Chapter 2: Overview of Database Languages and Architectures Fall 2015 Dr. Abdullah Almutairi.
Servicing Seismic and Oil Reservoir Simulation Data through Grid Data Services Sivaramakrishnan Narayanan, Tahsin Kurc, Umit Catalyurek and Joel Saltz.
Databases (CS507) CHAPTER 2.
Chapter 2 Database System Concepts and Architecture
Database System Concepts and Architecture
The Client/Server Database Environment
T Network Application Frameworks and XML Web Services and WSDL Sasu Tarkoma Based on slides by Pekka Nikander.
Introduction to HDF5 Session Five Reading & Writing Raw Data Values
Unit – 5 JAVA Web Services
Introduction to J2EE Architecture
Julia Powell Coast Survey Development Laboratory
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Chapter 2 Database Environment.
Introduction to Apache
Database Systems Instructor Name: Lecture-3.
Web Application Development Using PHP
Presentation transcript:

Victoria, May 2006 DAL for theorists: Implementation of the SNAP service for the TVO Claudio Gheller, Giuseppe Fiameni InterUniversitary Computing Center CINECA, Bologna Ugo Becciani, Alessandro Costa Astrophysical Observatory of Catania

Victoria, May 2006 The Simple Numerical Access Protocol Service The Snap service extracts or "cuts out" rectangular (spherical or even irregular) regions of some larger theory dataset, returning a subset of the requested size to the client. Snap basic components: DATA SNAP code SERVICE

Victoria, May Data and Data Model In order to analyze the needs of data produced by numerical simulations, we have considered a wide spectrum of applications: Particle based Cosmological simulations Grid based Cosmological simulations Magnatohydrodynamics simulations Planck mission simulated data... (thanks to V. Antonuccio, G. Bodo, S. Borgani, N. Lanza, L. Tornatore) At the moment, we consider only RAW data

Victoria, May Data In general, data produced by numerical simulations are Large (GB to TB scale) Monolithic (few files contains plenty of data) Uncompressible Non standard (propretary formats are the rule) Non portable (depend from simulation machine) No (or few) annotations – metadata Heterogeneous in units (often code units)

Victoria, May 2006 Data: the HDF5 format HDF5 ( represents a possible solution to deal with such data HDF5 is Portable between most of modern platform High performance Well supported Well documented Rich of tools HDF5 data files are Platform independent (portable) Well organized Self defined Metadata enriched Efficiently accessible HDF5 drawbacks Requires some expertise and skill to be used Information are difficult to access Can be subject to major library changes (see HDF4 to HDF5)

Victoria, May 2006 Data: our HDF5 implementation Each file represents an output time The structure is simple: all the data objects are at the root level: /BmMassDensityDataset {512, 512, 512} /BmTemperatureDataset {512, 512, 512} /BmVelocityDataset {512, 512, 512, 3} /DmMassDensityDataset {512, 512, 512} /DmPositionDataset { , 3} /DmVelocityDataset { , 3} HDF5 metadata make the file completely self-consistent Structural metadata (strictly required from the library) rank Dimensionality Annotation metadata (required from our implementation) Data object name Data object description Unit Formula Data objects (at the moment) can be: Structured grid: rank 4 (scalars or vectors) Unstructured points: rank 2 (scalar or vectors)

Victoria, May Data Model A Simulation Data Model (hereafter DM) is a way of describing an astrophysical simulation. The DM is a way to provide a conceptual, logical and interoperable description of a simulation. It is not tied to the way data providers internally store, describe, manage or organize their archives. We propose a data model for simulation data providing a general architecture which encompasses any kind of simulations. The architecture is hierarchical The root node of the hierarchy (Level 0), our basic object, is the Simulation (Level 0 of hierarchy). DM must characterize completely the Simulation.

Victoria, May 2006 Data Model schema

Victoria, May 2006 Data Model: 1 Starting from the Observation model and according to pub1, also for Simulations we can define three main classes: Data, Characterization and Provenance. These three classes, which compose the ‘ Level 1 ’ of the DM, are further specified in subclasses in a hierarchical pattern.

Victoria, May 2006 Data Model: 2 Raw data are annotated by a number of Level 2 metadata. The purpose of these metadata is to keep track of a file location and its format characteristics. In particular:  Link to the physical location of the file in the filesystem  File format  … Details about its content are managed by the Characterization class.

Victoria, May 2006 Data model: 3

Victoria, May 2006 Data Model: 4 The Provenance object contains the information describing the simulation as a whole. The Provenance object is defined as ’the description of how the dataset was created’ which for a simulation we are able to describe entirely. Two simulations with the same Provenance parameters are identical. Provenance can be broken down into the Theory, Computation and Parameters.

Victoria, May 2006 Data model: 5

Victoria, May 2006 Implementation of the model The database is at present implemented on a PostgreSQL Linux installation.

Victoria, May 2006 The Archive The archive is made up of: Data Metadata Applications The Data Archive Filesystem (SFS technology, HDF5 files) Storage Resource Broker (SRB) (in progress) HPC systems Federated resource Federated resource SQL Database (PostgreSQL technology) Server Applications (C++, Java) Download, Snap, ConvertToHdf5…

Victoria, May 2006 The Snap Code: overview The Snap code acts on large datafiles on different platforms. Therefore it has been implemented according to the following requirements: Efficiency Robustness Portability Extensibility We have adopted the C++ programming language over the HDF5 format and APIs. It is compiled under Linux (Gnu Compiler) and AIX (xlC compiler) Source HDF5 file Dataset1... Dataset N Snapped HDF5 file Dataset1... Dataset M SNAP service Download

Victoria, May 2006 The Snap Code Input: Data filename Data objects (one or more) Spatial Units Box Center Box Size Output filename Data objects names Output: One ore more HDF5 file with the same descriptive metadata as the original dataset. Goal : select all the data that fall inside a pre-defined region. At present the region can be only rectangular.

Victoria, May 2006 The Snap Code Mesh Based Data : Selection is performed using HDF5 hyperslabs selection functions. Only necessary data are loaded in memory. Selection is extremely fast. Particle Based Data : Particle positions are loaded in memory Particles inside the selected region are identified and their ids are stored (linked list) Other particle based dataset are loaded in memory and the list is used to select target particles Selected particles are written in the file Procedure can become “heavy” Data Geometry and Topology : at present we support regular mesh based data and unstructured data (particles). The data structure is crucial for the Snap implementation features Future upgrades : Support of spherical (or even irregular) regions Support of periodic boundary conditions Parallel implementation

Victoria, May 2006 Access to the Archive (the service) The archive can be accessed in two complementary ways: Via web and web portal Via web service and high level applications Data Archive (data + metadata+apps) WEBWEB SERVICE Web Portal VisIVO User app. 1 User app. 2 TomCat+Axis OGSA-DAI PHP, Java…