Object Persistency & Data Handling Session C - Summary Object Persistency & Data Handling Session C - Summary Dirk Duellmann.

Slides:



Advertisements
Similar presentations
Phillip Dickens, Department of Computer Science, University of Maine. In collaboration with Jeremy Logan, Postdoctoral Research Associate, ORNL. Improving.
Advertisements

Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.
CASTOR Project Status CASTOR Project Status CERNIT-PDP/DM February 2000.
D. Düllmann - IT/DB LCG - POOL Project1 POOL Release Plan for 2003 Dirk Düllmann LCG Application Area Meeting, 5 th March 2003.
O. Stézowski IPN Lyon AGATA Week September 2003 Legnaro Data Analysis – Team #3 ROOT as a framework for AGATA.
Objectivity Data Migration Marcin Nowak, CERN Database Group, CHEP 2003 March , La Jolla, California.
Chapter 2 Database Environment Pearson Education © 2014.
F Fermilab Database Experience in Run II Fermilab Run II Database Requirements Online databases are maintained at each experiment and are critical for.
Web Application Architecture: multi-tier (2-tier, 3-tier) & mvc
GRID job tracking and monitoring Dmitry Rogozin Laboratory of Particle Physics, JINR 07/08/ /09/2006.
Framework for Automated Builds Natalia Ratnikova CHEP’03.
Database Architecture Introduction to Databases. The Nature of Data Un-structured Semi-structured Structured.
Lecture On Introduction (DBMS) By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Nick Brook Current status Future Collaboration Plans Future UK plans.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
David N. Brown Lawrence Berkeley National Lab Representing the BaBar Collaboration The BaBar Mini  BaBar  BaBar’s Data Formats  Design of the Mini 
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
Grand Challenge and PHENIX Report post-MDC2 studies of GC software –feasibility for day-1 expectations of data model –simple robustness tests –Comparisons.
CERN Physics Database Services and Plans Maria Girone, CERN-IT
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge.
1 Database Management Systems (DBMS). 2 Database Management Systems (DBMS) n Overview of: ä Database Management Components ä Database Systems Architecture.
Bayu Adhi Tama, M.T.I 1 © Pearson Education Limited 1995, 2005.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
Test Results of the EuroStore Mass Storage System Ingo Augustin CERNIT-PDP/DM Padova.
Lee Lueking 1 The Sequential Access Model for Run II Data Management and Delivery Lee Lueking, Frank Nagy, Heidi Schellman, Igor Terekhov, Julie Trumbo,
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
STAR Event data storage and management in STAR V. Perevoztchikov Brookhaven National Laboratory,USA.
CERN-IT Oracle Database Physics Services Maria Girone, IT-DB 13 December 2004.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
STAR C OMPUTING Plans for Production Use of Grand Challenge Software in STAR Torre Wenaus BNL Grand Challenge Meeting LBNL 10/23/98.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
D. Duellmann - IT/DB LCG - POOL Project1 The LCG Pool Project and ROOT I/O Dirk Duellmann What is Pool? Component Breakdown Status and Plans.
Some Ideas for a Revised Requirement List Dirk Duellmann.
Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 24 January 2005.
ESG-CET Meeting, Boulder, CO, April 2008 Gateway Implementation 4/30/2008.
10 May 2001WP6 Testbed Meeting1 WP5 - Mass Storage Management Jean-Philippe Baud PDP/IT/CERN.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
D. Duellmann - IT/DB LCG - POOL Project1 The LCG Dictionary and POOL Dirk Duellmann.
Overview of C/C++ DB APIs Dirk Düllmann, IT-ADC Database Workshop for LHC developers 27 January, 2005.
CASTOR project status CASTOR project status CERNIT-PDP/DM October 1999.
Dynamic staging to a CAF cluster Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
15 March 2000Manuel Delfino / CERN IT Division / Mass Storage Management1 Mass Storage Management improvised report for LHC Computing Review Software Panel.
VI/ CERN Dec 4 CMS Software Architecture vs Hybrid Store Vincenzo Innocente CMS Week CERN, Dec
A Data Handling System for Modern and Future Fermilab Experiments Robert Illingworth Fermilab Scientific Computing Division.
CMS Experience with the Common Analysis Framework I. Fisk & M. Girone Experience in CMS with the Common Analysis Framework Ian Fisk & Maria Girone 1.
CDF SAM Deployment Status Doug Benjamin Duke University (for the CDF Data Handling Group)
POOL Based CMS Framework Bill Tanenbaum US-CMS/Fermilab 04/June/2003.
Hall D Computing Facilities Ian Bird 16 March 2001.
CTA: CERN Tape Archive Rationale, Architecture and Status
Databases and DBMSs Todd S. Bacastow January 2005.
(on behalf of the POOL team)
CMS High Level Trigger Configuration Management
IT-DB Physics Services Planning for LHC start-up
Open Source distributed document DB for an enterprise
The COMPASS event store in 2002
POOL persistency framework for LHC
Dirk Düllmann CERN Openlab storage workshop 17th March 2003
CTA: CERN Tape Archive Overview and architecture
Chapter 2 Database Environment Pearson Education © 2009.
POOL/RLS Experience Current CMS Data Challenges shows clear problems wrt to the use of RLS Partially due to the normal “learning curve” on all sides in.
Data, Databases, and DBMSs
Grid Data Integration In the CMS Experiment
Introduction to Databases Transparencies
Database Environment Transparencies
OO-Design in PHENIX PHENIX, a BIG Collaboration A Liberal Data Model
Event Storage GAUDI - Data access/storage Framework related issues
Presentation transcript:

Object Persistency & Data Handling Session C - Summary Object Persistency & Data Handling Session C - Summary Dirk Duellmann

CHEP 2000, PadovaSession C SummaryDirk Duellmann

CHEP 2000, PadovaSession C SummaryDirk Duellmann

CHEP 2000, PadovaSession C SummaryDirk Duellmann Espresso Feasibility Study n We identified solutions for most critical components of a scalable and performant ODBMS –Prototype implementation shows promising performance and scalability –Using a strict component approach allows to split the effort into independently developed, replaceable modules. n The development of an Open Source ODBMS seems possible within the HEP or general science community n A collaborative effort of the order of 15 person years seems sufficient to produce such a system with production quality

CHEP 2000, PadovaSession C SummaryDirk Duellmann

CHEP 2000, PadovaSession C SummaryDirk Duellmann

CHEP 2000, PadovaSession C SummaryDirk Duellmann

CHEP 2000, PadovaSession C SummaryDirk Duellmann HERA n ZEUS –Objectivity based TagDB in production –significant performance gain for event selection n H1 –H1 will move to an analysis and event display framework based on ROOT –DST and micro-DST (based on BOS & PAW) will be replaced by analysis objects stored in ROOT trees n HERA-B –Conditions database based on Berkeley DB –ROOT current being integrated

CHEP 2000, PadovaSession C SummaryDirk Duellmann Files + Metadata Approach n RHIC –STAR moved from Objectivity to ROOT I/O n ROOT files for event data n file catalogue implemented using mySQL –PHENIX n ROOT files for event data n Objectivity/DB for conditions, configuration and file catalogue n Fermilab Run II –CDF n ROOT for event data n file catalogue stored in Oracle –D0 n D0OM for event data n Metadata based on Oracle

CHEP 2000, PadovaSession C SummaryDirk Duellmann Sequential Access Model n Integrated information about –Tape volumes –File catalogue –Runs –Event properties –Trigger configuration n Uses Enstore as MSS –1.5TB on Mammoth tapes (1.TB/day peak) n Being used –to store Monte Carlo data –D0 analysis tasks

CHEP 2000, PadovaSession C SummaryDirk Duellmann

CHEP 2000, PadovaSession C SummaryDirk Duellmann

CHEP 2000, PadovaSession C SummaryDirk Duellmann

CHEP 2000, PadovaSession C SummaryDirk Duellmann Mass Storage Systems n The CASTOR project at CERN has moved into production –staging system backward compatible with SHIFT with additional HSM functionality –main client will be COMPASS 35MB/s) –planned ALICE Mock Data Challenge to prove feasibility of 100MB/s over one week n EUROSTORE - Esprit project over the last 2 years –Parallel Filesystem (QSW) + HSM (DESY) –Prototype installation & testing (CERN) –Operational system has been demonstrated –Follow-on proposal has been submitted with the aim to provide fully tested product including a LINUX port –Deployment at DESY foreseen for end 2000

CHEP 2000, PadovaSession C SummaryDirk Duellmann

CHEP 2000, PadovaSession C SummaryDirk Duellmann

CHEP 2000, PadovaSession C SummaryDirk Duellmann

CHEP 2000, PadovaSession C SummaryDirk Duellmann

CHEP 2000, PadovaSession C SummaryDirk Duellmann

CHEP 2000, PadovaSession C SummaryDirk Duellmann

CHEP 2000, PadovaSession C SummaryDirk Duellmann

CHEP 2000, PadovaSession C SummaryDirk Duellmann

CHEP 2000, PadovaSession C SummaryDirk Duellmann

CHEP 2000, PadovaSession C SummaryDirk Duellmann

CHEP 2000, PadovaSession C SummaryDirk Duellmann Language Binding & Insulation n Language Support - at least for C++ –and JAVA –or (in some cases) FORTRAN Trade-off between Trade-off between Risk for Experiment Code - insulation against change of persistency solution Risk for Experiment Code - insulation against change of persistency solution Maintainability - additional manual work and many additional classes Maintainability - additional manual work and many additional classes Transparency for End Users - as simple to use as transient data Transparency for End Users - as simple to use as transient data Flexibility - more than one storage solution at the same time - implement workable schema evolution Flexibility - more than one storage solution at the same time - implement workable schema evolution Performance - e.g. is I/O on demand needed ? - if yes, what is the right granularity: One object? One event? Performance - e.g. is I/O on demand needed ? - if yes, what is the right granularity: One object? One event?

CHEP 2000, PadovaSession C SummaryDirk Duellmann

CHEP 2000, PadovaSession C SummaryDirk Duellmann

CHEP 2000, PadovaSession C SummaryDirk Duellmann

CHEP 2000, PadovaSession C SummaryDirk Duellmann

CHEP 2000, PadovaSession C SummaryDirk Duellmann

CHEP 2000, PadovaSession C SummaryDirk Duellmann Language Binding & Insulation n Two main approaches are used –In place access of persistent objects n The framework is implemented using language binding n Only C++ pointers to persistent data are exposed to user (CMS, STAR, PHENIX) –Access of transient copies n Complete conversion into transient objects (BaBar) n On demand conversion into transient objects using smart pointers (LHCb) n Experiment specific insulation layer –usually coupled to a specific application framework n In both cases: split into two interfaces –framework uses more flexible, performant, exposed lower level –end user uses more insulated, transparent, customised higher level n Is the mapping layer in between really experiment specific?

CHEP 2000, PadovaSession C SummaryDirk Duellmann Schema Evolution & Object Conversion n BaBar - Objectivity/DB –presented conversion scheme using their transient/persistent mapping scheme n Star - ROOT –implemented an additional conversion mechanism which replaces the user schema evolution provided by ROOT n CLEO III - Objectivity/DB –implements a system based on opaque data objects stored in Objectivity n Is a experiment independent implementation of schema evolution possible?

CHEP 2000, PadovaSession C SummaryDirk Duellmann From Data Storage to Data Management n Consistent management of a distributed data store –needs knowledge about semantics of the data –which files belong to one event collection, run, calibration period n they should be discarded together n staged together n exported together n Strong coupling to system details n which application logic n batch system n mass storage system n Significantly larger functionality & complexity –significant development effort

CHEP 2000, PadovaSession C SummaryDirk Duellmann Performance Optimisation of Complex Storage Systems n Successful system optimisation requires correlated diagnostics on all levels –Mass Storage System n number of mounted tapes, file lifetime in disk pool –Data Server n I/O per server, per filesystem, per network interface –Lock Server n number of locks, number of waiting processes, locked resources –Client Host n I/O per client, per filesystem, per machine, total CPU usage n number of running processes –Client Application n number of used objects, containers and databases, transaction timing n regular profiling runs n All system components need monitoring instrumentation –understanding of chaotic areas like analysis servers is definitely non-trivial

CHEP 2000, PadovaSession C SummaryDirk Duellmann Transactions & Recovery n Are transactions needed to allow fail safe concurrent access? –Is it cheaper/easier to work in the old (manual) way? n With sequential recovery: just throw away the last file, the last group of files, change some meta data … n Application level consistency checks? –IT industry seems to have a different opinion n Use transactions to enforce consistency between the different parts of the store n Is the recovery of our data and meta data really that much simpler? n How does one integrate transactions in mutiple storage systems ? n The production experience of the next generation of experiments will tell us more.

CHEP 2000, PadovaSession C SummaryDirk Duellmann Summary of the Summary n Significant progress in providing object persistency for a real life experiment –BaBar successfully went into production with an ODBMS based store n Management of complex systems is a significant effort –Solutions for schema evolution, insulation layers, data import/export have been developed for specific experiment frameworks –Can some of those solutions be generalised? n Still open questions –direct use of persistent objects or converted copies ? –single ODBMS system or files + metadata in an RDBMS ? n More experience needed from running experiments –RHIC and Fermilab runII experiments will soon be able to tell us more