Object Persistency & Data Handling Session C - Summary Object Persistency & Data Handling Session C - Summary Dirk Duellmann.

Object Persistency & Data Handling Session C - Summary Object Persistency & Data Handling Session C - Summary Dirk Duellmann

CHEP 2000, PadovaSession C SummaryDirk Duellmann

CHEP 2000, PadovaSession C SummaryDirk Duellmann Espresso Feasibility Study n We identified solutions for most critical components of a scalable and performant ODBMS –Prototype implementation shows promising performance and scalability –Using a strict component approach allows to split the effort into independently developed, replaceable modules. n The development of an Open Source ODBMS seems possible within the HEP or general science community n A collaborative effort of the order of 15 person years seems sufficient to produce such a system with production quality

CHEP 2000, PadovaSession C SummaryDirk Duellmann HERA n ZEUS –Objectivity based TagDB in production –significant performance gain for event selection n H1 –H1 will move to an analysis and event display framework based on ROOT –DST and micro-DST (based on BOS & PAW) will be replaced by analysis objects stored in ROOT trees n HERA-B –Conditions database based on Berkeley DB –ROOT current being integrated

CHEP 2000, PadovaSession C SummaryDirk Duellmann Files + Metadata Approach n RHIC –STAR moved from Objectivity to ROOT I/O n ROOT files for event data n file catalogue implemented using mySQL –PHENIX n ROOT files for event data n Objectivity/DB for conditions, configuration and file catalogue n Fermilab Run II –CDF n ROOT for event data n file catalogue stored in Oracle –D0 n D0OM for event data n Metadata based on Oracle

CHEP 2000, PadovaSession C SummaryDirk Duellmann Sequential Access Model n Integrated information about –Tape volumes –File catalogue –Runs –Event properties –Trigger configuration n Uses Enstore as MSS –1.5TB on Mammoth tapes (1.TB/day peak) n Being used –to store Monte Carlo data –D0 analysis tasks

CHEP 2000, PadovaSession C SummaryDirk Duellmann Mass Storage Systems n The CASTOR project at CERN has moved into production –staging system backward compatible with SHIFT with additional HSM functionality –main client will be COMPASS (@ 35MB/s) –planned ALICE Mock Data Challenge to prove feasibility of 100MB/s over one week n EUROSTORE - Esprit project over the last 2 years –Parallel Filesystem (QSW) + HSM (DESY) –Prototype installation & testing (CERN) –Operational system has been demonstrated –Follow-on proposal has been submitted with the aim to provide fully tested product including a LINUX port –Deployment at DESY foreseen for end 2000

CHEP 2000, PadovaSession C SummaryDirk Duellmann Language Binding & Insulation n Language Support - at least for C++ –and JAVA –or (in some cases) FORTRAN Trade-off between Trade-off between Risk for Experiment Code - insulation against change of persistency solution Risk for Experiment Code - insulation against change of persistency solution Maintainability - additional manual work and many additional classes Maintainability - additional manual work and many additional classes Transparency for End Users - as simple to use as transient data Transparency for End Users - as simple to use as transient data Flexibility - more than one storage solution at the same time - implement workable schema evolution Flexibility - more than one storage solution at the same time - implement workable schema evolution Performance - e.g. is I/O on demand needed ? - if yes, what is the right granularity: One object? One event? Performance - e.g. is I/O on demand needed ? - if yes, what is the right granularity: One object? One event?

CHEP 2000, PadovaSession C SummaryDirk Duellmann Language Binding & Insulation n Two main approaches are used –In place access of persistent objects n The framework is implemented using language binding n Only C++ pointers to persistent data are exposed to user (CMS, STAR, PHENIX) –Access of transient copies n Complete conversion into transient objects (BaBar) n On demand conversion into transient objects using smart pointers (LHCb) n Experiment specific insulation layer –usually coupled to a specific application framework n In both cases: split into two interfaces –framework uses more flexible, performant, exposed lower level –end user uses more insulated, transparent, customised higher level n Is the mapping layer in between really experiment specific?

CHEP 2000, PadovaSession C SummaryDirk Duellmann Schema Evolution & Object Conversion n BaBar - Objectivity/DB –presented conversion scheme using their transient/persistent mapping scheme n Star - ROOT –implemented an additional conversion mechanism which replaces the user schema evolution provided by ROOT n CLEO III - Objectivity/DB –implements a system based on opaque data objects stored in Objectivity n Is a experiment independent implementation of schema evolution possible?

CHEP 2000, PadovaSession C SummaryDirk Duellmann From Data Storage to Data Management n Consistent management of a distributed data store –needs knowledge about semantics of the data –which files belong to one event collection, run, calibration period n they should be discarded together n staged together n exported together n Strong coupling to system details n which application logic n batch system n mass storage system n Significantly larger functionality & complexity –significant development effort

CHEP 2000, PadovaSession C SummaryDirk Duellmann Performance Optimisation of Complex Storage Systems n Successful system optimisation requires correlated diagnostics on all levels –Mass Storage System n number of mounted tapes, file lifetime in disk pool –Data Server n I/O per server, per filesystem, per network interface –Lock Server n number of locks, number of waiting processes, locked resources –Client Host n I/O per client, per filesystem, per machine, total CPU usage n number of running processes –Client Application n number of used objects, containers and databases, transaction timing n regular profiling runs n All system components need monitoring instrumentation –understanding of chaotic areas like analysis servers is definitely non-trivial

CHEP 2000, PadovaSession C SummaryDirk Duellmann Transactions & Recovery n Are transactions needed to allow fail safe concurrent access? –Is it cheaper/easier to work in the old (manual) way? n With sequential recovery: just throw away the last file, the last group of files, change some meta data … n Application level consistency checks? –IT industry seems to have a different opinion n Use transactions to enforce consistency between the different parts of the store n Is the recovery of our data and meta data really that much simpler? n How does one integrate transactions in mutiple storage systems ? n The production experience of the next generation of experiments will tell us more.

CHEP 2000, PadovaSession C SummaryDirk Duellmann Summary of the Summary n Significant progress in providing object persistency for a real life experiment –BaBar successfully went into production with an ODBMS based store n Management of complex systems is a significant effort –Solutions for schema evolution, insulation layers, data import/export have been developed for specific experiment frameworks –Can some of those solutions be generalised? n Still open questions –direct use of persistent objects or converted copies ? –single ODBMS system or files + metadata in an RDBMS ? n More experience needed from running experiments –RHIC and Fermilab runII experiments will soon be able to tell us more

Object Persistency & Data Handling Session C - Summary Object Persistency & Data Handling Session C - Summary Dirk Duellmann.

Similar presentations

Presentation on theme: "Object Persistency & Data Handling Session C - Summary Object Persistency & Data Handling Session C - Summary Dirk Duellmann."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Object Persistency & Data Handling Session C - Summary Object Persistency & Data Handling Session C - Summary Dirk Duellmann.

Similar presentations

Presentation on theme: "Object Persistency & Data Handling Session C - Summary Object Persistency & Data Handling Session C - Summary Dirk Duellmann."— Presentation transcript:

Similar presentations

About project

Feedback