Presentation is loading. Please wait.

Presentation is loading. Please wait.

KANGA: ROOT Access to BABAR Data for Physics Analysis David Kirkby, UC Irvine for the BABAR Computing Group CHEP ‘03 - Data Management & Persistency 25.

Similar presentations


Presentation on theme: "KANGA: ROOT Access to BABAR Data for Physics Analysis David Kirkby, UC Irvine for the BABAR Computing Group CHEP ‘03 - Data Management & Persistency 25."— Presentation transcript:

1 KANGA: ROOT Access to BABAR Data for Physics Analysis David Kirkby, UC Irvine for the BABAR Computing Group CHEP ‘03 - Data Management & Persistency 25 March 2003 Primary Reference: T.J.Adye, A.Dorigo, R.Dubitzky, A.Forti, S.J.Gowdy, G.Hamel de Monchenault, R.G.Jacobsen, D.Kirkby, S.Kluth, E.Leonardi, A.Salnikov, L.Wilden, Comp. Phys. Comm. 150, p.197-214 (2003).

2 ROOT Access to BABAR Data, D. Kirkby, CHEP 032 The BABAR Experiment The BABAR experiment records e+e- collisions at the SLAC PEP-II collider. BABAR has ~600 collaborators from 77 institutions in 10 countries. Approximately half are from US institutions.

3 ROOT Access to BABAR Data, D. Kirkby, CHEP 033 The BABAR Detector The BABAR detector has ~200k channels read out at ~100 Hz into a typical raw-data event size of 25kB. The experiment wrote ~300 TB to tape for the ~40/fb recorded during 2001, with ~10 TB kept on disk at SLAC. Projected luminosity increases will deliver an integrated ~500/fb by end of 2006.

4 ROOT Access to BABAR Data, D. Kirkby, CHEP 034 BABAR Physics Analysis and Data Access BABAR has published ~36 physics papers since Feb 2001. The typical physics analysis only needs access to a “micro-DST” for sparse subsets of data and Monte Carlo. Raw/Simulated hit data Reconstructed data Event summary data Analysis objs. Tag Monte Carlo truth data “Micro-DST” (incl. truth subset) Until 1999, data stored exclusively in an Objy. Database (now >750TB). No longer keeping Raw, Sim & Reco. ~0.7kB/evt ~3.0kB/evt ~8.5kB/evt ~120kB/evt ~53kB/evt ~15kB/evt

5 ROOT Access to BABAR Data, D. Kirkby, CHEP 035 BABAR Analysis Framework BABAR analysis uses a standard software framework: Begin/NextEvent/Finalize transitions. Each transition is passed through a sequence of execution “modules” with common base class. Special modules handle data I/O and the conversion between persistent & transient obj. representations. User modules deal only with transient object representations. Data access is handled differently for event- and non- event (“conditions”) sources. This framework design completely decouples the reconstruction and analysis code from the data store technology, at some cost in performance.

6 ROOT Access to BABAR Data, D. Kirkby, CHEP 036 Motivation for KANGA An Aug 99 review of BABAR Computing examined challenges involved in producing first physics results under conference deadline pressure. Access to data, both at SLAC and at remote sites, was identified as a critical bottleneck in physics analysis. Objectivity (Objy) performance problems recognized as weakness of computing model at the time. In particular, the limitations imposed by large files (~2Gb for analysis data), and poor lock-server scaling with many (~100) clients. Review committee recommended that BABAR develop a “limited-function short-to-medium term solution”…

7 ROOT Access to BABAR Data, D. Kirkby, CHEP 037 KANGA Design Requirements This recommendation led to the following design requirements: 1. Access to the identical micro-DST data available from Objy. No support for access to lower-level data. 2. Compatible with existing framework and user analysis code. Changes almost transparent to analysis users (relink required). 3. Fast event filtering using simple “attributes” (TAG) data. 4. Simple and efficient distribution of data to remote (non-SLAC) sites.

8 ROOT Access to BABAR Data, D. Kirkby, CHEP 038 The Implementation: KANGA (ROO) Kind ANd Gentle Analysis (without Relying On Objectivity) The key technical decision was to use ROOT objects and files for persistent data store. In general, there are many tradeoffs involved in the Objy/ROOT decision. Our decision was made in the context of a limited- function, short-term solution that would enhance the capabilities of a continuing Objy data store, and that could be completed quickly. KANGA was implemented and deployed in ~4 months by a small (~5) team in 1999.

9 ROOT Access to BABAR Data, D. Kirkby, CHEP 039 Event Data: Overview KANGA event data is stored in ROOT TTree objects. Each branch represents a small set of persistent classes with one branch instance per event. Events from one run are usually grouped into a single file containing 2 trees (Analysis objs, Tag attributes). Typical size is ~1.7 kB for data (21.6 GB per /fb) and 4.7kB for Monte Carlo. Tag attributes are stored as built-in types. class-1class-nattr-1attr-m … … Analysis Objs Tag attributes KANGA file (~10 6 of these now)

10 ROOT Access to BABAR Data, D. Kirkby, CHEP 0310 Event Data: Architecture BABAR event data I/O is managed by special-purpose framework execution modules. Only those modules dealing directly with persistent analysis objects and Tag attributes were re-implemented for KANGA. Input Module Analysis Module … Input Module Reco. Module Output Module Reco. Module … RAW  DST A significant factor in the rapid deployment of KANGA was the earlier design decision to completely decouple the event store technology from the analysis framework.

11 ROOT Access to BABAR Data, D. Kirkby, CHEP 0311 Event Data: Attribute Tags The design requirement of fast selection on a sparse set of event attributes (total energy, # of muons, etc) required a small compromise in the persistent/transient decoupling to gain improved efficiency. Instead of converting attributes, use “adapter pattern” to implement transient interface directly in terms of persistent objects. This compromise ties transient class directly to ROOT persistent class, but without exposing persistent class to user code.

12 ROOT Access to BABAR Data, D. Kirkby, CHEP 0312 Event Data: Object References Direct references (eg, by pointer) between transient classes require special handling to be persisted. Implemented general mechanism to support persistence of references between transient objects valid in a single execution context. In practice, this limits references to be within an event and does not support inter-event references. BABAR transient classes do not use direct references, and rely instead on indirect indexing. So this feature is not currently being exploited.

13 ROOT Access to BABAR Data, D. Kirkby, CHEP 0313 Event Data: Schema Evolution “Schema” describes the organization of data in a persistent object. Schema evolution is desirable to support improvements in data representation and pruning of obsolete data. ROOT I/O supports schema evolution for TObject subclasses via user-managed version numbers for each persistent class that are used to dispatch appropriate input-streamer code at obj-read time. KANGA additionally requires updated classes to implement a standard (frozen) interface for persistent->transient conversion.

14 ROOT Access to BABAR Data, D. Kirkby, CHEP 0314 After schema evolution, only new objects are written by new code. New and existing code must be linked against all versions of persistent classes. No change required to user modules. Rev.1 Modules Rev.1 Modules Rev.2 Modules Rev.2 Before: After:

15 ROOT Access to BABAR Data, D. Kirkby, CHEP 0315 Conditions Data: Overview Non-event data tracks slowly-varying (<1 Hz) data-taking conditions, e.g. high-voltages, gas flows, temperatures. Calibration results are also considered “conditions”. Conditions data is accessed using time as a key, unlike event data. The full BABAR conditions DB is implemented in Objy and supports a flexible revision mechanism.

16 ROOT Access to BABAR Data, D. Kirkby, CHEP 0316 Kanga Conditions Data KANGA supports access to the limited set of conditions needed for typical physics analysis. Access is read-only and limited to a single revision. The most recent revision of specific conditions are automatically extracted from Objy and stored in a single ROOT file of ~20Mb. Use separate files for data, MC. ROOT persistent implementation uses a binary tree (BTree class) for efficient time-key lookup with 1s resolution. Correct association of event- and non-event ROOT files requires some non-trivial bookkeeping.

17 ROOT Access to BABAR Data, D. Kirkby, CHEP 0317 Event Collections Physics analysis typically involves analyzing sparse subsets of the events in a data file, but different analyses require different subsets. Sparse collections used for analysis are grouped into ~100 “skims”. Skims were initially written using self-contained copies of each event. Grouping correlated skims into ~20 “streams” limited event-duplication overhead to ~200%. More recently, pointer-based collections were implemented. These are more efficient for bulk storage and distribution, but carry additional book-keeping overhead. Now moving in this direction.

18 ROOT Access to BABAR Data, D. Kirkby, CHEP 0318 KANGA Book-keeping & Production The set of available KANGA event-data files and their processing history is tracked in a relational DB managed with perl scripts (“SkimTools” package). This DB is used to schedule and monitor jobs for producing KANGA files from Objy (as well as physics skims from unfiltered data and MC). Users can query this database to prepare a TCL fragment that configures their analysis job to analyze a dataset. Size of DB is ~400Mb. Tables and scripts are compatible with Oracle and MySQL.

19 ROOT Access to BABAR Data, D. Kirkby, CHEP 0319 Data Export Straightforward and efficient data export was a primary requirement of the KANGA design. Goals: - only transfer files that are new (once created, a file is assumed to never change) - mirror SLAC filesystem layout to simplify logical-to- physical name mapping between sites. Initial implementation based on rsync was not efficient for typical directories containing O(1000) files. Present implementation uses the relational DB to efficiently generate lists of new files to transfer.

20 ROOT Access to BABAR Data, D. Kirkby, CHEP 0320 Experience and Outlook Since May 2002, the primary KANGA event store is based at Rutherford (RAL). RAL currently stores 22 TB of data and Monte Carlo (~8B events) in 1.1M files. A survey in early 2002 found that at least 19 institutions operated a local KANGA event store, including 5 with the majority of data available. Head-to-head comparisons of analysis results obtained with Kanga and Objy provide valuable QA tool.

21 ROOT Access to BABAR Data, D. Kirkby, CHEP 0321 Although conceived as a short-term solution, KANGA is still with us 3 years later. Burden of duplicated support and storage is becoming unsustainable. BABAR is now implementing a new Computing Model in which ROOT is the primary event store technology. This migration involves the eventual complete phase out of Objectivity from the event store, and possible significant changes to the original KANGA design to support other features of the new Computing Model.


Download ppt "KANGA: ROOT Access to BABAR Data for Physics Analysis David Kirkby, UC Irvine for the BABAR Computing Group CHEP ‘03 - Data Management & Persistency 25."

Similar presentations


Ads by Google