ATLAS Databases: An Overview, Athena use of Geometry/Conditions DB, and Conditions Metadata Elizabeth Gallas - Oxford ATLAS-UK Distributed Computing Tutorial.

1 ATLAS Databases: An Overview, Athena use of Geometry/Conditions DB, and Conditions Metadata Elizabeth Gallas - Oxford ATLAS-UK Distributed Computing Tutorial Edinburgh, UK – March 21-22, 2011

2 Mar 2011Elizabeth Gallas - Databases/COMA2 Outline  Motivation: Databases  Overview of ATLAS Databases  Databases of Athena-based analysis interest  Geometry Database  Conditions Database And how they are made accessible on the grid  COMA (Conditions Metadata)  Selected/derived Run/Lb-wise Conditions/Configuration  in relational format  Data Periods in COMA  Other COMA reports  Summary and Conclusions

3 Mar 2011Elizabeth Gallas - Databases/COMA3 Motivation: Database use in ATLAS  ATLAS “data” – falls into 2 broad categories  Event-wise data: stored in files (RAW, ESD, AOD, TAG …)  Know something about themselves but also have some ‘metadata’ pointers to the bigger picture  Non-event-wise data: Stored in Databases  Enable construction of the ‘bigger picture’  Important information needed at our fingertips  Usually by diverse clients  Data Base Management Systems (DBMS) provide:  persistent storage  for large/small collections of data of varied complexity  in data structures that provide access flexibility  powerful query language  for data entry, modification and retrieval  transaction management  appearance of isolation  but provides multi-user simultaneous access

4 Mar 2011Elizabeth Gallas - Databases/COMA4 Overview – Oracle usage in ATLAS Oracle is used extensively: every stage of data taking, processing, analysis. Some of the more common applications:  Configuration  PVSS – Detector Control System (DCS) Configuration & Monitoring  Trigger – Trigger Configuration (online and simulation data)  OKS – Configuration databases for the TDAQ  Geometry - Detector Description  File and Job management  T0 – Tier 0 processing  DQ2/DDM – distributed file and dataset management  Dashboard – monitor jobs and data movement on the ATLAS grid  PanDa – workload management: production & distributed analysis  Conditions data (non-event data for offline analysis)  Conditions Database  [POOL files in DDM (referenced from the Conditions DB)]  “Metadata” == data about data  AMI (ATLAS Metadata Interface) – Dataset metadata  COMA (COnditions MetadatA) – Configuration/Conditions metadata  TAGs (not an acronym) – Event-level metadata

5 Mar 2011Elizabeth Gallas - Databases/COMA5 What does your Athena job need ? What does every Athena job need ? 1.Data (Events) 2.Database (Geometry, Conditions) 3.Efficient I/O (sometime across a network), CPU 4.(A Purpose and a) Place for Output  Next slides … more details about Geometry and Conditions  What they contain  How Athena accesses them  How they are distributed for access on the grid  User interfaces, documentation, and help Needs: 1.Food 2.Water 3.Love 4.Place for output

6 Mar 2011Elizabeth Gallas - Databases/COMA6 Geometry Database  Relational DB: Primary Numbers for the ATLAS Detector Description  All data for building GeoModel description in single place  Primary numbers stored in Data Tables (leaf)  Organized by subsystem (branch)  Tagging (versioning) at various levels  Locked tags define distinct detector description  And Globally tagged/locked at higher levels  Associated with Software Releases  Evolution of Geometry tags is set up such that  Each new tag is compatible with older Releases  Location and Distribution:  Master copy: in Oracle server at CERN  Up to now: Copy of entire database dumped into SQLite file  Delivered to sites using DB Release technology with each Software Release  Future … more diverse distribution model being tested (Frontier)  Update: (Vakho Tsulaia) in upcoming Software/Computing workshop

7 Mar 2011Elizabeth Gallas - Databases/COMA7 Geometry DB Browser

8 Mar 2011Elizabeth Gallas - Databases/COMA8 “Conditions” “Conditions” – general term for information which is not ‘event-wise’ reflecting the conditions or states of a system – conditions are valid for an ‘interval of validity’ (IOV) ranging from very short to infinity. IOV’s can be expressed as a range: in timestamps or Run/LumiBlocks. Any conditions data needed for offline processing and/or analysis must be stored in the ATLAS Conditions Database (aka: COOL) or in its referenced POOL files (DDM) ATLAS Conditions Database ZDC DCS TDAQ OKS LHC DQ

9 Mar 2011Elizabeth Gallas - Databases/COMA9 Conditions DB infrastructure in ATLAS  Relies on considerable infrastructure: COOL, CORAL, Athena (developed by ATLAS and CERN IT) -- generic schema design which can store / accommodate / deliver a large amount of data for diverse set of subsystems.  IOV ‘interval of validity’ DB in relational DB tables  Data organized into folders … foldersets  By schema (subdetector)  By instance (for real data and MC)  Stores data ‘inline’ but can have references to external POOL files (managed by DDM)  Athena / Conditions DB  data maps to transient C++ objects, which are accessible to Athena at run time through the Transient Store  COOL Tag (version) - distinct sets of Conditions  making specific computations reproducible  Used at many stages of data taking and analysis From online calibrations, alignment, monitoring, to offline … processing … more calibrations … further alignment… reprocessing … analysis …to luminosity and data quality

10 Mar 2011Elizabeth Gallas - Databases/COMA10 Conditions: User interfaces Command line interface:  Conditions TAG Browser: 

11 Mar 2011Elizabeth Gallas - Databases/COMA11 Oracle Distribution of Conditions data  Oracle stores a huge amount of essential data ‘at our fingertips’  But ATLAS has many… many… many… fingers  May be looking for oldest to newest data  Conditions in Oracle – Master copy at Tier-0  Replicated to many Tier-1 sites  Running jobs at Oracle sites (direct access) performs well  But direct Oracle access on the grid from remote sites:  Even after tuning, direct access requires many back/forth network transactions – RTT (Round Trip Time) multiplies … SLOW  Cascade effect: Jobs hold connections longer, prevents starting new jobs  Use alternative technologies, especially over WAN (Wide Area Network):  “caching” Conditions from Oracle when possible Online CondDB Offline master CondDB Tier-1 replica Tier-1 replica Tier-0 farm Computer centre Outside world Isolation / cut Calibration updates Simplified Diagram !

12 Mar 2011Elizabeth Gallas - Databases/COMA12 Technologies for Conditions “caching”  “DB Release”: make a system of files containing all data ‘needed’.  Used in reprocessing campaigns and for MC processing/analysis  Includes:  SQLite replicas: “mini” Conditions DB  with specific Folders, IOV range, CoolTag  (a ‘slice’ – small subset of all rows in Oracle tables)  And associated POOL files and a PFC (file catalog)  “Frontier”: store results in a web cache.  Developed by Fermilab ( used by CDF, further refined for CMS)  Basic Idea: Frontier / Squid servers located at/near Oracle RAC  negotiate transactions between grid jobs and Oracle DB  reduce the load on Oracle by caching results of repeated queries  reduce latency observed connecting to Oracle over the WAN.  Additional Squid servers at remote sites help even more  Used by default for user analysis jobs.  Picture on next slide

13 Mar 2011Elizabeth Gallas - Databases/COMA13 Conditions DB access via Frontier Frontier for distributed database access  Used by default for user analysis jobs. Main components  Frontier server  Communicates directly with Oracle server  Includes data caching  Provides data to Squids  Squid  Communicates with Frontier server over http  Caches retrieved data locally for its clients ATLAS: Frontier in operation late in 2009  Frontier servers at T1 sites on replication  ~60 Squids all over the world  Mostly T2, some T3 too Tier 2 Tier 1

14 Mar 2011Elizabeth Gallas - Databases/COMA14 DB Access in Athena  Athena applications access conditions and geometry DBs using LCG software libraries POOL, COOL and CORAL  Allows for transparent usage of various technologies (Oracle, SQLite, FroNTier/Squid)

15 Mar 2011Elizabeth Gallas - Databases/COMA15 Tips for Users (1)  What Global Conditions and Geometry tags to use?  Autoconfigure your job  Have job read global tags from its input file (ESD, AOD)  In job options: from RecExConfig.RecFlags import rec rec.AutoConfiguration=['everything']  In job transforms: Command line parameter 'autoConfiguration=everything' Slide: V.Tsulaia

16 Mar 2011Elizabeth Gallas - Databases/COMA16 Tips for Users (2)  How to configure my environment to access  FroNTier/Squid?  Conditions payload POOL files?  DB Release for geometry (and MC conditions if needed)?  All that is done for you automatically... … just sit back and enjoy the ride! Slide: V.Tsulaia

17 Mar 2011Elizabeth Gallas - Databases/COMA17 Tips for Users (3) If things go wrong … and it seems to be related to database access Useful information on TWiki:  Athena DB Access:  COOL Troubles:  Atlas DB Release: These TWiki documents should be able to help you in narrowing down the problem and then you'll be in position to  Either ask your site admin  Or send email to Database Operations Slide: V.Tsulaia

18 Mar 2011Elizabeth Gallas - Databases/COMA18 Conclusions: Databases and DB Access from Athena  Databases are used extensively in ATLAS  At every stage of data taking, processing, analysis  Scratch the surface of many interactive user applications  And you will find a Database !  I’ve attempted to give an overview of the issues and considerations in DB access from Athena  The need to provide database information  In a variety of access patterns  With potentially widely varying data volumes  From diverse clients makes Athena access to ATLAS non-event-wise databases (Conditions and Geometry) complex.  Supporting different technologies  allows us to optimally meet the various needs.  A lot of effort has gone into making DB access for user analysis as transparent as possible …  More details can be found:  See V.Tsulaia slides  Software Workshop in Tbilisi Oct 26, 2010  On various TWiki pages

