The ATLAS Database Project Richard Hawkings, CERN Torre Wenaus, BNL/CERN ATLAS plenary meeting June 24, 2004.

2 June 24th, 2004 Richard Hawkings 2 Outline  Mandate and scope  Project definition process  Organization and communication  Subproject survey  Concluding remarks  The current draft plan can be found at:{ps,pdf}{ps,pdf (temporary location until new database web is up and running)

3 June 24th, 2004 Richard Hawkings 3 Project Mandate and Scope  Lead and coordinate all ATLAS database activities  Including those so far under Software, TC, TDAQ, Detector Projects  Software, servers, distributed data management infrastructure  Specifically, databases and data management for  Detector production, detector installation, survey data  Detector geometry  Online configuration, run bookkeeping, run conditions  Event data and metadata  Calibration and alignment (online and offline)  Offline processing configuration and bookkeeping  Grid based access to event and non-event data

4 June 24th, 2004 Richard Hawkings 4 Project Definition Process  RH, TW appointments (each currently at 50%) effective May 1 st 2004  Serious project definition work began late April  Based on the project outline presented at January 2004 software week  Project should strengthen, not weaken or delay, DB activities across ATLAS  Many individual discussions, feeding into plan iterations  Public draft circulated to ATLAS software community in advance of BNL software week, and discussed during the week  No major new input – generally favourable impression  Draft plan was approved by May 28 th CMB/SPMB  Continuing to gather input (TDAQ community, subdetectors, EB, …)  The plan will evolve, but the current version will guide project launch

5 June 24th, 2004 Richard Hawkings 5 Organization  Part of Software and Computing Project (Dario Barberis)  Richard Hawkings, Torre Wenaus Co-Leaders 5/04-5/06  Computing Management Board (CMB) members  Ex-officio Trigger/DAQ steering group (TDAQ-SG) members  Database Steering Group chaired by the Project Leaders is planning and decision making body  Twelve subprojects cover the mandated scope  Some subprojects embedded in other parts of ATLAS  Where tight integration of DB activities must be preserved  Subprojects being organized in close consultation with the projects concerned, to ensure this happens

6 June 24th, 2004 Richard Hawkings 6 Work Breakdown 1) Project management - steering, planning, coordination, strategy 2) Detector production - long term storage of subdetector production data 3) Detector installation - manufacturing and test (MTF), racks, cabling, survey 4) Detector geometry - primary numbers for detector description 5) Online databases - configuration, conditions, bookkeeping, offline transfer 6) Calibration and alignment - central tools, not subdetector algorithm work 7) Conditions database infrastructure - core sw and tools; framework integration 8) Event data - events and metadata from raw to analysis. Common core sw 9) Distributed data management - event and conditions data. Grid integration 10) Offline processing configuration and bookkeeping - production metadata 11) Distributed database services - physical databases, distributed infrastructure 12) Software support services - supporting users and deployers of DB software

7 June 24th, 2004 Richard Hawkings 7 Steering Group  Planning and decision making body  Integration mechanism to ensure synergy and coherence  Across subdetectors, across database areas  Representation from subprojects, associated projects  Decisions taken following consensus in the steering group  Project Leaders have full authority for planning and execution, including cases lacking full consensus  In cases of serious dissent, CMB and TDAQ-SG (where appropriate) take final decision  Strategic decisions go to the CMB (and TDAQ-SG) for endorsement  Steering Group is a large body, as the broad scope requires…

8 June 24th, 2004 Richard Hawkings 8 Steering Group Composition  Still have to fill some appointments in consultation with appropriate communities  Technical coordination - Kathy Pommes, Luc Poggioli  Online - Antonio Amorim, Mihai Caprini, Igor Soloviev  High level trigger - TBD  Calibration and Alignment - Richard Hawkings  Detector geometry - Joe Boudreau  Inner Detector - TBD  LAr calorimeter - Hong Ma  Tile calorimeter – Karl Gellerstedt  Muon spectrometer - Joe Rothberg  Conditions database infrastructure - RD Schaffer  Event data - David Malon  Distributed data management - TBD  Offline processing - TBD  Distributed database services - Alexandre Vaniachine  User feedback - TBD, an informed+noisy+constructive user voice  Persistency Framework Project (LCG Apps Area) - Dirk Duellmann  Computing Coordinator (ex officio) - Dario Barberis  Software Project Leader (ex officio) - David Quarrie Liaison from DB Project to Software Project Management Board - David Malon

9 June 24th, 2004 Richard Hawkings 9 Communication  Meetings  All with agendas in advance, and minutes documenting technical progress, planning and decisions. Phone connections to allow wide participation  Steering Group meeting bi-weekly (Friday 15:30 starting June 25 th )  Weekly meeting covering primarily offline – continuation of existing meeting  Technical planning and execution, within overall guidelines and plan of the SG  A second weekly slot (to be defined) for  Online database meeting, roughly bi-weekly  TC database meeting (production, installation) every 2-4 weeks  Conditions data working group meeting periodically  Associated mailing lists for all these communities (online/TC to be setup)  Web  Project web as a comprehensive and current source of technical and planning information and documentation is a very high priority  We take this as a project management responsibility  We will both write web content and nag others to do the same!

10 June 24th, 2004 Richard Hawkings 10 Subproject Survey A compressed survey of the subprojects… A compressed survey of the subprojects…

11 June 24th, 2004 Richard Hawkings 11 1) Project Management  Most of this area already addressed…  Planning and steering  Project meetings  Project web  …but also includes…  Monitoring of QA, testing and validation  Strategy and technology evolution

12 June 24th, 2004 Richard Hawkings 12 2) Detector Production  The many production/construction DBs used worldwide in the subdetectors are not the responsibility of this project  Ensuring all data of long-term interest to ATLAS is gathered into central (CERN IT Oracle) databases is in the mandate  Central system for uniform access and long-term maintainability  Provision of tools, standards, guidelines to subdetectors  Data definition and entry is subdetector responsibility  Central DB exists and some subsystems are entering data, but there is a great deal of central/common work to be done  Personnel and oversight for central/common work is mostly absent  Some ideas and possibilities – benefit from subdet production finishing?

13 June 24th, 2004 Richard Hawkings 13 3) Detector Installation  MTF (manufacturing and test) installation database  Installed parts with links to production database  Rack database (exists; being populated)  Cabling database (partially exists)  Survey database (does not exist)  Extraction tools to e.g. use cabling data for online config (not existing, and needed soon – e.g. comissioning)  Here again, personnel and oversight for central/common work (this project’s mandate) is severely lacking

14 June 24th, 2004 Richard Hawkings 14 4) Detector Geometry  Primary numbers used by detector description software  NOVA-based system deployed and operating for some time  Work is underway to move to a successor with versioning support  Approach is consistent with EB-mandated push to implement final ‘as-built’ detector geometry before subdetector engineers leave  Involves a ‘fast track’ implementation using standard relational DB tools to quickly support gathering and loading data  Offline access (via LCG ‘relational POOL’, conditions DB) on longer timescale when that software is ready

15 June 24th, 2004 Richard Hawkings 15 5) Online Databases  Configuration database - 30% of the online system; must remain integral to online  Online run bookkeeping - expect to employ standard offline/online tools  Conditions database interfaces - ditto.  Online, through the Lisbon group, has provided the standard tool used also offline, now also contributing to LCG CondDB project  External interfaces and data flow - Information Server (IS), slow controls (DCS), offline (to AMI, production mgmt)

16 June 24th, 2004 Richard Hawkings 16 6) Calibration and Alignment  Activities organized via conditions data working group  Communication forum for developing strategies, preparing online and offline algorithms  Conditions database loading and access  Contribute to computing model  Little manpower for central/common tasks  Both subdetector and coordination effort currently focused on CTB

17 June 24th, 2004 Richard Hawkings 17 7) Conditions Database Infrastructure  Conditions database core software development  Supporting tools (browsers, data distribution and synchronization, subsetting, etc.)  Athena services for conditions data  ATLAS participation in LCG CondDB common project  Activity is increasing, with ATLAS the largest experiment participant  New: Relational DB support for POOL, versioning system component  Short-term focus is on CTB support and stability  Planning (after CTB) to converge from many tools …  Lisbon CondDB, NOVA, POOL, geometry DB  … to essentially one, incorporating all experience gained  Common project CondDB with POOL support

18 June 24th, 2004 Richard Hawkings 18 8) Event Data  Core software support for event data, from raw data to analysis  Including event collections and physics datasets  Athena integration - both event data specific and common persistency services  This activity moved from Software Project to Database Project  Event data access outside Athena, e.g. in ROOT analysis environment  ATLAS participation in POOL common project  Event data storage for CTB and DC2 is generally OK  File-level data management is handled by the next subproject

19 June 24th, 2004 Richard Hawkings 19 9) Distributed Data Management  Management of ATLAS data around the world  Cataloging, replication, synchronization, access control, …  Event, conditions and other data; files and relational DBs  Integration/interfacing with grid tools for data management  And working around grid software deficiencies  Present focus on DC2 production needs –  Key tool: Don Quixote – interface to heterogeneous grids  As yet no overall strategy for DDM today and in the future  Need urgently to address user-level data management tools

20 June 24th, 2004 Richard Hawkings 20 10) Offline Processing Configuration and Bookkeeping  Databases cataloging metadata that is input to and output from offline processing jobs  Both managed production and (in the future) group and individual level jobs  Cataloging of provenance information to unambiguously define job/software configuration  Key tools at present are AMI and the production DB  Again needs plan & strategy, including technology choices  Present focus on DC2 and CTB support

21 June 24th, 2004 Richard Hawkings 21 11) Distributed Database Services  Support for deployed database and data management services at CERN and throughout ATLAS  Physical servers, distributed (heterogeneous!) database infrastructure  Support and/or liaison for admin and operations of databases away from CERN  Liaison to CERN IT/DB for CERN-based services  Possible common project in distributed database infrastructure under discussion, initiated by ATLAS (David Malon)  Present focus is again on CTB and DC2 support

22 June 24th, 2004 Richard Hawkings 22 12) Database Software Support Services  Support for software, distinct from support for physical services (preceding subproject)  Documentation  Not authoring (developers are responsible), but organization, usability, monitoring and review, ‘encouragement’ to authors  Tutorials and training  User support services  E.g. Savannah problem reporting, feature requests

23 June 24th, 2004 Richard Hawkings 23 The challenges ahead  ATLAS database project is a big project  Covers many different areas, diverse communities  Key objectives: Improving communication, facilitating data transfers  Short and medium-term concerns  Manpower for TC-related areas (detector production / installation)  Missing both sub-project leadership effort and workers  Becoming increasingly important as we approach commissioning  Can we exploit effort freed up from sub-detectors ?  Data management strategies and needs – DC2 and vision beyond  Large scale distributed infrastructure – LCG common project initiative  Individual doing analysis/development – end user tools  New contributions / efforts are needed and welcome !

