D. Duellmann, CERN Data Management at the LHC1 Data Management at CERN’s Large Hadron Collider (LHC) Dirk Düllmann CERN IT/DB, Switzerland

Slides:



Advertisements
Similar presentations
31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
Advertisements

Data Management Expert Panel - WP2. WP2 Overview.
Resources for the ATLAS Offline Computing Basis for the Estimates ATLAS Distributed Computing Model Cost Estimates Present Status Sharing of Resources.
Highest Energy e + e – Collider LEP at CERN GeV ~4km radius First e + e – Collider ADA in Frascati GeV ~1m radius e + e – Colliders.
1 Databases in ALICE L.Betev LCG Database Deployment and Persistency Workshop Geneva, October 17, 2005.
Randall Sobie The ATLAS Experiment Randall Sobie Institute for Particle Physics University of Victoria Large Hadron Collider (LHC) at CERN Laboratory ATLAS.
RLS Production Services Maria Girone PPARC-LCG, CERN LCG-POOL and IT-DB Physics Services 10 th GridPP Meeting, CERN, 3 rd June What is the RLS -
Grids: Why and How (you might use them) J. Templon, NIKHEF VLV T Workshop NIKHEF 06 October 2003.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
D. Düllmann - IT/DB LCG - POOL Project1 POOL Release Plan for 2003 Dirk Düllmann LCG Application Area Meeting, 5 th March 2003.
N° 1 LCG EDG Data Management Catalogs in LCG James Casey LCG Fellow, IT-DB Group, CERN
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC.
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
POOL Project Status GridPP 10 th Collaboration Meeting Radovan Chytracek CERN IT/DB, GridPP, LCG AA.
CERN/IT/DB Multi-PB Distributed Databases Jamie Shiers IT Division, DB Group, CERN, Geneva, Switzerland February 2001.
SEAL V1 Status 12 February 2003 P. Mato / CERN Shared Environment for Applications at LHC.
CERN-IT-DB Exabyte-Scale Data Management Using an Object-Relational Database: The LHC Project at CERN Jamie Shiers CERN, Switzerland
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL March 18, 2004 ATLAS Software Workshop Grid session.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
LHC Computing Review - Resources ATLAS Resource Issues John Huth Harvard University.
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 25 th April 2012.
Workshop Summary (my impressions at least) Dirk Duellmann, CERN IT LCG Database Deployment & Persistency Workshop.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
Databases E. Leonardi, P. Valente. Conditions DB Conditions=Dynamic parameters non-event time-varying Conditions database (CondDB) General definition:
LCG Application Area Internal Review Persistency Framework - Project Overview Dirk Duellmann, CERN IT and
5 May 98 1 Jürgen Knobloch Computing Planning for ATLAS ATLAS Software Week 5 May 1998 Jürgen Knobloch Slides also on:
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
CHEP 2003 March 22-28, 2003 POOL Data Storage, Cache and Conversion Mechanism Motivation Data access Generic model Experience & Conclusions D.Düllmann,
CERN Physics Database Services and Plans Maria Girone, CERN-IT
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid.
SEAL Core Libraries and Services CLHEP Workshop 28 January 2003 P. Mato / CERN Shared Environment for Applications at LHC.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
The POOL Persistency Framework POOL Project Review Introduction & Overview Dirk Düllmann, IT-DB & LCG-POOL LCG Application Area Internal Review October.
ATLAS is a general-purpose particle physics experiment which will study topics including the origin of mass, the processes that allowed an excess of matter.
WLCG and the India-CERN Collaboration David Collados CERN - Information technology 27 February 2014.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
D. Duellmann - IT/DB LCG - POOL Project1 The LCG Pool Project and ROOT I/O Dirk Duellmann What is Pool? Component Breakdown Status and Plans.
Some Ideas for a Revised Requirement List Dirk Duellmann.
Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 24 January 2005.
Computing for LHC Physics 7th March 2014 International Women's Day - CERN- GOOGLE Networking Event Maria Alandes Pradillo CERN IT Department.
A proposal: from CDR to CDH 1 Paolo Valente – INFN Roma [Acknowledgements to A. Di Girolamo] Liverpool, Aug. 2013NA62 collaboration meeting.
LHC Computing, CERN, & Federated Identities
Data Processing and the LHC Computing Grid (LCG) Jamie Shiers Database Group, IT Division CERN, Geneva, Switzerland
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
LCG LHC Computing Grid Project From the Web to the Grid 23 September 2003 Jamie Shiers, Database Group IT Division, CERN, Geneva, Switzerland
D. Duellmann - IT/DB LCG - POOL Project1 The LCG Dictionary and POOL Dirk Duellmann.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
CORAL CORAL a software system for vendor-neutral access to relational databases Ioannis Papadopoulos, Radoval Chytracek, Dirk Düllmann, Giacomo Govi, Yulia.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
G.Govi CERN/IT-DB 1GridPP7 June30 - July 2, 2003 Data Storage with the POOL persistency framework Motivation Strategy Storage model Storage operation Summary.
Status of tests in the LCG 3D database testbed Eva Dafonte Pérez LCG Database Deployment and Persistency Workshop.
ATLAS Distributed Analysis DISTRIBUTED ANALYSIS JOBS WITH THE ATLAS PRODUCTION SYSTEM S. González D. Liko
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Grid technologies for large-scale projects N. S. Astakhov, A. S. Baginyan, S. D. Belov, A. G. Dolbilov, A. O. Golunov, I. N. Gorbunov, N. I. Gromova, I.
(on behalf of the POOL team)
By: S.S.Tomar Computer Center CAT, Indore, India On Behalf of
The LHC Computing Grid Visit of Mtro. Enrique Agüera Ibañez
POOL: Component Overview and use of the File Catalog
POOL persistency framework for LHC
Dirk Düllmann CERN Openlab storage workshop 17th March 2003
The POOL Persistency Framework
CERN, the LHC and the Grid
Grid Data Integration In the CMS Experiment
LHC Data Analysis using a worldwide computing grid
Presentation transcript:

D. Duellmann, CERN Data Management at the LHC1 Data Management at CERN’s Large Hadron Collider (LHC) Dirk Düllmann CERN IT/DB, Switzerland

D. Duellmann, CERNData Management at the LHC 2 Outline Short Introduction to CERN & LHC Data Management Challenges The LHC Computing Grid (LCG) LCG Data Management Components Object Persistency and the POOL Project Connecting to the GRID – LCG Replica Location Service

CERN - The European Organisation for Nuclear Research The European Laboratory for Particle Physics Fundamental research in particle physics Designs, builds & operates large accelerators Financed by 20 European countries (member states) + others (US, Canada, Russia, India, ….)  ~€650M budget - operation + new accelerators  2000 staff users (researchers) from all over the world Next Major Research Project - LHC start ~ LHC Experiments, each with 2000 physicists, 150 universities, apparatus costing ~€300M, computing ~€250M to setup, ~€60M/year to run year lifetime

D. Duellmann, CERNData Management at the LHC 4 airport Computer Centre Geneva  27km 

D. Duellmann, CERNData Management at the LHC 5 The LHC machine Two counter- circulating proton beams Collision energy 7+7 TeV 27 Km of magnets with a field of 8.4 Tesla Super-fluid Helium cooled to 1.9°K The world’s largest superconducting structure

D. Duellmann, CERNData Management at the LHC 6 online system multi-level trigger filter out background reduce data volume from 40TB/s to 500MB/s level 1 - special hardware 40 MHz (40 TB/sec) level 2 - embedded processors level 3 - PCs 75 KHz (75 GB/sec) 5 KHz (5 GB/sec) 100 Hz (500 MB/sec) data recording & offline analysis

D. Duellmann, CERNData Management at the LHC 7 LHC Data Challenges 4 large experiments, year lifetime Data rates: 500MB/s – 1.5GB/s Total data volume: 12-14PB / year Several hundred PB total ! Analysed by thousands of users world-wide Data reduced from “raw data” to “analysis data” in a small number of well-defined steps

interactive physics analysis batch physics analysis batch physics analysis detector event summary data raw data event reprocessing event reprocessing event simulation event simulation analysis objects (extracted by physics topic) Data Handling and Computation for Physics Analysis event filter (selection & reconstruction) event filter (selection & reconstruction) processed data CER N

LHC Other experiments LHC Other experiments Moore’s law Planned capacity evolution at CERN Mass Storage Disk CPU

physics group regional group CERN Tier2 Lab a Uni a Lab c Uni n Lab m Lab b Uni b Uni y Uni x Tier3 physics department    Desktop Germany Tier 1 USA UK France Italy ………. CERN Tier 1 ………. The LHC Computing Centre Multi Tiered Computing Models - Computing Grids

D. Duellmann, CERNData Management at the LHC 11 LHC Data Models LHC data models are complex! Typically hundreds ( ) of structure types (classes in OO) Many relations between them Different access patterns LHC experiments rely on OO technology OO applications deal with networks of objects Pointers (or references) are used to describe inter object relations Need to support this navigational model in our data store Event TrackList TrackerCalor. Track Track Track Track Track HitList Hit Hit Hit Hit Hit

D. Duellmann, CERNData Management at the LHC 12 What is POOL? POOL is the common persistency framework for physics applications at the LHC P ool O f persistent O bjects for L HC Hybrid Store – Object Streaming & Relational Database Eg ROOT I/O for object streaming -complex data, simple consistency model (write once) Eg RDBMS for consistent meta data handling -simple data, transactional consistency Initiated in April 2002 Ramping up over the last year from 1.5 FTE to ~10 FTE Common effort between LHC experiments and the CERN Database group project scope and architecture and development => Rapid feedback cycles between project and its users First larger data productions starting now!

D. Duellmann, CERNData Management at the LHC 13 Component Architecture POOL (as most other LCG software) is based on a strict component software approach Components provide technology neutral APIs Communicate with other components only via abstract component interfaces Goal: Insulate the very large experiment software systems from concrete implementation details and technologies used today POOL user code is not dependent on any implementation libraries No link time dependency on any implementation packages (e.g. MySQL, Root, Xerces-c..) Component implementations are loaded at runtime via a plug-in infrastructure POOL framework consists of three major, weakly coupled, domains

D. Duellmann, CERNData Management at the LHC 14 POOL Components RDBMS Storage Svc

D. Duellmann, CERNData Management at the LHC 15 POOL Generic Storage Hierarchy A application may access databases (eg streaming files) from one or more file catalogs Each database is structured into containers of one specific technology (eg ROOT trees or RDBMS Tables) POOL provides a “Smart Pointers” type pool::Ref to transparently load objects from the back end into a client side cache define persistent inter object associations across file or technology boundaries POOL Context FileCatalog Database Container Object

D. Duellmann, CERNData Management at the LHC 16 Data Dictionary & Storage Technology dependent Dictionary Generation CINT dictionary I/O Data I/O GCC-XML LCG dictionary code Abstract DDL Code Generator LCG dictionary Gateway Reflection Other Clients C++ Header

D. Duellmann, CERNData Management at the LHC 17 POOL File Catalog Files are referred to inside POOL via a unique and immutable file identifier which is system generated at file creation time This allows to provide stable inter-file reference FileID are implemented as Global Unique Identifier (GUID) Allows to create consistent sets of files with internal references -without requiring a central ID allocation service Catalog fragments created independently can later be merged without modification to corresponding data file Logical Naming Object Lookup LFN1 PFN1, technology LFN2 LFNn PFN2, technology PFNn, technology File Identity and metadata

D. Duellmann, CERNData Management at the LHC 18 Storage Element EDG Replica Location Services - Basic Functionality Replica Manager Replica Location Service Replica Metadata Catalog Storage Element Files have replicas stored at many Grid sites on Storage Elements. Each file has a unique GUID. Locations corresponding to the GUID are kept in the Replica Location Service. Users may assign aliases to the GUIDs. These are kept in the Replica Metadata Catalog. The Replica Manager provides atomicity for file operations, assuring consistency of SE and catalog contents.

D. Duellmann, CERNData Management at the LHC 19 Storage Element Interactions with other Grid Middleware Components Replica Manager Replica Location Service Replica Optimization Service Replica Metadata Catalog SE Monitor Network Monitor Information Service Resource Broker User Interface or Worker Node Storage Element Virtual Organization Membership Service Applications and users interface to data through the Replica Manager either directly or through the Resource Broker.

D. Duellmann, CERNData Management at the LHC 20 RLS Service Goals To offer production quality services for LCG 1 to meet the requirements of forthcoming (and current!) data challenges e.g. CMS PCP/DC04, ALICE PDC-3, ATLAS DC2, LHCb CDC’04 To provide distribution kits, scripts and documentation to assist other sites in offering production services To leverage the many years’ experience in running such services at CERN and other institutes Monitoring, backup & recovery, tuning, capacity planning, … To understand experiments’ requirements in how these services should be established, extended and clarify current limitations Not targeting small-medium scale DB apps that need to be run and administered locally (to user)

D. Duellmann, CERNData Management at the LHC 21 Conclusions Data Management at LHC remains a significant challenge because of data volume, project lifetime, complexity of S/W and H/W setups. The LHC Computing Grid (LCG) approach is based on eg the EDG and GLOBUS Middleware projects and uses a strict component approach for physics application software The LCG-POOL project has developed a technology neutral persistency framework which is currently being integrated into the experiment production systems In conjunction with POOL a data catalog production service is provided to support several upcoming data productions in the 100 of terabyte area

D. Duellmann, CERNData Management at the LHC 22 Component Overview Replica Location Index Local Replica Catalog Storage Element CNAF Replica Location Index Local Replica Catalog Storage Element RAL Replica Location Index Local Replica Catalog Storage Element CERN Replica Location Index Local Replica Catalog Storage Element IN2P3

D. Duellmann, CERNData Management at the LHC 23 LHC Software Challenges Experiment software systems are large and complex Developed by teams of expert developers Permanent evolution and improvement for years… Analysis is performed by many end user developers Often participating only for short time Usually without strong computer science background Need simple and stable software environment Need to manage change over a long project lifetime Migration to new software, implementation languages New computing platforms, storage media New computing paradigms ??? Data management system needs to be designed such confine the impact of unavoidable change during the project

D. Duellmann, CERNData Management at the LHC 24 RAWRAW ESDESD AODAOD TAG random seq. 1PB/yr 100TB/yr 10TB/yr 1TB/yr Data Users Tier0 Tier1 Data Types, Volumes, Distribution & Access

D. Duellmann, CERNData Management at the LHC 25 Object Access via Smart Pointers Data Service object cache … TokenObject Persistency Service Object type Storage type Persistent Reference T o k e n Cache Ref Data Service Pointer Ref File Catalog