Petabyte-scale computing challenges of the LHCb experiment UK e-Science All Hands Meeting 2008, Edinburgh, 9 th September 2008 Y. Y. Li on behalf of the.

Slides:



Advertisements
Similar presentations
Ying Ying Li Windows Implementation of LHCb Experiment Workload Management System DIRAC LHCb is one of the four main high energy physics experiments at.
Advertisements

1 ALICE Grid Status David Evans The University of Birmingham GridPP 14 th Collaboration Meeting Birmingham 6-7 Sept 2005.
31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
First results from the ATLAS experiment at the LHC
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
1 Grid services based architectures Growing consensus that Grid services is the right concept for building the computing grids; Recent ARDA work has provoked.
6/4/20151 Introduction LHCb experiment. LHCb experiment. Common schema of the LHCb computing organisation. Common schema of the LHCb computing organisation.
Stuart K. PatersonCHEP 2006 (13 th –17 th February 2006) Mumbai, India 1 from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication('DaVinci',
Exploiting the Grid to Simulate and Design the LHCb Experiment K Harrison 1, N Brook 2, G Patrick 3, E van Herwijnen 4, on behalf of the LHCb Grid Group.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
LHC’s Second Run Hyunseok Lee 1. 2 ■ Discovery of the Higgs particle.
ITEP participation in the EGEE project NEC’2005, Varna, Bulgaria Ivan Korolko (ITEP Moscow)
1 port BOSS on Wenjing Wu (IHEP-CC)
December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.
Particle Physics and the Grid Randall Sobie Institute of Particle Physics University of Victoria Motivation Computing challenge LHC Grid Canadian requirements.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
F.Fanzago – INFN Padova ; S.Lacaprara – LNL; D.Spiga – Universita’ Perugia M.Corvo - CERN; N.DeFilippis - Universita' Bari; A.Fanfani – Universita’ Bologna;
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
1 Kittikul Kovitanggoon*, Burin Asavapibhop, Narumon Suwonjandee, Gurpreet Singh Chulalongkorn University, Thailand July 23, 2015 Workshop on e-Science.
INFSO-RI Enabling Grids for E-sciencE Geant4 Physics Validation: Use of the GRID Resources Patricia Mendez Lorenzo CERN (IT-GD)
Copyright © 2000 OPNET Technologies, Inc. Title – 1 Distributed Trigger System for the LHC experiments Krzysztof Korcyl ATLAS experiment laboratory H.
8th November 2002Tim Adye1 BaBar Grid Tim Adye Particle Physics Department Rutherford Appleton Laboratory PP Grid Team Coseners House 8 th November 2002.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
Extension of DIRAC to enable distributed computing using Windows resources 3 rd EGEE User Forum February 2008, Clermont-Ferrand J. Coles, Y. Y. Li,
…building the next IT revolution From Web to Grid…
The LHCb CERN R. Graciani (U. de Barcelona, Spain) for the LHCb Collaboration International ICFA Workshop on Digital Divide Mexico City, October.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
Author: Andrew C. Smith Abstract: LHCb's participation in LCG's Service Challenge 3 involves testing the bulk data transfer infrastructure developed to.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
ATLAS is a general-purpose particle physics experiment which will study topics including the origin of mass, the processes that allowed an excess of matter.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
WLCG and the India-CERN Collaboration David Collados CERN - Information technology 27 February 2014.
Your university or experiment logo here What is it? What is it for? The Grid.
Bookkeeping Tutorial. 2 Bookkeeping content  Contains records of all “jobs” and all “files” that are produced by production jobs  Job:  In fact technically.
1 DIRAC Job submission A.Tsaregorodtsev, CPPM, Marseille LHCb-ATLAS GANGA Workshop, 21 April 2004.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CRAB: the CMS tool to allow data analysis.
INFSO-RI Enabling Grids for E-sciencE CRAB: a tool for CMS distributed analysis in grid environment Federica Fanzago INFN PADOVA.
DIRAC Pilot Jobs A. Casajus, R. Graciani, A. Tsaregorodtsev for the LHCb DIRAC team Pilot Framework and the DIRAC WMS DIRAC Workload Management System.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
LHCb Data Challenge in 2002 A.Tsaregorodtsev, CPPM, Marseille DataGRID France meeting, Lyon, 18 April 2002.
1 LHCb computing for the analysis : a naive user point of view Workshop analyse cc-in2p3 17 avril 2008 Marie-Hélène Schune, LAL-Orsay for LHCb-France Framework,
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
DIRAC for Grid and Cloud Dr. Víctor Méndez Muñoz (for DIRAC Project) LHCb Tier 1 Liaison at PIC EGI User Community Board, October 31st, 2013.
Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
DIRAC: Workload Management System Garonne Vincent, Tsaregorodtsev Andrei, Centre de Physique des Particules de Marseille Stockes-rees Ian, University of.
WLCG Tier-2 Asia Workshop TIFR, Mumbai 1-3 December 2006
L’analisi in LHCb Angelo Carbone INFN Bologna
Overview of the Belle II computing
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
LHCb Computing Philippe Charpentier CERN
LHC Data Analysis using a worldwide computing grid
Gridifying the LHCb Monte Carlo production system
The LHCb Computing Data Challenge DC06
Presentation transcript:

Petabyte-scale computing challenges of the LHCb experiment UK e-Science All Hands Meeting 2008, Edinburgh, 9 th September 2008 Y. Y. Li on behalf of the LHCb collaboration

Outline  The questions…  The LHC – the experiments taking up the challenge  The LHCb experiment at the LHC  LHCb computing model  Data flow, processing requirements  Distributive computing in LHCb  Architecture and functionality  Performance

The questions…  The Standard Model of particle physics explains much of the interactions between the fundamental particles that form the Universe  all experiments so far confirming its predictions  BUT many questions still remain …  How does gravity fit into the model?  Where does all the mass come from?  Why do we have a Universe made up of matter?  Does dark matter exist and how much?  Search for phenomena beyond our current understanding  Go back to the 1 st billionth of a second after the BIG BANG…

The Large Hadron Collider  100m below surface on Swiss/French border  14 TeV proton-proton collider, 7x higher than previous machines  1,232 superconducting magnets chilled to ºc  4 experiments/detectors France Swiss CMS Alice LHCb Atlas 27Km After ~25 years since its first proposal… 1 st circulating beam tomorrow! 1 st collisions in October 2008.

LHCb pp VErtex LOcator – b decay vertex Operates only ~5mm from the beam Ring Imaging CHerenkov detector – particle ID Human eye – 100 photos/s RICH – 40million photos/s  LHC beauty experiment  Special purpose detector to search for:  New physics in very rare b quark decays  Investigates particle-antiparticle asymmetry  ~ 1Trillion bb pairs per year!

Data flow  Five main LHCb applications (C++ : Gauss, Boole, Brunel, DaVinci Python: Bender) Gauss Event Generation Detector Simulation Boole Digitization Brunel Reconstruction DaVinci Analysis Bender Sim DST Statistics RAW Data flow from detector Production Job  Detector calibrations Analysis Job Sim – Simulation data format DST – Data Storage Tape

CPU times Gauss Event Generation Detector Simulation Analysis 2,000 interesting events selected per second = 50MB/s data transferred and stored 40 million collisions (events) per second DST Offline  full reconstruction, 150MB processed per second of running Full simulation reconstruction, 100MB / event 500KB / event 962 physicist, 56 institutes in 4 continents Full simulation DST  80s / event (2.8GHz Xeon processor) ~100 years for 1 CPU to simulate 1s of real data! 10 7 s data taking / year + simulation  ~ O (PB) data per year

LHCb computing structure CERN RAL, UK PIC, Spain IN2P3, France GridKA, Germany NIKHEF, Netherlands CNAF, Italy Detector RAW data transfer 10MB/s Simulation data transfer 1MB/s  Tier 0 CERN  Raw data, ~3K cpu  Tier 1 Large centres  Reconstruction and analysis, ~15K cpu  Tier 2 Universities (~34)  Simulations, ~19K cpu  Tier 3 / 4 Laptops, desktops etc…  Simulations  Needs distributed computing

LHCb Grid Middleware - DIRAC  LHCb’s grid middleware: Distributed Infrastructure with Remote Agent Control  Python  Multi-platform (Linux, Windows)  Built with common grid tools  GSI (Grid Security Infrastructure) authentication  Pulls together all resources, shared with other experiments  Uses experimental wide CPU fair share  Optimises CPU usage with  Long, steady simulation jobs by production managers  Chaotic analysis usage by individual users

DIRAC architecture  Service orientated architecture  4 parts  User interface  Services  Agents  Resources  Uses a pull strategy for assigning CPU’s  Free, stable CPU’s request for jobs from main server  Useful in masking instability of resources from users

Linux based Multi-platform Combination of DIRAC services and non-DIRAC services Web monitoring

Security and data access  DISET, DIRAC SEcuriTy module  Uses openssl and modified pyopenssl  Allows for proxy support for secure access  DISET portal used to facilitate secure access on various platforms when authentication process is OS dependent  Platform binaries shipped with DIRAC, version is determined during installation  Various data access protocols supported  SRM, GridFTP,.NetGridFTP on Windows etc …  Data Services operates on main server  Each file is assigned a logical file name that matches to the physical file name(s)

Compute element resources  Other grids, e.g. WLCG (Worldwide LHC Computing Grid)  Linux machines  Local batch systems, Condor  Stand alone, desktops, laptops etc …  Windows  3 sites so far ~100 CPU  Windows Server, Windows Compute Cluster  Windows XP  ~90% of the World’s computers are Windows

Pilot agents  Used to access other grid resources, e.g. WLCG via gLite  User job triggers pilot agent submission by DIRAC as a ‘grid job’ to reserve CPU time  Pilot on WN checks environment before retrieving the user job from DIRAC WMS  Advantages  Easy control of CPU quota for shared resources  Several pilot agents can be deployed for the same job if failure on WN occurs  If full reserved CPU time is not used another job can also be retrieved from the DIRAC WMS

Agents on Windows  Windows resources – CPU scavenging  Non-LHC dedicated CPU’s  Spare CPU’s at Universities, private home computers etc…  Agent launch would be triggered by e.g. screen saver  CPU resource contribution determined by owner during DIRAC installation  Windows Compute Cluster  Shared single DIRAC installation  Job Wrapper submits retrieved jobs via Windows CC submission calls  Local job scheduling determined by the Windows CC scheduling service

Cross-platform submissions  Submissions made with valid grid proxy  Three Ways  JDL (Job Description Language)  DIRAC API  Ganga job management system  Built on DIRAC API commands  Full porting to Windows under process SoftwarePackages = { “DaVinci.v19r12" }; InputSandbox = { “DaVinci.opts” }; InputData = { "LFN:/lhcb/production/DC06/v2/ /DST/Presel _ _ dst" }; JobName = “DaVinci_1"; Owner = "yingying"; StdOutput = "std.out"; StdError = "std.err"; OutputSandbox = { "std.out", "std.err", “DaVinci_v19r12.log” “DVhbook.root” }; JobType = "user"; JDL import DIRAC from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication(‘DaVinci', 'v19r12') job.setInputSandbox(['DaVinci.opts’]) job.setInputData(['LFN:/lhcb/production/DC 06/v2/ /DST/Presel_ _ dst']) job.setOutputSandbox([‘DaVinci_v19r12.lo g’, ‘DVhbook.root’]) dirac.submit(job) API  User pre-compiled binaries can also be shipped  Jobs are then bound to be processed on the same platform  Successfully used in full selection and background analysis studies (User – Windows, resources – Windows and Linux)

Performance  Successful processing of data challenges since 2004  Latest data challenge  record of >10,000 simultaneous jobs (analysis and production)  700M events simulated in 475 days, ~1700 years of CPU time Windows SitesLinux Sites Total Running Jobs: 9715

Conclusions  LHC will expect O (PBytes) of data per year per experiment  Data to be analysed by 1,000s physicists on 4 continents  LHCb distributed computing structure is in place, pulling together a total of ~40K CPU’s from across the World  The DIRAC system has been fine tuned on the experiences from the past 4 years of intensive testing  We now eagerly await for the LHC switch on and the true test! 1 st beams tomorrow morning!!!