1 LHCb computing for the analysis : a naive user point of view Workshop analyse cc-in2p3 17 avril 2008 Marie-Hélène Schune, LAL-Orsay for LHCb-France Framework,

Slides:



Advertisements
Similar presentations
31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
Advertisements

6/4/20151 Introduction LHCb experiment. LHCb experiment. Common schema of the LHCb computing organisation. Common schema of the LHCb computing organisation.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Production Planning Eric van Herwijnen Thursday, 20 june 2002.
11 Dec 2000F Harris Datagrid Testbed meeting at Milan 1 LHCb ‘use-case’ - distributed MC production
A tool to enable CMS Distributed Analysis
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.
3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.
LHCb Applications and GRID Integration Domenico Galli Catania, April 9, st INFN-GRID Workshop.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
LHCb computing in Russia Ivan Korolko (ITEP Moscow) Russia-CERN JWGC, October 2005.
Cosener’s House – 30 th Jan’031 LHCb Progress & Plans Nick Brook University of Bristol News & User Plans Technical Progress Review of deliverables.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Nick Brook Current status Future Collaboration Plans Future UK plans.
Belle MC Production on Grid 2 nd Open Meeting of the SuperKEKB Collaboration Soft/Comp session 17 March, 2009 Hideyuki Nakazawa National Central University.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
Dan Tovey, University of Sheffield GridPP: Experiment Status & User Feedback Dan Tovey University Of Sheffield.
7April 2000F Harris LHCb Software Workshop 1 LHCb planning on EU GRID activities (for discussion) F Harris.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
1 LCG-France sites contribution to the LHC activities in 2007 A.Tsaregorodtsev, CPPM, Marseille 14 January 2008, LCG-France Direction.
The LHCb Italian Tier-2 Domenico Galli, Bologna INFN CSN1 Roma,
LHCb The LHCb Data Management System Philippe Charpentier CERN On behalf of the LHCb Collaboration.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
The LHCb CERN R. Graciani (U. de Barcelona, Spain) for the LHCb Collaboration International ICFA Workshop on Digital Divide Mexico City, October.
LHCb-ATLAS GANGA Workshop, 21 April 2004, CERN 1 DIRAC Software distribution A.Tsaregorodtsev, CPPM, Marseille LHCb-ATLAS GANGA Workshop, 21 April 2004.
Successful Distributed Analysis ~ a well-kept secret K. Harrison LHCb Software Week, CERN, 27 April 2006.
Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
Jean-Roch Vlimant, CERN Physics Performance and Dataset Project Physics Data & MC Validation Group McM : The Evolution of PREP. The CMS tool for Monte-Carlo.
Bookkeeping Tutorial. 2 Bookkeeping content  Contains records of all “jobs” and all “files” that are produced by production jobs  Job:  In fact technically.
Outline: The LHCb Computing Model Philippe Charpentier, CERN ICFA workshop on Grid activities, Sinaia, Romania, October
INFSO-RI Enabling Grids for E-sciencE CRAB: a tool for CMS distributed analysis in grid environment Federica Fanzago INFN PADOVA.
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
LHCb report to LHCC and C-RSG Philippe Charpentier CERN on behalf of LHCb.
LHCb Data Challenge in 2002 A.Tsaregorodtsev, CPPM, Marseille DataGRID France meeting, Lyon, 18 April 2002.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
LHCbDirac and Core Software. LHCbDirac and Core SW Core Software workshop, PhC2 Running Gaudi Applications on the Grid m Application deployment o CVMFS.
LHCb GRID Meeting 11/12 Sept Sept LHCb-GRID T. Bowcock 2 AGENDA 9:30 LHCb MC Production –Points SICB Processing Req. Data Storage Data Transfer.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
LHCb Computing activities Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group.
LHCb 2009-Q4 report Q4 report LHCb 2009-Q4 report, PhC2 Activities in 2009-Q4 m Core Software o Stable versions of Gaudi and LCG-AA m Applications.
ATLAS Physics Analysis Framework James R. Catmore Lancaster University.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
LHCb LHCb GRID SOLUTION TM Recent and planned changes to the LHCb computing model Marco Cattaneo, Philippe Charpentier, Peter Clarke, Stefan Roiser.
LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.
1-2 March 2006 P. Capiluppi INFN Tier1 for the LHC Experiments: ALICE, ATLAS, CMS, LHCb.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
LHCb D ata P rocessing S oftware J. Blouw, A. Zhelezov Physikalisches Institut, Universitaet Heidelberg DESY Computing Seminar, Nov. 29th, 2010.
L’analisi in LHCb Angelo Carbone INFN Bologna
Overview of the Belle II computing
Moving the LHCb Monte Carlo production system to the GRID
The LHCb Software and Computing NSS/IEEE workshop Ph. Charpentier, CERN B00le.
LHCb Software & Computing Status
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
R. Graciani for LHCb Mumbay, Feb 2006
LHCb Computing Philippe Charpentier CERN
 YongPyong-High Jan We appreciate that you give an opportunity to have this talk. Our Belle II computing group would like to report on.
Gridifying the LHCb Monte Carlo production system
Status and plans for bookkeeping system and production tools
The LHCb Computing Data Challenge DC06
Presentation transcript:

1 LHCb computing for the analysis : a naive user point of view Workshop analyse cc-in2p3 17 avril 2008 Marie-Hélène Schune, LAL-Orsay for LHCb-France Framework, data flow, computing model … Few specific points Practically ? And today ? LHCb computing TDR Talks from Ph. Charpentier (eg LHCC February 2008) and N. Brook (DESY computing seminar) Many thanks to my LHCb colleagues for their help in preparing the talk (in particular : MN Minard, S. Poss, A. Tsaregorodtsev) Any mistake is mine, I am not at all an expert !

2 LHCb physics goals : search for New Physics signals in flavour physics (B and D) CP violation study Rare B decays studies In 1 year : 2 fb -1 The LHCb collaboration : 15 countries 47 institutes ~600 physicists

3 An LHCb event : Reconstruction RAW data Reduced DST Preselection Code RAW + DST TAG 2 GBytes file, 60k events, 30s on average Transferred from Online to Tier0 (CERN-Castor) Copied from Tier0 to one of the Tier1s Reconstruction is run at Tier0 and Tier1. Tracks reconstruction, clusters, PID … Stored locally at the Tier0 and the Tier1 Stripping of the events, developed by the physics groups. Data streams are created at Tier0 and Tier1. They are distributed to all Tier1s A priori the reconstruction is foreseen to be run twice a year : quasi real time and after LHC shutdown

4 Preselection Code RAW data 35kB/evt Reduced DST 20kB/evt Physics stream1 DST+RAW Physics stream2 DST+RAW Physics streamN DST+RAW DST : 110 kB/evt To allow quick access to the data … Event Tag Collection For 120 days of run : 6 TB for each stream. Numbers based on the computing TDR : factor 10 overall reduction A priori the preselection is foreseen to be run four times a year

5 Simulation is done using non-Tier1 CPU resources –MC data are stored at Tier0 and Tier1s, no permanent storage at Tier2s Tier0, Tiers1 and Tiers2 : Monte Carlo production

6 LHCb computing TDR : Physics stream i DST+RAW Event Tag Collection Analysis Code User DST User Event Tag Collection RooTuple Final Analysis Code (cuts …) Result ! CERN + the 6 Tier1s

7 Data access through the GRID : for the users : GANGA front-end is used to prepare and submit jobs. DIRAC is wrapping all the GRID ( and non GRID ) resources for LHCb. It is not used directly by the users DIRAC can be viewed as a (very) large batch system : – Accounting – Priority Mechanism – Fair share A GANGA job :

8 Few specific points : LHCb does not see directly cc-in2p3 : it appears that none of the French physicists doing analysis in LHCb logs on cc-in2p3 For an LHCb user where the job runs is fully transparent After the CERN, the cc-in2p3 will be the largest center for the analysis in LHCb So the use of cc-in2p3 is in fact dictated by the presence of the MC sample analyzed the data access is the main problem raised by the users eg : on 2 millions events only ¼ can be analyzed (after several trials)

9 Practically : 1.Create the datafiles location from the LHCb Bookkeeping web interface 2.Set up the environment (versions…) 3.Tell GANGA to work interactively 4.Do a final check of the code 5.Tell GANGA to send the jobs on the GRID using DIRAC 6.Have few coffees 7.Look at the monitoring page ( 8.When the jobs have ended copy the RooTuples.

10 1.Create the datafiles location from the LHCb Bookkeeping web interface 2.Set up the environment (versions…) 3.Tell GANGA to work interactively 4.Do a final check of the code 5.Tell GANGA to send the jobs on the GRID using DIRAC Through the web a large file with all the requested data is obtained : //-- GAUDI data cards generated on 3/25/08 10:27 AM //-- For Event Type = / Data type = DST 1 //-- Configuration = DC06 - phys-lumi2 //-- DST 1 datasets produced by Brunel - v30r14 //-- From DIGI 1 datasets produced by Boole - v12r10 //-- From SIM 1 datasets produced by Gauss - v25r7 //-- Database version = v30r14 //-- Cards content = logical //-- //-- Datasets replicated at ANY // dataset(s) - NbEvents = //-- EventSelector.Input = { "DATAFILE='LFN:/lhcb/production/DC06/phys-lumi2/ /DST/0000/ _ _5.dst' TYP='POOL_ROOTTREE' OPT='READ'", "DATAFILE='LFN:/lhcb/production/DC06/phys-lumi2/ /DST/0000/ _ _5.dst' TYP='POOL_ROOTTREE' OPT='READ'", "DATAFILE='LFN:/lhcb/production/DC06/phys-lumi2/ /DST/0000/ _ _5.dst' TYP='POOL_ROOTTREE' OPT='READ'", "DATAFILE='LFN:/lhcb/production/DC06/phys-lumi2/ /DST/0000/ _ _5.dst' TYP='POOL_ROOTTREE' OPT='READ'", "DATAFILE='LFN:/lhcb/production/DC06/phys-lumi2/ /DST/0000/ _ _5.dst' TYP='POOL_ROOTTREE' OPT='READ'", "DATAFILE='LFN:/lhcb/production/DC06/phys-lumi2/ /DST/0000/ _ _5.dst' TYP='POOL_ROOTTREE' OPT='READ'", "DATAFILE='LFN:/lhcb/production/DC06/phys-lumi2/ /DST/0000/ _ _5.dst' TYP='POOL_ROOTTREE' OPT='READ'", "DATAFILE='LFN:/lhcb/production/DC06/phys-lumi2/ /DST/0000/ _ _5.dst' TYP='POOL_ROOTTREE' OPT='READ'", "DATAFILE='LFN:/lhcb/production/DC06/phys-lumi2/ /DST/0000/ _ _5.dst' TYP='POOL_ROOTTREE' OPT='READ'", "DATAFILE='LFN:/lhcb/production/DC06/phys-lumi2/ /DST/0000/ _ _5.dst' TYP='POOL_ROOTTREE' OPT='READ'", "DATAFILE='LFN:/lhcb/production/DC06/phys-lumi2/ /DST/0000/ _ _5.dst' TYP='POOL_ROOTTREE' OPT='READ'", "DATAFILE='LFN:/lhcb/production/DC06/phys-lumi2/ /DST/0000/ _ _5.dst' TYP='POOL_ROOTTREE' OPT='READ'", … };

11 This is given to Ganga with : DaVinciVersion = 'v19r9' myJobName = 'Bu2LLK_bb1' myApplication = DaVinci() myApplication.version = DaVinciVersion myApplication.cmt_user_path = '/afs/cern.ch/user/m/mschune/cmtuser/DaVinci_v19r9' myApplication.masterpackage = 'PhysSel/Bu2LLK/v3r2' myApplication.optsfile = File ( '/afs/cern.ch/user/m/mschune/cmtuser/DaVinci_v19r9/PhysSel/Bu2LLK/v3r2/options/myBd2Kstaree-bb1.opts' ) mySplitter = DiracSplitter( filesPerJob = 4, maxFiles = -1 ) myMerger = None myInputsandbox = [] myBackend = Dirac( CPUTime=1000 ) j = Job ( name = myJobName, application = myApplication, splitter = mySplitter, merger = myMerger, inputsandbox = myInputsandbox, backend = myBackend ) j.submit() This will automatically split the job in N subjobs where (in this case) N=N DataSets /4 In [32]:jobs Out[32]: Statistics: 4 jobs # id status name subjobs application backend backend.actualCE # 104 completed Bu2LLK_bb3 12 DaVinci Dirac # 105 running Bu2LLK_bb2 50 DaVinci Dirac # 106 completed Bu2LLK_bb1 50 DaVinci Dirac # 116 submitted Bs2DsX 3 DaVinci Dirac In ganga : Depending on the user the number of datafiles per job varies from 1 to ~10 1 job per file : a lot of jobs to handle but low failure rate … 10 jobs per file : too high failure rate

12 1.Create the datafiles location from the LHCb Bookkeeping web interface 2.Set up the environment (versions…) 3.Tell GANGA to work interactively 4.Do a final check of the code 5.Tell GANGA to send the jobs on the GRID using DIRAC 6.Have few coffees 7.Look at the monitoring page ( 8.When the jobs have ended copy the RooTuples (python scripts). Everything is available in :

13 Two ways of working : 1.Use a generic LHCb code which works for any analysis and stores all needed information … : No need to write the code  (Very) large Rootuples  RooTuples analysis will require some CPU (+ a lot of disk space) 2.Write your own analysis code : Small RooTuples which can be then read interactively with ROOT  Need to know a little bit more about LHCb code and C++ Two ways of working … still at the experimental stage. Time will show what is the users’ preferred way. The first approach raises more stringently some questions about where to do the analysis of the large RooTuples ? : Tiers1, Tiers2, Tiers3 ? A significant amount of disk space is needed to store the Rootuple (cc-in2p3, labs, laptops … ? ) Some students are using ~100 GB.

14 Three examples (french physicists) for the period march 2008 – april 2008 ALL sites 1049 jobs 317 stalled 29 failed Cc-in2p3 253 jobs 252 stalled 0 failed ALL sites 340 jobs 1 stalled 19 failed Cc-in2p3 10 jobs 1 stalled 0 failed ALL sites 294 jobs 62 stalled 54 failed Cc-in2p3 0 jobs 0 stalled 0 failed All users CERN 5086 jobs 664 stalled 645 failed Cc-in2p jobs 940 stalled 14 failed CNAF 163 jobs 84 stalled 41 failed NIKHEF 84 jobs 15 stalled 24 failed RAL 349 jobs 68 stalled 51 failed Numbers for analysis jobs … NB failed jobs can be the user’s fault …

15 A significant amount of “know how” is needed to run on the GRID (tutorials and documentation is usually not enough : other users’ help is needed ! ) Compared with my previous experiments (ALEPH and BaBar) : an additional level of complexity on the web page you know that you are in waiting state … but why … how to know for the others ??? Should know the name of somebody running the same kind of jobs to know what happens for him !! When you have found the correct set of packages … It runs fast ! Final remarks Working with the GRID brings a little bit of dream in our everyday life ….