Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 LHCb computing for the analysis : a naive user point of view Workshop analyse cc-in2p3 17 avril 2008 Marie-Hélène Schune, LAL-Orsay for LHCb-France Framework,

Similar presentations


Presentation on theme: "1 LHCb computing for the analysis : a naive user point of view Workshop analyse cc-in2p3 17 avril 2008 Marie-Hélène Schune, LAL-Orsay for LHCb-France Framework,"— Presentation transcript:

1 1 LHCb computing for the analysis : a naive user point of view Workshop analyse cc-in2p3 17 avril 2008 Marie-Hélène Schune, LAL-Orsay for LHCb-France Framework, data flow, computing model … Few specific points Practically ? And today ? LHCb computing TDR Talks from Ph. Charpentier (eg LHCC February 2008) and N. Brook (DESY computing seminar) Many thanks to my LHCb colleagues for their help in preparing the talk (in particular : MN Minard, S. Poss, A. Tsaregorodtsev) Any mistake is mine, I am not at all an expert !

2 2 LHCb physics goals : search for New Physics signals in flavour physics (B and D) CP violation study Rare B decays studies In 1 year : 2 fb -1 The LHCb collaboration : 15 countries 47 institutes ~600 physicists

3 3 An LHCb event : Reconstruction RAW data Reduced DST Preselection Code RAW + DST TAG 2 GBytes file, 60k events, 30s on average Transferred from Online to Tier0 (CERN-Castor) Copied from Tier0 to one of the Tier1s Reconstruction is run at Tier0 and Tier1. Tracks reconstruction, clusters, PID … Stored locally at the Tier0 and the Tier1 Stripping of the events, developed by the physics groups. Data streams are created at Tier0 and Tier1. They are distributed to all Tier1s A priori the reconstruction is foreseen to be run twice a year : quasi real time and after LHC shutdown

4 4 Preselection Code RAW data 35kB/evt Reduced DST 20kB/evt Physics stream1 DST+RAW Physics stream2 DST+RAW Physics streamN DST+RAW DST : 110 kB/evt To allow quick access to the data … Event Tag Collection For 120 days of run : 6 TB for each stream. Numbers based on the computing TDR : factor 10 overall reduction A priori the preselection is foreseen to be run four times a year

5 5 Simulation is done using non-Tier1 CPU resources –MC data are stored at Tier0 and Tier1s, no permanent storage at Tier2s Tier0, Tiers1 and Tiers2 : Monte Carlo production

6 6 LHCb computing TDR : Physics stream i DST+RAW Event Tag Collection Analysis Code User DST User Event Tag Collection RooTuple Final Analysis Code (cuts …) Result ! CERN + the 6 Tier1s

7 7 Data access through the GRID : for the users : GANGA front-end is used to prepare and submit jobs. DIRAC is wrapping all the GRID ( and non GRID ) resources for LHCb. It is not used directly by the users DIRAC can be viewed as a (very) large batch system : – Accounting – Priority Mechanism – Fair share A GANGA job :

8 8 Few specific points : LHCb does not see directly cc-in2p3 : it appears that none of the French physicists doing analysis in LHCb logs on cc-in2p3 For an LHCb user where the job runs is fully transparent After the CERN, the cc-in2p3 will be the largest center for the analysis in LHCb So the use of cc-in2p3 is in fact dictated by the presence of the MC sample analyzed the data access is the main problem raised by the users eg : on 2 millions events only ¼ can be analyzed (after several trials)

9 9 Practically : 1.Create the datafiles location from the LHCb Bookkeeping web interface 2.Set up the environment (versions…) 3.Tell GANGA to work interactively 4.Do a final check of the code 5.Tell GANGA to send the jobs on the GRID using DIRAC 6.Have few coffees 7.Look at the monitoring page (http://lhcb.pic.es/DIRAC/Monitoring/Analysis/): 8.When the jobs have ended copy the RooTuples.

10 10 1.Create the datafiles location from the LHCb Bookkeeping web interface 2.Set up the environment (versions…) 3.Tell GANGA to work interactively 4.Do a final check of the code 5.Tell GANGA to send the jobs on the GRID using DIRAC Through the web a large file with all the requested data is obtained : //-- GAUDI data cards generated on 3/25/08 10:27 AM //-- For Event Type = 11124001 / Data type = DST 1 //-- Configuration = DC06 - phys-lumi2 //-- DST 1 datasets produced by Brunel - v30r14 //-- From DIGI 1 datasets produced by Boole - v12r10 //-- From SIM 1 datasets produced by Gauss - v25r7 //-- Database version = v30r14 //-- Cards content = logical //-- //-- Datasets replicated at ANY //-- 158 dataset(s) - NbEvents = 78493 //-- EventSelector.Input = { "DATAFILE='LFN:/lhcb/production/DC06/phys-lumi2/00001558/DST/0000/00001558_00000001_5.dst' TYP='POOL_ROOTTREE' OPT='READ'", "DATAFILE='LFN:/lhcb/production/DC06/phys-lumi2/00001558/DST/0000/00001558_00000002_5.dst' TYP='POOL_ROOTTREE' OPT='READ'", "DATAFILE='LFN:/lhcb/production/DC06/phys-lumi2/00001558/DST/0000/00001558_00000003_5.dst' TYP='POOL_ROOTTREE' OPT='READ'", "DATAFILE='LFN:/lhcb/production/DC06/phys-lumi2/00001558/DST/0000/00001558_00000004_5.dst' TYP='POOL_ROOTTREE' OPT='READ'", "DATAFILE='LFN:/lhcb/production/DC06/phys-lumi2/00001558/DST/0000/00001558_00000005_5.dst' TYP='POOL_ROOTTREE' OPT='READ'", "DATAFILE='LFN:/lhcb/production/DC06/phys-lumi2/00001558/DST/0000/00001558_00000006_5.dst' TYP='POOL_ROOTTREE' OPT='READ'", "DATAFILE='LFN:/lhcb/production/DC06/phys-lumi2/00001558/DST/0000/00001558_00000007_5.dst' TYP='POOL_ROOTTREE' OPT='READ'", "DATAFILE='LFN:/lhcb/production/DC06/phys-lumi2/00001558/DST/0000/00001558_00000008_5.dst' TYP='POOL_ROOTTREE' OPT='READ'", "DATAFILE='LFN:/lhcb/production/DC06/phys-lumi2/00001558/DST/0000/00001558_00000009_5.dst' TYP='POOL_ROOTTREE' OPT='READ'", "DATAFILE='LFN:/lhcb/production/DC06/phys-lumi2/00001558/DST/0000/00001558_00000010_5.dst' TYP='POOL_ROOTTREE' OPT='READ'", "DATAFILE='LFN:/lhcb/production/DC06/phys-lumi2/00001558/DST/0000/00001558_00000011_5.dst' TYP='POOL_ROOTTREE' OPT='READ'", "DATAFILE='LFN:/lhcb/production/DC06/phys-lumi2/00001558/DST/0000/00001558_00000012_5.dst' TYP='POOL_ROOTTREE' OPT='READ'", … };

11 11 This is given to Ganga with : DaVinciVersion = 'v19r9' myJobName = 'Bu2LLK_bb1' myApplication = DaVinci() myApplication.version = DaVinciVersion myApplication.cmt_user_path = '/afs/cern.ch/user/m/mschune/cmtuser/DaVinci_v19r9' myApplication.masterpackage = 'PhysSel/Bu2LLK/v3r2' myApplication.optsfile = File ( '/afs/cern.ch/user/m/mschune/cmtuser/DaVinci_v19r9/PhysSel/Bu2LLK/v3r2/options/myBd2Kstaree-bb1.opts' ) mySplitter = DiracSplitter( filesPerJob = 4, maxFiles = -1 ) myMerger = None myInputsandbox = [] myBackend = Dirac( CPUTime=1000 ) j = Job ( name = myJobName, application = myApplication, splitter = mySplitter, merger = myMerger, inputsandbox = myInputsandbox, backend = myBackend ) j.submit() This will automatically split the job in N subjobs where (in this case) N=N DataSets /4 In [32]:jobs Out[32]: Statistics: 4 jobs -------------- # id status name subjobs application backend backend.actualCE # 104 completed Bu2LLK_bb3 12 DaVinci Dirac # 105 running Bu2LLK_bb2 50 DaVinci Dirac # 106 completed Bu2LLK_bb1 50 DaVinci Dirac # 116 submitted Bs2DsX 3 DaVinci Dirac In ganga : Depending on the user the number of datafiles per job varies from 1 to ~10 1 job per file : a lot of jobs to handle but low failure rate … 10 jobs per file : too high failure rate

12 12 1.Create the datafiles location from the LHCb Bookkeeping web interface 2.Set up the environment (versions…) 3.Tell GANGA to work interactively 4.Do a final check of the code 5.Tell GANGA to send the jobs on the GRID using DIRAC 6.Have few coffees 7.Look at the monitoring page (http://lhcb.pic.es/DIRAC/Monitoring/Analysis/): 8.When the jobs have ended copy the RooTuples (python scripts). Everything is available in :

13 13 Two ways of working : 1.Use a generic LHCb code which works for any analysis and stores all needed information … : No need to write the code  (Very) large Rootuples  RooTuples analysis will require some CPU (+ a lot of disk space) 2.Write your own analysis code : Small RooTuples which can be then read interactively with ROOT  Need to know a little bit more about LHCb code and C++ Two ways of working … still at the experimental stage. Time will show what is the users’ preferred way. The first approach raises more stringently some questions about where to do the analysis of the large RooTuples ? : Tiers1, Tiers2, Tiers3 ? A significant amount of disk space is needed to store the Rootuple (cc-in2p3, labs, laptops … ? ) Some students are using ~100 GB.

14 14 Three examples (french physicists) for the period march 2008 – april 2008 ALL sites 1049 jobs 317 stalled 29 failed Cc-in2p3 253 jobs 252 stalled 0 failed ALL sites 340 jobs 1 stalled 19 failed Cc-in2p3 10 jobs 1 stalled 0 failed ALL sites 294 jobs 62 stalled 54 failed Cc-in2p3 0 jobs 0 stalled 0 failed All users CERN 5086 jobs 664 stalled 645 failed Cc-in2p3 1419 jobs 940 stalled 14 failed CNAF 163 jobs 84 stalled 41 failed NIKHEF 84 jobs 15 stalled 24 failed RAL 349 jobs 68 stalled 51 failed Numbers for analysis jobs … NB failed jobs can be the user’s fault …

15 15 A significant amount of “know how” is needed to run on the GRID (tutorials and documentation is usually not enough : other users’ help is needed ! ) Compared with my previous experiments (ALEPH and BaBar) : an additional level of complexity on the web page you know that you are in waiting state … but why … how to know for the others ??? Should know the name of somebody running the same kind of jobs to know what happens for him !! When you have found the correct set of packages … It runs fast ! Final remarks Working with the GRID brings a little bit of dream in our everyday life ….


Download ppt "1 LHCb computing for the analysis : a naive user point of view Workshop analyse cc-in2p3 17 avril 2008 Marie-Hélène Schune, LAL-Orsay for LHCb-France Framework,"

Similar presentations


Ads by Google