Presentation is loading. Please wait.

Presentation is loading. Please wait.

Federico Carminati, Peter Hristov NEC’2011 Varna September 12-19, 2011 Federico Carminati, Peter Hristov NEC’2011 Varna September 12-19, 2011 An Update.

Similar presentations


Presentation on theme: "Federico Carminati, Peter Hristov NEC’2011 Varna September 12-19, 2011 Federico Carminati, Peter Hristov NEC’2011 Varna September 12-19, 2011 An Update."— Presentation transcript:

1 Federico Carminati, Peter Hristov NEC’2011 Varna September 12-19, 2011 Federico Carminati, Peter Hristov NEC’2011 Varna September 12-19, 2011 An Update about ALICE Computing

2 ALICE@NEC NEC’2001 AliRoot for simulation NEC’2003 Reconstruction with AliRoot NEC’2005 AliRoot for analysis NEC’2007 Still no LHC data => Status and plans of the ALICE offline software. Calibration & Alignment NEC’2009 In preparation for the first LHC data, no presentation NEC’2011 Almost 2 years of stable data taking, a lot of published physics results => An update about the ALICE computing

3 Why HI collisions? STAR Indication of trans. HG to QGP at T c  170 MeV  c  1 GeV/fm 3 Phase trans. or crossover? Intermediate phase of strongly interacting QGP? Chiral symmetry restoration ? Constituent mass  current mass NA49 3 flavors Study QCD at its natural energy scale T = Λ QCD =200 MeV by creating a state of matter at high density and temperature using high energetic heavy ion collisions. ALICE

4 4 History of High-Energy A+B Beams BNL-AGS: mid 80’s, early 90’s O+A, Si+A15 AGeV/c√s NN ~ 6 GeV Au+Au11 AGeV/c√s NN ~ 5 GeV CERN-SPS: mid 80’s, 90’s O+A, S+A200 AGeV/c√s NN ~ 20 GeV Pb+A160 AGeV/c√s NN ~ 17 GeV BNL-RHIC: early 00’s Au+Au√s NN ~ 130 GeV p+p, d+Au √s NN ~ 200 GeV LHC: 2010 (!) Pb+Pb √s NN ~ 5,500 (2,760 in ’10-’12) GeV p+p√s NN ~ 14,000 (7000 in ’10-’12) GeV

5 2 level 0 - special hardware 8 kHz (160 GB/sec) level 1 - embedded processors level 2 - PCs 200 Hz (4 GB/sec) 30 Hz (2.5 GB/sec) 30 Hz (1.25 GB/sec) data recording & offline processing Total weight 10,000t Overall diameter 16.00m Overall length 25m Magnetic Field 0.5Tesla ALICE Collaboration ~ 1/2 ATLAS, CMS, ~ 2x LHCb ~1000 people, 30 countries, ~ 80 Institutes A full pp programme Data rate for pp is 100Hz@1MB

6 Organization Core Offline is CERN responsibility Framework development Coordination activities Documentation Integration Testing & release Resource planning Each sub detector is responsible for its own offline system It must comply with the general ALICE Computing Policy as defined by the Computing Board It must integrate into the AliRoot framework http://aliweb.cern.ch/Offline/

7 PLANNING IN PREPARING FOR BATTLE I ALWAYS FOUND PLANS USELESS BUT PLANNING ESSENTIAL GEN D.EISENHAUER (155 open items, 3266 total)

8 RESOURCES Sore point for ALICE computing

9 9 Computing model – pp Generation of calibration parameters RAW Calibration Disk buffer T0 CERN T0 First pass Reco Tape T0 T0 tape To Grid FC Alien FC CAF analysis to T1s T1s MC data T2s Pass 1& 2 reco ordered analysis end-user analysis

10 10 Computing model – AA Generation of calibration parameters RAW Calibration Disk buffer T0 Tape T0 To tape To Grid FC Alien FC CAF analysis to T1s T1s CERN T0 Pilot Reco First pass Reco HI data taking LHC shutdown From tape MC data T2s Pass 1& 2 reco ordered analysis end-user analysis

11 Prompt reconstruction Based on PROOF (TSelector) Very useful for high-level QA and debugging Integrated in the AliEVE event display Full Offline code sampling events directly from DAQ memory

12 12 Visualization V0

13 13 Ξ  →    Λ→ p    ALICE Analysis Basic Concepts Analysis Models Prompt data processing (calib, align, reco, analysis) @CERN with PROOF Batch Analysis using GRID infrastructure Local analysis Interactive analysis PROOF+GRID User Interface Access GRID via AliEn or ROOT UIs PROOF/ROOT Enabling technology for (C)AF GRID API class TAliEn Analysis Object Data contain only data needed for a particular analysis Extensible with ∆-AODs Same user code local, on CAF and Grid Work on the distributed infrastructure has been done by the ARDA project  →    Λ→ p   

14 Analysis train AOD TASK 1TASK 2TASK 3TASK 4 ESD Kine Eff cor AOD production is organized in a ‘train’ of tasks To maximize efficiency of full dataset processing To optimize CPU/IO Using the analysis framework Needs monitoring of memory consumption and individual tasks

15 Analysis on the Grid

16 Production of RAW Successful despite rapidly changing conditions in the code and detector operation 74 major cycles 7.210 9 events (RAW) passed through the reconstruction Processed 3.6PB of data Produced 0.37TB of ESDs and other data

17 17 ALICE central services Sending jobs to data Job 1lfn1, lfn2, lfn3, lfn4 Job 2lfn1, lfn2, lfn3, lfn4 Job 3lfn1, lfn2, lfn3 Optimizer Submits job User ALICE Job Catalogue Registers output lfnguid{se’s} lfnguid{se’s} lfnguid{se’s} lfnguid{se’s} lfnguid{se’s} ALICE File Catalogue Computing Agent Site Computing Agent Site Computing Agent Site Computing Agent Send results Fetch job Job 1.1lfn1 Job 1.2lfn2 Job 1.3lfn3, lfn4 Job 2.1lfn1, lfn3 Job 2.1lfn2, lfn4 Job 3.1lfn1, lfn3 Job 3.2lfn2

18 18 Storage strategy WN VOBOX::SA xrootd (manager) MSS xrootd (server) Disk SRM xrootd (server) DPM xrootd (server) Castor SRM SRM MSS xrootd emulation (server) dCache SRM

19 The access to the data Application ALICE FC File GUID, lfn or MD SE & pfn & envelope lfn → guid → ( acl, size, md5) build pfn who has pfn? SE & pfn xrootd ev#guidTag1, tag2, tag3… ev#guidTag1, tag2, tag3… ev#guidTag1, tag2, tag3… ev#guidTag1, tag2, tag3… Tag catalogue Direct access to data via TAliEn/TGrid interface

20 The ALICE way with XROOTD Pure Xrootd + ALICE strong authz plugin. No difference among T1/T2 (only size and QOS) WAN-wide globalized deployment, very efficient direct data access Tier-0: CASTOR+Xrd serving data normally. Tier-1: Pure Xrootd cluster serving conditions to ALL the GRID jobs via WAN “Old” DPM+Xrootd in some tier2s Xrootd site (GSI) A globalized cluster ALICE global redirector Local clients work Normally at each site Missing a file? Ask to the global redirector Get redirected to the right collaborating cluster, and fetch it. Immediately. A smart client could point here Any other Xrootd site (CERN) Cmsd Xrootd Virtual Mass Storage System … built on data Globalization More details and complete info in “Scalla/Xrootd WAN globalization tools: where we are.” @ CHEP09

21 21 CAF lfnguid{se’s} lfnguid{se’s} lfnguid{se’s} lfnguid{se’s} lfnguid{se’s} xrootd CASTOR WN PROOF XROOTD WN PROOF XROOTD WN PROOF XROOTD WN PROOF XROOTD WN PROOF XROOTD WN PROOF XROOTD PROOF master The whole CAF becomes a xrootd cluster Powerful and fast machinery – very popular with users Powerful and fast machinery – very popular with users Allows for any use pattern, however quite often leading to contention for resources Allows for any use pattern, however quite often leading to contention for resources Expected speedup Observed speedup 70% utilization

22 22 Analysis facilities - profile 1.8 PB of data through CAF, 550TB through SKAF For comparison – on the Grid, we have written 15PB, read 37PB

23 The ALICE Grid AliEn working prototype in 2002 Single interface to distributed computing for all ALICE physicists File catalogue, job submission and control, software management, user analysis ~80 participating sites now 1 T0 (CERN/Switzerland) 6 T1s (France, Germany, Italy, The Netherlands, Nordic DataGrid Facility, UK) KISTI and UNAM coming (!) ~73 T2s spread over 4 continents ~30,000 (out of ~150,000 WLCG) cores and 8.5 PB of disk Resources are “pooled” together No localization of roles / functions National resources must integrate seamlessly into the global grid to be accounted for FAs contribute proportionally to the number of PhDs (M&O-A share) T3s have the same role than T2s, even if they do not sign the MoU http://alien.cern.ch

24 All is in MonALISA

25 GRID operation principle Central AliEn services Site VO-box WMS (gLite/ARC/OSG/Local) SM (dCache/DPM/CASTOR/xrootd) Monitoring, Package management The VO-box system (very controversial in the beginning) Has been extensively tested Allows for site services scaling Is a simple isolation layer for the VO in case of troubles

26 Operation – central/site support Central services support (2 FTEs equivalent) There are no experts which do exclusively support – there are 6 highly- qualified experts doing development/support Site services support - handled by ‘regional experts’ (one per country) in collaboration with local cluster administrators Extremely important part of the system In normal operation ~0.2FTEs/site Regular weekly discussions and active all-activities mailing lists

27 Summary ALICE offline framework (AliRoot) is mature project that covers simulation, reconstruction, calibration, alignment, visualization and analysis Successful operation with “real data” since 2009 The results for several major physics conferences were obtained in time The Grid and AF resources are adequate to serve the RAW/MC and user analysis tasks More resources would be better of course The sites operation is very stable The gLite (EMI now) software is mature and few changes are necessary

28 Some Philosophy

29 29 The code Move to C++ was probably inevitable But it made a lot of “collateral damage” Learning process was long, and it is still going on Very difficult to judge what would have happened “had root not been there” The most difficult question is now “what next” A new language? there is none at the horizon Different languages for different scopes (python, java, C, CUDA…) just think about debugging A better discipline in using C++ (in ALICE no STL / templates) Code management tools, build systems, (c)make, autotools Still a lot of “glue” has to be provided, no comprehensive system “out of the box” Move to C++ was probably inevitable But it made a lot of “collateral damage” Learning process was long, and it is still going on Very difficult to judge what would have happened “had root not been there” The most difficult question is now “what next” A new language? there is none at the horizon Different languages for different scopes (python, java, C, CUDA…) just think about debugging A better discipline in using C++ (in ALICE no STL / templates) Code management tools, build systems, (c)make, autotools Still a lot of “glue” has to be provided, no comprehensive system “out of the box”

30 30 The Grid A half empty glass We are still far from the “Vision” A lot of tinkering and hand-holding to keep it alive 4+1 solutions for each problem We are just seeing now some light at the end of the tunnel of data management The half full glass We are using the Grid as a “distributed heterogeneous collection of high-end resources”, which was the idea after all LHC physics is being produced by the Grid A half empty glass We are still far from the “Vision” A lot of tinkering and hand-holding to keep it alive 4+1 solutions for each problem We are just seeing now some light at the end of the tunnel of data management The half full glass We are using the Grid as a “distributed heterogeneous collection of high-end resources”, which was the idea after all LHC physics is being produced by the Grid

31 31 Grid need-to-have Far more automation and resilience Make the Grid less manpower intensive More integration between workload management and data placement Better control of upgrades (OS, MW) Or better transparent integration of different OS/MW Integration of the network as an active, provisionable resource “Close” storage element, file replication / caching vs remote access Better monitoring Or perhaps simply more coherent monitoring... Far more automation and resilience Make the Grid less manpower intensive More integration between workload management and data placement Better control of upgrades (OS, MW) Or better transparent integration of different OS/MW Integration of the network as an active, provisionable resource “Close” storage element, file replication / caching vs remote access Better monitoring Or perhaps simply more coherent monitoring...


Download ppt "Federico Carminati, Peter Hristov NEC’2011 Varna September 12-19, 2011 Federico Carminati, Peter Hristov NEC’2011 Varna September 12-19, 2011 An Update."

Similar presentations


Ads by Google