Presentation is loading. Please wait.

Presentation is loading. Please wait.

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.

Similar presentations


Presentation on theme: "Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt."— Presentation transcript:

1 Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt

2 ALICE T2 Present status Present status Plans and timelines Plans and timelines Issues and problems Issues and problems

3 Status GridKa Pledged: 600 KSI2k, delivered: 133%, 11% of ALICE jobs (last month) Pledged: 600 KSI2k, delivered: 133%, 11% of ALICE jobs (last month) FZK CERN

4 GridKa – main issue Resources provided according to megatable Resources provided according to megatable –The share among Tier1s comes automatically when considering the Tier2s connecting to this Tier1 … –GridKa pledges 2008: tape 1.5 PB, disk 1 PB –Current megatable: tape 2.2 PB !!!  Much more than pledged, more than all other experiments together, most of the additional demand due to the Russian T2 (0.8 PB) The point is: the money is fixed. In principle switch between tape/disk/cpu should be possible – not on short notice, though. Eventually for 2009 things still can be changed.

5 GridKa – one more issue disk cache in front of the mass storage – how to compute this value ? disk cache in front of the mass storage – how to compute this value ? –Suggestion: –strongly depending on the ALICE computing model and therefore the formula to compute it should be the same for all T1 centres. –The various parameters in the formula should be defined by individual sites, according to actual MSS implementation (dCache, DPM, xrootd, …)

6 ALICE T2 – present status vobox LCG RB/CE GSI Batchfarm (39 nodes/252 cores for ALICE) & GSIAF(14 nodes) Directly attached disk storage (55 TB) ALICE::GSI::SE_tactical ::xrootd 30 TB ALICE::GSI::SE ::xrootd PROOF/ Batch Grid CERN GridKa 150 Mbps GSI

7 Present Status ALICE::GSI:SE::xrootd ALICE::GSI:SE::xrootd > 30 TB disk on fileserver (8 FS a 4 TB each) > 30 TB disk on fileserver (8 FS a 4 TB each) TB disk on fileserver TB disk on fileserver –20 fileserver 3U 15*500 GB disks RAID 5 –6 TB user space per server Batch Farm/GSIAF and ALICE::GSI::SE_tactical::xrootd Batch Farm/GSIAF and ALICE::GSI::SE_tactical::xrootd nodes dedicated to ALICE: nodes dedicated to ALICE: 15 D-Grid funded boxes: each 15 D-Grid funded boxes: each –2*2core 2.67 GHz Xeon, 8 GB RAM –2.1 TB local disk space on 3 disks + system disk Additionally 24 new boxes: each –2*4core 2.67 GHz Xeon, 16 GB RAM –2.0 TB local disk space on 4 disks including system

8 ALICE T2 – short term plans Extend GSIAF to all 39 nodes Extend GSIAF to all 39 nodes Study coexistence of interactive and batch processes on the same machines. Develop possibility to increase/decrease the number of batch jobs on the fly to give advantage to analysis. Study coexistence of interactive and batch processes on the same machines. Develop possibility to increase/decrease the number of batch jobs on the fly to give advantage to analysis. Add newly bought fileservers (about 120 TB disk space) to ALICE::LCG::SE::xrootd Add newly bought fileservers (about 120 TB disk space) to ALICE::LCG::SE::xrootd

9 ALICE T2 – medium term plans Add 25 additional nodes to GSI Batchfarm/GSIAF to be financed via 3rd party project (D-Grid) Add 25 additional nodes to GSI Batchfarm/GSIAF to be financed via 3rd party project (D-Grid) Upgrade GSI network connection to 1 Gbs either as dedicated line to GridKa (direct T2 connection to T0 problematic) or as general internet connection Upgrade GSI network connection to 1 Gbs either as dedicated line to GridKa (direct T2 connection to T0 problematic) or as general internet connection

10 ALICE T2 – ramp up plans RRB/MoU/WLCGMoU.pdf RRB/MoU/WLCGMoU.pdf

11 Plans for the Alice Tier 2&3 at GSI: Remarks: Remarks: 2/3 of that capacity is for the tier 2 (ALICE central, fixed via WLCG MoU) 2/3 of that capacity is for the tier 2 (ALICE central, fixed via WLCG MoU) 1/3 for the tier 3 (local usage, may be used via Grid) 1/3 for the tier 3 (local usage, may be used via Grid) according to the Alice computing model no tape for tier2 according to the Alice computing model no tape for tier2 tape for tier3 independent of MoU tape for tier3 independent of MoU hi run in October -> upgrade operational: 3Q each year hi run in October -> upgrade operational: 3Q each year Year ramp-up CPU (kSI2k) 400/ / / / Disk (TB) 120/80300/200390/260510/ WAN (Mb/s)

12 ALICE T2/T3 remarks related to ALICE T2/3: remarks related to ALICE T2/3: –At T2 centres are the Physicists who know what they are doing –Analysis can be prototyped in a fast way with the experts close by –GSI requires flexibility for optimising the ratio of calibration/analysis & simulation at tier2/3 Language definition according to GSI interpretation: ALICE T2: central use ALICE T3: local use. Resources may be used via Grid. But no pledged resources.

13 ALICE T2 use cases (see computing model) Three kinds of data analysis Fast pilot analysis of the data “just collected” to tune the first reconstruction at CERN Analysis Facility (CAF) Scheduled batch analysis using GRID (Event Summary Data and Analysis Object Data) End-user interactive analysis using PROOF and GRID (AOD and ESD) CERN Does: first pass reconstruction Stores: one copy of RAW, calibration data and first-pass ESD’s T1 Does: reconstructions and scheduled batch analysis Stores: second collective copy of RAW, one copy of all data to be kept, disk replicas of ESD’s and AOD’s T2 Does: simulation and end-user interactive analysis Stores: disk replicas of AOD’s and ESD’s

14 Requires AliRoot+Cond+AliEn (once) Has to run on a disconnected laptop Data reduction in ALICE RAW 14MB/ev RAW 1.1MB/ev ESD 3MB/ev ESD 40kB/ev Reco T0/T1s AODs 300kB/ev AODs 5kB/ev S-AOD 5kB/ev S-AOD 300kB/ev Tag 2kB/ev Tag 2kB/ev Analysis T0/T1s/T2/ laptop Tag 2kB/ev Tag 2kB/ev Cond Data In principle: individual file transfer works fine, now. Plan: next transfers with Pablos new collections based commands. Webpage where transfer requests can be entered and transfer status can be followed. In principle: individual file transfer works fine, now. Plan: next transfers with Pablos new collections based commands. Webpage where transfer requests can be entered and transfer status can be followed.

15 data transfers CERN GSI motivation: calibration modell and algorithms need to be tested before October motivation: calibration modell and algorithms need to be tested before October test the functionality of current T0/T1  T2 transfer methods. test the functionality of current T0/T1  T2 transfer methods. At GSI the CPU and storage resources are available, but how do we bring the data here ? At GSI the CPU and storage resources are available, but how do we bring the data here ?

16 data transfer CERN GSI The system is not ready yet for generic use. Therefore expert control by a „mirror is necessary. The system is not ready yet for generic use. Therefore expert control by a „mirror is necessary. In principle: individual file transfer works fine, now. Plan: next transfers with Pablos new collections based commands. Webpage where transfer requests can be entered and transfer status can be followed up. In principle: individual file transfer works fine, now. Plan: next transfers with Pablos new collections based commands. Webpage where transfer requests can be entered and transfer status can be followed up. So far about 700 ROOT files have been successfully transfered. This corresponds to about 1 TB of data. So far about 700 ROOT files have been successfully transfered. This corresponds to about 1 TB of data. 30% of the newest request still pending. 30% of the newest request still pending. Maximum speed achieved so far: 15 MB/s (almost complete bandwidth of GSI), but only during a relatively short time Maximum speed achieved so far: 15 MB/s (almost complete bandwidth of GSI), but only during a relatively short time Since August 8 no relevant transfers anymore. Reasons: Since August 8 no relevant transfers anymore. Reasons: –August 8, pending xrootd update at Castor SE –August 14, GSI SE failure due to network problems –August 20, instability of central AliEn services. Production comes first -- Up to recently: AliEn update GSI plans to analyse the transferred data ASAP and to continue with more transfers. Also PDC data need to be transferred for prototyping and testing of analysis code. GSI plans to analyse the transferred data ASAP and to continue with more transfers. Also PDC data need to be transferred for prototyping and testing of analysis code.

17 data transfer CERN GSI

18 ALICE T2 – problems and issues Where do we get our KSI2k values from for monitoring of CPU usage. Currently: (but e.g. HEPiX: intel CPUs – not complete performance available for typical HEP applications since optimised for Intel compilers etc… Where do we get our KSI2k values from for monitoring of CPU usage. Currently: (but e.g. HEPiX: intel CPUs – not complete performance available for typical HEP applications since optimised for Intel compilers etc… –How to do comparision between values published in ALICE and WLCG ?


Download ppt "Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt."

Similar presentations


Ads by Google