Presentation is loading. Please wait.

Presentation is loading. Please wait.

Patricia Méndez Lorenzo (IT/GS) ALICE Offline Week (18th March 2009)

Similar presentations


Presentation on theme: "Patricia Méndez Lorenzo (IT/GS) ALICE Offline Week (18th March 2009)"— Presentation transcript:

1 Patricia Méndez Lorenzo (IT/GS) ALICE Offline Week (18th March 2009)

2 Introduction ALICE is interested in the deployment of the CREAM-CE service at all sites which provide support to the experiment GOAL: Deprecation of the WMS use in benefit of the direct CREAM-CE submission WMS submission mode to CREAM-CE not required ALICE has began to test the CREAM-CE since the beginning of Summer 2008 into the real production environment For the time being, ALICE is the only LHC experiment performing stress and real tests to the CREAM-CE This talk will focus on the ALICE experiences using CREAM-CE, the expectations, future plans and requirements for all the sites 18/03/092ALICE Offline Week -- CREAM-CE Use and Status for ALICE

3 The CREAM-CE CREAM (Computing Resource Execution And Management)  lightweight service for job management operations at the CE level Called to be the replacement of the current LCG-CE Submission procedures allowed by CREAM: Submissions to CREAM via WMS Via generic clients which allow direct submission The submission method depends basically on the experiment computing model Normally pilot based follows the direct submission mode approach (4 LHC experiments) Bulk submissions of real jobs follows the WMS submission approach (CMS) 18/03/09ALICE Offline Week -- CREAM-CE Use and Status for ALICE3

4 Direct Submission to CREAM-CE Extra elements required for direct submission Proxy renewal mechanism (required by CMS and ATLAS) Responsible to automatically renew the user proxy if expiring Already (recently) available The lack of this element is not a showstop for ALICE 48h voms extensions ensured by the security team@CERN Enough to run production/analysis jobs without any addition extension 18/03/09ALICE Offline Week -- CREAM-CE Use and Status for ALICE4

5 The 1st test phase Performed in summer 2008 at FZK (T1 site, Germany) Tests operated through a second VOBOX parallel to the already existing service at the T1 (operating in WMS submission mode) Access to the local CREAM-CE was ensured through the PPS infrastructure Initially 30 CPUs Moved to the ALICE production queue in few weeks (production setup) Intensive functionality and stability tests from July to September 2008 Production stopped to create and ALICE CREAM module into AliEn and to allow the site to upgrade their system Excellent support from the CREAM-CE developers and the site admins Specially Massimo Sgaravatto (INFN-Padova) and Angela Poschlad (GridKa T1 site) 18/03/09ALICE Offline Week -- CREAM-CE Use and Status for ALICE5

6 Results of the 1st test phase More than 55000 jobs successfully executed through the CREAM-CE in the mentioned period No interventions in the VOBOX required in the testing phase CREAM-CE used to distribute real (standard) ALICE jobs 18/03/09ALICE Offline Week -- CREAM-CE Use and Status for ALICE6 Running on the production queue Running on PPS nodes

7 Implementation into AliEn (I) Creation of a new CREAM module Specific for CREAM-CE submissions Available since AliEn v2-16 In parallel with the usual LCG module (restricted to WMS submissions only) Change on the jdl construction The current ALICE jdl contained the outputsandbox field which specifies the standard outputs of the job agents CREAM-CE requires a new jdl field which declares the gridftp server where to retrieve the standard outputs ALICE PROCEDURE: to remove the outputsandbox field of the jdl files created by the CREAM module Only available in case of submission in debug mode 18/03/09ALICE Offline Week -- CREAM-CE Use and Status for ALICE7

8 Implementation into AliEn (II) gridftp server is required Required to retrieve the standard outputs of the job agents Sites are free to decide ist implementation (proposal: VOBOX) 200 GB of space required It will be used ONLY if the submission has been done in debug mode Change on the proxy renewal mechanism Submision optimization purpose The user proxy will be renewed only once per hour In previous AliEn version this procedure was executed BEFORE each agent submission The procedure has been implemented ALSO in LCG.pm 18/03/09ALICE Offline Week -- CREAM-CE Use and Status for ALICE8

9 The 2nd test phase After a debug phase of the CREAM module in January 2009, the new CREAM module in production the 19th of February (2 nd testing phase started) Stability and performance are currently the most important test issues at the sites providing CREAM-CE The deployment of a 2 nd VOBOX ensures that the production will continue on parallel through the WMS A unique VOBOX would require a dedicated babysitting of the system (not realistic) Feedback of all issues are directly provided to the CREAM developers As of today, 11 sites are providing CREAM CE 18/03/09ALICE Offline Week -- CREAM-CE Use and Status for ALICE9

10 SitequeuesStatus of the queues2 nd VOBOXVOBOX with clientsGeneral Status 18/03/09ALICE Offline Week -- CREAM-CE Use and Status for ALICE10 SiteCREAM- CEs CREAM Status 2 nd VOBOX Clients in VOBOX General Status FZK1 (4 queues) OKYES OK Kolkata2OKYES OK Athens1OKNO NOT OK KISTI1OKYES OK GSI1OKNOYESNOT OK* IHEP RAL1OKNOYESOK* CNAF1OKYES OK CERN2 (3 queues each) OKYES OK Torino1OKYES OK SARA1OKIn preparation YESIn testing

11 Status of the sites (I) Sitequeues 18/03/09ALICE Offline Week -- CREAM-CE Use and Status for ALICE11 FZK Minor actions required during the 2 nd phase test Delete some sandbox directories (hitting file limit again 32K subdirs) Procedure not neccessary in the next CREAM versions 46530 jobs since the 19th of Feb through the FZK CREAM-CE RAL No special actions reported by the site for service maintenance 2678 jobs executed using the local CREAM-CE Kolkata Debugging phase performed directly with the developer (Massimo Sgaravatto) In production from 9th of March

12 Status of the sites (II) Sitequeues 18/03/09ALICE Offline Week -- CREAM-CE Use and Status for ALICE12 CERN Two CEs have been provided the 9th of March to ALICE for testing In production since the 10th of March (voalice03 used for this production) SLC5 WNs behind the CREAM-CE 17247 jobs since the 10th of March GSI Still pending the setup of a 2 nd VOBOX The CREAM-CE performing well CNAF CREAM-CE ready to enter production at the end of February After some instabilities observed last week (lack of automatic purge, entered the production back the 13th of March) Info provider of the CREAM-CE showing certain instabilities

13 Status of the sites (III) Sitequeues 18/03/09ALICE Offline Week -- CREAM-CE Use and Status for ALICE13 KISTI Instabilities at the VOBOX level prevents the full setup of the local CREAM-CE in production CREAM-CE system performing well ATHENS The CREAM-CE is working but the site cannot be put in production No CREAM clients on the VOBOX IHEP CREAM-CE is not working yet (siter admin working on) Missing infrastructure - no 2 nd VOBOX (it will be provided next week)

14 Status of the sites (IV) Sitequeues 18/03/09ALICE Offline Week -- CREAM-CE Use and Status for ALICE14 SARA System tested yesterday evening with some few jobs Still in testing phase Torino System in production since last week Already 744 jobs executed through the local CREAM system Subatech 2 nd vobox already provided, the setup of the CREAM-CE is ongoing

15 Reminder: How to provide CREAM-CE services for ALICE Sitequeues 18/03/09ALICE Offline Week -- CREAM-CE Use and Status for ALICE15 During the last October pre-GDB meeting it was explicitly mentioned: Unlikely to be deployable as an lcg-CE replacement on this timescale (downtime period), but we can continue with rollout in parallel. In addition during the November pre-GDB meeting it was concluded: The lcg-CE replacement will required the WMS submission in place and the resolution of the proxy renewal issue (among more other points related to the service performance) It was encouraged however the deployment of the system in parallel to the LCG-CE

16 Reminder: How to provide CREAM-CE services for ALICE (II) Sitequeues 18/03/09ALICE Offline Week -- CREAM-CE Use and Status for ALICE16 The parallel LCG-CE vs. CREAM-CE setup in terms of ALICE computing model means the deployment of a 2 nd VOBOX Each VOBOX is able to submit to a specific backend One VOBOX  LCG-CE OR CREAM-CE submission: replacement approach Two VOBOXES  LCG-CE AND CREAM-CE submission: parallel approach This is a temporary solution during the parallel running phase As soon as the replacement is ensured and the LCG-CE is deprecated ALICE will not required a 2 nd VOBOX Remarks for the 2 nd VOBOX deployment Its setup is not sign with blood Each case can be studied individually BUT! Sites with important Storage capability for ALICE should be included in the list of sites providing a 2nd VOBOX

17 Reminder: How to provide CREAM-CE services for ALICE (III) Sitequeues 18/03/09ALICE Offline Week -- CREAM-CE Use and Status for ALICE17 Setup of the ALICE production queue behind the CREAM-CE This procedure puts the CREAM-CE directly in production GridFTP server Required to retrieve the job (agent) outputs Removed from the VOBOX in January 2008 with the deployment of the gLite3.1 VOBOX It was not longer required by the 4 LHC experiments at that time No specific wish for the placement of this service It can be provided into the VOBOX but this site decision

18 Future Plans Sitequeues 18/03/09ALICE Offline Week -- CREAM-CE Use and Status for ALICE18 Small changes in the CREAM module are still needed The current implementation of the CREAM-CE via CLI allows the declaration of a single queue only Sites can provide several queues per site (moreover T0/T1 sites) The implementation of submission to several queues must be done to the application level PROPOSAL for ALICE (in 3 lines of code): Definition of a range per queue at the LDAP level Calculation of a random number before each agent submission Assignment of a queue based on the random number/range matchmaking

19 Conclusions Sitequeues 18/03/09ALICE Offline Week -- CREAM-CE Use and Status for ALICE19 The ALICE experience with the current CREAM-CE service is very positive Stable (and maintenance-free) operation is achieved quickly after the initial debugging period High performance and scalability (FZK 2000+ parallel jobs) served by a single CREAM-CE Excellent support provided by the developers Special thanks to Massimo Sgravatto (INFN Padova) ALICE is working with all sites to install a CREAM- CE In full production before start of data taking


Download ppt "Patricia Méndez Lorenzo (IT/GS) ALICE Offline Week (18th March 2009)"

Similar presentations


Ads by Google