Presentation is loading. Please wait.

Presentation is loading. Please wait.

Current status WMS and CREAM CE deployment Patricia Mendez Lorenzo ALICE TF Meeting (CERN, 02/04/09)

Similar presentations


Presentation on theme: "Current status WMS and CREAM CE deployment Patricia Mendez Lorenzo ALICE TF Meeting (CERN, 02/04/09)"— Presentation transcript:

1 Current status WMS and CREAM CE deployment Patricia Mendez Lorenzo ALICE TF Meeting (CERN, 02/04/09)

2 WMS: some highlights  In December 2008 ALICE finished the migration of all sites to a WMS submission approach  The instabilities found in the system has forced the experiment and the support to babysit continuously the system and the production  This procedure does not scale in a real data taking approach (in few months)  ALICE has not changed the submission procedure defined even before 2006 DC  IMHO is not the experiment chaging the submission procedure because a new service is not providing the corresponding stability  It is the service coping with the experiment requirements and computing model, not the opposite  Let’s stop:  saying that this issue affectes ALICE only: It is simply NOT TRUE  Daily I see similar issues with Geant4, Lattice QCD, sixT.  Asking ALICE to change the submission procedure  It is not realistic at this point, in addition not see the point of changing one workload management system due to (not well understood) instabilities in a service

3 ALICE approach  ALICE requires deployment of the CREAM-CE at all sites  This is the highest priority  Sites might be excluded of the production if the service is not provided  The experiment therefore will not maintain a new submission procedure for some months  Intermedium time from WMS to CREAM  In addition both systems must be maintain together  bulk submission is not supported to the CLI level yet by CREAM  It is not realistic to have 2 submission approaches at this time by NONE application

4 Status of the WMS in production  Distribution of WMS in the ALICE production  For T0 site  Optimal situation: 3 WMS covering the production and the Pass 1 reconstruction at the T0 only  The reality: Each node has achieved a limit of 13K jobs/day (confirmed by the WMS operation experts). In addition these nodes have to cope with the instabilities of external WMS  For T1 sites  Optimal situation: Each T1 site should provide at least 2 WMS which should be dedicated in the case of many depending T2 sites in the country  The reality: This affects basically Italy and France and it is ensured by Italy  For T2 sites  Optimal situation:Large federations WITHOUT a regional T1 should follow the structure asked for the T2 sites (case of Russia)  The reality: the available T1 WMS must fly from one T2 to another depending on the daily overload status

5 Some trues and some lies about the ALICE Submission procedure and the WMS  The latest WMS mega-patch solves the overloding issues observed in gLite3.0: FALSE  We have not seen huge backlogs anymore: TRUE  The ALICE submission procedure has changed in the last time producing the instabilities observed in some WMS: FALSE  The experiment tried to accomodate as much as possible the submission procedure to WMS within their own computing model limits: TRUE  Same WMS configuration file as in AFS@CERN  Proxy renewal trigered only once per hour  RESUBMISSION FEATURE OF THE WMS DISCARTED BY THE EXPERIMENT AT THE JDL LEVEL SINCE FEB2009  ALICE is therefore using the WMS to a tree level (RB mode)  All the rest of the features are simply not used and not required

6 WHAT WAS HAPPENING IN FRANCE?  Issues in GRIF and CCIN2P3 are totaly uncorrelated  GRIF  grid33.lal.in2p3.fr got overloaded yesterday  In addition it was announced that ALICE was overloading the CE  Resubmission approach was discarted  Number of jobs not visible in the IS not the LB (later on)  CCIN2P3  This is the unique VO supporting CE in the T1 and T2  CEs with different ranks  This situation was fulfilling one CE (best ranking) leaving the rest of CE empty  The query to the info system was providing 0 waiting jobs for those (worse ranking) CE and therefore the system kept on submitting jobs  T1 and T2 clisters will be separated in different VOBOXES

7 Status of the CREAM-CE  New sites providing CREAM-CE:  RU-SPbSU (under testing)  Prague (still to be tested)  Subatech (still to be tested)  Already existing sites with production infrastructures:  FZK (just upgraded to the next version)  Kolkata (performing fine)  KISTI (no issues)  GSI (pending the setup in production)  RAL (no issues)  CNAF (no issues)  CERN (moving the system from SLC5 to SLC4 to increase the number of resources)  Torino (no issues)  SARA (no issues)

8


Download ppt "Current status WMS and CREAM CE deployment Patricia Mendez Lorenzo ALICE TF Meeting (CERN, 02/04/09)"

Similar presentations


Ads by Google