Presentation is loading. Please wait.

Presentation is loading. Please wait.

Summary on PPS-pilot activity on CREAM CE

Similar presentations


Presentation on theme: "Summary on PPS-pilot activity on CREAM CE"— Presentation transcript:

1 Summary on PPS-pilot activity on CREAM CE
D.Cesini (INFN-CNAF) D.Dongiovanni (INFN-CNAF) C.Aiftimiei (INFN-PD)

2 CREAM PPS Pilot PHASE1 Some of the PPS sites will be gradually requested to replace their lcg-CE with CREAM. Start with one site, published in the PPS BDII and then extend the testbed as needed. To fine-tune the installation tools (YAIM and release notes), To verify the correct interactions of the new services with the monitoring tools Test direct submission to cream CE (in collaboration with ALICE) NOTE: this activity is by no means meant to replace the standard certification of the service. The certification will be carried out in parallel in the usual way and in close synergy with the pilot Cream CEs: -CNAF: cert-ce-03.cnaf.infn.it + 4 virtual WNs using pbs -FZK: pps-cream-fzk.gridka.de ICE WMS: -FZK: pps-rb-fzk.gridka.de -SCAI: glite-wms2.scai.fraunhofer.de Available CLIs: -CNAF: cert-ui-01.cnaf.infn.it -FZK: pps-vobox-fzk.gridka.de (alice prod setup)

3 Phase1 test result by ALICE
ALICE production jobs via CREAM CE (ca. 2000) Alice jobs via lcg-CE The two CEs used have the same hardware The CREAM CE used in this test (PATCH#2415) is now in production. Not the ICE component

4 The CREAM CE performance
ALICE TASK FORCE SLIDE By P.Mendez Lorenzo The CREAM CE performance Stable performance since we put it in production Once the performance was ensured, the number of resources has been decreased to CPUs Stability tests (running this summer) shows good results No special baby-sitting required during this summer The system has been running all alone with no special interventions We have changed the ALICE queue to point the CREAM CE to the ALICE production queue (aliceXL)

5 CREAM PPS Pilot PHASE2 2 WMSs: CNAF and (FZK or SCAI) 1 UI CNAF: cert-ui-01.cnaf.infn.it 1 BDII CNAF: including services in pilot + LCG production 1 VOBOX(if needed): FZK CREAM CEs: -FZK Padova (14 ones, 7 PBS , 7 LSF) Bari CNAF (~ 10 ones) SCAI The CREAM CEs will access production batch systems Phase2, started on the 1st October It is focused on the performances of the ICE WMS. The objective of Phase2 is to enable CMS users to submit continuously at a rate of 10Kjob/day over 5 weeks

6 Lastest important pilot updates
….. 12-Dec-08: There is a new yaim-cream-ce (v ) in the YUM repo for the CREAM PPS pilot (PATCH:2667). 13-Jan-09: A new version of CREAM was release to the pilot. This version fixes BUG:45437 and BUG:45736. 13-Jan-09: within the SA1 coordination meeting the SA1 ROCs were invited to use the pilot version of CREAM for their regional installation 13-Jan-09: Stress test of the ICE+CREAM submission chain: A submission rate of 40 job/min was sustained but a failure rate higher that expected was observed. The issue is currently under analysis (Done by PD SA3) 13-Jan-09: Pilot end-date moved to mid-March. 20-Jan-09: Alice tested successfully the CLI using the CE at FZK. 03-Feb-09: CMS will start ICE+CREAM submission tests in parallel with PD SA3

7 Test details on ICE+CREAM (1/2)
Tests done by SA3 personnel in PADOVA. A submission rate of Jobs/min The failure rate is still higher than desirable. Test starts at Wed Jan 7 16:01:32 CET 2009 (WMS: devel18) Description: 7200 collections each of 40 jobs One collection every 60 seconds Used the CEs of testbedB (PD+CNAF) plus cream-12.pd.infn.it Used automatic-delegation and proxy renewal service Proxy has 5 hours of lifetime (and it is renewed every 4 hours) Collections correctly submitted: 3733 ( jobs) DONE OK: (96.44%) ABORTED: 446 (0.3%) Not finished: 4870 (3.26%) The numbers above were obtained with resubmission on (retrycount=2, shallowretrycount=3) They may be slightly polluted by the fact that 3 of the CEs had a configuration problem with LSF After this test two issues were found on CREAM reported with bugs: # ("too many open files" exception raised by the job purger) # ("problems in case of resubmission to the same CE")

8 Test details on ICE+CREAM (2/2)
Last Test info: 1 collection of 40 jobs per minute for 5 days : DONE OK: (99.18%) ABORTED: 0 (0.0%) Not finished: 2362 (0.82%) Resubmissions: 4599 (1.60%) Problems were originated by a CNAF CE were a process was continuously dying and by the blparser crash on the same CE. Those problems were fixed in the following tests. Submitting longer jobs at the same rate (40 jobs/min for 5 days) performance problems arise probably because more jobs are “active” at the same time.

9 Some infos The WMS used for submission in the pilot is still not delivered to certification. It will be released as an add-on to the WMS with PATCH:2459 . The version of WMS currently in PPS (PATCH:1841) supports submission to CREAM but there are known performance issues. New CREAM+ICE (PATCH:2748,2459) should be 'ready for certification' in mid February, with performance improved The workaround for proxy renewal issue on WNs was delivered to certification with PATCH:2669 and PATCH: These patches are still in certification (they have been for a month now). The mechanism was tested on the pilot however and hasn't shown any issues The submission via condorG was tried about one month ago by CMS users in Wisconsin which were able to submit to CEs in Padova. No further news received.


Download ppt "Summary on PPS-pilot activity on CREAM CE"

Similar presentations


Ads by Google