Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.

Similar presentations


Presentation on theme: "Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week."— Presentation transcript:

1 pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week

2 Overview In this talk I will show the installed resources in Germany (especially GridKa) and how they compare to the pledged resources and the resources being used by Grid jobs. from the hardware point of view the resources are all there and can be used storage is hardly used by ALICE so far CPU: the pledged resources are not fully delivered. An analysis within this talk discusses the reasons. some suggestions how to improve the situation are shown summary

3 ALICE resources at GridKa GridKa provided 2009 25% of the requested ALICE T1 resources. The plan was to do the same for 2010. But since the globally requested T1 CPU resources reduced significantly compared to the original request GridKa will go in 2010 for 25% disk and tape, CPU will be ca. 30% of the requested ALICE T1 resources so that the increase in absolute numbers will stay the same as 2009 - CPUs have already been ordered GridKa resources for all 8 experiments & ALICE share

4 resources at GridKa (CPU)  all requested resources are installed and are ready to be used !!! nominal ALICE share: 38% of CPUs

5 GridKa CPU Share  If CPUs are not requested by ALICE they can be used easily by other experiments statistics September 2009

6 storage at GridKa new SE to follow: ALICE::FZK::TAPE  full xrootd based storage solution in the sense of WLCG (SRM interface, tape backend, space reservation, many more features) Generally: storage usage rather low. If the storage devices are not used the disks stay empty. They can not be used by other experiments easily. The 1.5 PB disk space installed at GridKa are to a large extend empty !!! Tapes are basically not used at all !!!

7 analysis of consumed CPU resources by ALICE Grid jobs at German sites the following 3 slides show 3 run periods of 1 month within the time span July to October

8 German sites no production due to AliRoot issue ALICE IS firewall issue Switch of hostcert from FZK to KIT ALICE total central Task Queue July 23 – August 23

9 issues listed chronologically 29.7. - 6.8. switch of hostcert on vobox from FZK to KIT (GridKa) 12.8. no production due to AliRoot issues (ALICE) 18.8. - 21.8. ALICE IS can not be contacted due to firewall issue (ALICE)

10 done job statistics GridKa WMS GSI + FZK: 12%

11 pledged resources

12 analysis (July-August) GSI ok GridKa various issues (GridKa, ALICE) before July 27: no official statement but number of Jobs in TQ on lower end in spite of that Germany largest group in terms of delivered CPUs. pledged resources not fulfilled, though in terms of delivered KSI2K Germany on same level with France

13 ALICE Computing issues at GridKa & GSI run period August 22 – September 21 Kilian Schwarz

14 WMS CREAM connection WMS Software dir Job profile Productio n pause CREAM DB Software dir readonly # job dirs too high (CREAM) German sites ALICE total central TQ August 22 – September 21

15 issues listed chronologically 25.8.-31.8. WMS performance (MW) 28.8.-31.8. CREAM connection (read only voms cert) (GridKa) 28.8.-31.8. proxy delegation to CREAM (see above) (GridKa) 30.8.-1.9. WMS overloaded (MW) 8.9.-9.9. WMS performance (MW) 9.9.-10.9. ALICE software dir not visible (GridKa) 11.9. ALICE job profile unstable (ALICE) 14.9.-16.9. login to CREAM vobox not possible (gsissh server) (GridKa) 15.9. ALICE production pause (ALICE) 16.9.-21.9. IS: CREAM DB returns wrong values (MW) 16.9.-17.9. ALICE software dir readonly (GridKa) 21.9. Can not read job wrapper (# job dirs too high in CREAM CE) (MW)

16 done job statistics GridKa GridKa: 15%

17 pledged resources

18 analysis GSI ok GridKa various issues (MW, GridKa, ALICE) End of the September: no official statement but number of Jobs in TQ on lower end in spite of that GridKa with 15% largest producer of DONE jobs in terms of delivered KSI2K Germany on same level with France

19 run period September – October 2009 ALICE Offline Week Kilian Schwarz GSI

20 WMS BDII no production planned productio n restart no production due to blocking AliRoot problems German sites ALICE total central TQ September 22 – October 22

21 issues listed chronologically 24.9. – 28.9. WMS overloaded (MW) 23.9. ALICE production break (ALICE) 29.9. planned ALICE production restart (ALICE) 02.10. ALICE production break (AliRoot) (ALICE) 12.10.-13.10. BDII returns 44444 (sensor replaced) (GridKa)

22 done job statistics FZK-CREAM FZK WMS GSI GridKa: 12.4 % GSI: 2.3 %

23 pledged resources

24 analysis (Sep-Oct) all sites affected various issues (MW, GridKa, ALICE) mainly no jobs due to no ALICE production for the 1st time a significantly higher contribution by CREAM compared to WMS GridKa with 12.4% largest producer of DONE jobs in terms of delivered KSI2K the largest producers are CERN, France, Germany, Nordic Countries in this run period almost no region did well if compared to pledged resources

25 done job statistics GridKa: 13% GridKa WMS GridKa CREAM

26 pledged resources

27 Suggestions to improve the situation Increased reaction time to GridKa related problems  More manpower  Better communication Participation in related meetings  Better monitoring Increased reaction time to Middleware related problems  Better monitoring (e.g. at CERN exists a technique to monitor overladed WMS)  Also here participation in related meetings Grid jobs  If no production jobs then user jobs should go to GridKa  Production free time should be reduced and better communicated In terms of resources: eventually GSI could pledge more resources to the Grid, resource distribution among German Grid sites can be done in a flexible way to ensure that ALICE Grid gets what it needs

28 summary and conclusion Germany is currently not able to deliver the pledged resources to a full extend reasons for this are many folded (site problems, Middleware problems, missing production jobs)  site problems could be solved with higher response time  MW problems could be monitored better (e.g. overloaded WMS following a procedure existing at CERN)  but if production jobs are missing the sites can not fulfill the request for pledged resources  the delivered CPUs have to be folded with the number of existing jobs (e.g. "jobs match the site resources") from the hardware point of view all resources are installed and available


Download ppt "Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week."

Similar presentations


Ads by Google