Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.

Slides:



Advertisements
Similar presentations
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Advertisements

GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage.
ALICE Operations short summary and directions in 2012 Grid Deployment Board March 21, 2011.
ALICE Operations short summary LHCC Referees meeting June 12, 2012.
ALICE Operations short summary and directions in 2012 WLCG workshop May 19-20, 2012.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Patricia Méndez Lorenzo (IT/GS) ALICE Offline Week (18th March 2009)
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
Overview of day-to-day operations Suzanne Poulat.
Status of the production and news about Nagios ALICE TF Meeting 22/07/2010.
Panda Grid Status Kilian Schwarz, GSI on behalf of PANDA GRID Group (slides to a large extend from Radoslaw Karabowicz)
Status of PDC’06 Latchezar Betev TF meeting – September 28, 2006.
WLCG GDB, CERN, 10th December 2008 Latchezar Betev (ALICE-Offline) and Patricia Méndez Lorenzo (WLCG-IT/GS) 1.
Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team 12 th CERN-Korea.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
1 LCG-France sites contribution to the LHC activities in 2007 A.Tsaregorodtsev, CPPM, Marseille 14 January 2008, LCG-France Direction.
Status of PDC’07 and user analysis issues (from admin point of view) L. Betev August 28, 2007.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
Site Report: Prague Jiří Chudoba Institute of Physics, Prague WLCG GridKa+T2s Workshop.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
AliEn central services Costin Grigoras. Hardware overview  27 machines  Mix of SLC4, SLC5, Ubuntu 8.04, 8.10, 9.04  100 cores  20 KVA UPSs  2 * 1Gbps.
DIRAC Pilot Jobs A. Casajus, R. Graciani, A. Tsaregorodtsev for the LHCb DIRAC team Pilot Framework and the DIRAC WMS DIRAC Workload Management System.
Materials for Report about Computing Jiří Chudoba x.y.2006 Institute of Physics, Prague.
LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006.
Patricia Méndez Lorenzo (CERN, IT/GS-EIS) ċ. Introduction  Welcome to the first ALICE T1/T2 tutorial  Delivered for site admins and regional experts.
Christmas running post- mortem (Part III) ALICE TF Meeting 15/01/09.
03/09/2007http://pcalimonitor.cern.ch/1 Monitoring in ALICE Costin Grigoras 03/09/2007 WLCG Meeting, CHEP.
Status of AliEn2 Services ALICE offline week Latchezar Betev Geneva, June 01, 2005.
Service Challenge Report Federico Carminati GDB – January 11, 2006.
Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
ALICE Grid operations +some specific for T2s US-ALICE Grid operations review 7 March 2014 Latchezar Betev 1.
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
VO Box discussion ATLAS NIKHEF January, 2006 Miguel Branco -
The status of IHEP Beijing Site WLCG Asia-Pacific Workshop Yaodong CHENG IHEP, China 01 December 2006.
Grid Operations in Germany T1-T2 workshop 2015 Torino, Italy Kilian Schwarz WooJin Park Christopher Jung.
Grid Operations in Germany T1-T2 workshop 2016 Bergen, Norway Kilian Schwarz Sören Fleischer Raffaele Grosso Christopher Jung.
Availability of ALICE Grid resources in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
The ALICE Christmas Production L. Betev, S. Lemaitre, M. Litmaath, P. Mendez, E. Roche WLCG LCG Meeting 14th January 2009.
Alice Operations In France
Kilian Schwarz ALICE Computing Meeting GSI, October 7, 2009
WLCG IPv6 deployment strategy
WLCG Tier-2 Asia Workshop TIFR, Mumbai 1-3 December 2006
LCG Service Challenge: Planning and Milestones
U.S. ATLAS Tier 2 Computing Center
Summary on PPS-pilot activity on CREAM CE
ALICE FAIR Meeting KVI, 2010 Kilian Schwarz GSI.
Patricia Méndez Lorenzo ALICE Offline Week CERN, 13th July 2007
GSIAF & Anar Manafov, Victor Penso, Carsten Preuss, and Kilian Schwarz, GSI Darmstadt, ALICE Offline week, v. 0.8.
Update on Plan for KISTI-GSDC
The CREAM CE: When can the LCG-CE be replaced?
Luca dell’Agnello INFN-CNAF
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
Readiness of ATLAS Computing - A personal view
WLCG Management Board, 16th July 2013
RDIG for ALICE today and in future
GSIAF "CAF" experience at GSI
Disk capacities in 2017 and 2018 ALICE Offline week 12/11/2017.
Simulation use cases for T2 in ALICE
ALICE – FAIR Offline Meeting KVI (Groningen), 3-4 May 2010
AliEn central services (structure and operation)
The LHCb Computing Data Challenge DC06
Presentation transcript:

pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week

Overview In this talk I will show the installed resources in Germany (especially GridKa) and how they compare to the pledged resources and the resources being used by Grid jobs. from the hardware point of view the resources are all there and can be used storage is hardly used by ALICE so far CPU: the pledged resources are not fully delivered. An analysis within this talk discusses the reasons. some suggestions how to improve the situation are shown summary

ALICE resources at GridKa GridKa provided % of the requested ALICE T1 resources. The plan was to do the same for But since the globally requested T1 CPU resources reduced significantly compared to the original request GridKa will go in 2010 for 25% disk and tape, CPU will be ca. 30% of the requested ALICE T1 resources so that the increase in absolute numbers will stay the same as CPUs have already been ordered GridKa resources for all 8 experiments & ALICE share

resources at GridKa (CPU)  all requested resources are installed and are ready to be used !!! nominal ALICE share: 38% of CPUs

GridKa CPU Share  If CPUs are not requested by ALICE they can be used easily by other experiments statistics September 2009

storage at GridKa new SE to follow: ALICE::FZK::TAPE  full xrootd based storage solution in the sense of WLCG (SRM interface, tape backend, space reservation, many more features) Generally: storage usage rather low. If the storage devices are not used the disks stay empty. They can not be used by other experiments easily. The 1.5 PB disk space installed at GridKa are to a large extend empty !!! Tapes are basically not used at all !!!

analysis of consumed CPU resources by ALICE Grid jobs at German sites the following 3 slides show 3 run periods of 1 month within the time span July to October

German sites no production due to AliRoot issue ALICE IS firewall issue Switch of hostcert from FZK to KIT ALICE total central Task Queue July 23 – August 23

issues listed chronologically switch of hostcert on vobox from FZK to KIT (GridKa) no production due to AliRoot issues (ALICE) ALICE IS can not be contacted due to firewall issue (ALICE)

done job statistics GridKa WMS GSI + FZK: 12%

pledged resources

analysis (July-August) GSI ok GridKa various issues (GridKa, ALICE) before July 27: no official statement but number of Jobs in TQ on lower end in spite of that Germany largest group in terms of delivered CPUs. pledged resources not fulfilled, though in terms of delivered KSI2K Germany on same level with France

ALICE Computing issues at GridKa & GSI run period August 22 – September 21 Kilian Schwarz

WMS CREAM connection WMS Software dir Job profile Productio n pause CREAM DB Software dir readonly # job dirs too high (CREAM) German sites ALICE total central TQ August 22 – September 21

issues listed chronologically WMS performance (MW) CREAM connection (read only voms cert) (GridKa) proxy delegation to CREAM (see above) (GridKa) WMS overloaded (MW) WMS performance (MW) ALICE software dir not visible (GridKa) ALICE job profile unstable (ALICE) login to CREAM vobox not possible (gsissh server) (GridKa) ALICE production pause (ALICE) IS: CREAM DB returns wrong values (MW) ALICE software dir readonly (GridKa) Can not read job wrapper (# job dirs too high in CREAM CE) (MW)

done job statistics GridKa GridKa: 15%

pledged resources

analysis GSI ok GridKa various issues (MW, GridKa, ALICE) End of the September: no official statement but number of Jobs in TQ on lower end in spite of that GridKa with 15% largest producer of DONE jobs in terms of delivered KSI2K Germany on same level with France

run period September – October 2009 ALICE Offline Week Kilian Schwarz GSI

WMS BDII no production planned productio n restart no production due to blocking AliRoot problems German sites ALICE total central TQ September 22 – October 22

issues listed chronologically – WMS overloaded (MW) ALICE production break (ALICE) planned ALICE production restart (ALICE) ALICE production break (AliRoot) (ALICE) BDII returns (sensor replaced) (GridKa)

done job statistics FZK-CREAM FZK WMS GSI GridKa: 12.4 % GSI: 2.3 %

pledged resources

analysis (Sep-Oct) all sites affected various issues (MW, GridKa, ALICE) mainly no jobs due to no ALICE production for the 1st time a significantly higher contribution by CREAM compared to WMS GridKa with 12.4% largest producer of DONE jobs in terms of delivered KSI2K the largest producers are CERN, France, Germany, Nordic Countries in this run period almost no region did well if compared to pledged resources

done job statistics GridKa: 13% GridKa WMS GridKa CREAM

pledged resources

Suggestions to improve the situation Increased reaction time to GridKa related problems  More manpower  Better communication Participation in related meetings  Better monitoring Increased reaction time to Middleware related problems  Better monitoring (e.g. at CERN exists a technique to monitor overladed WMS)  Also here participation in related meetings Grid jobs  If no production jobs then user jobs should go to GridKa  Production free time should be reduced and better communicated In terms of resources: eventually GSI could pledge more resources to the Grid, resource distribution among German Grid sites can be done in a flexible way to ensure that ALICE Grid gets what it needs

summary and conclusion Germany is currently not able to deliver the pledged resources to a full extend reasons for this are many folded (site problems, Middleware problems, missing production jobs)  site problems could be solved with higher response time  MW problems could be monitored better (e.g. overloaded WMS following a procedure existing at CERN)  but if production jobs are missing the sites can not fulfill the request for pledged resources  the delivered CPUs have to be folded with the number of existing jobs (e.g. "jobs match the site resources") from the hardware point of view all resources are installed and available