Summary on PPS-pilot activity on CREAM CE

Slides:



Advertisements
Similar presentations
Development of test suites for the certification of EGEE-II Grid middleware Task 2: The development of testing procedures focused on special details of.
Advertisements

CREAM: Update on the ALICE experiences WLCG GDB Meeting Patricia Méndez Lorenzo (IT/GS) CERN, 11th March 2009.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Patricia Méndez Lorenzo (IT/GS) ALICE Offline Week (18th March 2009)
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
INFSO-RI Enabling Grids for E-sciencE Practicals on VOMS and MyProxy Emidio Giorgio INFN Retreat between GILDA and ESR VO, Bratislava,
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
Status of the production and news about Nagios ALICE TF Meeting 22/07/2010.
CMS Stress Test Report Marco Verlato (INFN-Padova) INFN-GRID Testbed Meeting 17 Gennaio 2003.
WLCG GDB, CERN, 10th December 2008 Latchezar Betev (ALICE-Offline) and Patricia Méndez Lorenzo (WLCG-IT/GS) 1.
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Angela Poschlad (PPS-FZK), Antonio Retico.
INFSO-RI Enabling Grids for E-sciencE SA1 and gLite: Test, Certification and Pre-production Nick Thackray SA1, CERN.
Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Usage of virtualization in gLite certification Andreas Unterkircher.
Glexec, SCAS & CREAM. Milestones CREAM-CE capable of large-scale direct job submission Glexec & SCAS capable of large-scale use on WN in logging only.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CREAM and ICE Massimo Sgaravatto – INFN Padova.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Priorities update Andrea Sciabà IT/GS Ulrich Schwickerath IT/FIO.
Experiment Operations: ALICE Report WLCG GDB Meeting, CERN 14th October 2009 Patricia Méndez Lorenzo, IT/GS-EIS.
8 th CIC on Duty meeting Krakow /2006 Enabling Grids for E-sciencE Feedback from SEE first COD shift Emanoil Atanassov Todor Gurov.
VO Box Issues Summary of concerns expressed following publication of Jeff’s slides Ian Bird GDB, Bologna, 12 Oct 2005 (not necessarily the opinion of)
Certification and test activity ROC/CIC Deployment Team EGEE-SA1 Conference, CNAF – Bologna 05 Oct
Service Availability Monitor tests for ATLAS Current Status Tests in development To Do Alessandro Di Girolamo CERN IT/PSS-ED.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA3 partner collaboration tasks & process.
EGEE-III INFSO-RI Enabling Grids for E-sciencE SA3 All Hands Meeting 'Cluster of Competence' Experience SA3 INFN Cyprus May 7th-8th.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Middleware Update Maria Alandes Pradillo.
INFSO-RI Enabling Grids for E-sciencE gLite Certification and Deployment Process Markus Schulz, SA1, CERN EGEE 1 st EU Review 9-11/02/2005.
WP1 WMS release 2: status and open issues Massimo Sgaravatto INFN Padova.
Christmas running post- mortem (Part III) ALICE TF Meeting 15/01/09.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
INFSO-RI Enabling Grids for E-sciencE gLite Test and Certification Effort Nick Thackray CERN.
INFN GRID Production Infrastructure Status and operation organization Cristina Vistoli Cnaf GDB Bologna, 11/10/2005.
Current status WMS and CREAM CE deployment Patricia Mendez Lorenzo ALICE TF Meeting (CERN, 02/04/09)
EGEE is a project funded by the European Union under contract IST LCG open issues Massimo Sgaravatto INFN Padova JRA1 IT-CZ cluster meeting,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Management Claudio Grandi.
INFN/IGI contributions Federated Clouds Task Force F2F meeting November 24, 2011, Amsterdam.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CREAM: current status and next steps EGEE-JRA1.
Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Simone Campana (CERN) Job Priorities: status.
WLCG Operations Coordination Andrea Sciabà IT/SDC GDB 11 th September 2013.
Availability of ALICE Grid resources in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
The ALICE Christmas Production L. Betev, S. Lemaitre, M. Litmaath, P. Mendez, E. Roche WLCG LCG Meeting 14th January 2009.
Status of the SL5 migration ALICE TF Meeting
Servizi core INFN Grid presso il CNAF: setup attuale
CEMon
ALICE Workload Model – WMS and CREAM
CREAM and ICE Test Results
LCG Service Challenge: Planning and Milestones
Design rationale and status of the org.glite.overlay component
Practicals on VOMS and MyProxy
WP1 WMS release 2: status and open issues
Andreas Unterkircher CERN Grid Deployment
Farida Naz Andrea Sciabà
Latest WMS news and more
Preview Testbed Massimo Sgaravatto – INFN Padova
CREAM Status and Plans Massimo Sgaravatto – INFN Padova
Patricia Méndez Lorenzo ALICE Offline Week CERN, 13th July 2007
Accounting at the T1/T2 Sites of the Italian Grid
Massimo Sgaravatto INFN Padova On behalf of the CREAM product team
Update on Plan for KISTI-GSDC
The CREAM CE: When can the LCG-CE be replaced?
Artem Trunov, Günter Quast EKP – Uni Karlsruhe
Grid status ALICE Offline week March 30, Maarten Litmaath CERN-IT v1.1
ALICE – FAIR Offline Meeting KVI (Groningen), 3-4 May 2010
TCG Discussion on CE Strategy & SL4 Move
Francesco Giacomini – INFN JRA1 All-Hands Nikhef, February 2008
WMS Options: DIRAC and GlideIN-WMS
Presentation transcript:

Summary on PPS-pilot activity on CREAM CE D.Cesini (INFN-CNAF) D.Dongiovanni (INFN-CNAF) C.Aiftimiei (INFN-PD)

CREAM PPS Pilot PHASE1 Some of the PPS sites will be gradually requested to replace their lcg-CE with CREAM. Start with one site, published in the PPS BDII and then extend the testbed as needed. To fine-tune the installation tools (YAIM and release notes), To verify the correct interactions of the new services with the monitoring tools Test direct submission to cream CE (in collaboration with ALICE) NOTE: this activity is by no means meant to replace the standard certification of the service. The certification will be carried out in parallel in the usual way and in close synergy with the pilot Cream CEs: -CNAF: cert-ce-03.cnaf.infn.it + 4 virtual WNs using pbs -FZK: pps-cream-fzk.gridka.de ICE WMS: -FZK: pps-rb-fzk.gridka.de -SCAI: glite-wms2.scai.fraunhofer.de Available CLIs: -CNAF: cert-ui-01.cnaf.infn.it -FZK: pps-vobox-fzk.gridka.de (alice prod setup)

Phase1 test result by ALICE ALICE production jobs via CREAM CE (ca. 2000) Alice jobs via lcg-CE The two CEs used have the same hardware The CREAM CE used in this test (PATCH#2415) is now in production. Not the ICE component

The CREAM CE performance ALICE TASK FORCE SLIDE By P.Mendez Lorenzo The CREAM CE performance Stable performance since we put it in production Once the performance was ensured, the number of resources has been decreased to 30-40 CPUs Stability tests (running this summer) shows good results No special baby-sitting required during this summer The system has been running all alone with no special interventions We have changed the ALICE queue to point the CREAM CE to the ALICE production queue (aliceXL)

CREAM PPS Pilot PHASE2 2 WMSs: CNAF and (FZK or SCAI) 1 UI CNAF: cert-ui-01.cnaf.infn.it 1 BDII CNAF: including services in pilot + LCG production 1 VOBOX(if needed): FZK CREAM CEs: -FZK Padova (14 ones, 7 PBS , 7 LSF) Bari CNAF (~ 10 ones) SCAI The CREAM CEs will access production batch systems Phase2, started on the 1st October 2008. It is focused on the performances of the ICE WMS. The objective of Phase2 is to enable CMS users to submit continuously at a rate of 10Kjob/day over 5 weeks

Lastest important pilot updates ….. 12-Dec-08: There is a new yaim-cream-ce (v. 4.0.7-2) in the YUM repo for the CREAM PPS pilot (PATCH:2667). 13-Jan-09: A new version of CREAM was release to the pilot. This version fixes BUG:45437 and BUG:45736. 13-Jan-09: within the SA1 coordination meeting the SA1 ROCs were invited to use the pilot version of CREAM for their regional installation 13-Jan-09: Stress test of the ICE+CREAM submission chain: A submission rate of 40 job/min was sustained but a failure rate higher that expected was observed. The issue is currently under analysis (Done by PD SA3) 13-Jan-09: Pilot end-date moved to mid-March. 20-Jan-09: Alice tested successfully the CLI using the CE at FZK. 03-Feb-09: CMS will start ICE+CREAM submission tests in parallel with PD SA3

Test details on ICE+CREAM (1/2) Tests done by SA3 personnel in PADOVA. A submission rate of 40-45 Jobs/min The failure rate is still higher than desirable. Test starts at Wed Jan 7 16:01:32 CET 2009 (WMS: devel18) Description: 7200 collections each of 40 jobs One collection every 60 seconds Used the CEs of testbedB (PD+CNAF) plus cream-12.pd.infn.it Used automatic-delegation and proxy renewal service Proxy has 5 hours of lifetime (and it is renewed every 4 hours) Collections correctly submitted: 3733 (149320 jobs) DONE OK: 144004 (96.44%) ABORTED: 446 (0.3%) Not finished: 4870 (3.26%) The numbers above were obtained with resubmission on (retrycount=2, shallowretrycount=3) They may be slightly polluted by the fact that 3 of the CEs had a configuration problem with LSF After this test two issues were found on CREAM reported with bugs: #45437 https://savannah.cern.ch/bugs/?45437 ("too many open files" exception raised by the job purger) #45736 https://savannah.cern.ch/bugs/?45736 ("problems in case of resubmission to the same CE")

Test details on ICE+CREAM (2/2) Last Test info: 1 collection of 40 jobs per minute for 5 days : DONE OK: 284838 (99.18%) ABORTED: 0 (0.0%) Not finished: 2362 (0.82%) Resubmissions: 4599 (1.60%) Problems were originated by a CNAF CE were a process was continuously dying and by the blparser crash on the same CE. Those problems were fixed in the following tests. Submitting longer jobs at the same rate (40 jobs/min for 5 days) performance problems arise probably because more jobs are “active” at the same time.

Some infos The WMS used for submission in the pilot is still not delivered to certification. It will be released as an add-on to the WMS with PATCH:2459 . The version of WMS currently in PPS (PATCH:1841) supports submission to CREAM but there are known performance issues. New CREAM+ICE (PATCH:2748,2459) should be 'ready for certification' in mid February, with performance improved The workaround for proxy renewal issue on WNs was delivered to certification with PATCH:2669 and PATCH:2667 . These patches are still in certification (they have been for a month now). The mechanism was tested on the pilot however and hasn't shown any issues The submission via condorG was tried about one month ago by CMS users in Wisconsin which were able to submit to CEs in Padova. No further news received. https://twiki.cern.ch/twiki/bin/view/LCG/PpsPilotCream https://twiki.cern.ch/twiki/bin/view/LCG/EGEE_PPS_Coordination