Current status WMS and CREAM CE deployment Patricia Mendez Lorenzo ALICE TF Meeting (CERN, 02/04/09)

Slides:



Advertisements
Similar presentations
LCG-France Project Status Fabio Hernandez Frédérique Chollet Fairouz Malek Réunion Sites LCG-France Annecy, May
Advertisements

CREAM: Update on the ALICE experiences WLCG GDB Meeting Patricia Méndez Lorenzo (IT/GS) CERN, 11th March 2009.
CREAM John Gordon GDB November CREAM number of sites now – gstat2 says 24. Batch systems supported Experiment Tests Feedback from sites. Evaluation.
Patricia Méndez Lorenzo (IT/GS) ALICE Offline Week (18th March 2009)
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
CERN - IT Department CH-1211 Genève 23 Switzerland t LCG Deployment GridPP 18, Glasgow, 21 st March 2007 Tony Cass Leader, Fabric Infrastructure.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz (IT-ES) AliEn job agents.
LCG Plans for Chrsitmas Shutdown John Gordon, STFC-RAL GDB December 10 th, 2008.
Status of the production and news about Nagios ALICE TF Meeting 22/07/2010.
Grid infrastructure analysis with a simple flow model Andrey Demichev, Alexander Kryukov, Lev Shamardin, Grigory Shpiz Scobeltsyn Institute of Nuclear.
WLCG Service Report ~~~ WLCG Management Board, 1 st September
WLCG GDB, CERN, 10th December 2008 Latchezar Betev (ALICE-Offline) and Patricia Méndez Lorenzo (WLCG-IT/GS) 1.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Angela Poschlad (PPS-FZK), Antonio Retico.
MW Readiness Verification Status Andrea Manzi IT/SDC 21/01/ /01/15 2.
Status of the Production and Nagios news ALICE TF Meeting 29/07/2010.
GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!
Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware.
Status of PDC’07 and user analysis issues (from admin point of view) L. Betev August 28, 2007.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The usage of the gLite Workload Management.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
 Status of the ALICE Grid Patricia Méndez Lorenzo (IT)ALICE OFFLINE WEEK, CERN 18 October 2010.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Priorities update Andrea Sciabà IT/GS Ulrich Schwickerath IT/FIO.
CREAM: ALICE Experience WLCG GDB Meeting, CERN 11th November 2009 Stefano Bagnasco (INFN-Torino), Jean-Michel Barbet (Subatech), Latchezar Betev (ALICE),
Experiment Operations: ALICE Report WLCG GDB Meeting, CERN 14th October 2009 Patricia Méndez Lorenzo, IT/GS-EIS.
1 WLCG-GDB Meeting. CERN, 12 May 2010 Patricia Méndez Lorenzo (CERN, IT-ES)
LCG Report from GDB John Gordon, STFC-RAL MB meeting February24 th, 2009.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Status of the Production ALICE TF MEETING 11/02/2010.
WLCG Service Report ~~~ WLCG Management Board, 18 th September
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The LCG interface Stefano BAGNASCO INFN Torino.
Patricia Méndez Lorenzo (CERN, IT/GS-EIS) ċ. Introduction  Welcome to the first ALICE T1/T2 tutorial  Delivered for site admins and regional experts.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES L. Betev, A. Grigoras, C. Grigoras, P. Saiz, S. Schreiner AliEn.
SL5 Site Status GDB, September 2009 John Gordon. LCG SL5 Site Status ASGC T1 - will be finished before mid September. Actually the OS migration process.
Christmas running post- mortem (Part III) ALICE TF Meeting 15/01/09.
Criteria for Deploying gLite WMS and CE Ian Bird CERN IT LCG MB 6 th March 2007.
INFSO-RI Enabling Grids for E-sciencE gLite Test and Certification Effort Nick Thackray CERN.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Requirements Status EGI.eu UCB
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
PDC’06 - status of deployment and production Latchezar Betev TF meeting – April 27, 2006.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.
WLCG Service Report ~~~ WLCG Management Board, 17 th February 2009.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
GRID interoperability and operation challenges under real load for the ALICE experiment F. Carminati, L. Betev, P. Saiz, F. Furano, P. Méndez Lorenzo,
CERN IT Department CH-1211 Genève 23 Switzerland t CHEP 2009, Monday 26rd March 2009 (Prague) Patricia Méndez Lorenzo on behalf of the IT/GS-EIS.
CREAM CE: upgrades in the system  Migration of the ALICE production queue in the CREAM CE: DONE  From pps-cream-fzk.gridka.de:8443/cream-pbs-pps to.
LHCb 2009-Q4 report Q4 report LHCb 2009-Q4 report, PhC2 Activities in 2009-Q4 m Core Software o Stable versions of Gaudi and LCG-AA m Applications.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Management Claudio Grandi.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
INFN/IGI contributions Federated Clouds Task Force F2F meeting November 24, 2011, Amsterdam.
The ALICE Production Patricia Méndez Lorenzo (CERN, IT/PSS) On behalf of the ALICE Offline Project LCG-France Workshop Clermont, 14th March 2007.
Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
ALICE WLCG operations report Maarten Litmaath CERN IT-SDC ALICE T1-T2 Workshop Torino Feb 23, 2015 v1.2.
WLCG Operations Coordination Andrea Sciabà IT/SDC GDB 11 th September 2013.
The ALICE Christmas Production L. Betev, S. Lemaitre, M. Litmaath, P. Mendez, E. Roche WLCG LCG Meeting 14th January 2009.
Status of the SL5 migration ALICE TF Meeting
The EDG Testbed Deployment Details
ALICE Workload Model – WMS and CREAM
gLite->EMI2/UMD2 transition
Latest WMS news and more
Service Operations at the T0/T1 for the ALICE Experiment
Status of the Production
Summary on PPS-pilot activity on CREAM CE
CREAM Status and Plans Massimo Sgaravatto – INFN Padova
Patricia Méndez Lorenzo ALICE Offline Week CERN, 13th July 2007
Short update on the latest gLite status
Summary from last MB “The MB agreed that a detailed deployment plan and a realistic time scale are required for deploying glexec with setuid mode at WLCG.
ALICE – FAIR Offline Meeting KVI (Groningen), 3-4 May 2010
The LHCb Computing Data Challenge DC06
Presentation transcript:

Current status WMS and CREAM CE deployment Patricia Mendez Lorenzo ALICE TF Meeting (CERN, 02/04/09)

WMS: some highlights  In December 2008 ALICE finished the migration of all sites to a WMS submission approach  The instabilities found in the system has forced the experiment and the support to babysit continuously the system and the production  This procedure does not scale in a real data taking approach (in few months)  ALICE has not changed the submission procedure defined even before 2006 DC  IMHO is not the experiment chaging the submission procedure because a new service is not providing the corresponding stability  It is the service coping with the experiment requirements and computing model, not the opposite  Let’s stop:  saying that this issue affectes ALICE only: It is simply NOT TRUE  Daily I see similar issues with Geant4, Lattice QCD, sixT.  Asking ALICE to change the submission procedure  It is not realistic at this point, in addition not see the point of changing one workload management system due to (not well understood) instabilities in a service

ALICE approach  ALICE requires deployment of the CREAM-CE at all sites  This is the highest priority  Sites might be excluded of the production if the service is not provided  The experiment therefore will not maintain a new submission procedure for some months  Intermedium time from WMS to CREAM  In addition both systems must be maintain together  bulk submission is not supported to the CLI level yet by CREAM  It is not realistic to have 2 submission approaches at this time by NONE application

Status of the WMS in production  Distribution of WMS in the ALICE production  For T0 site  Optimal situation: 3 WMS covering the production and the Pass 1 reconstruction at the T0 only  The reality: Each node has achieved a limit of 13K jobs/day (confirmed by the WMS operation experts). In addition these nodes have to cope with the instabilities of external WMS  For T1 sites  Optimal situation: Each T1 site should provide at least 2 WMS which should be dedicated in the case of many depending T2 sites in the country  The reality: This affects basically Italy and France and it is ensured by Italy  For T2 sites  Optimal situation:Large federations WITHOUT a regional T1 should follow the structure asked for the T2 sites (case of Russia)  The reality: the available T1 WMS must fly from one T2 to another depending on the daily overload status

Some trues and some lies about the ALICE Submission procedure and the WMS  The latest WMS mega-patch solves the overloding issues observed in gLite3.0: FALSE  We have not seen huge backlogs anymore: TRUE  The ALICE submission procedure has changed in the last time producing the instabilities observed in some WMS: FALSE  The experiment tried to accomodate as much as possible the submission procedure to WMS within their own computing model limits: TRUE  Same WMS configuration file as in  Proxy renewal trigered only once per hour  RESUBMISSION FEATURE OF THE WMS DISCARTED BY THE EXPERIMENT AT THE JDL LEVEL SINCE FEB2009  ALICE is therefore using the WMS to a tree level (RB mode)  All the rest of the features are simply not used and not required

WHAT WAS HAPPENING IN FRANCE?  Issues in GRIF and CCIN2P3 are totaly uncorrelated  GRIF  grid33.lal.in2p3.fr got overloaded yesterday  In addition it was announced that ALICE was overloading the CE  Resubmission approach was discarted  Number of jobs not visible in the IS not the LB (later on)  CCIN2P3  This is the unique VO supporting CE in the T1 and T2  CEs with different ranks  This situation was fulfilling one CE (best ranking) leaving the rest of CE empty  The query to the info system was providing 0 waiting jobs for those (worse ranking) CE and therefore the system kept on submitting jobs  T1 and T2 clisters will be separated in different VOBOXES

Status of the CREAM-CE  New sites providing CREAM-CE:  RU-SPbSU (under testing)  Prague (still to be tested)  Subatech (still to be tested)  Already existing sites with production infrastructures:  FZK (just upgraded to the next version)  Kolkata (performing fine)  KISTI (no issues)  GSI (pending the setup in production)  RAL (no issues)  CNAF (no issues)  CERN (moving the system from SLC5 to SLC4 to increase the number of resources)  Torino (no issues)  SARA (no issues)