CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t CHEP 2009, Monday 26rd March 2009 (Prague) Patricia Méndez Lorenzo on behalf of the IT/GS-EIS.

Slides:



Advertisements
Similar presentations
EGEE-II INFSO-RI Enabling Grids for E-sciencE The gLite middleware distribution OSG Consortium Meeting Seattle,
Advertisements

CREAM: Update on the ALICE experiences WLCG GDB Meeting Patricia Méndez Lorenzo (IT/GS) CERN, 11th March 2009.
Analysis demos from the experiments. Analysis demo session Introduction –General information and overview CMS demo (CRAB) –Georgia Karapostoli (Athens.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Patricia Méndez Lorenzo (IT/GS) ALICE Offline Week (18th March 2009)
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
CERN - IT Department CH-1211 Genève 23 Switzerland t LCG Deployment GridPP 18, Glasgow, 21 st March 2007 Tony Cass Leader, Fabric Infrastructure.
CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Julia Andreeva CERN (IT/GS) CHEP 2009, March 2009, Prague New job monitoring strategy.
PanDA Multi-User Pilot Jobs Maxim Potekhin Brookhaven National Laboratory Open Science Grid WLCG GDB Meeting CERN March 11, 2009.
LCG Plans for Chrsitmas Shutdown John Gordon, STFC-RAL GDB December 10 th, 2008.
Grid Workload Management Massimo Sgaravatto INFN Padova.
Status of the production and news about Nagios ALICE TF Meeting 22/07/2010.
Grid job submission using HTCondor Andrew Lahiff.
WLCG GDB, CERN, 10th December 2008 Latchezar Betev (ALICE-Offline) and Patricia Méndez Lorenzo (WLCG-IT/GS) 1.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Angela Poschlad (PPS-FZK), Antonio Retico.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware.
Glexec, SCAS & CREAM. Milestones CREAM-CE capable of large-scale direct job submission Glexec & SCAS capable of large-scale use on WN in logging only.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The usage of the gLite Workload Management.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Priorities update Andrea Sciabà IT/GS Ulrich Schwickerath IT/FIO.
CREAM: ALICE Experience WLCG GDB Meeting, CERN 11th November 2009 Stefano Bagnasco (INFN-Torino), Jean-Michel Barbet (Subatech), Latchezar Betev (ALICE),
Experiment Operations: ALICE Report WLCG GDB Meeting, CERN 14th October 2009 Patricia Méndez Lorenzo, IT/GS-EIS.
1 WLCG-GDB Meeting. CERN, 12 May 2010 Patricia Méndez Lorenzo (CERN, IT-ES)
LCG Report from GDB John Gordon, STFC-RAL MB meeting February24 th, 2009.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
CERN IT Department CH-1211 Geneva 23 Switzerland t A proposal for improving Job Reliability Monitoring GDB 2 nd April 2008.
Workload management, virtualisation, clouds & multicore Andrew Lahiff.
Julia Andreeva on behalf of the MND section MND review.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.
DIRAC Pilot Jobs A. Casajus, R. Graciani, A. Tsaregorodtsev for the LHCb DIRAC team Pilot Framework and the DIRAC WMS DIRAC Workload Management System.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
Report from GSSD Storage Workshop Flavia Donno CERN WLCG GDB 4 July 2007.
Patricia Méndez Lorenzo (CERN, IT/GS-EIS) ċ. Introduction  Welcome to the first ALICE T1/T2 tutorial  Delivered for site admins and regional experts.
The ATLAS Strategy for Distributed Analysis on several Grid Infrastructures D. Liko, IT/PSS for the ATLAS Distributed Analysis Community.
SL5 Site Status GDB, September 2009 John Gordon. LCG SL5 Site Status ASGC T1 - will be finished before mid September. Actually the OS migration process.
Christmas running post- mortem (Part III) ALICE TF Meeting 15/01/09.
Criteria for Deploying gLite WMS and CE Ian Bird CERN IT LCG MB 6 th March 2007.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
CERN IT Department CH-1211 Genève 23 Switzerland t SL(C) 5 Migration at CERN CHEP 2009, Prague Ulrich SCHWICKERATH Ricardo SILVA CERN, IT-FIO-FS.
New solutions for large scale functional tests in the WLCG infrastructure with SAM/Nagios: The experiments experience ES IT Department CERN J. Andreeva.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
LCG Pilot Jobs + glexec John Gordon, STFC-RAL GDB 7 December 2007.
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
Current status WMS and CREAM CE deployment Patricia Mendez Lorenzo ALICE TF Meeting (CERN, 02/04/09)
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.
WLCG Service Report ~~~ WLCG Management Board, 17 th February 2009.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
GRID interoperability and operation challenges under real load for the ALICE experiment F. Carminati, L. Betev, P. Saiz, F. Furano, P. Méndez Lorenzo,
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
CREAM CE: upgrades in the system  Migration of the ALICE production queue in the CREAM CE: DONE  From pps-cream-fzk.gridka.de:8443/cream-pbs-pps to.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The Common Solutions Strategy of the Experiment Support group.
CMS Experience with the Common Analysis Framework I. Fisk & M. Girone Experience in CMS with the Common Analysis Framework Ian Fisk & Maria Girone 1.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Management Claudio Grandi.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CREAM: current status and next steps EGEE-JRA1.
Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
The ALICE Christmas Production L. Betev, S. Lemaitre, M. Litmaath, P. Mendez, E. Roche WLCG LCG Meeting 14th January 2009.
WLCG IPv6 deployment strategy
ALICE Workload Model – WMS and CREAM
Summary on PPS-pilot activity on CREAM CE
The CREAM CE: When can the LCG-CE be replaced?
Summary from last MB “The MB agreed that a detailed deployment plan and a realistic time scale are required for deploying glexec with setuid mode at WLCG.
ALICE – FAIR Offline Meeting KVI (Groningen), 3-4 May 2010
Presentation transcript:

CERN IT Department CH-1211 Genève 23 Switzerland t CHEP 2009, Monday 26rd March 2009 (Prague) Patricia Méndez Lorenzo on behalf of the IT/GS-EIS section (CERN) Massimo Sgaravatto (INFN-Padova) Jaime Frey (Condor/Wisconsin) Burt Holzman (FNAL) Frank Wuerthwein (UCSD Sanjay Padhi (UCSD)

CREAM-CE for the 4 LHC VOs - 2 Outlook This talk will concentrate on the CREAM-CE perspective from the applications point of view 1.Migration plan in WLCG: from lcg-CE to CREAM-CE 2.Implementation of the CREAM-CE into the experiment computing models 3.First experiences by using the service The talk does not contain technical specifications about the CREAM-CE service itself –Follow: [143] « Using CREAM and CEMON for job submission and management in the gLite middelware » (Massimo Sgaravatto et al.). 26th March in the session: Grid Middleware and Networking Technologies

The CREAM-CE CREAM (Computing Resource Execution And Management)  lightweight service for job management operations at the CE level Planned as replacement of the current LCG-CE Submission procedures allowed by CREAM: –Submissions to CREAM via WMS –Direct Submission via generic clients The submission method depends basically on the experiment computing model –Normally pilot based follows the direct submission mode approach (The 4 LHC experiments) –Bulk submissions of real jobs follow the WMS submission approach (CMS) CREAM-CE for the 4 LHC VOs - 3

WLCG Statement Nov 2008 (WLCG GDB Meeting). CREAM Status: –WMS submission via ICE to CREAM was still not available –Proxy renewal mechanism of the system not optimized –In addition: Condor-G submission to CREAM was not enabled At that point, these factors were preventing the setup of the system in production (also in testing) for ATLAS and CMS, but not for ALICE and LHCb –The highest priorities for LHCb were (and are) glexec and SLC5 –The highest priority for ALICE was (and is) CREAM The experiment began to put CREAM in production in Summer 2008 –One experiment was ready to stress the system Good oportunity for the experiment, developers and site admins to gain experience with the new system WLCG encouraged sites to provide a CREAM-CE system in parallel to the current LCG-CE CREAM-CE for the 4 LHC VOs - 4

The Migration Plan (Some) Technical issues –Condor-G submission to CREAM must be in production (ATLAS and CMS) Status: Tests ongoing (CMS) –ICE enabled WMS must be in production (CMS) Status: The patch is ready for certification –Robust proxy renewal mechanism (sites admins) Status: Already in production (Some) Scalability aspects (still to be checked) –At least 5000 simultaneous jobs per CE node –Unlimited number of user/role/submission node combinations from many VO's (at least 50), up to the limit of the number of jobs supported on a CE node –Job failure rates (<0.1% due to CE normal operations, restart and reboot) –1 month unattended running without significant performance degradation (ALICE experience) CREAM-CE for the 4 LHC VOs - 5

Implementation and Experiences in ALICE ALICE is interested in the deployment of the CREAM-CE service at all sites which provide support to the experiment –GOAL: Deprecation of the WMS use in benefit of the direct CREAM-CE submission –WMS submission mode to CREAM-CE not required The experiment is not limited by the issues observed while using the WMS submission mode –In addition the proxy renewal feature was neither required 48h voms extensions ensured by the security Enough to run production/analysis jobs without any addition extension ALICE has began to test the CREAM-CE since the beginning of Summer 2008 into the real production environment ALICE priority list: –CREAM-CE –SLC5 (DONE) –glexec/SCAS (Beginning of the summer 2009) CREAM-CE for the 4 LHC VOs - 6

Implementation and Experiences in ALICE The 1st test phase of the CREAM-CE –Performed in summer 2008 at FZK (T1 site, Germany) –Tests operated through a second VOBOX parallel to the already existing service at the T1 (operating in WMS submission mode) –Access to the local CREAM-CE was ensured through the PPS infrastructure Initially 30 CPUs Moved to the ALICE production queue in few weeks (production setup) –Intensive functionality ans stability tests from July to September 2008 Production stopped to create an ALICE CREAM module into AliEn and to allow the site to upgrade their system –Results: More than jobs successfully executed through the CREAM-CE in the mentioned period No interventions in the VOBOX required during the testing phase CREAM-CE for the 4 LHC VOs - 7

Implementation and Experiences in ALICE The 2nd test phase of the CREAM-CE –Full implementation of the CREAM-CE into the AliEn environment –After a debug phase of the CREAM module in January 2009, the new CREAM module in production the 19th of February (2 nd testing phase started) –Stability and performance are currently the most important test issues at the sites providing CREAM-CE –The deployment of a 2 nd VOBOX ensures that the production will continue on parallel through the WMS A unique VOBOX would require a dedicated babysitting of the system –Feedback of all issues are directly provided to the CREAM developers –As of today, 12 sites are providing CREAM CE CERN, KISTI, INFN-Torino, CNAF, RAL, FZK, Kolkata, IHEP, SARA, Subatech, GSI, Athens –During the 2 nd test phase more than jobs have been successfully executed through all CREAM-CE system CREAM-CE for the 4 LHC VOs - 8

Implementation and Experiences in ATLAS The experiment is interested in the direct submission feature of the system ATLAS requires the Condor-G submission enabled to the CREAM-CE –At this point the implementation of CREAM-CE into the ATLAS environment will be transparent –We will see the same requirement in CMS ATLAS backends –PANDA (main backend for production and analysis jobs) wll implement the direct submission to CREAM-CE via Condor-G Using Pathena as UI –WMS or local batch system and also PANDA Using Ganga as UI CREAM-CE for the 4 LHC VOs - 9

Implementation and Experiences in ATLAS ATLAS has not yet performed any test with the CREAM-CE system The CREAM-CE testing is not scheduled as the highest priority ATLAS –Assuming a non decrease of available resources Parallel LCG-CE vs. CREAM-CE setup at the sites ensures the maintenance of the resources –ATLAS assumes a transparent transition to CREAM-CE once the Condor-G submission is enabled CREAM-CE for the 4 LHC VOs - 10

Implementation and Experiences in CMS Condor is able to submit to any Globus CE –This procedure currently excludes: NorduGrid and CREAM –GAHP (Globus/Grid Ascii Helper Programs) is used in both cases to as translator between NorduGrid/CREAM- Condor CREAM-CE for the 4 LHC VOs - 11

Implementation and Experiences in CMS Translation made from Condor-G (considered as the upper leyer) to CREAM CREAM-CE for the 4 LHC VOs - 12

Implementation and Experiences in CMS CMS test schedule: –1 st Phase (01 Nov - 30 Nov 2008): Initial goal: Initial communication and states translantion performing well –2 nd Phase (01 Feb - 28 Feb 2009): Objective: to ensure the functionalities of Phase 1 with respect to any change in the software version CE –Input, output sandbox inclusion, Prototype the Integration with the glideinWMS, Small scale user jobs with glideinWMS + CRAB/Crabserver –3 rd Phase (15 April - Summer 2009): Based on how EGEE moves from Pre-Production sites to Production/Certification of the software and the status of ICE-based WMS, this phase can have a wide range of goals –In terms of production implementation CREAM-CE for the 4 LHC VOs - 13

Implementation and Experiences in LHCb LHCb priority list: –glexec tests –SLC5 –CREAM-CE Successfull glexec and SLC5 tests can allow an start up of the CREAM- CE tests in about 1 month –LHCb faces a similar situation to ALICE Direct job submission mode Implementation of APIs into DIRAC No requirements in terms of ICE/WMS, Condor-G, proxy renewal…. –DIRAC was therefore ready to begin the implementation of CREAM-CE at the same time that ALICE was The CREAM-CE implementation in Dirac only when CREAM-CE is distributed (at least) through all T1 sites –Dirac will not maintain different implementation for different sites and backends CREAM-CE for the 4 LHC VOs - 14

Implementation and Experiences in LHCb LHCb requirement for CEMon: –LHCb requires the CEMon architecture available in CREAM (DONE) General framework for managing and publishing information Allows synchronous and asynchronous connections –CEMon architecture foresees a core component and one or more sensors: CREAM job sensor CE sensor OSG sensor –When submitting to CREAM via WMS, CEMon with the CREAM job sensor is used CREAM includes the CEMon core and the CREAM and CE sensors –When submiting in direct mode, CEMon is not used in default However the use of CEMon also in direct submission mode is not prevented: With the CREAM job sensor to be notified about job status changes With the CREAM CE sensor to be notified about the CE status  Use case of LHCb Presentation title - 15

Summary and Conclusions All LHC VO experiments expressed their interest to use the CREAM-CE in direct submission mode –CMS foresees also the usage of the system via WMS Condor-G interfaced with CREAM-CE is required by ATLAS and CMS –CMS is currently testing the setup with quite promissing results The proxy renewal feature of CREAM is the only pending issue in terms of patch distribution to all sites –ATLAS will wait until its full deployment ALICE and LHCb can already use the current setup of CREAM-CE –ALICE is succesfully testing the system since Summer 2008 –LHCb foresees the testing of the sysem in bout 1 month WLCG encourages all LHC sites to provide CREAM-CE services to all experiments in parallel mode to the LCG-CE –This procedure will give experiments the oportunity tocheck the system Provide good feedbacks to developers and site admins Speeding up therefore the migration phase CREAM-CE for the 4 LHC VOs - 16