WLCG Operations Coordination report Maria Dimou / CERN With input and on behalf of the WLCG Operations Coordination team May 2015 GDB CERN indico event.

Slides:



Advertisements
Similar presentations
Operations Coordination Team Maria Girone, CERN IT-ES GDB 10 th October 2012.
Advertisements

WLCG Interaction Stefan Roiser LHCb Computing Operations Workshop 27 Jan ‘15.
The Middleware Readiness Working Group LHCb Computing Workshop LHCb Computing Workshop Maria Dimou IT/SDC 2014/05/22.
Jan 2010 Current OSG Efforts and Status, Grid Deployment Board, Jan 12 th 2010 OSG has weekly Operations and Production Meetings including US ATLAS and.
New VOMS servers campaign GDB, 8 th Oct 2014 Maarten Litmaath IT/SDC.
CERN IT Department CH-1211 Geneva 23 Switzerland t T0 report WLCG operations Workshop Barcelona, 07/07/2014 Maite Barroso, CERN IT.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting.
SRM 2.2: tests and site deployment 30 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN.
SRM 2.2: status of the implementations and GSSD 6 th March 2007 Flavia Donno, Maarten Litmaath INFN and IT/GD, CERN.
Network and Transfer WG Metrics Area Meeting Shawn McKee, Marian Babik Network and Transfer Metrics Kick-off Meeting 26 h November 2014.
News from the HEPiX IPv6 Working Group David Kelsey (STFC-RAL) WLCG GDB, CERN 8 July 2015.
News from the HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP35, Liverpool 11 Sep 2015.
The production deployment of IPv6 on WLCG David Kelsey (STFC-RAL) CHEP2015, OIST, Okinawa 16 Apr 2015.
GGUS summary ( 4 weeks ) VOUserTeamAlarmTotal ALICE ATLAS CMS LHCb Totals 1.
PanDA Multi-User Pilot Jobs Maxim Potekhin Brookhaven National Laboratory Open Science Grid WLCG GDB Meeting CERN March 11, 2009.
Network and Transfer Metrics WG Meeting Shawn McKee, Marian Babik Network and Transfer Metrics WG Meeting 8 th April 2015.
WLCG Service Report ~~~ WLCG Management Board, 1 st September
UK middleware deployment GridPP27 - CERN 15 th September 2011 GridPP27 - CERN 15 th September 2011 Status & plans Jeremy Coles.
MW Readiness Verification Status Andrea Manzi IT/SDC 21/01/ /01/15 2.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES GGUS Overview ROC_LA CERN
CERN11 th February WLCG Ops Coordination [GDB Report] Josep Flix (PIC/CIEMAT) On behalf of the WLCG Operations Coordination Team GDB – CERN.
WLCG operations A. Sciabà, M. Alandes, J. Flix, A. Forti WLCG collaboration workshop July , Barcelona.
MW Readiness WG Update Andrea Manzi Maria Dimou Lionel Cons 10/12/2014.
Information System Status and Evolution Maria Alandes Pradillo, CERN CERN IT Department, Grid Technology Group GDB 13 th June 2012.
Handling ALARMs for Critical Services Maria Girone, IT-ES Maite Barroso IT-PES, Maria Dimou, IT-ES WLCG MB, 19 February 2013.
Report from the WLCG Operations and Tools TEG Maria Girone / CERN & Jeff Templon / NIKHEF WLCG Workshop, 19 th May 2012.
LCG Introduction John Gordon, STFC GDB June 8 th 2011.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
Last update 29/01/ :01 LCG 1Maria Dimou- cern-it-gd Maria Dimou IT/GD CERN VOMS server deployment LCG Grid Deployment Board
GGUS summary (4 weeks) VOUserTeamAlarmTotal ALICE4015 ATLAS CMS LHCb Totals
The GridPP DIRAC project DIRAC for non-LHC communities.
1Maria Dimou- cern-it-gd LCG November 2007 GDB October 2007 VOM(R)S Workshop report Grid Deployment Board.
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE4004 ATLAS CMS LHCb Totals
WLCG Information System Use Cases Review WLCG Operations Coordination Meeting 18 th June 2015 Maria Alandes IT/SDC.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
WLCG Operations Coordination Andrea Sciabà IT/SDC 10 th July 2013.
LCG Issues from GDB John Gordon, STFC WLCG MB meeting September 28 th 2010.
Grid Deployment Board 5 December 2007 GSSD Status Report Flavia Donno CERN/IT-GD.
The Grid Storage System Deployment Working Group 6 th February 2007 Flavia Donno IT/GD, CERN.
The GridPP DIRAC project DIRAC for non-LHC communities.
WLCG Service Report Jean-Philippe Baud ~~~ WLCG Management Board, 24 th August
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
MW Readiness WG Update Andrea Manzi Maria Dimou Lionel Cons Maarten Litmaath On behalf of the WG participants GDB 09/09/2015.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
WLCG Service Report ~~~ WLCG Management Board, 17 th February 2009.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
Ian Bird LCG Project Leader Status of EGEE  EGI transition WLCG LHCC Referees’ meeting 21 st September 2009.
WLCG Service Report ~~~ WLCG Management Board, 10 th November
SRM 2.2: experiment requirements, status and deployment plans 6 th March 2007 Flavia Donno, INFN and IT/GD, CERN.
WLCG Operations Coordination news and meeting restructuring Maria Alandes Pradillo Josep Flix Alessandra Forti Andrea Sciabà WLCG operations coordination.
News from the HEPiX IPv6 Working Group David Kelsey (STFC-RAL) HEPIX, BNL 13 Oct 2015.
The HEPiX IPv6 Working Group David Kelsey (STFC-RAL) EGI OMB 19 Dec 2013.
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE7029 ATLAS CMS LHCb Totals
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
WLCG Accounting Task Force Update Julia Andreeva CERN GDB, 8 th of June,
WLCG Information System Status Maria Alandes Pradillo, CERN CERN IT Department, Support for Distributed Computing Group GDB 9 th September 2015.
WLCG Operations Coordination report Maria Dimou Andrea Sciabà IT/SDC On behalf of the WLCG Operations Coordination team GDB 12 th November 2014.
ALICE WLCG operations report Maarten Litmaath CERN IT-SDC ALICE T1-T2 Workshop Torino Feb 23, 2015 v1.2.
HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP33 Ambleside 22 Aug 2014.
WLCG Operations Coordination Andrea Sciabà IT/SDC GDB 11 th September 2013.
Operations Coordination Team Maria Girone, CERN IT-ES GDB, 11 July 2012.
WLCG IPv6 deployment strategy
Support for IPv6-only CPU – an update from the HEPiX IPv6 WG
Update on SHA-2 and RFC proxy support
Update from the HEPiX IPv6 WG
WLCG and support for IPv6-only CPU
Presentation transcript:

WLCG Operations Coordination report Maria Dimou / CERN With input and on behalf of the WLCG Operations Coordination team May 2015 GDB CERN indico event /05/13 13 May 2015WLCG Ops Coord May 2015 GDB1

Two months passed since the last such report at the GDB, due to the Okinawa Workshop & CHEP. So, these news cover the March 19 th, April 2 nd, April 23 rd (virtual) and May 7 th WLCG Operations Coordination meetings. Minutes’ index here.Minutes’ index here Tier-0 news Experiment reports Status of Task Forces & Working Groups Other news Last word on the Site Survey Conclusions Outline 13 May 2015WLCG Ops Coord May 2015 GDB2

StoRM v is now installed everywhere after MW Readiness testing by QMUL. Many dCache versions made available in these last 2 months: – verified by the MW Readiness WG. – announced as Golden Release. – Reminder: End of June=End of support for v.2.6.x. An openssl security fix was announced on March 19 th. Luckily, it wasn’t affecting WLCG. ARGUS PAP 1.6.2, followed by were announced in this period to fix blocking issues linked with the Java upgrade. FTS3 v is now available for better REST API performance & memory usage. Torque sites should upgrade to version , or downgrade IF they have a problem with version , which was found not compatible with the standard EMI torque installation. WLCG Ops Coord May 2015 GDB Middleware news 13 May 20153

Since May 4 th a batch HTCondor pilot (96 CPUs, 2 ARC CEs) is available for grid submission. Atlas and CMS are starting to use it. Other interested experiments, contact the Tier-0 managers. Batch overload is the most probable reason for CERN’s low WLCG availability figures in March for Atlas and CMS. Argus failures/overload should also be investigated. WLCG Ops Coord May 2015 GDB4 Tier- 0 news 13 May 2015

ALICE – High activity through-out this reporting period (2 months). – Taking advantage of opportunistic resources. – Started automatic replication to the Tier-1s, mid-March. – Most VOBOXes now moved to RFC proxies. – CASTOR instabilities at CERN observed multiple times. ATLAS – Data taking (cosmics, 1beam...) ongoing. Data replication setup and commissioned – Experienced network degradation at/between some sites (CERN, Triumf-RAL). – Reminder: All ATLAS sites should provide multicore resources. – Rucio/FTS issue causes few missing files due to race condition when job submission killed and retried. – Doubled the job length to improve CPU/WALLCLOCK ratio of the jobs and to minimize the draining of the resources at the sites – Getting ready for Run-2, Computing Run Coordinator shifts started and the GGUS ALARMs’ workflow being verified. CMS – Work with cosmics and MC. – After the vomrs switch-off, voms-admin asked all the VO members to re-sign the AUP. – Experienced, temporarily, some problems with various services (EOS, squid, CASTORs, some CEs). – Experienced network saturation on busy weeks. LHCb – Stopped writing into LFC on May 11 th. The LFC will stay in read-only mode until June, it will then be decomissioned by the Tier-0 service managers. – Data access problems experienced with SARA. Network problems reported for RAL. WLCG Ops Coord May 2015 GDB Experiment news 13 May 20155

gLExec Deployment – Testing campaign for gLExec in PanDA on-going. The sites with gLExec are 61 out of 94. SHA-2 Migration TF – The old VOMS server aliases (lcg-)voms.cern.ch were removed on Tue Apr 28 th. All due publicity was made. This completes this TF’s work. Thanks for this long and hard effort! RFC Proxies – RFC proxy readiness to be followed up per experiment: ALICE done (being used at almost all sites where this matters) CMS users are using RFC proxies since months – SAM-Nagios proxy renewal code fix to support RFC proxies, maybe no longer needed after SAM upgrade to UMD-3 (the latest VOMS client enforces the correct proxy type automatically). Machine/Job Features – Started working with UK on deployment of MJF on batch system infrastructures. Also contacted other sites, still low priority issue for them. WLCG Ops Coord May 2015 GDB Task Forces (TF) news I 13 May 20156

Multicore Deployment – New memory schemes for memory parameters passing to the batch systems were tested for ARC-CE/HTCondor & CREAM sites in the UK. IPv6 Validation & Deployment TF – FTS3 testbed operational with servers at KIT and Imperial College. – Many sites activated IPv6 on the LHCOPN & LHCONE. – OSG is testing glideinWMS with IPv6. Observation: they run squid2 but only squid3 is IPv6 compliant. – LHCb tested IPv6-compatible DIRAC and discovered a CERN python lib that returned a wrong IPv6 address, hence 50% of client connections failing. Squid Monitoring and HTTP Proxy Discovery – NTR HTTP Deployment: – The kick-off meeting took place on April 29 th to agree on Mandate, priorities, methods, documentation. Agenda here.Agenda here WLCG Ops Coord May 2015 GDB Task Forces (TF) news II 13 May 20157

Middleware Readiness – Two meetings took place since the last such GDB report, on March 18 th & on May 6 th (minutes).minutes – After one year of full Readiness Verification activity, the WG is making a check- point of goals and priorities. – ATLAS and CMS were invited to review their workflow twikis for possible changes in the MW products to verify.workflow twikis – LHCb and ALICE are invited to declare if and for which products they plan to contribute to the MW Readiness WG. – The pakiti client is available for reporting the rpms under verification by the Volunteer sites. Installation instructions here.Installation instructions here – Another application is being developed to offer an automated way to display Baseline versions instead of the current, manually updated, table.table – All this is possible thanks to the dedication of the MW Officer, the quality code of our pakiti expert &, above all, the active participation of the Volunteer sites. – The next vidyo MW Readiness WG meeting will take place on Wednesday June 17th at 4pm CEST WLCG Ops Coord May 2015 GDB Working Groups (WG) news I 13 May 20158

Network and Transfer Metrics WG – The WG held 3 meetings in this reporting period, on March 18 th, April 8 th & May 6 th. – All WLCG sites to make sure they run PerfSONAR toolkit v pSPS & have NDT/NPAD disabled & auto- update enabled. – A network performance follow-up procedure was put in place. Document here.here – FTS performance study integrated in the WG. – Preparing a dedicated GGUS Support Unit, for network issues, clearly showing the distinction from the existing ones: “Network Operations” & “WLCG PerfSONAR Support”. – Next meeting: June 3 rd at 4pm CEST. WLCG Ops Coord May 2015 GDB WG news II 13 May 20159

A presentation on Data Preservation activities was given at the April 2 nd WLCG Ops Coord meeting. Slides here. Slides here EGI, in a presentation given at the May 7 th WLCG Ops Coord meeting, confirmed that CentOS7 is not yet a viable option for sites, but many products are ready for it and more will be in the coming months. Slides here.Slides here There were 2 GGUS Releases since the last such GDB report, on March 23 rd & April 29 th. Release notes here. Next release on June 24 th (no May release!)Release notes here. WLCG Ops Coord May 2015 GDB Other news 13 May

This was the first time a survey of this amplitude was launched in the 10 years of WLCG. It was also the first and most significant occasion to collect feedback from the Tier2 and Tier3 sites, with which meeting opportunities are not as frequent, as desired. The WLCG Ops Coord Team is grateful to the site managers for their detailed feedback. Detailed presentations were given in Okinawa. All details here.All details here The results analysis showed possible areas of improvement, including: – The reduction and simplification of services deployed at sites (to be followed-up with MW providers). – The documentation quality & ease of retrieval (an operations portal). – The information exchange within the community. – The adoption of common solutions & industry standards. The WLCG Operations team will have to define and implement a roadmap describing all these measures in detail in order to optimise the WLCG operational cost. WLCG Ops Coord May 2015 GDB11 A last word on the Site Survey 13 May 2015

Conclusions Cost Optimisation conclusions deduced by the Site Survey presented at Okinawa and written in the relevant CHEP paper. Next meeting on May 21 st at 3:30pm CEST. WLCG Ops Coord May 2015 GDB12