Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 AMOD report 29.8.2011 – 4.9.2011 Fernando H. Barreiro Megino CERN-IT-ES-VOS.

Slides:



Advertisements
Similar presentations
CERN IT Department CH-1211 Geneva 23 Switzerland t ES 1 Alessandro Di Girolamo Experiment Support 4 July 2011 AMOD weekly report 27 th June.
Advertisements

Graeme Stewart: ATLAS Computing WLCG Workshop, Prague ATLAS Suspension and Downtime Procedures Graeme Stewart (for ATLAS Central Operations Team)
BE-CO work for the TS Nov 8 Nov 11P.Charrue - BE/CO - LBOC1.
Saturday May 02 PST 4 PM. Saturday May 02 PST 10:00 PM.
AMOD Report Doug Benjamin Duke University. Hourly Jobs Running during last week 140 K Blue – MC simulation Yellow Data processing Red – user Analysis.
WLCG ‘Weekly’ Service Report ~~~ WLCG Management Board, 22 th July 2008.
WLCG Service Report (for the SCOD team) ~~~ WLCG Management Board, 22 nd January 2013 Thanks to Maria Dimou, Mike Kenyon, David.
AMOD Report Simone Campana CERN IT-ES. Grid Services A very good week for sites – No major issues for T1s and T2s The only one to report is
WLCG Service Report ~~~ WLCG Management Board, 18 th August
AMOD Weekly report (Ale, Alexei, Jarka) Doug Benjamin (AMOD shadow)
WLCG Service Report ~~~ WLCG Management Board, 27 th January 2009.
WLCG Service Report ~~~ WLCG Management Board, 27 th October
AMOD Report Doug Benjamin Duke University. Running Jobs last 7 days 120K MC sim Users MC Rec Group.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
SRM 2.2: status of the implementations and GSSD 6 th March 2007 Flavia Donno, Maarten Litmaath INFN and IT/GD, CERN.
GGUS summary (7 weeks) VOUserTeamAlarmTotal ALICE ATLAS CMS LHCb Totals 1 To calculate the totals for this slide and copy/paste the usual graph please:
WLCG Service Report ~~~ WLCG Management Board, 24 th November
CCRC’08 Weekly Update ~~~ WLCG Management Board, 27 th May 2008.
WLCG Service Report ~~~ WLCG Management Board, 1 st September
AMOD Report October 22-28, 2012 Torre Wenaus With thanks to Alexei Sedov, shadow shifter October 30, 2012.
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
CERN - IT Department CH-1211 Genève 23 Switzerland WLCG 2009 Data-Taking Readiness Planning Workshop Tier-0 Experiences Miguel Coelho dos.
OFFLINE TRIGGER MONITORING TDAQ Training 5 th November 2010 Ricardo Gonçalo On behalf of the Trigger Offline Monitoring Experts team.
Andrea Sciabà CERN CMS availability in December Critical services  CE, SRMv2 (since December) Critical tests  CE: job submission (run by CMS), CA certs.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Direct gLExec integration with PanDA Fernando H. Barreiro Megino CERN IT-ES-VOS.
Graeme Stewart: ATLAS Computing WLCG Workshop, Prague ATLAS Suspension and Downtime Procedures Graeme Stewart (for ATLAS Central Operations Team)
WLCG Service Report ~~~ WLCG Management Board, 9 th August
EGI-InSPIRE EGI-InSPIRE RI DDM Site Services winter release Fernando H. Barreiro Megino (IT-ES-VOS) ATLAS SW&C Week November
Alberto Aimar CERN – LCG1 Reliability Reports – May 2007
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
WLCG Service Report ~~~ WLCG Management Board, 16 th December 2008.
Online System Status LHCb Week Beat Jost / Cern 9 June 2015.
GGUS Slides for the 2012/07/24 MB Drills cover the period of 2012/06/18 (Monday) until 2012/07/12 given my holiday starting the following weekend. Remove.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
GGUS summary (4 weeks) VOUserTeamAlarmTotal ALICE1102 ATLAS CMS LHCb Totals
Busy Storage Services Flavia Donno CERN/IT-GS WLCG Management Board, CERN 10 March 2009.
EGI-InSPIRE EGI-InSPIRE RI DDM solutions for disk space resource optimization Fernando H. Barreiro Megino (CERN-IT Experiment Support)
WLCG Service Report ~~~ WLCG Management Board, 7 th September 2010 Updated 8 th September
CERN IT Department CH-1211 Genève 23 Switzerland t Streams Service Review Distributed Database Workshop CERN, 27 th November 2009 Eva Dafonte.
WLCG Service Report ~~~ WLCG Management Board, 7 th July 2009.
GGUS summary (4 weeks) VOUserTeamAlarmTotal ALICE4015 ATLAS CMS LHCb Totals
WLCG Service Report ~~~ WLCG Management Board, 16 th September 2008 Minutes from daily meetings.
WLCG Service Report ~~~ WLCG Management Board, 31 st March 2009.
WLCG Service Report ~~~ WLCG Management Board, 7 th June
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Data Management Highlights in TSA3.3 Services for HEP Fernando Barreiro Megino,
WLCG Service Report ~~~ WLCG Management Board, 18 th September
WLCG Service Report ~~~ WLCG Management Board, 23 rd November
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE4004 ATLAS CMS LHCb Totals
WLCG ‘Weekly’ Service Report ~~~ WLCG Management Board, 5 th August 2008.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
CERN IT Department CH-1211 Geneva 23 Switzerland t Distributed Database Operations Workshop CERN, 17th November 2010 Dawid Wójcik Streams.
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
WLCG Service Report ~~~ WLCG Management Board, 20 th January 2009.
WLCG Service Report ~~~ WLCG Management Board, 9 th February
WLCG Service Report ~~~ WLCG Management Board, 14 th February
WLCG Service Report Jean-Philippe Baud ~~~ WLCG Management Board, 24 th August
ATLAS Distributed Computing ATLAS session WLCG pre-CHEP Workshop New York May 19-20, 2012 Alexei Klimentov Stephane Jezequel Ikuo Ueda For ATLAS Distributed.
Summary of SC4 Disk-Disk Transfers LCG MB, April Jamie Shiers, CERN.
WLCG Service Report ~~~ WLCG Management Board, 10 th November
Analysis of Service Incident Reports Maria Girone WLCG Overview Board 3 rd December 2010, CERN.
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE7029 ATLAS CMS LHCb Totals
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon
WLCG Service Report ~~~ WLCG Management Board, 9 th December 2008.
WLCG ‘Weekly’ Service Report ~~~ WLCG Management Board, 19 th August 2008.
WLCG Management Board, 30th September 2008
James Casey, IT-GD, CERN CERN, 5th September 2005
Elizabeth Gallas - Oxford ADC Weekly September 13, 2011
Grid status ALICE Offline week Nov 3, Maarten Litmaath CERN-IT v1.0
Farida Fassi, Damien Mercie
Dirk Duellmann ~~~ WLCG Management Board, 27th July 2010
Presentation transcript:

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI AMOD report – Fernando H. Barreiro Megino CERN-IT-ES-VOS 10/22/2015 AMOD report –

EGI-InSPIRE RI Summary Technical Stop for LHC and ATLAS No shifters during the whole week Stable beam expected for tomorrow Overall very quite week. Few highlights caused by Interventions scheduled during this period Hurricane Irene HLT reprocessing problems AMOD report –

EGI-InSPIRE RI DDM summary All activities Sources Destinations

EGI-InSPIRE RI Panda summary AMOD report –

EGI-InSPIRE RI Monday CERN LFC migration Sunday set CERN offline in Panda Monday 8:30 CERN excluded from DDM Intervention as planned: ~9:20-11:30 Update of LFC information in information systems and services 12:30 re-included in DDM ~17:00 online in Panda CERN CASTORATLAS DB patch and defragmentation CERN ATONR rolling intervention BNL emergency downtime Sat. evening – Monday evening Frontier fallback mechanisms failed – US cloud offline in Panda ~19:00 back in DDM ~22:30 testjobs and cloud online in Panda Decision to postpone dCache upgrade until next technical stop ASGC in downtime for CASTOR upgrade to ASGC affected by broken 10Gbps link between Chicago and Amsterdam 10/22/2015 AMOD report –

EGI-InSPIRE RI Tuesday Wednesday CERN myproxy.cern.ch machine migration transparent for ATLAS, not so much for the other experiments CERN ATLR, ATCR & ATLDSC rolling intervention ATLDSC migration problematic Gap of transactions created after restarting Replication of conditions data to T1 stopped Wed. between 12:30-16:30 CERN: "Invalid SRM version [] for endpoint []" error (GGUS 73918) affecting T2-CERN transfers Disappeared with the CERN FTS T2 OS upgrade ASGC network maintenance AMOD report –

EGI-InSPIRE RI Thursday Friday Victor stopped with reboot of machine on Critical service now Consequences: Victor should restart with machine reboot Monitoring proposal: Implement SLS monitoring in ADC Central Services category If it shows unavailable shifters should check Victor’s webpage and/or notify AMOD AMOD report –

EGI-InSPIRE RI Saturday Sunday Errors with T1s involved, mostly because of high load. In particular MCTAPE To TAIWAN-LCG2_MCTAPE: destination file failed on the SRM with error [SRM_ABORTED] GGUS: Felix Lee: “We found a lot of jobs are queuing in Castor transfer manager, which might cause srm aborting new transfer, we have increased slot to see if things can be improved.” To RAL-LCG2_MCTAPE: destination file failed on the SRM with error [SRM_ABORTED] GGUS: Alastair Dewhurst: “MCTAPE is a quite underpowered space token at RAL. It appears the problem is just high usage and it can't cope. The immediate solution is to reduce the number of FTS transfers allowed into it and then during the working week add some more hardware so it can cope with this kind of load.” SARA-MATRIX: Get error: dccp failed with output GGUS: Onno Zweers: “pool node crashed with a kernel panic” PIC_DATADISK: [GENERAL_FAILURE] AsyncWait GGUS: Fernando Lopez: “This problem was caused due a lot of queued transfers in our PoolManager. Transfers has been forced manually and now all seems ok” BNL-OSG2_DATADISK: First non-zero marker not received within 300 seconds GGUS: Jane: “As checked, quite some pools were in high load and shown as not seen, which should be the cause of the failure. Now, the system is in good shape and I don't see problems. As tested, the transfer of the file was also fine. The problem should be gone. AMOD report –

EGI-InSPIRE RI Sunday Problem on voatlas66: DDM Central Deletion stopped 11:30-21:30 RAL-LCG2: network problems during the whole night. All services unavailable during this period