Presentation is loading. Please wait.

Presentation is loading. Please wait.

Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 AMOD report 29.8.2011 – 4.9.2011 Fernando H. Barreiro Megino CERN-IT-ES-VOS.

Similar presentations


Presentation on theme: "Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 AMOD report 29.8.2011 – 4.9.2011 Fernando H. Barreiro Megino CERN-IT-ES-VOS."— Presentation transcript:

1 www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 AMOD report 29.8.2011 – 4.9.2011 Fernando H. Barreiro Megino CERN-IT-ES-VOS 10/22/2015 AMOD report 29.8.2011 – 4.9.20111

2 www.egi.eu EGI-InSPIRE RI-261323 Summary Technical Stop for LHC and ATLAS No Comp@P1 shifters during the whole week Stable beam expected for tomorrow Overall very quite week. Few highlights caused by Interventions scheduled during this period Hurricane Irene HLT reprocessing problems AMOD report 29.8.2011 – 4.9.2011

3 www.egi.eu EGI-InSPIRE RI-261323 DDM summary 29.8.2011-6.9.2011 All activities Sources Destinations

4 www.egi.eu EGI-InSPIRE RI-261323 Panda summary AMOD report 29.8.2011 – 4.9.2011

5 www.egi.eu EGI-InSPIRE RI-261323 Monday 29.8.2011 CERN LFC migration Sunday set CERN offline in Panda Monday 8:30 CERN excluded from DDM Intervention as planned: ~9:20-11:30 Update of LFC information in information systems and services 12:30 re-included in DDM ~17:00 online in Panda CERN CASTORATLAS DB patch and defragmentation CERN ATONR rolling intervention BNL emergency downtime Sat. evening – Monday evening Frontier fallback mechanisms failed – US cloud offline in Panda ~19:00 back in DDM ~22:30 testjobs and cloud online in Panda Decision to postpone dCache upgrade until next technical stop ASGC in downtime for CASTOR upgrade to 2.1.11-2 ASGC affected by broken 10Gbps link between Chicago and Amsterdam 10/22/2015 AMOD report 29.8.2011 – 4.9.20115

6 www.egi.eu EGI-InSPIRE RI-261323 Tuesday 30.8.2011-Wednesday 31.8.2011 CERN myproxy.cern.ch machine migration transparent for ATLAS, not so much for the other experiments CERN ATLR, ATCR & ATLDSC rolling intervention ATLDSC migration problematic Gap of transactions created after restarting Replication of conditions data to T1 stopped Wed. between 12:30-16:30 CERN: "Invalid SRM version [] for endpoint []" error (GGUS 73918) affecting T2-CERN transfers Disappeared with the CERN FTS T2 OS upgrade ASGC network maintenance AMOD report 29.8.2011 – 4.9.2011

7 www.egi.eu EGI-InSPIRE RI-261323 Thursday 1.9.2011-Friday 2.9.2011 Victor stopped with reboot of machine on 23.8.2011 Critical service now Consequences: Victor should restart with machine reboot Monitoring proposal: Implement SLS monitoring in ADC Central Services category If it shows unavailable shifters should check Victor’s webpage and/or notify AMOD AMOD report 29.8.2011 – 4.9.2011

8 www.egi.eu EGI-InSPIRE RI-261323 Saturday 3.9.2011- Sunday 4.9.2011 Errors with T1s involved, mostly because of high load. In particular MCTAPE To TAIWAN-LCG2_MCTAPE: destination file failed on the SRM with error [SRM_ABORTED] GGUS: 74039 Felix Lee: “We found a lot of jobs are queuing in Castor transfer manager, which might cause srm aborting new transfer, we have increased slot to see if things can be improved.” To RAL-LCG2_MCTAPE: destination file failed on the SRM with error [SRM_ABORTED] GGUS: 74041 Alastair Dewhurst: “MCTAPE is a quite underpowered space token at RAL. It appears the problem is just high usage and it can't cope. The immediate solution is to reduce the number of FTS transfers allowed into it and then during the working week add some more hardware so it can cope with this kind of load.” SARA-MATRIX: Get error: dccp failed with output GGUS: 74043 Onno Zweers: “pool node crashed with a kernel panic” PIC_DATADISK: [GENERAL_FAILURE] AsyncWait GGUS: 74045 Fernando Lopez: “This problem was caused due a lot of queued transfers in our PoolManager. Transfers has been forced manually and now all seems ok” BNL-OSG2_DATADISK: First non-zero marker not received within 300 seconds GGUS: 74050 Jane: “As checked, quite some pools were in high load and shown as not seen, which should be the cause of the failure. Now, the system is in good shape and I don't see problems. As tested, the transfer of the file was also fine. The problem should be gone. AMOD report 29.8.2011 – 4.9.2011

9 www.egi.eu EGI-InSPIRE RI-261323 Sunday 4.9.2011 Problem on voatlas66: DDM Central Deletion stopped 11:30-21:30 RAL-LCG2: network problems during the whole night. All services unavailable during this period


Download ppt "Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 AMOD report 29.8.2011 – 4.9.2011 Fernando H. Barreiro Megino CERN-IT-ES-VOS."

Similar presentations


Ads by Google