Presentation is loading. Please wait.

Presentation is loading. Please wait.

Take the summary from the table on

Similar presentations


Presentation on theme: "Take the summary from the table on"— Presentation transcript:

1 Take the summary from the table on
GGUS summary (3 weeks) VO User Team Alarm Total ALICE ATLAS CMS LHCb Totals To calculate the totals for this slide and copy/paste the usual graph please: Take the summary from the table on 2. Add them in 3. Add up the last 3 weeks i.e. 25-Apr + 2-May + 9-May and put them in this table. 4. Copy/paste the graph from the .xls file of point 2 above. 5. In case of trouble maria.dimou but you’ll have to wait until Monday evening as I am on official training without computer access.  1

2 Support-related events since last MB
There were 6 real ALARM tickets since the 2011/04/19 MB (3 weeks), 5 submitted by ATLAS, 1 by CMS, all ‘solved’ and ‘verified’. Notified sites were: CERN IN2P3 NDGF FZK Details follow… 9/13/2018 WLCG MB Report WLCG Service Report

3 WLCG MB Report WLCG Service Report
ATLAS ALARM->CERN CASTOR no read or write possible on T0merge GGUS:69954 What time UTC What happened 2011/04/23 05:03 SATURDAY GGUS ALARM ticket, automatic notification to AND automatic assignment to ROC_CERN. Automatic SNOW ticket creation successful. 2011/04/23 05:21 Operator acknowledges and records in the GGUS ticket that the Castor piquet was called. 2011/04/23 08:11 Submitter comments that the problem is gone. 2011/04/24 06:08 SUNDAY Submitter puts the ticket to status ‘solved’ and ‘verified’. 9/13/2018 WLCG MB Report WLCG Service Report

4 WLCG MB Report WLCG Service Report
ATLAS ALARM->CERN CASTOR no read or write possible on T0merge GGUS:69992 What time UTC What happened 2011/04/26 05:36 GGUS ALARM ticket, automatic notification to AND automatic assignment to ROC_CERN. Automatic SNOW ticket creation successful. 2011/04/26 05:53 Operator acknowledges and records in the GGUS ticket that the Castor service was contacted. Same problem as on previous slide, i.e. GGUS:69954. 2011/04/26 06:45 Service mgr finds too many requests queued for the pool to process in parallel. 2011/04/26 08:03 Submitter says the problem is gone but occurred twice at the same time. 2011/04/26 09:35 Service mgr checks with the DBA and discovers that the node serves production and backup services by mistake left from a recent upgrade. 2011/04/26 09:59 Service mgr puts to ‘solved’. Backup activity moved elsewhere. 2011/05/02 08:16 Submitter sets to ‘verified’. 9/13/2018 WLCG MB Report WLCG Service Report

5 ATLAS ALARM-> IN2P3 gridftp GGUS:70113
What time UTC What happened 2011/04/28 21:57 GGUS TEAM ticket, automatic notification to AND automatic assignment to NGI_FR. 2011/04/29 01:32 More information and examples from panda on jobs’ failing. 2011/04/29 07:17 Ticket upgrade to ALARM. notification sent ot Added dashboard information. 2011/04/29 07:20 Automatic IN2P3 acknowledgement of an ALARM registration. 2011/04/29 07:49 Service mgr finds gftp process stopped since 23:30. 2011/04/29 08:16 Service mgr restarts gftp service on ccdcatli012. A note follows on the french ticketing system internals for handling TEAM-ALARM upgrades. 2011/04/29 09:04 ALARMer sets to ‘solved’ and ‘verified’. 9/13/2018 WLCG MB Report WLCG Service Report 5

6 ATLAS ALARM -> srmNDGF from/to failures GGUS:70157
What time UTC What happened 2011/04/30 05:38 SATURDAY GGUS TEAM ticket, automatic notification to AND automatic assignment to NGI_NDGF. 2011/04/30 06:12 Ticket upgrade to ALARM. Same notification published by NDGF, i.e. Is this the way they want it!!??. 2011/04/30 06:13 Attached dashboard info. 2011/04/30 07:42 Site mgr reports dCache ws found stuck maybe due to too many staging requests. He sets the ticket to status ‘solved’ while suggesting further investigation. 2011/04/30 08:04 The supporter who upgraded the ticket to an ALARM sets it to ‘verified’. Where will the above investigation take place? 9/13/2018 WLCG MB Report WLCG Service Report 6

7 CMS ALARM -> CERN CMSR DB hung GGUS:70114
What time UTC What happened 2011/04/28 22:03 GGUS ALARM ticket, automatic notification to AND automatic assignment to ROC_CERN. Automatic SNOW ticket creation successful. 2011/04/28 22:28 Operator acknowledges and records in the GGUS ticket that the Castor piquet was called. This was a mistake. The DB should have been contacted. 2011/04/28 23:04 DB expert assigns the ticket to herself and sets the ticket to status ‘solved’ while suggesting further investigation. 2011/04/28 23:44 The submitter sets it to ‘verified’. Where will the above investigation take place? 9/13/2018 WLCG MB Report WLCG Service Report 7

8 ATLAS ALARM -> Transfer errors from/to FZK GGUS:70119
What time UTC What happened 2011/05/02 19:16 GGUS TEAM ticket, automatic notification to AND automatic assignment to NGI_DE. 2011/05/02 19:20 More evidence of the problem by another shifter/ALARMer. 2011/05/02 19:40 Priority raised from ‘urgent’ to ‘very urgent’. 2011/05/02 19:56 Site sets the ticket to ‘solved’ without detailed solution. 2011/05/02 19:58 Ticket upgrade from TEAM to ALARM. notification sent to 2011/05/02 20:02 Link more evidence from panda on job failures. 2011/05/02 20:11 Site sets again the ticket to ‘solved’ without detailed solution. 2011/05/02 20:14 Exchange ALARMer/site on the actual time of the solution. 2011/05/03 09:56 ALARMer sets the ticket to status ‘verified’. 9/13/2018 WLCG MB Report WLCG Service Report 8


Download ppt "Take the summary from the table on"

Similar presentations


Ads by Google