Presentation is loading. Please wait.

Presentation is loading. Please wait.

CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t Problem management AI Thursday meeting 02/10/2014.

Similar presentations


Presentation on theme: "CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t Problem management AI Thursday meeting 02/10/2014."— Presentation transcript:

1 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t Problem management AI Thursday meeting 02/10/2014

2 Background Existing problem mgt meetings –Monthly, dedicated –Original goal was to follow up and reduce the ITCM events being handled by the SysAdmin and CCOperators teams; Thanks to Anthony for the good work New monitoring framework offers more possibilities to deal with notifications –Time for a redefinition of the goals The department decided to evolve problem mgt involving all IT services 2

3 Problem management: scope ITIL: Ensure stability in services, by identifying root causes and removing errors in the infrastructure. A Problem is the unknown underlying cause of one or more Incidents Goal of problem management is to minimise both the number and severity of incidents and potential problems –User-reported incidents versus automated tool-generated events (GNI notifications). –Our problem mgt effort will focus on these ones. 3

4 Problem management: goal What do we want to do? move forward with automation, from automated problem detection/notification to automated recovery actions How? Profiting the capabilities of the toolset in use (Puppet, GNI monitoring, SNOW) to deal with notifications at the most efficient level –Instead of sending them to the Ccoperators/sysadmins 4

5 In detail Problem detection: AI monitoring –Define metrics/sensors/notifications for your service (plus use of the existing ones) Problem logging: several possibilities –GNI dashboard –SNOW ticket –Dedicated consumers Problem categorization: –Which service? Which FE? If Snow ticket, which support group? Problem diagnosis and solution: –Ad-hoc, procedure or automated? 5

6 Next steps No dedicated effort, instead, part of service manager duties (as config mgt, monitoring, etc) What we expect from service managers: Decide how to set up notifications for your service –More details in Miguel’s talk Start thinking what procedures you can automate –Some of the existing ops procedures? –Recovery actions for newly defined service notifications? –See Jerome’s talk for an example (batch) 6


Download ppt "CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t Problem management AI Thursday meeting 02/10/2014."

Similar presentations


Ads by Google