Presentation is loading. Please wait.

Presentation is loading. Please wait.

Problem Management Familiarisation Training Michael Hall Real-World IT www.real-worldit.com.

Similar presentations


Presentation on theme: "Problem Management Familiarisation Training Michael Hall Real-World IT www.real-worldit.com."— Presentation transcript:

1 Problem Management Familiarisation Training Michael Hall Real-World IT www.real-worldit.com

2 © Real-World IT 2014 All rights reserved What We Cover  Incidents and Problems – What’s the Difference?  Definition of Problem Management  Difference from Incident Management  Keys to Success  Advantages of structured problem solving  How we work together – the ‘Rules of Engagement’  How we run problems - logistics  How we run problems - structure  Appendix  Solving problems as a group  Benefits to the business and IT teams  Key Performance Indicators 2

3 © Real-World IT 2014 All rights reserved Incidents and Problems – What’s the Difference? 3 ITIL® 2011 defines an incident as:  An unplanned interruption to a business service,  A reduction in the quality of a service, or  The failure of a CI that has not yet impacted a service (2011, Service Operations, p 72) While a problem is defined as:  The underlying cause of one or more incidents (2011, Service Operations, p 97) In other words:  Incidents stop services being useful  Problems are why they happen ITIL® is a Registered Trade Mark of AXELOS Limited

4 © Real-World IT 2014 All rights reserved So what is Problem Management? ITIL® 2011 defines the objectives of Problem Management as:  “Prevent problems and resulting incidents from happening  Eliminate recurring incidents  Minimise the impact of incidents that cannot be prevented” (2011, Service Operations, p 97) In other words:  Finding the cause and  Fixing it so it cannot happen again  Will make a difference to stability ITIL® is a Registered Trade Mark of AXELOS Limited 4

5 © Real-World IT 2014 All rights reserved How is it different from Incident Management?  Incident management is focused on restoring service Reactive by nature Minimise time to restore Minimise business impact But by itself does not reduce the long term incidence of service interruptions  Problem management is focused on identifying the root cause Establish the real reason for the incident Execute a plan to fix the cause permanently 5

6 © Real-World IT 2014 All rights reserved  Keys to success: Good handover from Incident Mgmt Structured investigation methods Key staff collaborate on investigation Confirm root cause, then work out how to fix it Fully costed solution options Only proceed with fixes when approved Check solution really has fixed the cause Only close problems when definitely fixed Report on success – what won’t happen again 6 Problem Detection Problem Investigation Error Resolution Review and Closure Root Cause Confirmed What Makes Problem Management Successful?

7 © Real-World IT 2014 All rights reserved  Use structured methods to improve Speed to root cause - standard approach Consistency - based on evidence Certainty that real causes are found Collaboration – teams know what to expect  Repeatable process* used every time: Define the problem precisely Use rapid analysis first for root cause ◦ Why did that object have that fault? ◦ Repeat until cause is clear – 4 to 6 questions If rapid analysis does not reveal cause, move to IS/BUT NOT and possible causes ◦ Identify more about the problem ◦ Find possible causes ◦ Test each logically to confirm true cause Decide how to fix the problem ◦ Develop options and choose most effective ◦ Confirm actions and costs with customer Implement the solution Verify the problem has been eliminated *This is KEPNERandFOURIE, insert your preferred method here 7 62%  Clear communication to customers of what happened and why, plus how and when a permanent fix will be deployed Why Adopt Structured Problem Solving?

8 © Real-World IT 2014 All rights reserved How we work together - the ‘Rules of Engagement’  Assign no blame What happened and why? What can be done to stop it happening again? Human error? No such thing!  Attend problem sessions Problem solving as a group of experts Clear statement of the problem Gather all the facts List all possible causes  Suspend Judgement Keep an open mind Assume you do not know cause Fit theories to facts, not the other way around  Make problem tasks a priority Take responsibility for your tasks Make your management aware Raise conflicts to the problem manager Don’t close tasks without review and approval 8

9 © Real-World IT 2014 All rights reserved How we run problem investigations - logistics Problem investigations are usually run as bridge calls. Join the bridge when asked. Who will be engaged is usually negotiated in advance with your management  Bridge Lines Dial-in details will always be in the invitation  Timing For major problems, problem management should hold the initial call within 24 hours of service restoration Aim is to keep the evidence fresh in people’s minds As many follow-up calls as necessary are held to get to cause and then resolution  Reporting Problem management has a commitment to produce regular progress reports The first is published immediately after the first analysis call is held Follow-up reports published as required – regularly while root cause is being investigated, then at significant milestones until resolved The problem investigation team is always copied on these reports 9

10 © Real-World IT 2014 All rights reserved How we run problem investigations – structure – 1 All problem investigations have a standard agenda that we always follow. The aim is to confirm root cause as quickly as possible – ‘Root Cause Analysis’ - then to decide what the permanent fix should be – ‘Problem Resolution’ (also called ‘Error Resolution’).  Define the problem clearly Start from technical cause, if found during incident response. If not, determine technical cause first Make sure this statement is about one object with one fault This statement is always an event in time  Use a rapid analysis strategy first to determine root cause, using a cascade of questions: Ask ‘Why did that object have that fault?’ Repeat until the underlying cause is clear (usually four to six questions does it) Remember that the test for a root cause is: ‘If I fix this, will it stop repeats of the incident?’ 10

11 © Real-World IT 2014 All rights reserved  If rapid analysis does not reveal root cause easily, move on to an ‘IS/BUT NOT’ and possible causes analysis: Use the KEPNERandFOURIE method to ask 8 questions about the problem. For each question, also quickly brainstorm possible causes before moving to the next. Do not judge or try to filter at this point. Suspend judgement and record all suggestions. Select the most likely of the possible causes for verification (usually up to 4 or 5). Test each logically using the KEPNERandFOURIE technique The cause that meets all criteria is the most likely root cause Remember that there can also be contributing factors and multiple causes that occur together to cause the problem Note: This structure is based on the KepnerandFourie TM methodology. Substitute the methodology as required How we run problem investigations – structure – 2 11

12 © Real-World IT 2014 All rights reserved Once cause is found and confirmed, you are half way there. The next step is to develop a solution and implement it.  Develop a solution for the cause or causes Develop options and decide the best and most cost -effective Obtain approval for implementation, including spend, timing and resources  Implement the solution Track any implementation steps and dates until the solution is in place  Verify the problem has been eliminated Has the solution prevented future incidents?  Report success to our customers Make sure people know that the problem is solved How we run problem investigations – structure – 3 12

13 © Real-World IT 2014 All rights reserved Appendix  Solving problems as a group  Benefits to the business and IT teams  Key Performance Indicators 13

14 © Real-World IT 2014 All rights reserved Why solve problems as a group?  Evidence shows that group problem-solving is more effective More effective than individual efforts or investigations run by groups of like- minded individuals. Diversity in perspective and ways of thinking lead to better outcomes than even the best problem solvers. Humans are 'good at producing convincing arguments, but we are also adept at puncturing other people's faulty reasoning‘  Mix of ‘insiders and outsiders’ critical Too many insiders leads to uniform thinking, while too many outsiders dampens the free exchange of ideas Ignoring the ‘outside view’ limits our thinking Diverse groups generate many more interesting ideas to help solve problems So take advantage of it! References: Hong and Page (2004), Jones D (2012), Kahnemann (2012) 14

15 © Real-World IT 2014 All rights reserved What are the Business Benefits?  Problem management increases the overall stability of systems supporting business operations  Directly Fixing problems – finding root causes and resolving them Not just visible problems – potential problems as well Reducing the number of recurring incidents  Indirectly Influence design of applications and infrastructure Improved audit and regulatory compliance – effective governance seen to be done  Cross silo and cross regional resolutions Problems addressed comprehensively ◦ Identify all instances that could be affected ◦ Resolution plans with global reach  Improve visibility of problems Regular reporting to senior management ◦ Accountability for driving improvement KPIs in place to measure success ◦Make performance visible to all 15

16 © Real-World IT 2014 All rights reserved  Improved stability Gets rid of recurring problems Reduces fire-fighting – more time for “value-add” work  Framework for problem solving Gives more confidence when attacking problems Makes engaging the right people much easier Clarifies which problems are being worked on And who owns each problem Results in happier customers and colleagues  Satisfaction Finds the cause without witch hunts or blame Quickly shares knowledge about problems for everyone’s benefit Makes working together across teams much easier Delivers a higher success rate and a faster turn around Automates reporting to replace tedious manual work And what are the Benefits for IS Teams? 16

17 © Real-World IT 2014 All rights reserved Measuring the difference we make – Key Performance Indicators (KPIs)  % of root cause being found: 95% of root causes will be found  % root cause found within 5 working days: Root cause will be found within 5 working days 80% of the time  Number of recurring problems: Zero recurring problems  % of problems resolved within agreed time frame: 90% of problems will be resolved in the time frame agreed by management  % reduction in incidents from agreed baseline: 25% reduction in incidents per period These KPIs will go live with looser targets that will tighten as the problem management implementation progresses 17


Download ppt "Problem Management Familiarisation Training Michael Hall Real-World IT www.real-worldit.com."

Similar presentations


Ads by Google