Presentation on theme: "Problem Management Overview"— Presentation transcript:
1 Problem Management Overview Ensures stability in services, by identifying and removing errors in the infrastructure.
2 Definition of a Problem A Problem is the unknown underlying cause of one or more IncidentsA Known Error is a Problem that has been successfully diagnosed and for which a work-around and/or a permanent solution has been identified
3 Difference between Incident and Problem Management Problem Management differs from Incident Management in that its main goal is the detection of the underlying causes of an Incident and their subsequent resolution and prevention.“Root Cause Analysis”
4 Problem Management Activities Problem controlError controlThe proactive prevention of ProblemsIdentifying trendsObtaining management information from Problem Management dataMajor Problem reviews.
5 Problems Are Identified When Analyzing Incidents as they occur (reactive Problem Management)Analyzing Incidents over differing time periods (proactive Problem Management)Analyzing the InfrastructureInformation provided by developers/vendors when new products are introduced
6 Definition of a Known Error A condition identified by successful diagnosis of the root cause of a Problem, when it is confirmed that a Configuration Item (CI) is at fault
7 Problem ControlThe process of identifying, recording, classifying and progressing Problems through investigation and diagnosis, until either ‘Known Error’ status is achieved or an alternative procedural reason for the ‘Problem’ is revealed
8 Activities of Problem Control Problem identification and registrationIncident MatchingClassification (Category / Priority)Allocation of resources (particularly by Functional Managers)Investigation and diagnosisRoot cause determination
9 Error ControlThe removal, replacement or repair of the CI(s) which caused the Incident / Problem and led to the degradation of the agreed service level, by means of changes to the infrastructure
10 Activities of Error Control Root Cause Analysis (Determine Solution)Communication (Knowledge Management)MonitoringIntegration with Change Management
11 REQUIRES HISTORICAL DATA!! Proactive ProceduresIdentification of trends and potential problems (Service Owners have a key role)Identifying weak infrastructure CIs (Functional Managers have a key role)Initiation of Change to prevent:Problems from occurringProblems from repeatingPreventing Problems from affecting other areas and systemsREQUIRES HISTORICAL DATA!!
12 Structured approach to problem solving Kepner and TregoeDefining the ProblemDescribing the Problem with regard to identity, location, time and sizeEstablishing possible causesTesting the most probable causeVerifying the true cause.
13 From Incident(s) To A Problem To A Known Error To A Change Incident ManagementX}X}X}X}CI atFaultProblemKnown ErrorProblem ManagementChangeRFCChangeManagement
14 Example Scenario SD Temporary Fix Re-Boot Server IncidentDownProblemRoot Cause Analysis (Overheating)New Problem IdentifiedRequestFor ChangeRemove the issue permanentlyAssessApproveScheduleImplementReviewKnown ErrorSolution: Rack Configuration (Take off Doors)
15 Problem Management Roles Problem Process OwnerProblem ManagerFunctional ManagerService OwnerSupport Group StaffService DeskDevelopment StaffVendor / Supplier
16 Benefits of Problem Management Better first-time fix at the Service DeskDepartments can show added value to the organisationReduced workload for staff and Service Desk (incident volume reduction)Better alignment between departmentsImproved work environment for CERN staffMore empowered staffImproved prioritization of effortBetter use of resourcesMore control over services provided
17 Benefits of Problem Management..cont Improved quality of servicesHigher service availabilityImproved user productivity
18 Problem Management Dependencies Commitment of management for resourcesCommitment of Functional ManagersResources come from existing support teamsSupport of Service OwnersIncident Management dataProblem / Error history
19 Problem Management KPIs Percentage reduction in repeat Incidents/ProblemsPercentage reduction in the Incidents and Problems affecting service to usersPercentage reduction in the known Incidents and Problems encounteredNo delays in production of management reportsImproved Customer Satisfaction Survey responses on business disruption caused by Incidents and Problems
20 Problem Management KPIs…..cont Percentage reduction in average time to resolve ProblemsPercentage reduction of the time to implement fixes to Known ErrorsPercentage reduction of the time to diagnose ProblemsPercentage reduction of the average number of undiagnosed ProblemsPercentage reduction of the average backlog of 'open' Problems and errors
21 Problem Management KPIs…..cont Percentage reduction of the impact of Problems on UserReduction in the business disruption caused by Incidents and ProblemsPercentage reduction in the number of Problems escalated (missed target)Percentage reduction in the Problem Management budgetIncreased percentage of proactive Changes raised by Problem Management, particularly from Major Incident and Problem reviews.
22 Process Implementation Where are we now?Where do we want to be?How do we get there?Project PlansHigh Level Process ModelSign offDetailed Process DescriptionProcess ImplementationProcess:ReviewCurrent State?Gather Tool RequirementsInstall & CustomizeDeploy and ScaleTechnology:Roles definition & authority matrixPeople:Process WorkshopsITIL TrainingAwareness Campaign