Presentation is loading. Please wait.

Presentation is loading. Please wait.

Event Management & ITIL V3

Similar presentations


Presentation on theme: "Event Management & ITIL V3"— Presentation transcript:

1 Event Management & ITIL V3
Copyright: The Art of Service 2008

2 Service Operation Processes
Service Desk Technical Support Groups Event Management Request Fulfillment This slide incorporates all the Service Operation Processes and demonstrates how much responsibility the functions; the Service Desk and the Technical Support Groups have in the Service Operation Processes. Access Management Incident Management Problem Management Copyright: The Art of Service 2008

3 Service Operation Event Management Copyright: The Art of Service 2008

4 Event Management The process that monitors all events that occur through the IT infrastructure to allow for normal service operation and to detect and escalate exceptions. Effective service support relies on know the status of the infrastructure and detecting any deviation from normal or expected operation. This can be provided by good monitoring and control systems, which are based on two types of tools: Active monitoring tools that monitor key CI’s to determine their status and availability. Passive monitoring tools that detect and correlate operational alerts or communications generated by CI’s. There is more information available on the Roles and Responsibilities of Event Management available within this toolkit. Copyright: The Art of Service 2008

5 Terminology Event Alert Trigger
An event can be defined as a change of state that has significance for the management of a Configuration Item (including IT Services). This can be detected by technical staff or be automated alerts or notifications created by CI monitoring tools. Alert: A warning that a threshold has been reached or something has been changed (An event has occurred). Trigger: An indication that some action or response to an Event may be needed. Trigger Copyright: The Art of Service 2008

6 Scope Event Management can be applied to any aspect of Service
Management that needs to be controlled and can be automated. E.g. Configuration Items Environmental conditions Software license monitoring Security Normal activity Copyright: The Art of Service 2008

7 Different types of event
There are different types of event: Events that signify regular operation Notifications that a scheduled workload has completed A user has logged in to use an application An has reached its intended recipient Events that signify an exception User attempts to log on to an application with the incorrect password Unusual situation has occurred in a business process that may indicate an exception requiring further business investigation Device’s CPU is above the acceptable utilization rate Events that signify unusual, but not exceptional, operation Server’s memory utilization reaches within 5% of its highest acceptable performance level Completion time of a transaction is 100% longer than normal Copyright: The Art of Service 2008

8 Event Management - Activities
Occurs Event Notification Alert Event Filtering Significance of events Event Correlation Each of the activities of Event Management will be discussed in more detail in the following slides. There is also an example diagram of the Event Management process flow, available within this toolkit. Trigger Response Selection Review Actions Close Event Copyright: The Art of Service 2008

9 Event Occurs Events occur continuously, but not all of them are detected or registered. It is therefore important that everybody involved in designing, developing, managing and supporting IT services and the IT Infrastructure that they run on understands what types of event need to be detected. Event Occurs Copyright: The Art of Service 2008

10 Event notification Most CI’s are designed to communicate certain information about themselves on one of two ways: A device is interrogated by a management tool, which collects certain targeted data. Often referred to as ‘polling’. The CI generates a notification when certain conditions are met. The ability to produce these notifications has to be designed and built into the CI. In a perfect world the service design team will define which events need to be generated and then specify how this can be done for each type of CI. During the service testing and release phase, the event generation options would be set and tested. However, in a lot of organizations defining which events to generate is done by trial and error. The problem with this approach is that it is reactive and only considers the immediate needs of the staff managing the device and does not provide any basis for planning or improvement. A general principle of event notification is that the more qualitative the data it contains and the more targeted the audience is, the easier it is to make a decision about the event. Quality notification data and clearly defined roles and responsibilities need to be documented during the designing, testing and releasing phases. Event Notification Copyright: The Art of Service 2008

11 Alert / Event detection
Once an Event notification has been generated, it will be detected by an agent running on the same system, or transmitted directly to a management tool specifically designed to read and interpret the meaning of the event. Alert Copyright: The Art of Service 2008

12 Event Filtering The purpose of filtering is to decide what is the best course of action to take e.g. Communicate the event to a management tool Ignore it, if this is the case the event will need to be recorded. Events need to be filtered as it is not always possible to turn event notifications off. During the filtering activity, the first level of correlation is performed Correlation: the determination of whether the event is informational, a warning, or an exception. This correlation is normally done by an agent that resides on the CI or on a server to which the CI is connected. Event Filtering Copyright: The Art of Service 2008

13 Significance of events
Every organization will have its own method and criteria for categorizing the significance of an event, the following are three broad category suggestions: Informational Warning Exception Informational: an event that does not require action and does not represent an exception. These should be documented and kept for a predetermined timescale. These types of event are usually used to check on the status of a device or service, or to confirm successful completion of an activity. They can also be useful in supplying statistics and measurements. Warning: an event that is generated when a service or device is approaching a threshold. They are intended to notify the appropriate staff member or tool, so that the situation can be dealt with before an exception occurs. Exception: a service or a device is currently operating abnormally. Typically this will mane that a Service Level Agreement or an Operational Level Agreement has been breached and the business is being impacted. Exceptions could represent a total failure, impaired functionality or degraded performance. Significance of events Copyright: The Art of Service 2008

14 Event correlation Correlation is normally done by a ‘Correlation Engine’, part of a management tool that compares the event with a set of criteria and relies in prescribed order. These criteria are often referred to as Business Rules, but are generally quite technical. The idea is that the event may represent some impact on the business and the rules can be used to determine the level and type of business impact. A correlation engine is programmed according to the performance standards created during the service design stages and any additional guidance specific to the operating environment. Event Correlation Copyright: The Art of Service 2008

15 Trigger If the correlation activity recognizes an event, a response will be required. The mechanism used to initiate that response is called a trigger. There are many different types of triggers, each designed specifically for the task it has to initiate. E.g. Incident triggers. Change triggers. A trigger resulting from an approved RFC that has been implemented but caused the event or an unauthorized change that has been detected. Paging systems that will notify a person of the event by mobile phone . Database triggers that restrict access of a user to specific records. Trigger Copyright: The Art of Service 2008

16 Response selection At this point in the process, there are a number of response options available. Difference organizations will have different options. E.g. There will be a range of responses for each different technology. Some of the options available are: Event Logged – regardless of what activity is performed, it is a good idea to have a record of the event and any subsequent actions. The event can be logged as an Event Record in the management tool, or it can be documented separately. This should be monitored on a regular basis. Event Management procedures for each system or team need to define standards about how long events are kept in logs before being archived and disposed of. Auto response: so events are understood well enough that the appropriate response is already defined and automate. The trigger will initiate the action and then evaluate whether it was completed correctly. If this is not the case an Incident and Problem Record will be created. E.g. of auto responses include Rebooting a device Restarting a service Submitting a job into batch Locking a device or application top protect it. Alert and human intervention: if human intervention is required, the event will need to be escalated. Incident, Problem or Change: some events will need to be handled by these processes e.g. RFC; when an exception occurs, correlation identifies a change is needed. Response Selection Copyright: The Art of Service 2008

17 Review actions With so many events occurring on a daily basis, it is not possible to review each one individually. However, it is essential to check any significant events or exceptions have been handled correctly, track trends etc In most cases this can be done automatically. In cases where events have initiated an incident, problem or change the action review should not duplicate any reviews that have been completed as part of these processes. Reviews will also be used to identify continual service improvement opportunities and in the evaluation of the Event Management process. Review Actions Copyright: The Art of Service 2008

18 Close Event Some events remain open until specific actions take place e.g. an event that is linked to an open incident. However, most events are not opened or closed. Informational events are logged Auto-response events will typically be closed by the generation of a second event. In the case of events that have generated an incident, problem or change, these will be formerly closed with a link to the appropriate record from the other processes. Close Event Copyright: The Art of Service 2008

19 Information Management
SNMP messages, which are a standard way of communicating technical information about the status of components of an IT infrastructure. Management Information Bases of IT devices Vendor's monitoring tools agent software Correlation Engines Event Records. More information on the Technology Considerations for Event Management can be found in this toolkit. Copyright: The Art of Service 2008

20 Value to the business The value to the business of implementing the Event Management process is generally indirect, but it is possible to determine the basis for its value. E.g. It provides mechanisms for early detection of incidents. It enables some types of automation activity to be monitored by exception. Signal status changes or exceptions that allow the appropriate person or team to perform early response. Provides a basis for automated operations, thus increasing efficiency and allowing human resources to be better utilized. Copyright: The Art of Service 2008

21 Metrics # of events by category and significance
# and % of events that required human intervention and whether this was performed # and % of events that resulted in incidents and changes # and % of events caused by existing problems and known errors. # and % of repeated or duplicated events. # and % of events indicating performance issues and potential availability issues. # and % of each type of event per platform or application # and ratio of events compared with the number of incidents. Copyright: The Art of Service 2008

22 Challenges Initial challenge in obtaining funding for necessary tools and effort required. Setting the correct level of filtering. Rolling out necessary monitoring agents across the entire IT infrastructure can be difficult and time consuming – required ongoing commitment. Acquiring the necessary skills. Copyright: The Art of Service 2008

23 CSF’s In order to obtain the necessary funding a compelling Business
Case should be prepared showing how the benefits of effective Event Management can far outweigh the costs – giving a positive return on investment. One of the most important CSF’s is achieving the correct level of filtering. This is complicated by the fact that the significance of events changes. E.g. a user logging into a system today is normal, but if that user leaves the organization and tries to log in is it a security breach. Copyright: The Art of Service 2008

24 Risks Failure to obtain adequate funding
Ensuring the correct level of filtering Failure to maintain momentum in rolling out the necessary monitoring agents across the IT infrastructure. Copyright: The Art of Service 2008


Download ppt "Event Management & ITIL V3"

Similar presentations


Ads by Google