Presentation is loading. Please wait.

Presentation is loading. Please wait.

Section 1: Architecture

Similar presentations


Presentation on theme: "Section 1: Architecture"— Presentation transcript:

1 Section 1: Architecture
From Ingestion to Situation

2 Section Objectives Articulate how the event data flows from ingestion to Situation in AIOps Identify the main components within AIOps, and explain the functionality of each

3 Topics In This Section Key Concepts Architecture Conceptual Overview
How AIOps Works

4 Key Concepts Event An Event can be any log, status or change event generated by a third party monitoring tool Alert An Alert is a de-duplicated Event or an instance of new data coming into AIOps Situation A Situation is a cluster of Alerts that have been grouped together by one of the Sigalisers LAMs The LAMs (Link Access Modules) connect the third party monitoring tools to AIOps Moog_farmd This is the master service or service harness which controls all other services and which algorithms are running Sigaliser The sigalisers are the algorithms which group Alerts based on factors such as time, language, topology and similarity

5 Conceptual Overview

6 Architecture A working system must have a farmd with its Moolets, some LAMS, a MooMS and a MySQL instance. In addition, to support the web user interface, an instance of Apache Tomcat and Apache HTTD must also be running.

7 Data Ingestion + Performed by “LAMs”
Take raw event data Map to MOOG event fields Pass events onto the Message Bus A Lam may be supplemented with one or more optional JavaScript file called “Lambot” UI-based data ingestion option is available for popular sources Lam Status events Log events Change events APM events First step is data ingestion. At this stage the raw data from the external system is ingested, and mapped to the AIOps event object. The component that handles this task is called “Lam”, Link Access Modules. We’ll discuss different types of LAMs in the configuration section. While LAMs are pieces of Java that are hard coded in the product, and therefore not editable by customers, there’s a way for an administrator to customize the data ingestion settings. A Lam may be accompanied with a “Lambot”, which is a JavaScript that is invoked and executed by a Lam. For example, if you want to ignore certain events and not create corresponding events in AIOps, you can specify that in a Lambot. UI-based data ingestion option is available under System Administration > Monitoring > Add Monitoring Integration. Lam Lam + LAMBots

8 Event Processing Generic LAMs Vendor Specific Examples AppDynamics JMS
Let’s drill down into each stage / component in the order of general data flow. First is the data ingestion. At this stage the raw data from the external system is ingested, and mapped to the AIOps event object. The component that handles this task is called “Lam”, Link Access Modules. You need one Lam per monitored system, and there are five fundamentally different types of LAMs as listed above. Each Lam is configured during the implementation, and the configuration files are stored in the following locations: Lam binaries: $MOOGSOFT_HOME/bin Default configuration files: $MOOGSOFT_HOME/config The LAMs then publish the data on the Message Bus as events. REST - reads JSON data from a local socket AND returns acknowledgement Socket - reads data from a local socket (TCP transport) Logfile - reads data from a locally accessible log file REST Client - reads event data from RESTful Services Trap - parses SNMP v1/v2 traps sent to a local socket AppDynamics JMS Netcool Microsoft SCOM New Relic Solarwinds UIM XClarity Zenoss

9 Alert Processing Handled by Moobots - Alert Builder, Maintenance Window Manager, and Alert Rules Engine Alerts are created from Events, and published to the message bus for further processing External events are mapped to the AIOps event fields and published to the Message Bus. The Message Bus in turn passes that AIOps events to the next components to clean noise and turn them into Alerts. This is done by Alert Builder, and optionally Alert Rules Engine. Alert Builder and Alert Rules Engine are both programs called “MooBots”. We’ll discuss this further later, so for now, just remember these are individual isolated applications running in a process called “farmd”. Configuration file: $MOOGSOFT_HOME/config/moog_farmd.conf Binary in: $MOOGSOFT_HOME/bin Invocation: moog_farmd-config<config_file>—loglevel WARN Service moogfarmd {start|stop|status|restart|reload} Default logging to: /var/log/moogsoft/moogfarmd.log

10 Alert Processing – Alert Builder
If no Alert exists with the same signature as the event, as Alert is created Assigns Entropy values to Alerts If an Alert exists with the same signature as the Event, the Event is de-duplicated Alert count is increased Alert fields are updated AlertBuilder listens for evens on the Message Bus, and figures out whether a given event should be made into an Alert. It uses the entropy value to assess the significance of the events, and assign values to Alerts. The Alert Rules Engine is an optional add-on to the AlertBuilder. it allows you to build some business logic in a completely optional way into the system.

11 Alert Processing – Maintenance Window Manager
Processes output of Alert Builder Labels Alerts as ”in maintenance” Leverages UI based scheduling Optionally prevents in maintenance events from being evaluated as part of a situation AlertBuilder listens for evens on the Message Bus, and figures out whether a given event should be made into an Alert. It uses the entropy value to assess the significance of the events, and assign values to Alerts. The Alert Rules Engine is an optional add-on to the AlertBuilder. it allows you to build some business logic in a completely optional way into the system.

12 Alert Processing – Alert Rules Engine
Processes output of Maintenance window Manager Allows conditional processing of Alerts e.g. link flapping, Alert suppression… Allows non-binary fail/clear correlation Fail / clear Correlation not enabled by simple de- duplication based on signature AlertBuilder listens for evens on the Message Bus, and figures out whether a given event should be made into an Alert. It uses the entropy value to assess the significance of the events, and assign values to Alerts. The Alert Rules Engine is an optional add-on to the AlertBuilder. it allows you to build some business logic in a completely optional way into the system.

13 What is Entropy? The value (between 0 to 1) that signifies the informational importance of the Alert The standalone events analyzer utility calculates entropy values, and these are used by the alert builder when assigning entropy to an alert Calculation based on relative characteristics of an event (content and context) Once database is primed, it will automatically update itself Let’s go a bit more into detail about entropy. Based on the information it retrieves from the MySQL Entropy Store database, the Event Analyser in Alert Builder generates Entropy values for incoming Alerts on the fly. It looks at the signature field (value defined by the Lam), and if the Event has been encountered before, then the Entropy for that Event will be used.

14 Alert Clustering Alerts are clustered into Situations based on their similarities After Events are de-duped and become Alerts, the next step is clustering. AIOps groups Alerts based on the similarity in various factors to create Situations, an you can configure the clustering logic to suit your needs. For example, you may cluster your Alerts based purely on the time of occurrence, or the similarity in their description. You may use topology to identify how similar given Alerts are, or not rely on machine learning at all but define rules in a deterministic manner. Also, based on the past experience you can set to treat certain Situations with more urgency than others. All these behaviors are defined in what we call Sigalisers. During the initial cleaning process, our machine learning engine removes noise, blacklists unwanted events, and de-duplicates many others. This typically reduces hundreds of millions of raw events per day down to 1 million alerts. At this point, the engine then contextualizes the still large volume of alerts into much fewer incidents – called “situations”. The engine looks at multiple variables to assess how “surprising or abnormal” an event is, and how it relates to other events. These variables are: Detects patterns in timestamps Detect linguistic relationships in events Detects patterns in events based in network proximity Blueprint past faults for future detection & remediation Automatically learns behavior from IT Ops users Custom recipes for detecting patterns

15 Enrichment Integrate additional data sources with AIOps for the purpose of adding context to Alerts and Situations not already provided within the Event data Examples include ServiceNow AppDynamics CMDB… By integrating with external tools, your troubleshooting activities can be even more efficient. For example, you may want to integrate with ServiceNow so the information related to a ticket in ServiceNow is shared bi-directionally. Or you may want to bring in additional data to add more context to situations. You may want to store the configuration changes as a result of resolving a situation in a CMDB automatically. Different components may be responsible to make this happen. If you are integrating more event sources, you will be adding more LAMs. If you are adding more contextual data, you may need to work with MooBots. And we also have REST API called Graze to integrate with third party process tools. Graze is implemented as a set of servlets running in the AIOps Tomcat instance, and exposes selected AIOps functionality to authorized external clients.

16 Situation Workflow Journaler
SituationManager Acts on newly created or updated situations, typically enriching, decorating or processing the situation (for example automating invitations, or notifications) Handles automatic notification, invitation, modification actions Journaler Maintains an audit trail of operations on a Situation (alerts added, discussion entries, tools run etc. ) Situation creation is quite a complex process, but now we are finally moving on to the investigation of situation. The Situation Manager listens for new situations being created, and passes them to its bot that handles automatic notification, automatic invitation of the users into the Situation, and any change to the Situation parameters The Notifier Moolet processes manual invitations. When you issue an invitation from a Situation Room, it is the Notifier that processes the invitation. Journaler lets you audit the actions performed on Situations.

17 Putting It All Together
Starting with the basics we have Moogfarmd (farmd), MySQL, Apache Web stack, and at least one Lam <CLICK> Event Message comes into the Lam and is <CLICK> placed on the Message Bus <CLICK> Alert Builder picks up the new Event and looks for an open Alert with the same Signature <CLICK> Alert Builder then updates the New or Updated Alert on the message bus <CLICK> Maintenance Window Manager examines the maintenance windows it knows about and SOMTHING <CLICK> Alert Rules Engine then processes the Alert if necessary <CLICK> THE SIGALIZERS now have access to the Alert and each enabled Sigaliser will examine the Alert deciding if it should be included in a cluster <CLICK> If a Situation is created, Situation Manager access MySQL moogfarmd Apache Tomcat Mooms Message Bus Lam Lam Lam

18 Questions


Download ppt "Section 1: Architecture"

Similar presentations


Ads by Google