Presentation is loading. Please wait.

Presentation is loading. Please wait.

IBM Operations Analytics for z Systems Transforming Data into Insights The Next Generation of IT Service Management.

Similar presentations


Presentation on theme: "IBM Operations Analytics for z Systems Transforming Data into Insights The Next Generation of IT Service Management."— Presentation transcript:

1 IBM Operations Analytics for z Systems Transforming Data into Insights The Next Generation of IT Service Management

2 Agenda Why IT Analytics?
Overview of IBM Operations Analytics for z Systems Functional capability What’s New in 2015 Architecture Out-of-the-box Value Customize to meet your needs Integration with Service Management tooling Additional Detail Bring Your own Data – Example using HMC log IOAz V2.2 Details CICS insights Network insights Security insights Log Forwarder improvements

3 Solution Branding IBM Operations Analytics for z Systems
This solution was previously branded as IBM SmartCloud Analytics - Log Analysis. The support to search and analyze z/OS logs was initially provided in March, 2014 under the following product names: IBM SmartCloud Analytics - Log Analysis z/OS - Insight Packs – SYSLOG V1.1 IBM SmartCloud Analytics - Log Analysis z/OS - Insight Packs - IBM WebSphere® Application Server V1.1 Subsequent releases were named with the SmartCloud brand until April 2015, when Version 2 of the product was rebranded to IBM Operations Analytics for z Systems Initial release under the new name: IBM Operations Analytics for z Systems v2.1 (GA on April 24, 2015) Current release: IBM Operations Analytics for z Systems v2.2 (GA on October 16, 2015) Note that the distributed version of the product is now named IBM Operations Analytics – Log Analysis

4 Analytics for System z addresses rapid growth of data and next generation technology
Much greater amount of critical IT operational data (SMF, log, journal) than distributed-only environments. Focus on problem determination and time to resolution while placing premium on availability of services and applications. 100x to 1000x explosion in data flooding existing tools. New runtimes, programming languages needing complex instrumentation. By 2016, 40% of Global 2000 enterprises will have IT operations analytics architecture in place, up from < 1% in 2014, looking to integrate across their enterprise to reduce outages (Gartner). 90% of the Fortune 1000 companies are running z and have ‘Systems of Record’ dependencies for transactional processing and data serving applications. Main Point: As technology improves and data increases, there is a requirement be able to predict, search and optimize this new/additional data to gain insights from it that have not existed in the past. 4 4

5 Is managing IT today like sipping from a fire hose?
New Technologies like cloud, mobile and big data already challenging current Enterprise tools Too long to isolate, diagnose problems in applications and infrastructure. Complex application workloads span multiple platforms Increasing amounts of IT data: Performance metrics, events, infrastructure logs, application logs, configuration files, traces Existing IT tools need additional data analysis capabilities to manage of Systems of Engagement 100x to 1000x explosion in data flooding existing tools. New runtimes, programming languages needing complex instrumentation. Reactive analytics misses critical information leading to outages Need to move to a more proactive model Analysing ALL information better for predicting problems. Is managing IT today like sipping from a fire hose?

6 IBM is focused on managing end-to-end analytics for
improved performance and workload management Predict: Pro-Active Outage Avoidance Predict problems before they occur Search & Analyze: Quickly search and analyze large volumes of data from a single search bar Perform log and performance analysis while searching Correlate messages from multiple logs for end-to-end problem diagnosis Optimize: Improve performance across IT Infrastructure IBM Analytics solutions for z Systems Proactive Outage Avoidance Faster Problem Resolution Optimized Performance Main Point: Analytics is now a key focus for our customers. As we have discussed, Operations Analytics can help increase business value by ensuring system and application availability and reducing Mean Time to Repair (MTTR). Operations Analytics is about: Predict - Proactively surfacing problems using anomaly detection. The current solution is IBM zAware. IBM zAware surfaces anomalies by analyzing z/OS and zLinux system logs. OMEGAMON and NetView integrate with IBM zAware by monitoring the IBM zAware anomaly scores, correlating log analysis with performance monitoring and providing the option to generate events and trigger automation. Search - Search for information, including logs and metrics to enable a much more efficient environment for performing problem determination. The current solution in this area is IBM Operations Analytics for z Systems. IOA for z Systems integrates with ITM/OMEGAMON and Network Operations Insights. Optimize – Provides analytics for both Business and IT. Capacity Management Analytics (CMA) for z/OS, is a suite that includes SPSS, Cognos and TDSz. CMA enables customers to forecast capacity and more recently provides a feature for forecasting the 4 hour rolling average enabling customers to manage subcap pricing. Predict Search & Analyze Optimize IBM zAware IBM Capacity Management Analytics (CMA)   IBM Operations Analytics for z Systems 6

7 IBM Operations Analytics for z Systems
Accelerate problem isolation and identification … Reduce mean time to repair Analyze various types of data (logs, metrics, events, trouble tickets) from multiple sources (mainframe and distributed) Locate problems from system, configuration, software logs and performance metrics using rapid index search and pattern analysis Isolate issues across various domains including OS, Middleware, applications, etc. Leverage Expert Advice via links to support documentation and operations notes to resolve problems quickly Visualize search results with analytic tools to rapidly determine root cause Out-of-the-box analysis and insights for z/OS, WebSphere, DB2, CICS, IMS, MQ, Network, Security as well as distributed systems Enable early error detection and broaden scope of automation with event notifications Fully customizable to meet your needs in 2015 Network insights Security insights Event notification Hadoop support Analysis of performance metrics (SMF real time Data Provider) Integration with existing Service Management tooling (Automation, Monitoring, Event and Incident Management) Role-based access control Multi-time zone support Main Point: Search and analysis is the primary focus for Log Analytics and IBM Operations Analytics – Log Analysis provides this capability. This tool will enable you to perform problem determination and resolution more quickly and will ultimately decrease Mean Time To Recovery (MTTR). The Log Analysis server runs on Linux on x Systems or Linux on z Systems. The server can consume logs from multiple sources (distributed and mainframe systems), enabling users to search and analyze log data from all components of your cross-platform workloads or from all the log sources in your enterprise if you so choose. Customers are already seeing value from Analytics – One of the key values with IBM Operations Analytics is the ability to create Insight Packs designed to analyze specific logs. The offering named IBM Operations Analytics for z Systems includes the Log Analysis server as well as z/OS Insight Packs that enable search and analysis for z/OS logs and performance metrics. The initial release of the z/OS support was provided in March, 2014 under the product names ‘IBM SmartCloud Analytics - Log Analysis z/OS - Insight Packs – SYSLOG V1.1’ and ‘IBM SmartCloud Analytics - Log Analysis z/OS - Insight Packs - IBM WebSphere® Application Server V1.1’. Subsequent releases were named with the SmartCloud brand until April, 2015 when Version 2 of the product was rebranded to IBM Operations Analytics for z Systems V2.1. IBM Operations Analytics for z Systems provides the following: • Ability to collect z/OS logs across the enterprise and stream the logs to the Log Analysis server for the server to index and analyze. • Ability to index, search, and analyze application, middleware, and infrastructure log data across System z enterprise. • Ability to quickly search and visualize errors across huge volumes of log records. • Advanced search and text analytics across large volumes of data. • Expert advice by linking search results to available best practices and recommended resolution documentations. • Near real-time streaming of z/OS logs. The z/OS support consists of the following components: • z/OS log forwarder that is installed on the required z/OS LPARs where the logs are to be collected and forwarded. • SMF data provider that is installed on the required z/OS LPARs where SMF performance metrics are to be collected and forwarded. • Insight Packs to provide the index, search, and domain insights capability for logs and performance metrics. Search is provided for all messages in the logs and you can choose to search one or more or all logs. The user can also specify a timeframe of the search to help narrow the focus to the time period when the error occurred. The Insight Pack surfaces patterns as the logs are searched, enabling the user to quickly focus on errors and drill down to the offending problem area. IBM Operations Analytics for z Systems provides out-of-the-box insights and application views for z/OS, WebSphere, DB2, CICS, IMS and MQ with the addition of Network Insights in V2.1. Also in V2.1, we have included initial support for consuming and analyzing performance metrics using our SMF Data Provider component. The user interface is customizable such that users can build their own application views and create and save environment-specific queries. The search language is text based and easy to use, and users can easily create and save simple or complex search strings with minimal typing. The tool is helpful to novice as well as experienced users. Online help, product documentation and product videos are easily accessed from the Getting Started page. 5698-AAP V2.1.0 IBM Operations Analytics for z Systems Large Insurance Company – Customer story 1 Quote: “This tool can really save a pile of diagnostic time! “ Customer experienced a problem that took 29 hours to debug. This process required time from both IBM (Level 2) and multiple employees from that company. The account team contacted the IBM development team and described an outage at the customer site. The development team received the Syslogs from the customer, fed them into Operations Analytics Server and immediately saw the high volume of error messages on the two LPARs (thousands of error messages were Severe errors). Most errors were in DB2 and MQ. The development team immediately noticed the high volume of some very specific messages (mostly DB2). The Log Analysis Application views graphically displayed the message peeks (as compared to normal message flows). ‘Needles’ (error messages) in the haystacks (LPARs) were immediately evident through visual representation of the message spikes. Ultimately, the problem was caused by a bad PTF that was applied as part of a z/OS maintenance window. The Expert Advice feature was used to pinpoint the relevant maintenance to fix the problem (based on the error messages that were generated). One member of the development team was able to pinpoint the problem using IBM Operations Analytics for z Systems in under 30 minutes … It went from 29 hours to 29 minutes. Moral of the story - IBM Operations Analytics for z Systems would have helped decrease the amount of time required for problem determination. The log analysis provided by IBM Operations Analytics for z Systems would have highlighted the high volume of error messages visually (in both the application views AND the insights (message pattern detection) to determine the scope of the problem (ie which systems are affected) and identify which additional components are affected (ie MQ, IMS, CICS, etc.). Once the focus was narrowed down to the problem area, the Expert Advice feature was used to perform a quick search of the IBM support site to identify a fix for the problem (PTF, technote, white paper, etc.). Another Insurance Company – Customer story 2 Quote: “This tool can quickly prove it is not my fault!” The DB2 support team within the customer shop often spends many hours isolating problems to discover it is not in fact a DB2 problem and needs to be routed to another group. In this specific case in point, there were serious MQ errors and the DB2 team spent hours isolating the problem as an MQ problem. With IBM Operations Analytics for z Systems, it was proven that the team could have gone directly to the source of the issue immediately. This would have saved them hours, and cumulatively days, of spinning unproductive cycles and they could have routed the issue to the internal MQ support team immediately. Large Bank – Customer Story 3 Quote: “Faster than a speeding Bullet! “ Customer is running a WAS-based On-line Banking Application in a couple of datacenters. Often when they receive a trouble ticket from their external customer (i.e. the user of their online banking application), they cannot determine which datacenter originated the error messages. With IBM Operations Analytics for z Systems’ ability to consolidate logs, they stated they could reduce their initial isolation time significantly (maybe 50%) Government Agency IT department - Customer story 4 Quote: “Talk about Time to Value! “ In a recent customer engagement, the client was able to download, install and configure the solution and had an operational environment in 2.5 hrs! SEARCH ANALYZE Launch to Support Doc RESOLVE INTEGRATE 7 7

8 … IBM Operations Analytics Architecture and Flows … … Mainframe z/OS
NetView Message Provider 8 IBM Operations Analytics Architecture and Flows Mainframe z/OS SMF Data Real-time Data Provider SMF WAS SYSPRINT WAS SYSOUT Operations Analytics Server Alert Actions z/OS Syslog Applications Search Index z/OS Log Forwarder CICS MSGUSR CICS EYULOG Log USS Log Files Joblogs Annotators Alerts Current/ Archive Tier Hadoop Tier Generic Receiver NetView Message Gatherer NetView Netlog Insight Pack (z/OS) Script Indexers Other Logs Distributed Systems Insight Packs EIF z/Linux File Agent Logstash Log or WAS SYSPRINT If you’re presenting to a customer that only cares about consuming mainframe data, then you should use this slide. There is another slide in backup that provides a more complete picture because it includes data coming from OMNIbus and distributed systems as well as z/OS. Note that Syslogd falls under USS Log Files. Distributed systems logs, insight packs, toolkits, etc. are documented here: Hadoop (frozen tier) and alerting is included in the 1Q, 2015 version of the IOA server. SNMP WAS SYSOUT DB2 DB2 App The IBM Operations Analytics server is installed on z System (or x System) running Linux (64 bit) z/OS Insight Packs are installed on the IBM Operations Analytics server z/OS Log Forwarder / SMF Data Provider installed on each z/OS LPAR where you want to provide Search and Analysis Syslog Web Access Log Other Logs

9 Simple search interface EASY to customize
Log data is analysed and insights are surfaced as you search Find problems you didn’t know existed Save My Search Timeframe Enter search string Search specific logs or ALL logs Quick Searches, Analysis, Annotations, Patterns, Expert Advice, Dashboards will populate the Navigation tree

10 Easy to use – Quick Search
Domain-specific ‘Quick Searches’ available out-of-the-box or create and save your own Provided with every z/OS Insight Pack Provided by subject matter experts, support teams and customers Immediate value out of the box Easy to modify or create and save your own

11 Dashboards, Information Links and Expert Advice
Visualize the data with Dashboards Quick links to additional information and support documents. Provided with every Insight Pack Expert Advice to access white papers, tech notes, APARs, etc. for faster problem resolution Dashboard views created by subject matter experts, support teams and customers Immediate value out of the box Easy to modify or create and save your own

12 Search for expert advice with the click of a button
Quickly and easily access IBM Support Portal based Expert Advice from Log Analysis Search for expert advice with the click of a button All IBM support site documents that reference messages from search results Launch to Tech Note

13 Analyze logs as you Search
Insights are surfaced automatically as you search. Patterns are surfaced based on the log type. Provided with every Insight Pack Logs are analysed automatically Log data is categorized by hostname, data source, message type, message source, etc. Patterns/Insights are surfaced to help you focus on the source of the problem. For example, log analysis automatically surfaces java exceptions in application logs. Perform searches and analyse multiple logs, organized per the needs of your enterprise. Create your own Insight Pack for any text logs with time stamps

14 Sample dashboard View your log and metric data however you like
Presenter name here.ppt BA Cognos 10 Template 4/15/2018 Sample dashboard View your log and metric data however you like Doesn’t need to be stuck w textual, can do visuals/graphs

15 Sample dashboard View your log and metric data however you like
Out-of-the-box dashboards (Example – Display message counts and java exceptions) OR Build Your Own Dashboard with the click of the mouse

16 Visualizing the Data Search and Analyze SMF Data (New in 2015)
Analyze your SMF data AND your log data for a complete view of the enterprise. CPU utilization, Working Set Size, Paging & IO Rates

17 Create your own – Queries, Dashboards, Feeds
Out-of-the-Box capabilities provide immediate value. Additionally, IOA can easily be tailored to your specific needs. Perform simple free-form searches using the standard set of search keywords and operators Build complex queries with range searches and DateMath functions To learn more, consult Online Help available from the Learn More → Search Bar → Search query syntax menu: BYOD – Bring your own Data – The z/OS Log Forwarder can be configured to forward your text logs to enable Search, Analysis, Dashboards and Expert advice. BYOIP – Build your own Insight Pack BYOV – Build your own Views (Graphs, Charts and Dashboards)

18 Customer Experiences Large Insurance Company (29 hours down to 29 minutes) Experienced an application outage that resulted in the team working around the clock for 29 hours. Multiple customers and IBM support staff poured through logs and traces to determine the root cause of the issue. After the issue was resolved, the logs were captured and sent to IBM lab for analysis using IBM Operations Analytics for z Systems. Within minutes, the IBM team was able to focus in on the root cause of the problem and to find the relevant PTF to resolve the issue through the integrated expert advice. State Agency (up and running in 2.5 hours) Were able to download, install, configure and use IBM Operations Analytics for z Systems to search their logs in 2.5 hours. Numerous Customers (improve visibility and find problems you weren’t aware of) Errors lurking in logs that are never examined because they don’t necessarily cause SLA or performance problems. For example, IBM Operations Analytics for z Systems found Over 4,000 invalid login attempts in a three day period that had otherwise gone unnoticed. MQ channel errors causing MQ errors in logs from distributed systems – not being monitored SQL errors in multiple logs

19 1919 New capabilities in 4Q, 2015 General capabilities (delivered via IBM Operations Analytics – Log Analysis and included with IOAz) Additional real-time alerting actions: SNMP Traps, EIF Events Role-based access control Support for multiple time zones and time intervals Service Desk Extension: Incident and service request analytics z/OS capabilities (included in the z/OS Insight Pack) Additional CICS insights from SMF 110 and EYULOG Additional network insights from NetView netlog Security insights Pattern-based configuration for z/OS Log Forwarder job log data gatherer Additional out-of-the-box searches for DB2 and MQ Translation of z/OS Insight Packs (English + 10 languages) and documentation M ain Point: Analytics is now a key part of what customers are looking to improve on. As we have seen, analytics can help increase business value and IT metrics. A nalytics is about: 1 . Predict problems and anomalies – Current product is OMEGAMON V5.1.1 with IBM zAware support and NetView which also includes IBM zAware 2 . Search for information, including logs – The current product in this area is SmartCloud Analytics – Log Analysis 3 . Optimize analytics for both Business and IT – Capacity Management Analytics (CMA) for z/OS, is a suite that includes SPSS, Cognos and TDSz. I BM SmartCloud Analytics - Predictive Insights R educe outages and increase service performance with predictive problem detection I BM® SmartCloud® Analytics – Predictive Insights can provide early problem detection to predict application or middleware problems before they impact service. The software helps you avoid application outages and increase service performance. I BM SmartCloud Analytics – Predictive Insights helps you: A void outages to increase application availability and reduce service degradation. P erform faster root cause analysis to isolate problems sooner. R educe operational costs without the need for complex service models or specialized skills. 19

20 Alerting actions: SNMP Traps, EIF Events
IOA now enables you to generate SNMP Traps and EIF Events. This is in addition to existing notifications (text, , etc.) Benefit: Utilize your existing event management tooling to track, highlight, enrich, correlate and act upon conditions that are identified in their operational data by IBM Operations Analytics for z Systems through the use of SNMP Traps, Informs or EIF events. Broaden your scope of automation. Use NetView or other automation tools to take automatic action on any messages or other operational data as long as that data is consumed by IBM Operations Analytics for z Systems. This expands your current automation capabilities to automate on ANY data source that is fed into IBM Operations Analytics. Personas supported: Alice (Subject Matter Novice) Jim (Subject Matter Expert) Zach (Senior Systems Programmer)

21 Role-based access control and audit
Benefit: Role-based access control and auditing capabilities enable customers to maintain compliance with their data segregation and access control requirements. It is of special interest for service provider environments in which segregation of data is of particular importance. Personas supported: Alice (Subject Matter Novice) Eric (Application Developer) Jim (Subject Matter Expert) Zach (Senior Systems Programmer)

22 Support for multiple time zones and time intervals
Benefit: All users connected to a single IOA Log Analytics server, regardless of their location, are able to view search results and graphs in their local time zone or in a different time zone of their choice. This new capability is particularly helpful for teams that are distributed across multiple time zones. Applications can now specify more than a single occurrence of a relative time interval. Instead of specifying “Last Day”, applications can specify “Last 3 Days” for example. Personas supported: Alice (Subject Matter Novice) Eric (Application Developer) Jim (Subject Matter Expert) Zach (Senior Systems Programmer)

23 Integration with Service Management Solutions
IOAz integrates with Monitoring, Automation and Event Management Automation NetView / SA (or other Automation tooling) Receive and enrich, action or forward Events from ANY log source (not just Syslog) Event Management Netcool Operations Insights (NOI) Launch to IOAz to analyze logs and metrics (IOA is included with NOI) Search and analyze Events Receive, correlate, enrich and action Events from IOAz (NOI or other Event Management System) Incident Management IBM Service Desk (or other incident management / trouble ticketing solutions) Generate Events to create Trouble Tickets Analyze Trouble Tickets Monitoring OMEGAMON Launch in context to IBM Operations Analytics from OMEGAMON and ITM workspaces OMEGAMON Insight Pack to analyze ITM logs (RKLVLOG) Service Management Unite (included with Performance Management and Service Management Suites) Launch in context to analyze logs and SMF data in context of performance problem diagnosis

24 Event Management and Automation
Using IOAz to broaden the scope of Event Management and Automation

25 Be Proactive! Enhance your Visibility & Automation Capabilities
IOAz can generate notifications for messages from any log in your enterprise. Event processing Generate Events from ANY log message(s) or other data in IOA Notifications can be in the form of: Text message SNMP Trap EIF Event Increase scope of log monitoring and automation Improve event correlation Be Proactive!

26 Getting the most out of IOA notification capabilities
IBM Operations Analytics provides the ability to generate events based on messages, combination of messages over time, number of occurrences, etc. Notifications can be generated from any data source: Messages from Mainframe and Distributed Logs SMF data Events Other Examples include: Send an or text message whenever a specific message(s) is written to a log … For example, message IRRB069I (RACF is being shut down) Generate a SNMP Trap or EIF event when there are more then 500 failed logon attempts in a 30 minute period

27 Send notifications in many forms …
Index alert action (ie send events back into IOA so they can be searched) You can use the index alert action template to index any triggered alerts. / Text alert action You can use the template to send an when a condition is met. s can easily be sent as text messages by most carriers EIF alert action You can use the EIF template to send an EIF formatted event when a condition is met. SNMP Trap alert action You can use the SNMP Trap template to send an SNMP Trap when a condition is met. Script alert action You can use the Script template to execute a custom script when a condition is met. Write to Log alert action You can use the Write to Log template to write an entry to a log file of your choice when a condition is met.

28 Sending Events to any Event Receiver
Management System (NOI, OMNIbus or other Event Management tool) Alert Actions Index Event Receiver Data Source 1 IOA Server Log Ingestion Pipeline Alert Runtime Data Source 2 Automation (NetView/SA or other automation tool) EIF Data Source N Event Receiver SNMP IOA can generate standard SNMP Traps and/or EIF Events that can be received and processed by ANY Event Receiver. Any Event Processor Script Event Receiver 28 28

29 Event Configuration is Simple
From IOAz Specify the message or messages to trigger the Notification Choose the event criteria (message IDs, number of occurrences, time period, etc.) Specify address of Event Receiver hostname/port or address From your automation tool Create an automation statement(s) to: Enrich the Event Forward the Event Automate to correct the problem Other Increase the scope of automation to include ANY log message Most z/OS automation tools are limited to z/OS Syslog and Console messages From your Event Management tool Enrich the Event Correlate with other Events and Log Messages Automate to correct the problem Create Trouble Tickets Other Scenario: MQ environment spanning z/OS and Distributed systems. MQ channel goes down. MQ message is written to distributed system log. IOAz triggers an event from the message in the distributed log Event is sent to z/OS automation tool (ie NetView / SA) Automation restarts the MQ channel. Failure is resolved quickly, avoiding an actual problem. Correlate z/OS Events with Events from distributed systems to resolve problems end-to-end

30 Event driven automation scenarios
3030 Event driven automation scenarios There are many scenarios where events can drive automation. Prior to IOAz, these scenarios were limited to events being driven from Syslog, because most z/OS automation tools only monitor the z/OS Syslog. Since IOAz has access to many more logs than Syslog, we now have the ability to drive automation from messages coming from other logs and even other platforms. We have included just a few examples in the subsequent slides. The possibilities are endless. Benefit: The subject matter expert can now access messages from ANY log in the enterprise Events coming from IOA can be consumed by ANY Event receiver to automate, enrich, correlate or forward Events or generate trouble tickets Events can be generated in SNMP or EIF format. As a result, the events can be consumed by any Event Receiver (Event Management or Automation tool). Since IBM Operation Analytics for z systems can generate events from ANY message it consumes and NetView can act as an event receiver, NetView can now automate on ANY log message (not just messages from Syslog). This scenario will work with any automation tool that can drive automation from events. This feature will enable customers the ability to ‘TAKE ACTION’ on any messages being consumed by IBM Operation Analytics for z Systems. Optimized Performance 30

31 Alerting actions: SNMP Traps, EIF Events
Sample scenario for MQ WebSphere MQ channel stopped abnormally MQ server runs on Windows with a MQ channel defined to MQ running on z/OS. MQ server detects that the MQ channel to z/OS is not active and writes error messages to the Windows MQ AMQError log. Subsequent MQ communications fail. Without IBM Operations Analytics for z Systems: The ‘Channel down’ message is never proactively observed and the support team(s) struggle for hours to debug the problem and finally re-initiate the Channel. With IBM Operations Analytics for z Systems: IBM Operations Analytics for z Systems detects the problem through MQ error messages written to the Windows MQ AMQError log. IBM Operations Analytics for z Systems generates an SNMP Trap (or EIF event) and forwards it to NetView (or other automation solution). Automation is driven from this event and resolves the problem by issuing a command to restart the MQ channel. Customer Scenario (prior to using IOAz) MQ outage caused several hours of downtown and application failures. Multiple SMEs worked on the issue. MQ issues are often hard to debug. Environment (with IOAz) IOA server (running on System x or System z) receiving data from multiple sources MQ server running on Windows server Log File Agent (LFA) sending log data from Windows server into IOA server NetView is running on z/OS and is driving Event and Message automation (Note that this could be ANY automation tool that can act as an Event receiver) Scenario Overview (with IOAz) MQ channel defined to z/OS system and MQ server on Windows stops abnormally. MQ server generates ‘channel down’ message (AMQ9999). LFA sends AMQ9999 message to IBM Operations Analytics server IBM Operation Analytics sends SNMP trap (or EIF event) to NetView NetView issues command response to restart MQ channel Outage avoided with IOAz!

32 Alerting actions: SNMP Traps, EIF Events
Sample scenario for DB2 DDF DB2 DDF applications timed out DB2 runs on z/OS; IBM Operations Analytics for z Systems collects DB2MSTR address space log. The customer applies bulk maintenance for z/OS and DB2 over the weekend. After application of maintenance, DB2 DDF applications experience time-outs. Without IBM Operations Analytics for z Systems: Because maintenance occurs on a Saturday, operators do not catch the resulting problem until later. The DBA is notified on Saturday evening, a PMR is opened against IBM DB2, and diagnostics are started with the DB2 and TCP/IP L2 teams. By Monday morning, none of the agents can run transactions. DB2 and z/OS maintenance have to be backed out. With IBM Operations Analytics for z Systems: IBM Operation Analytics is able to detect the time-out problem immediately after the maintenance is applied. Operators are notified immediately and are able to determine the root cause of the issue. End users do not experience downtime when they come into work on Monday morning. Customer Scenario (prior to using IOAz) Customer applied z/OS and DB2 maintenance during weekend maintenance window. After the maintenance was applied, DB2 DDF applications started to fail due to ‘time-outs’. DBA was finally notified on Saturday evening, after several hours of failures. DB2 and TCP/IP level 2 teams tried to debug the problem. By Monday morning, all transactions were failing. DB2 and z/OS maintenance had to be backed out. Environment (with IOAz) IOA server (running on System x or System z) receiving data from multiple sources DB2 is running on z/OS z/OS Log Forwarder sending DB2MSTR address space log data into IOA server NetView is running on z/OS and is driving Event and Message automation (Note that this could be ANY automation tool that can act as an Event receiver) Scenario Overview (with IOAz) DB2 errors written to DB2MSTR address space log after maintenance is applied z/OS Log Forwarder sends messages from DB2MSTR address space log to IBM Operations Analytics server IBM Operation Analytics receives DSNL511I, IXL043I and other DB2 failure messages and sends SNMP trap (or EIF event) to NetView NetView issues commands to collect additional data and forwards the Event to the Event Management system so a trouble ticket can be created for the SME Issue reported immediately with IOAz. Maintenance backed out. Problem avoided!

33 Log Analysis and Event Management in Netcool Operations Insight (IOA is included in the box with NOI) Search and analyze events, logs and metrics using IOA and Netcool Operations Insight. Easily identify ‘related’ Events that may be candidates for suppression Identify “difficult to spot” seasonal events that often result in regular periodic problems Easily identify which events occur in clusters Leverage visualizations that help you quickly isolate more sever and significant problems. Also provides opportunities for event reduction thus improving operational efficiency. 33

34 Log Analysis – Streamline Incident Management
Incident Management The traditional incident management process usually begins with one or more trouble tickets being opened for an incident (for example, slow response time for a specific application). The first step is to engage the application support team and associated Subject Matter Experts for each of the application components (WebSphere, CICS, DB2, etc.). Each SME examine data from their specific subsystem and we usually experience a phenomenon commonly referred to as ‘ticket hopping’. During the ticket hopping phase, the trouble ticket will be reassigned multiple times before it lands on the correct SME’s lap. Over the lifetime of the incident, there is very little collaboration with respect to data and there’s usually a fair amount of ‘finger pointing’. In the post mortem session, we usually conclude that the ‘time to resolution’ is very high and so is the number of people involved in the process of diagnosing the problem. With IBM Operations Analytics for z Systems: IBM Operation Analytics will provide a unified view of the data, enabling the application support team to quickly focus on the problem component. The ability to search and analyze the data helps to quickly identify the problem area and the expert advice feature assists in finding the solution or workaround. If an SME is needed for a specific component, you can transfer the ticket to that SME with the data that was surfaced by IOAz. Post mortem reveals that time to resolution is significantly decreased by as much as 50% with less involvement by the SME community. To be more proactive and improve mean time to recovery even more, the team can incorporate the use of IOA notifications to immediately notify (Text, , SNMP Trap or EIF Event) that a problem is occurring. Early detection will significantly decrease time to resolution and automation can be triggered to resolve the issue before the problem affects the end user. I would like to introduce to you couple of solutions which demonstrate the use cases of IT Operations Analytics. Firstly, we will talk about Log Analysis Solution. If we take the example of a traditional incident lifecycle, we see that users report issues to service desk or monitoring tools generate events. Operations team (L1 support) assigns the incident to a resolver group. Subsequently the first resolver group engages other teams to drive incident troubleshooting and resolution. This is a time taking process as each of the teams perform troubleshooting in silos and do not have a unified view Log Analysis Solution ingests system and sub-system logs from infrastructure and application components to provide unified time sequenced view of logs with the ability quickly search thru massive amount of data for specific issues. Log analysis enables the team to identify when and where the error happened. This drives swift engagement of the right resolver team/s in parallel. The key differentiator is reduction in time to isolate and resolve problems. 34

35 Integration with Performance Monitoring
OMEGAMON + IBM Operations Analytics – Launch in Context from TEP The One Two – Punch: Combine two very powerful tools to ensure performance and high availability of your enterprise. Perform log analysis in context of OMEGAMON workspaces – This approach enables OMEGAMON users to perform in-context log analysis while doing problem determination From your OMEGAMON workspace, use the IOA search bar to search logs (using LPAR or Sysplex as the default context) Easy to implement - Configure TEP to display the IOA search bar Launch IOA from OMEGAMON performance monitoring workspaces to search logs in context You need to install the following maintenance to enable the TEP launch-in-context to Operations Analytics for z Systems Required changes to distributed components: ITM TEPS: Provisional fix TIV-ITM-FP0004-IV67740 Obtain FP5 fix by subscribing to: Required changes to z/OS components: PARMGEN: FMID HKCI310, Interim Feature APAR OA46184 (PTF UA76016) Obtain fix: ITM 630 z/OS TEMA update FMID HKDS630, APAR OA46976 (PTF UA76202, , available 2/28/15) Obtain fix: OMEGAMON XE for WebSphere MQ Monitoring: FMID HKMQ730, APAR OA46839 (PTF UA76091, available 2/28/15) Obtain fix: OMEGAMON XE for WebSphere Message Broker Monitoring: FMID HKQI730, APAR OA46840 (PTF UA76092, available 2/28/15) Obtain fix: OMEGAMON XE for Storage: FMID HKS3530 APAR OA46871 Subscribe and obtain fix: 35

36 Search and Analyze Operational Data in Context
Select a row first. In this example, a row specifies a Queue Manager. Specify a search string and timeframe to analyze operational data from the appropriate system(s)

37 Analysis of Operational Data
Launch into IBM Operations Analytics to analyze logs and other operational data to gain additional perspective and insights and help diagnose root cause. IBM Operations Analytics analyzes log, metric and event data and surfaces insights Built on industry expertise Expert Advice for faster time to resolution Expand analysis to include additional data sources (from mainframe and distributed systems)

38 Integration with existing Service Management solutions
(in a nutshell) IBM zAware POWerful tools integrate to ensure performance and high availability of your enterprise. Surface anomalies Automation & Problem Determination NetView Performance Monitoring ITM/OMEGAMON Event Management OMNIbus/NOI Incident Management Control Desk Alert, enrich, correlate and automate End of presentation. Service Management Unite Search and analyze logs, metrics, events and incident reports Launch from ITM, OMEGAMON, Service Management Unite & NOI 38 IBM Operations Analytics 38

39 Send us your logs! Or Take IOAz for a Test Drive
Request a product demo using logs from your own test, development or production environments IBM will load your logs into an IBM Operations Analytics server, then demo the results back to you A secure, dedicated drop box will be assigned to you You will be sent detail upload instructions via Any file uploaded will be automatically moved to a dedicated IBM Operations Analytics environment within 24 hours All log data will be purged from the IBM Operations Analytics environment within 48 hours after the demo event To request your hosted demo, visit: Or Take IOAz for a Test Drive A guided demo is provided online at:

40 IOA for z Systems Early Access and Beta Program https://ibm.biz/BdEkZV
Announcing the IBM Operations Analytics for z Systems Early Access and Beta Program! In 2015, we built on the strong foundation established over recent months as we develop and implement our product roadmap. We are looking for customers and business partners worldwide who would like to help influence our roadmap and test new capabilities. The program is open-ended; interested participants may join at any time and stay on as long as they wish. That said, it is our desire to establish a set of “customer sponsor” relationships that will become instrumental in shaping the future of our offering. To see the full program announcement, and to learn how to sign up, please visit us in our developerWorks community at:

41 Additional IBM Operations Analytics Reference Material
Analytics Overview Video IOA for z Systems videos: Overview: Domain Insights: Installation and Configuration: IOA for z Systems Documentation Knowledge Center: IOA – Log Analysis (server) Documentation Service Management Connect Knowledge Center

42


Download ppt "IBM Operations Analytics for z Systems Transforming Data into Insights The Next Generation of IT Service Management."

Similar presentations


Ads by Google