Improved Scripting of IDS Alarms and Events Thomas Horner Senior DBA/S1 Corporation Informix User Forum 2005 Moving Forward With Informix Atlanta, Georgia.

Slides:



Advertisements
Similar presentations
DBA Intranet Web - based monitoring tool Jan Dryak System Support Specialist - DBA Informix DHL Information Services (Europe) s.r.o. Session A04 Day 4/22.
Advertisements

Networking Essentials Lab 3 & 4 Review. If you have configured an event log retention setting to Do Not Overwrite Events (Clear Log Manually), what happens.
Welcome to the Award Winning Easiest to Use & Most Advanced View, Manage, and Control Security, Access Control, Video, Energy & Lighting Systems, & Critical.
OVERVIEW TEAM5 SOFTWARE The TEAM5 software manages personnel and test data for personal ESD grounding devices. Test and personnel data may be viewed/reported.
Week 6: Chapter 6 Agenda Automation of SQL Server tasks using: SQL Server Agent Scheduling Scripting Technologies.
DB2 Tools Pertemuan 3 Matakuliah: T0413 Tahun: 2009.
The Components There are three main components of inDepth Lite, inDepth and inDepth+ Real Time Component Reporting Package Configuration Tools.
SQL Server 2005 Implementation and Maintenance Chapter 10: Maintaining and Automating SQL Server.
ENVELOC GENERAL CONFIGURATION DEMO Contact Information Billing Information Backup Time Days to Backup Backup Selections Files to Exclude Wireback (Archive)
Calendar Browser is a groupware used for booking all kinds of resources within an organization. Calendar Browser is installed on a file server and in a.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 5: Planning, Configuring, And Troubleshooting DHCP.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 11: Monitoring Server Performance.
Chapter 14 Chapter 14: Server Monitoring and Optimization.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 12: Managing and Implementing Backups and Disaster Recovery.
Transaction log grows unexpectedly
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Hands-On Microsoft Windows Server 2008 Chapter 11 Server and Network Monitoring.
CH 13 Server and Network Monitoring. Hands-On Microsoft Windows Server Objectives Understand the importance of server monitoring Monitor server.
Windows Server 2008 Chapter 11 Last Update
Sharepoint Portal Server Basics. Introduction Sharepoint server belongs to Microsoft family of servers Integrated suite of server capabilities Hosted.
NovaBACKUP 10 xSP Technical Training By: Nathan Fouarge
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 12: Managing and Implementing Backups and Disaster Recovery.
Backup & Recovery 1.
Managing DHCP. 2 DHCP Overview Is a protocol that allows client computers to automatically receive an IP address and TCP/IP settings from a Server Reduces.
© 2011 IBM Corporation 11 April 2011 IDS Architecture.
Module 8: Server Management. Overview Server-level and instance-level resources such as memory and processes Database-level resources such as logical.
CN1276 Server Kemtis Kunanuraksapong MSIS with Distinction MCTS, MCDST, MCP, A+
IBM Software Group Washington Area Informix User Group Forum 2004 The DB2 DBA Checklist Dwaine R Snow, DB2 & Informix.
FireRMS SQL Audit, Archiving & Purging Presented by Laura Small FireRMS Quality Assurance.
DBSonar - Slice and Dice Performance Tuning and Management for IDS 7.x, 9.x and 10.x Eric Lam CTO/Cobrasonic.
Database Monitoring with BusyBee Agenda  What is BusyBee ?  Architecture  XML Configuration File  Domain Inspectors  Alert Examples  Interface to.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 12: Managing and Implementing Backups and Disaster Recovery.
Chapter 9 Scripting RMAN. Background Authors felt that scripting was a topic not covered well Authors wanted to cover both Unix/Linux and Windows environments.
Chapter 16 Designing Effective Output. E – 2 Before H000 Produce Hardware Investment Report HI000 Produce Hardware Investment Lines H100 Read Hardware.
CSE 781 – DATABASE MANAGEMENT SYSTEMS Introduction To Oracle 10g Rajika Tandon.
LiveCycle Data Services Introduction Part 2. Part 2? This is the second in our series on LiveCycle Data Services. If you missed our first presentation,
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
Informix IDS Administration with the New Server Studio 4.0 By Lester Knutsen My experience with the beta of Server Studio and the new Informix database.
Event Management & ITIL V3
© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 1 © 2012 Cisco and/or its affiliates. All rights reserved. 1 Voice Mailbox.
Learningcomputer.com SQL Server 2008 – Administration, Maintenance and Job Automation.
Oracle Data Integrator Procedures, Advanced Workflows.
Module 16: Performing Ongoing Database Maintenance
® IBM Software Group © 2005 IBM Corporation © IBM Corporation 2004 Informix Table Level Point in Time Restore for IDS and XPS John F. Miller III.
A Brief Documentation.  Provides basic information about connection, server, and client.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
© 2006 Cisco Systems, Inc. All rights reserved.1 Connection 7.0 Serviceability Reports Todd Blaisdell.
Week 7 : Chapter 7 Agenda SQL 710 Maintenance Plan:
© 2008 IBM Corporation November 17, 2015 Informix Administration Overview John F. Miller III March 2008.
© IBM Corporation 2005 Informix User Forum 2005 John F. Miller III Explaining SQLEXPLAIN ®
EPICS Release 3.15 Bob Dalesio May 19, Features for 3.15 Support for large arrays - done for rsrv in 3.14 Channel access priorities - planned to.
1 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public Cisco Unity Connection Notification Jane Rygg Core Services.
High Availability in DB2 Nishant Sinha
Module 12: Configuring and Managing Storage Technologies
Chapter 1Oracle9i DBA II: Backup/Recovery and Network Administration 1 Chapter 1 Backup and Recovery Overview MSCD642 Backup and Recovery.
ASP-2-1 SERVER AND CLIENT SIDE SCRITPING Colorado Technical University IT420 Tim Peterson.
Interactions & Automations
1 A Look at the Application Authorized users can access Communicator! NXT from any Internet-capable computer via the Web.
Internship with Contemporary Technologies (Remote DBA Experts) Jenna LuttonFebruary 1, 2007.
© 2009 IBM Corporation IWS z/OS SPEs Auditing enhancements.
Lawson Mid-America User Group Spring 2016 Meeting.
You Inherited a Database Now What? What you should immediately check and start monitoring for. Tim Radney, Senior DBA for a top 40 US Bank President of.
SQL Database Management
Essentials of UrbanCode Deploy v6.1 QQ147
Archiving and Document Transfer Utilities
IBM Tivoli Support Technical Exchange Event: Troubleshooting Gateways for TNPMW By: Sharina Shahir September 17, 2009.
IBM INFORMIX online Training in Hyderabad
The Ultimate Maintenance Plan By Ed Roepe Perimeter DBA, LLC
Database Backup and Recovery
Ch 10. Maintaining and Automating SQL Server
Presentation transcript:

Improved Scripting of IDS Alarms and Events Thomas Horner Senior DBA/S1 Corporation Informix User Forum 2005 Moving Forward With Informix Atlanta, Georgia December 8-9, 2005

December 8-9, Overall Objectives Enhancements to the supplied scripts Help prevent unnecessary late night page or cell phone call Be proactive in monitoring of dbspaces Same shells can be used for 7.x, 9.x, and 10.x IDS engines

December 8-9, Presentation Overview What does IBM/Informix supply? Purpose of these custom shells Overall design of the shells Details of the alarm shell Changes made to evidence shell Details of the “LookatSpace” shell Other shells I use for administration Limitations of these shells

December 8-9, IBM Supplied Scripts alarmprogram.sh, log_full.sh, no_log.sh, and evidence.sh supplied by IBM/Informix IDS 9.4+ and 10.x alarm program is improved over the older versions –it gathers additional data for certain alarms –it sends to and/or pages DBA –it recognizes the automatic log alarms First two functions are in my alarm shell, but not the last one

December 8-9, IBM onconfig Parameters ALARMPROGRAM onconfig parameter –set to appropriate value (full path name) ALRM_ALL_EVENTS onconfig parameter –set to 1 SYSALARMPROGRAM onconfig parameter –set to appropriate value (full path name) DYNAMIC_LOGS onconfig parameter –this needs to be 1 or 0 for my alarm shell –all available space in log dbspace allocated up front –this is a design decision

December 8-9, Purpose of these Shells Alarm Shell –combines functions of the “default” programs and adds features Evidence Shell –match design of this program with the alarm program changes LookatSpace Shell –gives DBA an “advance” notice of possible space issues

December 8-9, Purpose of these Shells Other Shells used to monitor and administer the databases: –check database shell – quick check of engine status –onchecks shell – perform oncheck commands weekly –update statistics shell – perform scheduled update statistics –prune log shell – prune online log and other logs

December 8-9, Overall Design of Shells Alarm and Evidence Shells –add functionality to supplied default programs –do not change how the shells are used by the Informix engine LookatSpace Shell –run on a scheduled basis to check for low space that may not be obvious from simple onstat -d output Other Shells –run on a daily or weekly schedule to perform other administrative functions

December 8-9, Overall Design of Shells All Shells –can be used for multi-instance installations and multiple production databases in one instance –can be used across 7.x, 9.x, and 10.x engines

December 8-9, Installation These are currently installed on four production servers and several test servers on the following versions: –IDS Version 7.24 on HPUX –IDS Version 9.21 on HPUX Other installations are successfully using them (based on s I have received) Requires notification means to DBA team and to the Data Center

December 8-9, Alarm Program – Overview Five parameters passed from instance: –Severity (severity) ranges from 1 through 5 –Class_ID (class_id) contains the message ID that caused the alarm –Message (class_msg) contains the actual text of the alarm –Additional Text (specific_msg) –Event File (see_also)

December 8-9, Alarm Program – Functions Added Set the proper level of notification based on alarm severity Prevent overload of machine resources and caused by duplicate or multiple alarms for the same issue Reduce “false” alarms by using mutex files Perform logical log backups using ontape Option for “no notification” Alarm log file used to record alarms and actions

December 8-9, Alarm Changes – Proper Notification Level Severity 1 or 2 –no notification as recommended by IBM/Informix Severity 3 –not critical – is sent to the DBA team –no if class 6, 15, 21, or 23 (more on why later) Severity 4 or 5 –critical – data center is notified for action and an is sent to the DBA team for our records –no notification if class 6, 15, or 21 (more on why later)

December 8-9, Stop Duplicate Alarms Biggest design change I made from the default alarm programs Classes 6, 15, and 21 can cause multiple alarms –class 6 is “non fatal” Internal Subsystem Failure –class 15 is Data Replication Failure –class 21 is Online Resource Overflow Idea for this change came with my first encounter with multiple class 21 alarms –caused by process exceeding available number of locks (version 7.x engine) –hundreds of s received within a minute – OOPS!

December 8-9, Stop Duplicate Alarms (cont’d) Separate section of code to handle classes 6, 15, and 21 Class 23 (logical log backup needed) also has specific section of code to perform log backups Shell uses distinctly named files in /tmp for these three classes of alarms: –/tmp/event${ENV}${FILENO}.`date +%H` Alarm is considered new if this file in /tmp does not exist or if that file is more than one hour old One hour threshold was a design decision

December 8-9, Stop Duplicate Alarms (cont’d) Steps used to handle classes 6, 15, and 21: –if the alarm severity is less than 3, ignore the alarm –if file in /tmp exists and is less than one hour old: consider this a duplicate alarm of this class simply log it –if file in /tmp file does not exist, or the file is more than one hour old, this is first alarm of this class: follow notification protocol create (or update) the /tmp file for this alarm

17 Alarm – Real alarm.log output Fri Jul 19 09:40:24 EDT 2002 alarm.sh got event 21 severity : 3 message : OnLine resource overflow: 'Locks'. additional text: Lock table overflow - user id 106, session id reference file : Fri Jul 19 09:40:30 EDT 2002 alarm.sh got event 23 severity : 2 message : Logical Log Complete. additional text: Logical Log Complete. reference file : Fri Jul 19 09:40:39 EDT 2002 alarm.sh got event 18 severity : 2 message : Log Backup completed: additional text: Logical Log Backup Completed reference file :

18 Alarm – Real alarm.log output (cont’d) Fri Jul 19 09:40:39 EDT 2002 Multiple alarms - class 21, severity 3. Fri Jul 19 09:40:40 EDT 2002 Multiple alarms - class 21, severity 3. Fri Jul 19 09:41:02 EDT 2002 Existing class 21 issue - no notification needed. Fri Jul 19 09:41:03 EDT 2002 Multiple alarms - class 21, severity 3. Fri Jul 19 09:41:03 EDT 2002 Multiple alarms - class 21, severity 3. Fri Jul 19 09:41:05 EDT 2002 Multiple alarms - class 21, severity 3. Fri Jul 19 09:41:05 EDT 2002 Existing class 21 issue - no notification needed. Fri Jul 19 09:41:17 EDT 2002 alarm.sh got event 23 severity : 2 message : Logical Log Complete. additional text: Logical Log Complete. reference file :

December 8-9, Alarm – Perform Logical Log Backups Make sure no other log backup is running: –check for /tmp/ontape.L${ENV}, a mutex file –do not start another log backup and notify DBA team via if it does exist –not considered critical because this can occur normally when logs turn over quickly –create the /tmp/ontape.L${ENV} mutex file if it does not exist and continue If onconfig file has /dev/null for the LTAPEDEV onconfig, run ontape -a to free the log, then exit

December 8-9, Alarm – Perform Logical Log Backups (cont’d) Make sure engine is up using “onstat -” command –if not follow notification protocol (severity is critical) Make sure log backup device is ready –if not follow notification protocol (severity is critical) Determine number of first and last log that will be in this backup file using “onstat -l” command piped to a grep

December 8-9, Alarm – Perform Logical Log Backups (cont’d) Note any “missing” log numbers in log file Perform the actual log backup using “ontape -a” If ontape command fails, follow notification protocol (severity is critical) Move, rename, and compress the log backup file using gzip Remove the mutex file so that the next log backup can run

December 8-9, Alarm – No Notification Option At beginning of alarm program, it looks for file named alarm.nomail in /usr/informix MAILFLAG shell variable is set to “on” or “off” Before every statement where notification is to be sent, the MAILFLAG variable is looked at If MAILFLAG is “off”, do not send or notify Data Center If MAILFLAG is “on”, send and (if critical) notify Data Center You can simply remove the alarm.nomail file to start having notifications sent

December 8-9, Evidence Program – Overview Default (supplied) program is called evidence.sh Normally called by engine when an assert failure occurs to “gather evidence” for use by IBM/Informix support Not supplied with 7.2x engines SYSALARMPROGRAM configuration parameter Twelve parameters are passed to program IBM/Informix recommends not changing the functions of this more complex shell

December 8-9, Evidence Program – Issues Addressed I did change the notification techniques to match those used in the alarm program Added the use of MAILFLAG to stop notification Added notification for warnings ( to DBA team) in addition to failures Put in appropriate values for the environment variables at the beginning of the program I do not the assert failure file (which the default program does) because of its large size Named the program evidence.${ENV} for use in multiple instances

December 8-9, LookatSpace Program – Purpose You may think that you have plenty of free space in a particular dbspace –one table that requests a large next extent can use up all the remaining free dbspace –another table in the same dbspace that also needs additional space can be “out of luck” and a SQL error will be returned to the user This shell looks for this type of situation and s any issues found to the DBA team DBA team then has time to add a chunk to the dbspace before it becomes critical We run this once a week on a scheduled basis

December 8-9, LookatSpace – Program Design Get name of database with the largest table in the instance using sysmaster SQL to get name of production database (assumes only one) Obtain dbspace usage using sysmaster SQL –separate out those that contain blobs for use later Obtain which non-fragmented tables are in what dbspace using SQL Obtain which fragmented tables are in what dbspace using SQL

December 8-9, LookatSpace – Program Design (cont’d) Two lists of dbspaces are created –we do not put non-fragmented and fragmented tables in the same dbspace If dbspace contains no tables or blobs, and has less than 3% free space: –assume that this dbspace contains only indexes –send to DBA team because it is low on space If dbspace has non-fragmented tables: –obtain table space usage and future needs –uses sysmaster SQL

December 8-9, LookatSpace – Program Design (cont’d) If dbspace has fragmented tables: –obtain table space usage and future needs –uses sysmaster SQL If space is more than 80% used, and next extent is greater than free space remaining in the dbspace: –send an to the DBA team If space is more than 95% used, and next extent is greater than available dbspace: –add a warning message to that DBA team

December 8-9, LookatSpace – Program Design (cont’d) If dbspace contains blobs, check free space in dbspace and the number of blobs remaining If space available is less than 3% and number of blobs remaining is less than 20000, send an with warning to the DBA team While the program goes through all these steps, a basic text report (space report) is created If there are no issues to report, no is sent, but the space report is always available for review

December 8-9, LookatSpace – Program Design (cont’d) The report is appended to each week, so a history of space utilization is available for analysis A future enhancement could include looking at the index dbspaces –we have had these unexpectedly fill up when there is more than one large index in the same dbspace Another enhancement can be to write code to analyze the space utilization reports and obtain trending information

December 8-9, LookatSpace – Sample Space is low in DBSpace dbs1 with tables on Tue Sep 27 05:31:00 EDT 2005 for host sf8pdb1, instance sfarm_shm. Table vfmtrnaudactvty next extent of pages will use all free pages in dbs1. Table has pages allocated, pages free, and percent used. Details are located in the /usr/informix/logs/checkspc.out file.

December 8-9, Other Shells I Use Check Database Shell –checks to see if engine is up and active on a scheduled basis –performs log move if requested (uses onmode commands) –log move is run from another shell (to prevent issue in case of hung checkpoint) –log move option is used in our shop for disaster recovery purposes Onchecks Shell –performs basic oncheck commands on a weekly basis

December 8-9, Other Shells I Use (cont’d) Update Statistics Shell –can choose how update statistics is run via input parameters –temporarily changes certain Informix environment variables to improve performance while running update statistics Prune Log Shell –archives various log files monthly –also archives the online.log

December 8-9, Limitations of these Shells The shells (except alarm or evidence) are run on a scheduled basis, not on a demand basis The LookatSpace shell requires that fragmented and non-fragmented tables not be in the same dbspace The LookatSpace shell does not “predict” when index dbspaces will fill up Certain thresholds are “hard-coded” in the shells and may need to be changed for your installation Certain names of files and directories are coded in the shells and may need to be changed for your installation Latest enhancements of data gathering features of 9.4+ supplied alarm program are not in the alarm shell

December 8-9, Review Alarm program –took the IBM/Informix “template” and ideas of others and myself to make it more robust –handles multiple alarms and performs log backups Evidence program –took the IBM/Informix “template” and made notification consistent with the alarm program LookatSpace program –helps the DBA team identify space issues before they impact end user or become an “emergency” Other shells we use to monitor the engines

December 8-9, Questions and Comments? To get a copy of these shells, me at I can package the files and send them to you via . Objective here was to prevent the unnecessary page or phone call, that may result in fixing something that is actually not broken. Proactive monitoring of dbspaces using LookatSpace is better than that 3 am page requiring you to add a chunk. Thank you all for your attention. I hope that these shells enable you to keep better informed about the status of your production systems.

Improved Scripting of IDS Alarms and Events Thomas Horner Informix User Forum 2005 Moving Forward With Informix Atlanta, Georgia December 8-9, 2005