Alerts and Monitors w/ SNMP

Alerts and Monitors w/ SNMP
Symantec OpsCenter Alerts and Monitors w/ SNMP

Covered OpsCenter Monitoring SNMP Veritas Operations Manager Monitors
Other monitors: Command Central ???

Why Monitor? Notifications of “critical” failures
Aggregation of data (graphs, performance) Causal analysis (what failed when/where) Historical performance Recurring failures (e.g. “X on a Tuesday does this”)

What Should I Monitor? K.I.S.S. (to start)
Depends on agency requirements Critical services at the very least More monitoring means more load What constitutes a failure (or need for notification)? Some things are department-driven Director of Y needs to know “blah happened”.

SNMP v. Email Can use both
SNMP software can generate alerts, tickets, etc. SNMP software gathers all data in one place for performance analysis SNMP software can separate out into support groups Aggregation of data, alerts, and failures.

Which SNMP Software to Use
Try before buying (all have online trials) Know your learning curve Nagios (Nagmin, Icinga) Orion (SolarWinds) Net-SNMP (backend) OpenNMS

OpsCenter

OpsCenter Job High Job Failure Rate
An alert is generated when the job failure rate becomes more than the specified rate. Hung Job An alert is generated when a job for a selected policy/client hangs for a specified period. Job Finalized An alert is generated when a job of a specified type of the specified policy/client ends in the specified status. Incomplete Job An alert is generated when a job of a specified type of the specified policy/client moves to an incomplete state.

OpsCenter High Job Failure Rate

OpsCenter

OpsCenter Hung Job

OpsCenter Job Finalized

OpsCenter Media Frozen Media
An alert is generated when any of the selected media is frozen. Suspended Media An alert is generated when any of the selected media is suspended. Exceeded Max Media Mounts An alert is generated when a media exceeds the threshold number of mounts. Media Required for Restore An alert is generated when a restore operation is not running due to non-availability of media. Low Available Media An alert is generated when the number of available media becomes less than the present threshold value. High Suspended Media An alert is generated when the number of suspended media exceeds the predefined threshold value. High Frozen Media An alert is generated when the number of frozen media exceeds the predefined threshold value.

OpsCenter Catalog Catalog Space low
An alert is generated when the space available for catalogs becomes less than the threshold value. Catalog not Backed up An alert is generated when the catalog backup does not take place for a predefined time period. Catalog Backup Disabled An alert is generated when the catalog backup is disabled. Tape Mount Request An alert is generated when a media mount request is pending. No Cleaning Tape An alert is generated when no cleaning tapes are left. Zero Cleaning Left An alert is generated if a cleaning tape has zero cleaning left.

OpsCenter Disk Disk Pool Full
An alert is generated when one or more disk pools are full. Disk Volume Down An alert is generated when the selected disk volume(s) is down. Low Disk Volume Capacity An alert is generated when a disk volume capacity is running low. Drive is Down An alert is generated when a drive in a specified robot/media server in the selected view goes down. High Down Drives An alert is generated when the number of down drives exceeds the predefined threshold value.

OpsCenter Host Agent Server Communication Break
An alert is generated if communication between agent and server is broken. Master Server Unreachable An alert is generated when OpsCenter loses contact with the master server. Lost Contact with Media Server An alert is generated when OpsCenter loses contact with the media server. Service stopped An alert is generated when the selected services stop on any of the servers in the selected view. Symantec ThreatCon An alert is generated when the ThreatCon level is equal to or above the threshold value. Job Policy Change An alert is generated when one or more job policies change.

OpsCenter

OpenNMS Little configuration Fast learning curve
XML based (needs mib2opennms converter)

OpenNMS

OpenNMS Received unformatted enterprise event (enterprise: generic:6 specific:3). 12 args: =“nmsserver03" ="2457 Active Job Completed with Exit Status 58" ="Alert Raised on: May 6, :38 PM .Job: Tree Type : Server .Tree Name : ALL MASTER SERVERS .Nodes : hsmasterserver.Job Policy: Solaris_OS-Deduped .Exit Status: 58 (can' t connect to client ) .Client: hsappserver2 .New State: Done .Alert Policy: Job Failed .OpsCenter Server: vsf02 ." ="Job Failed" ="" ="" ="vsf02" ="" ="" ="" ="Warning" ="Sun May 06 18:38:04 MST 2012"

OpenNMS Need MIB /opt/SYMCOpsCenterServer/config/snmp/VERITAS-REG.mib
/opt/SYMCOpsCenterServer/config/snmp/VERITAS-TC.mib /opt/SYMCOpsCenterServer/config/snmp/VRTS-cc.mib /opt/SYMCOpsCenterServer/config/snmp/cc_trapd.conf

OpenNMS /opt/SYMCOpsCenterServer/config/snmp $ head /opt/SYMCOpsCenterServer/config/snmp/VRTS-cc.mib -- -- defines VERITAS-COMMAND-CENTRAL-MIB MIB -- Copyright (C) by VERITAS SOFTWARE Corporation. -- All rights reserved. VERITAS-COMMAND-CENTRAL-MIB DEFINITIONS ::= BEGIN

OpenNMS Command Central? I thought we were using OpsCenter

OpenNMS /opt/SYMCOpsCenterServer/config/snmp $ head /opt/SYMCOpsCenterServer/config/snmp/VRTS-cc.mib -- -- defines VERITAS-COMMAND-CENTRAL-MIB MIB -- Copyright (C) by VERITAS SOFTWARE Corporation. -- All rights reserved. VERITAS-COMMAND-CENTRAL-MIB DEFINITIONS ::= BEGIN

OpenNMS /opt/SYMCOpsCenterServer/config/snmp $ head /opt/SYMCOpsCenterServer/config/snmp/VRTS-cc.mib -- -- defines VERITAS-OPSCENTER-MIB MIB -- Copyright (C) by VERITAS SOFTWARE Corporation. -- All rights reserved. VERITAS-COMMAND-CENTRAL-MIB DEFINITIONS ::= BEGIN

OpenNMS Much Better!

OpenNMS -- Trap Variables ccTrapVarsGroup OBJECT-GROUP
OBJECTS { alertRecipients, alertSummary, alertDescription, policyName, objectType, collectorName, ccHost, sourceId, ccObject, sampleData, ccAlertSeverity, ccAlertTime } STATUS current DESCRIPTION "Group for CC Trap VarBinds" ::= { ccTrapDefinitionsBranch 100 } ccTrapVarsBranch OBJECT-IDENTITY STATUS current "Branch of the CC MIB for VarBind Definitions" ::= { ccTrapDefinitionsBranch 1 }

OpenNMS /opt/opennms/etc/events
Command Central: /opt/opennms/etc/events/VRTS-cc.events.xml OpsCenter: /opt/opennms/etc/events/VRTSopsctr.events.xml VOM: /opt/opennms/etc/events/VRTSsfm.events.xml

OpenNMS /opt/opennms/etc/events $ diff /opt/opennms/etc/events/VRTS-cc.events.xml /opt/opennms/etc/events/VRTSopsctr.events.xml 2c2 <  --- >  11c11 < <mevalue>0</mevalue> > <mevalue>6</mevalue> 18,19c18,19 < <uei>uei.opennms.org/mib2opennms/ccCritical</uei> < <event-label>VERITAS-COMMAND-CENTRAL-MIB defined trap event: ccCritical</event-label> > <uei>uei.opennms.org/mib2opennms/opsCritical</uei>

OpenNMS <events>
 <event> <mask> <maskelement> <mename>id</mename> <mevalue> </mevalue> </maskelement> <mename>generic</mename> <mevalue>6</mevalue> <mename>specific</mename> <mevalue>1</mevalue> </mask> <uei>uei.opennms.org/mib2opennms/opsCritical</uei>

OpenNMS opsWarning trap received alertRecipients=vsf03p alertSummary=2587 Active Job Completed with Exit Status 40 alertDescription=Alert Raised on: May 25, :13 AM .Job: Tree Type : Server .Tree Name : ALL MASTER SERVERS .Nodes : wayback .Job Policy: Windows_OS-File-Users-Deduped .Exit Status: 40 (network connection broken) .Client: mfp01 .New State: Done .Alert Policy: Job Failed .OpsCenter Server: vsf02 . policyName=Job Failed objectType= collectorName= opsHost=vsf02 sourceId= opsObject= sampleData= opsAlertSeverity=Warning opsAlertTime=Fri May 25 01:13:34 MST 2012

Description A Warning alert trap from Symantec OpsCenter. alertRecipients hsphxvsf03p; alertSummary 2587 Active Job Completed with Exit Status 40; alertDescription Alert Raised on: May 25, :13 AM .Job: Tree Type : Server .Tree Name : ALL MASTER SERVERS .Nodes : wayback .Job Policy: Windows_OS-File-Users-Deduped .Exit Status: 40 (network connection broken) .Client: mfp01 .New State: Done .Alert Policy: Job Failed .OpsCenter Server: vsf02 .; policyName Job Failed; objectType ; collectorName opsHost hsphxvsf02; sourceId opsObject sampleData opsAlertSeverity Warning; opsAlertTime Fri May 25 01:13:34 MST 2012;

OpenNMS Set up Alerts

VOM No need to add recipients
Can call custom scripts in response to events notifications SNMP Can call commands in response to events

VOM Lots of options

VOM There are easier ways, but we’ll look at this anyway

VOM A Little Easier (maybe too late?)

VOM MIB /opt/VRTSsfmcs/config/snmp/VRTSsfm.mib

VOM sfmAlertVarGrp OBJECT-GROUP OBJECTS {alertTime, ruleName,
alertTopic, alertSeverity, alertSource, alertMessage, alertDescription, recommendedAction, classificationName, alertUserDefinedData} STATUS current DESCRIPTION "Trap Variable group" ::= { viptraps 3 }

VOM Major [+] [-] 6/13/12 16:00:46 [<] [>] 159.36.2.202 [+] [-]
6/13/12 16:00:46 [<] [>] [+] [-] uei.opennms.org/vendor/symantec/traps/sfmAlertFS [+] [-] Edit notifications for event sfmAlertFS trap received alertTime= ruleName=High Usage on Filesystem alertTopic=event.alert.vom.vm.fs.highusage.warn alertSeverity=3 alertSource=hsdev11 alertMessage=File system /CIST/u02 violated high usage warn threshold alertDescription=File system /CIST/u02 violated high usage warn threshold recommendedAction=none classificationName=fs alertUserDefinedData={"fssize": ,"volsize": ,"mountpoint":"/CIST/u02","dgname":"CIST_dg","volname":"CIST_u02","fsid":"{6b356ea8-aa1c-11df-bc ee42}","fstype":"vxfs"}

VOM sfmAlertFS trap received alertTime= ruleName=High Usage on Filesystem alertTopic=event.alert.vom.vm.fs.highusage.warn alertSeverity=3 alertSource=dev11 alertMessage=File system /CIST/u02 violated high usage warn threshold alertDescription=File system /CIST/u02 violated high usage warn threshold recommendedAction=none classificationName=fs alertUserDefinedData={"fssize": ,"volsize": ,"mountpoint":"/CIST/u02","dgname":"CIST_dg","volname":"CIST_u02","fsid":"{6b356ea8-aa1c-11df-bc ee42}","fstype":"vxfs"}

VOM Description SFM alert FS alertTime 1339628438; ruleName
High Usage on Filesystem; alertTopic event.alert.vom.vm.fs.highusage.warn; alertSeverity 3; critical(1) error(2) warning(3) info(4) alertSource dev11; alertMessage File system /CIST/u02 violated high usage warn threshold; alertDescription recommendedAction none; classificationName fs; alertUserDefinedData {"fssize": ,"volsize": ,"mountpoint":"/CIST/u02","dgname":"CIST_dg","volname":"CIST_u02","fsid":"{6b356ea8-aa1c-11df-bc ee42}","fstype":"vxfs"};

Others Command Central

Others Command Central /opt/VRTSccs/VRTSamccs/SNMP/CC/VERITAS-REG.mib
/opt/VRTSccs/VRTSamccs/SNMP/CC/VERITAS-TC.mib /opt/VRTSccs/VRTSamccs/SNMP/CC/VRTS-cc.mib

Others Learn your alert reporting tools
Learn your SNMP software, even if you’re not the administrator of it.

Questions

Alerts and Monitors w/ SNMP

Similar presentations

Presentation on theme: "Alerts and Monitors w/ SNMP"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Alerts and Monitors w/ SNMP

Similar presentations

Presentation on theme: "Alerts and Monitors w/ SNMP"— Presentation transcript:

Similar presentations

About project

Feedback