Jeff Sly Principal IT Architect

Slides:



Advertisements
Similar presentations
How We Manage SaaS Infrastructure Knowledge Track
Advertisements

This course is designed for system managers/administrators to better understand the SAAZ Desktop and Server Management components Students will learn.
Steve Lewis J.D. Edwards & Company
Complete Event Log Viewing, Monitoring and Management.
Presentation Date Top Down Performance Management with OEM Grid Control Or how I learned to stop worrying and love OEM Grid Control 10/1/2010 John Darrah.
powerful network monitoring & management solution
ONE STOP THE TOTAL SERVICE SOLUTION FOR REMOTE DEVICE MANAGMENT.
Introduction to Systems Management Server 2003 Tyler S. Farmer Sr. Technology Specialist II Education Solutions Group Microsoft Corporation.
ActiveXperts Network Monitor Monitors servers, workstations and devices for availability Alerts and corrects.
Complete Event Log Viewing, Monitoring and Management.
A comparison of MySQL And Oracle Jeremy Haubrich.
Integrating The Datacenter OpalisRobot MOM Operator.
ActiveXperts Network Monitor Monitors servers, workstations and devices for availability Alerts and corrects.
Windows Deployment Services WDS for Large Scale Enterprises and Small IT Shops Presented By: Ryan Drown Systems Administrator for Krannert.
1 CHEP 2000, Roberto Barbera Roberto Barbera (*) Grid monitoring with NAGIOS WP3-INFN Meeting, Naples, (*) Work in collaboration with.
MCTS GUIDE TO MICROSOFT WINDOWS 7 Chapter 10 Performance Tuning.
Keeping our websites running - troubleshooting with Appdynamics Benoit Villaumie Lead Architect Guillaume Postaire Infrastructure Manager.
VxWorks Real-Time Kernel Connectivity
ManageEngine® Applications Manager
Manageware For Documentum ESI SOFTWARE 2006
1 of 19 ManageWare for DB2 Connect ESI SOFTWARE 2006
ManageEngine TM Applications Manager 8 Monitoring Custom Applications.
1 ECM System Monitor in the CMOD Environment. © 2013 IBM Corporation Enterprise Content Management IBM ECM System Monitor Improve Availability / Lower.
SMS Gateway OZEKI NG Document version: v Adding SMS functionality to SysAid.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
VMware vCenter Server Module 4.
ABC Co. Network Implementation High reliability is primary concern – near 100% uptime required –Customer SLA has stiff penalty clauses –Everything is designed.
Take advantage of the SMS technology in your organization today!
SmartLog X 3 TEAM Basic SmartLog X 3 TEAM Basic DescoEMIT.com USER STATUS USER EDIT TEST LOG ADMIN TEST MACHINE SCHEDULE INSTALL System Requirements:
Acceleratio Ltd. is a software development company based in Zagreb, Croatia, founded in We create innovative software solutions for SharePoint,

MCTS Guide to Microsoft Windows 7
Reducing TCO with Lifecycle Management
WhatsUp Gold v15 – WhatsUp Companion 3.7 WhatsUp Companion Extended
Copyright ®xSpring Pte Ltd, All rights reserved Versions DateVersionDescriptionAuthor May First version. Modified from Enterprise edition.NBL.
Josh Riggs Utilizing Open Source Network Monitoring.
TrueSight vs Nagios & Foglight
Enterprise PI - How do I manage all of this? Robert Raesemann J Jacksonville, FL.
SIOS – Comprehensive High Availability Options for your VMware Environment.
workshop eugene, oregon What is network management? System & Service monitoring  Reachability, availability Resource measurement/monitoring.
Learningcomputer.com SQL Server 2008 – Administration, Maintenance and Job Automation.
©2002 Allen Systems Group, Inc. All Rights Reserved. by Scott Webb, ASG Senior Sales Engineer by Scott Webb, ASG Senior Sales Engineer ASG-sys*ADMIRAL.
Maintaining Large Vista Installations Amy Edwards, Ezra Freelove, & George Hernandez July 12, 2007.
Maintaining Large Vista Installations Amy Edwards, Ezra Freelove, & George Hernandez July 12, 2007.
Using Novell GroupWise ® 6 Monitor Duane Kuehne Software Engineer Novell, Inc. Danita Zanre Senior Consultant NSC Sysop,
Usenix Annual Conference, Freenix track – June 2004 – 1 : Flexible Database Clustering Middleware Emmanuel Cecchet – INRIA Julie Marguerite.
SQLRX – SQL Server Administration – Tips From the Trenches SQL Server Administration – Tips From the Trenches Troubleshooting Reports of Sudden Slowdowns.
IT Priorities Minimize CAPEX Maximize employee productivity Grow the business Add new compute resources real- time to support growth Meet compliance requirements.
System Center Lesson 4: Overview of System Center 2012 Components System Center 2012 Private Cloud Components VMM Overview App Controller Overview.
Product Presentation. SysKit By Acceleratio Acceleratio Ltd. is a software development company based in Zagreb, Croatia, Europe founded in Technology.
Microsoft Partner Conference Integrated Innovation Don Kerr Partner Technology Specialist.
Final Presentation Smart-Home Smart-Switch using Arduino
Internship with Contemporary Technologies (Remote DBA Experts) Jenna LuttonFebruary 1, 2007.
ITMT 1371 – Window 7 Configuration 1 ITMT Windows 7 Configuration Chapter 8 – Managing and Monitoring Windows 7 Performance.
Queensland University of Technology Nagios – an Open Source monitoring solution and it’s deployment at QUT.
 1- Definition  2- Helpdesk  3- Asset management  4- Analytics  5- Tools.
I/Watch™ Weekly Sales Conference Call Presentation (See next slide for dial-in details) Andrew May Technical Product Manager Dax French Product Specialist.
Application or server monitoring
Integrating ArcSight with Enterprise Ticketing Systems
InGenius Connector Enterprise Microsoft Dynamics CRM
Consulting Services JobScheduler Architecture Decision Template
Campus Monitoring Service
Introduction of Week 6 Assignment Discussion
Introduction of Week 3 Assignment Discussion
ManageEngine® Applications Manager
Get to know SysKit Monitor
SENTRY SOFTWARE Extending BMC ProactiveNet Performance Management with
Microsoft Virtual Academy
Presentation transcript:

Case Study Nagios @ Nu Skin Jeff Sly Principal IT Architect jsly@nuskin.com Isn’t Great to Be Here at the 1st Nagios World Conference, Ethan has done a great job with Nagios!

Who is in the Audience? How many of you are: Suppliers of Nagios or some value add-on for Nagios? Customers using Nagios? Just implementing Nagios or expanding implementation? Using NagiosXI? I would like to introduce some of our Nu Skin folks here that work on Nagios. Nate Broderick Nagios Systems Engineer Scott McWhorter Production Support

Who is Nu Skin? Direct Selling - Lotions and Potions (or supplements), also Nourish the children

Our Technology Footprint Ecommerce – Home grown Applications – Java, EJB, ABAP, .Net Databases – Oracle, MySQL, MSSQL OS – HPUX, Redhat, Windows, VMWare ERP – SAP Supply Chain, CRM, FI Datacenters – 6 locations in 6 countries Offices – 50 Countries

Monitoring Goals Monitoring presents operations with a completely integrated global view. Good monitoring is proactive; it helps teams prevent problems from becoming outages. Good monitoring helps minimize outage downtime, quickly identify root cause and contacts correct people.

Centralized Monitoring System

Our Monitoring History We tried for 10 years… We tried for 10 years…

Do it all in ‘One Tool Projects’ One Monitoring Tool to rule them all: Mercury SiteScope Remedy Help Desk HP OpenView Quest Foglight Home grown (several) One monitoring person He decided to quit!

Could never get everything All Failed – We always gave up! Why? Servers and agents that were proprietary Huge foot print inefficient performance Steep learning curve Very expensive Updates costly and very time consuming System Administrators like their own scripts, can see what they are doing

Resulting Monitoring Issues Tried to make Operations clearing house for all warnings and alerts from 10+ tools Operations was overwhelmed Took 4 process steps and lots of software to notify of critical failures Most Administrators setup own private monitoring to receive warnings Many false notifications Late notifications

As Is (start of project) Our Business Customers were Unhappy

Old Monitoring Work Flow Four steps to notify system administrator

Step 1: Everything Emails Operations Email HelpDesk Error Network HP NNM System Scripts Everything = Critical Errors, Warnings, Information, False alerts Nagios Database Foglight SiteScope 8 BAC Sitescope 6

Step 2: Operations Opens Email Email HelpDesk Error Network HP NNM System Scripts Remember there are lots of emails, critical emails buried in with the rest. Nagios Database Foglight SiteScope 8 BAC Sitescope 6

Step 3: Operations Checks Source Email HelpDesk Error Network Foglight System Scripts BAC HP NNM SiteScope 8 Sitescope 6 Nagios Database Decide if they think this is a critical problem, often a junior person trying to decided

Step 4: Operations Calls admin Email HelpDesk Error Network Foglight System Scripts BAC HP NNM SiteScope 8 Sitescope 6 Nagios Database

Inventory of Existing Checks Regular Expression found on Web Page Monitoring HTTP Check - Up or Down Ping Host Up or Down PORT monitoring FTP checking SMTP checking SNMP monitoring - no trap catching yet Radius DNS monitoring Disk Space monitoring CPU and Load Average monitoring Memory Monitoring

Inventory of Existing Checks Service monitoring Transaction monitoring - page load times – performance graph Website click through (Webinject not working) Log File monitor –parse for Errors Java HEAP, Thread, Threadlock monitoring Apache thread and worker count monitors Ecommerce shop monitors Email can send and receive SQL query ODBC (catalog ODBC had bugs) Later I show which of these we did in Nagios and how.

To Be Happy Customers

Key Ideas MoM Tool Requirements Shared Ownership Lowest Level Nagios Monitor Method

Idea 1: MoM Our first “break though” was the idea that even through we needed a centralized view for all monitoring that did not mean all monitoring had to be done by one monitoring tool. We had to pick a “Manager of the Monitors” (MoM) to bring together the best of breed monitoring.

MoM - according to Gartner

Idea 2: Tool Requirements Open – not proprietary and closed Mainstream – wanted good native support and strong community Interface – to 3rd Party Monitoring Flexible – adapt to many types of monitoring Efficient – minimal foot print on production servers, not chatty on network Notification – granular control Reliable – good clean architecture Usability – GUI interface, reporting

Idea 3: Shared Ownership Core team Operation of Monitoring Environment: backups, upgrades, & custom plug-ins Monitoring Experts Training Monitoring leads in Development & Admin teams: Set up own monitors Keep own monitors current Adjust monitors If something is not monitored not core teams fault

Operations Owned Monitoring Email HelpDesk Error Network Foglight System Scripts BAC HP NNM SiteScope 8 Sitescope 6 Nagios Database

Team Leads Own Monitoring Operations Network System Scripts SAP Asia Europe Web Database

How to Guides

How to Setup NRPE - HPUX

Idea 4: Lowest Level Handle alerts at the lowest possible level in the organization Only forward alerts if not handled at lower levels before they become critical

Handle events at lowest level Operations Network System Scripts SAP Asia Europe Web Database

Only forward unhandled alerts Network System Scripts SAP Asia Europe Web Database

Idea 5: Nagios Monitor Method Choose the Nagios Monitoring Method Active Check from Nagios Server (normal) Active Check performed by remote client NRPE, NSClient Passive Check – Listen to 3rd party monitors NSCA

Active Local Check Web HTTP or Ping Nagios Unix Win DB DB Monitor

Active Remote Check - UX Web Nagios CPU, RAM (NRPE) Unix Win DB DB Monitor

Active Remote Check - Win Web Nagios CPU, RAM (NSClient) Unix Win DB DB Monitor

Passive 3rd Party Alert Web Nagios 3rd Party Alert NSCA Unix DB Win DB Monitor 3rd Party Check DB

Bonus Idea - Tune Tune the database Add Ram Drive

Tune the Database Modify contents of the /etc/my.cnf [mysqld] section. tmp_table_size=524288000 max_heap_table_size=524288000 table_cache=768 set-variable=max_connections=100 wait_timeout=7800 query_cache_size = 12582912 query_cache_limit=80000 thread_cache_size = 4 join_buffer_size = 128K http://web3us.com Info on: MySQL Tuning, Nagios Tuning

RAM Drive Create a RAM disk for Nagios tempory files I created a ramdisk by adding the following entry to the /etc/fstab file: none /mnt/ram tmpfs size=500M 0 0 Mount the disk using the following commands # mkdir -p /mnt/ram; mount /mnt/ram Verify the disk was mounted and created # df -k Modify the /usr/local/nagios/etc/nagios.cfg file with the following tuned parameters temp_file=/mnt/ram/nagios.tmp temp_path=/mnt/ram status_file=/mnt/ram/status.dat precached_object_file=/mnt/ram/objects.precache object_cache_file=/mnt/ram/objects.cache

Implementation Methodology Site Survey Inventory existing monitors Proof of concept Build new environment Migrate monitors from each platform to Nagios, one at a time Integrate OEM, and to send monitors to Nagios

Three Project Phases Deliver something useful in each phase Build a level at a time

Phase I Set up a pilot of Nagios XI using Trial License. Set up Foglight monitoring of JVM (Java Virtual Machine). Purchase NagiosXI and Consulting Support Bring in a consultant for two weeks to help set up the architecture and help us work with the system. Documentation Web Site for Nagios learning's and “How to guides” Define a set of standards and guidelines to follow to help aid an effective monitoring process. Backups on Running on Production Nagios Server Set up services which aren't being caught right now and move a few of the important services over to the new Nagios XI monitoring system. Test Nagios plugins and server performance

Phase II Migrate off of Sitescope 6 and shutdown Decommission Foglight Clean up the old monitoring server Migrate the network team from old Nagios to core NagiosXI system Set up standby NagiosXI system, cron to replicate weekly Research missing alerts and add them to the new NagiosXI system

Phase III Implement Global Monitoring Add monitors for existing international systems Add monitors using JMX to monitor Java servers Nagios Remote Process Execution (NRPE) to monitor remotely Remote Monitoring for Windows Servers (NS Client++) Implement notification and escalation of alerts Add monitors for critical business functions

Phase III continued… Corporate Enhancements Request recurring down time enhancement from Ethan Galstad Automate refresh of NagiosXI standby system Build Network Map Retire Windows SiteScope Add monitors for phone systems Add monitors to data center (UPS, Temperature, Humidity) Integrate to SAP Tidal monitoring tool

Phase III continued… Business Business review and approve SLA (using business terms) Monitor both the Business Functions and the individual point devices that provide the Business Function Follow the Sun with Eyes on Glass. Training How to setup alerts How to receive alerts How to report on performance graphs Create a new Dashboard for HelpDesk and International IT Staff

Inventory of Monitor Checks

Inventory continued…

Nagios XI Interface

Data Centers in 7 Countries

IT Operations Goal Quick Notification & Recovery from Outage Type of Monitor Notification of outages with details on which system is down, so we know who to contact Solution Migrate from Sitescope, Openview to NagiosXI

IT Team Managers Goal Prevention of outage Type of Monitor Warnings about conditions before outages occur, allow for corrective actions that will prevent likely outages Solution Migrate from Sitescope, Openview to NagiosXI, Integrate OEM SAP and Scripts with Nagios

MoM ~ Manager of Managers Tool Requirements, enough but not all Summary MoM ~ Manager of Managers Allow specialized tools Tool Requirements, enough but not all Ownership for implementation, shared Handle alerts, lowest level in organization Choose Nagios monitoring method

Nagios XI Large Implementation Day 3, 2:00 Track 3 (Nate Broderick) Tips, Tricks & Demos Nagios XI Large Implementation Day 3, 2:00 Track 3 (Nate Broderick) 3 Demos Performance challenges and solutions Integrating monitoring solutions Oracle Migrating from BAC & Foglight Customization Graphing, and more.