We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byMatthew Kennedy
Modified over 3 years ago
Nagios in the Real World Dave Williams Technical Architect
2©Bull, 2011Presentation Title Agenda
3©Bull, 2011Presentation Title Agenda - Introduction - General Background - System Monitoring Background - Example Implementations of Nagios - UK Customer Examples - Datacentre Monitoring with Nagios - What is a Datacentre ? - Software & Hardware combinations - Vision - Conclusions
4©Bull, 2011Presentation Title Background - UK based - Mainframe (IBM & Honeywell) - Unix (HP-UX, AIX, Solaris) - Network (CASE, 3COM, CISCO) - Working for Bull - French Computer Manufacturer - Mainframes, Unix, HPC, Security, Managed Services
5©Bull, 2011Presentation Title Background - System Monitoring - OpenView - Netview - Open Master - Open Source Monitoring - NetSaint on AIX - Nagios
6©Bull, 2011Presentation Title Example Implementations
7©Bull, 2011Presentation Title Crown Office Procurator Fiscal Service - Responsible for the prosecution of crime in Scotland - Investigation of suspicious deaths - Complaints against the Police - IT Locations in Glasgow & Edinburgh - Windows at every Courts of Justice in Scotland - AIX / Oracle DB at Glasgow & Edinburgh
8©Bull, 2011Presentation Title Crown Office Procurator Fiscal Service - Already used Solarwinds for some network monitoring - Strategy demanded AIX based monitoring & reporting - In a competitive tender Nagios selected - Main success points were – simplicity, ease of customisation - Fitted within AIX based distance data replication already in use
9©Bull, 2011Presentation Title Crown Office Procurator Fiscal Service Windows systems monitored for CPU, Disk Space etc - 2 AIX servers monitored for CPU, Disk Space etc - Two Oracle Instances monitored for performance and DBspace usage - All alerts shown on monitor screen and if necessary SMS Text alerts - Installed 2005, still working - Provides backstop to Solarwinds for capacity monitoring on the WAN & LAN.
10©Bull, 2011Presentation Title Rother District Council - Working with the community to improve the overall well-being of the District - Responsible for Waste Collection, Housing, Planning & Building Control - The District covers some 200 square miles and serves a population of around 90,000 inhabitants.
11©Bull, 2011Presentation Title Rother District Council - Monitoring 20+ Windows Servers for CPU, Disk Utilsation etc - Monitoring numerous disparate Applications - Reporting on Availability - Monitoring Printer status - Unexpected benefits
12©Bull, 2011Presentation Title North Yorkshire County Council - Internet Access system for 30,000 pupils - Monitoring , internet access, IDS, AV, Webservers - Reporting on Availability - Monitoring Service Level Indicators - Mix of application providers (Scalix, Plesk) - Mix of appliance systems – Cisco, Panda, Radware, NetEnforcer, MyFilter
13©Bull, 2011Presentation Title North Yorkshire County Council - System Schematic
14©Bull, 2011Presentation Title North Yorkshire County Council - Uses NRPE to perform active checks on hosts - Multi O/S support - Debian - RedHat - Uses NSCA to accept check results from Windows - Via NagiosEventLog
15©Bull, 2011Presentation Title North Yorkshire County Council - - Scalix running on Redhat Cluster. Checking all processes, cluster state etc. - PLESK Web server - Checking availability of web sites via test installation - Monitoring disk utilsation and processor utilisation - AV systems - Monitoring availability - Checking on AV database - Myfilter - Monitoring filters running - Checking that sufficient filters are available
16©Bull, 2011Presentation Title North Yorkshire County Council - - Nagios server runs external loopback test every 20 minutes to confirm external reachability. - PLESK Web server - Straightforward implementation of check_http - NetBackup - Monitoring that backups have run - Checking that enough backup tapes are available - Business Availability - Define which services constitute a business line - 07:00 check – tell support before the customers come on line
17©Bull, 2011Presentation Title NYCC - Nagiosgraph - Nagiosgraph - Uses process_performance _data - Example of Unix load average
18©Bull, 2011Presentation Title NYCC – Nagios Monitoring - Scalix System
19©Bull, 2011Presentation Title NYCC - Alerts sent via to customers as well as support - Backup notifications via SMS Text - Use Nagios Looking Glass for Customer View - nagiosgraph used to catch all service performance data - Debian & Redhat perfomance metrics - Network throughput from LAN switches - LDAP response time
20©Bull, 2011Presentation Title Datacentre Monitoring with Nagios
21©Bull, 2011Presentation Title What is a DataCentre ? - A data center (or datacentre) is a facility used to house computer systems and associated components, such as telecommunications and storage systems. It generally includes redundant or backup power supplies, redundant data communications connections, environmental controls and security devices. (Wikipedia)
22©Bull, 2011Presentation Title How good is your DataCentre ? - The TIA-942:Data Center Standards Overview describes the requirements for the data centre infrastructure. The simplest is a Tier 1 data centre, which is basically a server room, following basic guidelines for the installation of computer systems. The most stringent level is a Tier 4 data centre, which is designed to host mission critical computer systems, with fully redundant subsystems and compartmentalized security zones controlled by biometric access controls methods.TIA-942:Data Center Standards Overviewserver room biometric (Wikipedia)
23©Bull, 2011Presentation Title What is a DataCentre ? - Tier 1 Requirements - Single non-redundant distribution path serving the IT equipment - Non-redundant capacity components - Basic site infrastructure guaranteeing % availability - Tier 2 Requirements - Fulfills all Tier 1 requirements - Redundant site infrastructure capacity components guaranteeing % availability - Tier 3 Requirements - Fulfills all Tier 1 and Tier 2 requirements - Multiple independent distribution paths serving the IT equipment - All IT equipment must be dual-powered and fully compatible with the topology of a site's architecture Concurrently maintainable site infrastructure guaranteeing % availability - Tier 4 Requirements - Fulfills all Tier 1, Tier 2 and Tier 3 requirements - All cooling equipment is independently dual-powered, including chillers and heating, ventilating and air-conditioning (HVAC) systems - Fault-tolerant site infrastructure with electrical power storage and distribution facilities guaranteeing % availability - ©Uptime Institute
24©Bull, 2011Presentation Title What is a Green DataCentre ? - The most commonly used metric to determine the energy efficiency of a data centre is power usage effectiveness, or PUE. This simple ratio is the total power entering the data centre divided by the power used by the IT equipment.power usage effectiveness - PUE = Total facility Power / IT Equipment Power - Power used by support equipment, often referred to as overhead load, mainly consists of cooling systems, power delivery, and other facility infrastructure like lighting. The average data centre in the US has a PUE of 2.0, meaning that the facility uses one Watt of overhead power for every Watt delivered to IT equipment. State-of-the-art data centre energy efficiency is estimated to be roughly 1.2.
25©Bull, 2011Presentation Title Bull Datacentre BC1 ? - New datacentre build on an already existing site - Design criteria PUE Easily expanded on demand - Tier 3
26©Bull, 2011Presentation Title Bull UK Datacentre BC1 - What do you get for £1.2m ?
27©Bull, 2011Presentation Title Bull UK Datacentre BC1 - New Mains Incomer - Took feed from 11Kv ring - Had to build own substation Had to build own substation - 1.2Mw Generator - Required 8000 litre fuel tank - Switchgear to automatically start generator if mains incomer fails (10-45 seconds) - 3 x Ambient CRAC Units - Cooling via external temperature differential - N+1 configuration - Hot Aisle Containment - In-Line UPS - UPS only required to keep IT equipment running until generator fires up - Uses space in Cab rows, easily scalable according to load
28©Bull, 2011Presentation Title Bull UK Datacentre BC1 - Monitoring - Physical Environment - APC Netbotz Devices Translate inputs from sensors Humidity, Temperature, Dew Point - SEAL I/O Dry Contact Voltage indicators For CRAC, FM200, Generator, UPS - Electrical Efficiency - PowerLogic ION software reads from power meters - Power meter on every Distribution Board - Real-time calculation of PUE - Power Distribution - Every PDU strip (2 per Cab) monitored for power consumption & problems - A number of PDU strips also have remote control down to socket level - Management Network - LAN infrastructure required to support the Datacentre - Servers required to support the datacentre - External alert mechanisms
29©Bull, 2011Presentation Title Bull UK Datacentre BC1 - What does Netbotz look like ?
30©Bull, 2011Presentation Title Bull UK Datacentre BC1 - What does SeaLevel look like ?
31©Bull, 2011Presentation Title Bull UK Datacentre BC1 - What does ION look like ? What does ION look like
32©Bull, 2011Presentation Title Bull UK Datacentre BC1 - What does a metered PDU look like ?
33©Bull, 2011Presentation Title Bull UK Datacentre BC1 - What does a managed PDU look like ?
34©Bull, 2011Presentation Title Bull UK Datacentre BC1 - Nagios Map
35©Bull, 2011Presentation Title Bull UK Datacentre BC1 - Nagios Host Groups
36©Bull, 2011Presentation Title Bull UK Datacentre BC1 - Do things go wrong - yes
37©Bull, 2011Presentation Title Bull UK Datacentre BC1 - Do things go wrong - yes & no
38©Bull, 2011Presentation Title Datacentre Monitoring Schematic
39©Bull, 2011Presentation Title Nagios Products in use - Nagios Core - NRPE - NSCA - Nagios Looking Glass - Nagvis - EventDB - SNMPTT - Nagmap - NDO
40©Bull, 2011Presentation Title Other Open Source Products in use - Nedi - Arpwatch - PSAD - SMS-Client - Bacula - Confluence (Wiki) - i-doit (ITIL CMDB) - MRTG - Routers2cgi
41©Bull, 2011Presentation Title BC1 Datacentre Monitoring Elements - Nagios Core - Normal install with direct polling of devices - Only looking at Datacentre - Nagios Display System - Central reporting Nagios - Absorbs updates from other Nagios instances - Information Display - Normal system with 5 heads - Nagios Customer System - Running on an appliance connected to Customer network - Sends data via encrypted secured link to Display System - Backup System - Use tape library - Hosts CMDB & WiKi
42©Bull, 2011Presentation Title BC1 Datacentre Nagios Core - Hardware Platform - Intel - O/S Centos 5 - Xeon 2.8Ghz, 8Gb memory, 72GB RAID-1 disk - Nagios Built from source tarball - Nagios Plugins Installed from RPM
43©Bull, 2011Presentation Title BC1 Datacentre Nagios Display System - Hardware Platform - Intel - O/S Fedora Core 9 - P4 2.8Ghz, 2.5Gb memory, 76GB RAID-1 disk - Nvidia dual monitor display Card – DVI interfaces - Nagios Built from source tarball - Nagios Plugins Installed from RPM
44©Bull, 2011Presentation Title BC1 Datacentre Normal Display System - Hardware Platform - AMD - O/S Centos 5 - Athlon 1.2Ghz, 1.0 Gb memory, 3GB disk - Matrox G200 Quad Head - Runs console displays – http/RDP/ssh
45©Bull, 2011Presentation Title BC1 Datacentre Customer System - Hardware Platform – Motion Tablet - O/S Ubuntu LTS - Pentium M 1.5Ghz, 0.5 Gb memory, 30GB disk - Touch Screen tablet system - Nagios Built from tarball - Nagios Plugins Built from tarball - Nagios NSCA - Sends status (encrypted) to central reporting system
46©Bull, 2011Presentation Title BC1 Datacentre Backup System - Hardware Platform – Intel - O/S Centos 5 - Xeon 3.06Ghz, 2.0 Gb memory, 108GB disk - Uses Bacula Controls SDLT 20 slot tape library - Backs up all Datacentre Infrastructure Windows Centos Ubuntu
47©Bull, 2011Presentation Title Conclusions
48©Bull, 2011Presentation Title Conclusions - Strategic Overall Design - Know what you need to monitor - Know who needs to be told - Expect to throw the first version away - Only when you have fully engineered the solution will you understand all of the issues - Keep a record of design decisions - You will have to make it pretty for management - Accept that an attractive display will be required - Reporting will become key - It must be reliable - Make backups - Consider clustering & recovery options
49©Bull, 2011Presentation Title & Hints
50©Bull, 2011Presentation Title Hints & Experience - Separate Display systems from Monitoring systems - If you are tracking 10,000s of services you dont want processor heavy graphics as well - Escalation & Alerting take time - Firstly to get right with your organisation - Secondly to actually physically do ! - Suppliers go out of their way to make it difficult - Dont give in – there is always a way to get Nagios involved - Screen scrape, , telnet,RS232 are all possible - SNMP is your friend - When in doubt use SNMP to help you out - SNMP V3 with AES cypher is suitably secure for most implementations
52©Bull, 2011Presentation Title
53©Bull, 2011Presentation Title
BMU - E I 1 Development of renewable energy sources in Germany in
BMU – KI III 1 Development of renewable energy sources in Germany in
C Copyright © 2005, Oracle. All rights reserved. Practice Solutions.
Break Time Remaining 10:00. Break Time Remaining 9:59.
13:00 Clock will move after 1 minute PPT – VCIC Timer 15.ppt.
PP Test Review Sections 6-1 to 6-6 Mrs. Rivas 1. 2.
Time for a BREAK! You have 45 Minutes. Time Left 44.
© Tally Solutions Pvt. Ltd. All Rights Reserved Shoper 9 License Management December 09.
DLMSO Classroom Timer Select a time to count down from the clock above 60 min 45 min 30 min 20 min 15 min 10 min 5 min or less.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
×1= 9 4 1×1= 1 5 8×1= 8 6 7×1= 7 7 8×3= 24.
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
Sample Service Screenshots Enterprise Cloud Service 11.3.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 11Slide 1 Chapter 11 Distributed Systems Architectures.
5 minutes 4 minutes 3 minutes 2 : 00 1 : 59 1 : 58.
1 Turing Machines. 2 A Turing Machine Tape Read-Write head Control Unit.
Adding Up In Chunks. Category 1 Adding multiples of ten to any number.
Local Customization Chapter 2. Local Customization 2-2 Objectives Customization Considerations Types of Data Elements Location for Locally Defined Data.
3 : 00 2 : 59 2 : 58 2 : 57 2 : 56 2 : 55 2 : 54.
Chapter 13 Fluids Physics for Scientists & Engineers, 3 rd Edition Douglas C. Giancoli © Prentice Hall.
Copyright © Action Works 2008 All Rights Reserved - Photos by David D. Kempster 1.
SLP – Endless Possibilities What can SLP do for your school? Everything you need to know about SLP – past, present and future.
Petersons Practice AP Exam Cengage Learning Infotrack Database.
Numbers Treasure Hunt
Threads, SMP, and Microkernels Chapter 4 1. Process Resource ownership - process includes a virtual address space to hold the process image Scheduling/execution-
Import Tracking and Landed Cost Processing An Enhancement For AS/400 DMAS from Copyright I/O International, 2001, 2005, 2008, 2012 Skip Intro Version.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 6 Processes and Operating Systems.
1 Titre de la diapositive SDMO Industries – Training Département MICS KERYS 09- MICS KERYS – WEBSITE.
Operating Systems Operating Systems - Winter 2010 Chapter 3 – Input/Output Vrije Universiteit Amsterdam.
PSSA Preparation. Question 1(no calculator) D Question 2 (no calculator)
© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Chapter 1: Introduction to Scaling Networks Scaling Networks.
Process a Customer Chapter 2. Process a Customer 2-2 Objectives Understand what defines a Customer Learn how to check for an existing Customer Learn how.
1 Budapest University of Technology and Economics, BME, 1872 Budapest University of Technology and Economics, BME, 1872 Happy New Year 2012.
1 © Bull, 2014 October 14th 2014 Dave Williams Technical Architect Multi-Tenant Nagios Monitoring.
David Burdett May 11, 2004 Package Binding for WS CDL.
CALENDAR NEW CALENDAR
AP STUDY SESSION 2. Answers 1.A 2.E 3.A 4.D 5.B 6.E 7.B 8.E 9.A 10.D 11.C 12.B 13.D 14.B 15.E 16.A 17.E 18.C 19.C 20.D 21.B 22.C 23.A 24.D 25. B 26. E.
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
RXQ Customer Enrollment Using a Registration Agent (RA) Process Flow Diagram (Move-In) Customer Supplier Customer authorizes Enrollment ( )
1 General Info’s about The OLYMPOS ® RETAIL MARKET APPLICATION.
1 BACKGROUND ALARM SYSTEM OF ZXJ10 Training Center Zhongxing Telecom Pakistan (Pvt.) Ltd.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination. Introduction to the Business.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
Chapter 11 Creating Framed Layouts Principles of Web Design, 4 th Edition.
Aviation Management System 1 2 Silver Wings Aircraft Aviation Management System represents a functional “high – end” suite of integrated applications.
© 2017 SlidePlayer.com Inc. All rights reserved.