Monitoring Openstack – The Relationship Between Nagios and Ceilometer

Slides:



Advertisements
Similar presentations
STUDY ON OPENSTACK BY JAI KRISHNA. LIST OF COMPONENTS Introduction Components Architecture Where it is used.
Advertisements

Profit from the cloud TM Parallels Dynamic Infrastructure AndOpenStack.
Cloud Computing Brandon Hixon Jonathan Moore. Cloud Computing Brandon Hixon What is Cloud Computing? How does it work? Jonathan Moore What are the key.
Pankaj Kumar Qinglan Zhang Sagar Davasam Sowjanya Puligadda Wei Liu
1 Security on OpenStack 11/7/2013 Brian Chong – Global Technology Strategist.
FI-WARE – Future Internet Core Platform FI-WARE Cloud Hosting July 2011 High-level description.
24 February 2015 Ryota Mibu, NEC
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 4.
1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC.
Amazon EC2 Quick Start adapted from EC2_GetStarted.html.
System Center 2012 Setup The components of system center App Controller Data Protection Manager Operations Manager Orchestrator Service.
VAP What is a Virtual Application ? A virtual application is an application that has been optimized to run on virtual infrastructure. The application software.
Cloud Computing Why is it called the cloud?.
Cloud Computing All Copyrights reserved to Talal Abu-Ghazaleh Organization
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Opensource for Cloud Deployments – Risk – Reward – Reality
What is Cloud Computing? Cloud computing is the delivery of computing capabilities as a service, making access to IT resources like compute power, networking.
Software to Data model Lenos Vacanas, Stelios Sotiriadis, Euripides Petrakis Technical University of Crete (TUC), Greece Workshop.
Customized cloud platform for computing on your terms !
Passive Monitoring with Nagios Jim Prins
Cloud Computing. What is Cloud Computing? Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable.
An Introduction to IBM Systems Director
Software Architecture
Module 7: Fundamentals of Administering Windows Server 2008.
Cloud Computing & Amazon Web Services – EC2 Arpita Patel Software Engineer.
Presented by: Sanketh Beerabbi University of Central Florida COP Cloud Computing.
Windows Azure Conference 2014 Deploy your Java workloads on Windows Azure.
Mehdi Ghayoumi Kent State University Computer Science Department Summer 2015 Exposition on Cyber Infrastructure and Big Data.
FailSafe SGI’s High Availability Solution Mayank Vasa MTS, Linux FailSafe Gatekeeper
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Jose Castro Leon CERN – IT/OIS CERN Agile Infrastructure Infrastructure as a Service.
OpenStack cloud at Oxford Kashif Mohammad University of Oxford.
Mark E. Fuller Senior Principal Instructor Oracle University Oracle Corporation.
EXPOSING OVS STATISTICS FOR Q UANTUM USERS Tomer Shani Advanced Topics in Storage Systems Spring 2013.
Windows Azure. Azure Application platform for the public cloud. Windows Azure is an operating system You can: – build a web application that runs.
Using Heat to Deploy and Manage Applications in OpenStack Trevor Roberts Jr, VMware, Inc. CNA1763 #CNA1763.
CoprHD and OpenStack Ideas for future.
Ceilometer + Gnocchi + Aodh Architecture
Aneka Cloud ApplicationPlatform. Introduction Aneka consists of a scalable cloud middleware that can be deployed on top of heterogeneous computing resources.
Scaling the CERN OpenStack cloud Stefano Zilli On behalf of CERN Cloud Infrastructure Team 2.
Vignesh Ravindran Sankarbala Manoharan. Infrastructure As A Service (IAAS) is a model that is used to deliver a platform virtualization environment with.
Document Name CONFIDENTIAL Version Control Version No.DateType of ChangesOwner/ Author Date of Review/Expiry The information contained in this document.
OpenStack Chances and Practice at IHEP Haibo, Li Computing Center, the Institute of High Energy Physics, CAS, China 2012/10/15.
EGI-InSPIRE RI EGI Webinar EGI-InSPIRE RI Porting your application to the EGI Federated Cloud 17 Feb
Deploying Highly Available SQL Server in Windows Azure A Presentation and Demonstration by Microsoft Cluster MVP David Bermingham.
Cloud Installation & Configuration Management. Outline  Definitions  Tools, “Comparison”  References.
© 2015 MetricStream, Inc. All Rights Reserved. AWS server provisioning © 2015 MetricStream, Inc. All Rights Reserved. By, Srikanth K & Rohit.
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
OPENSTACK Presented by Jordan Howell and Katie Woods.
PaaS services for Computing and Storage
Prof. Jong-Moon Chung’s Lecture Notes at Yonsei University
Md Baitul Al Sadi, Isaac J. Cushman, Lei Chen, Rami J. Haddad
OpenStack.
Security on OpenStack 11/7/2013
Smart Cities and Communities and Social Innovation
Infrastructure Orchestration to Optimize Testing
Introduction to Cloud Computing
OpenStack Ani Bicaku 18/04/ © (SG)² Konsortium.
OPNFV Arno Installation & Validation Walk-Through
Cloud Computing Dr. Sharad Saxena.
Managing Clouds with VMM
HC Hyper-V Module GUI Portal VPS Templates Web Console
Micro-service Orchestration for Serverless Cloud Computing Cathy Zhang, Distinguished Engineer, Huawei Louis Fourie, Senior Staff Engineer, Huawei.
OpenStack-alapú privát felhő üzemeltetés
Introduction to Apache
Brandon Hixon Jonathan Moore
Cloud computing mechanisms
* Introduction to Cloud computing * Introduction to OpenStack * OpenStack Design & Architecture * Demonstration of OpenStack Cloud.
Future Internet: Infrastructures and Services
OpenStack Summit Berlin – November 14, 2018
Presentation transcript:

Monitoring Openstack – The Relationship Between Nagios and Ceilometer Konstantin Benz, Researcher @ Zurich University of Applied Sciences benn@zhaw.ch

Introduction & Agenda About me Working as researcher @ Zurich University of Applied Sciences OpenStack / Cloud Computing Engaged in monitoring and High Availability systems Currently working on a Europe-wide cloud federation: XIFI – eXtensible Infrastructure for Future Internet http://www.fi-xifi.eu 17 nodes / OpenStack clouds Test environment for Future Internet (FI-WARE) applications Infrastructure for smart cities, public healthcare, traffic management… European-wide L2-connected backbone network Nagios as main monitoring tool of that project

Introduction & Agenda What are you talking about in this presentation? How to use Nagios to monitor an OpenStack cloud environment Integrate Nagios with OpenStack Anything else? Cloud monitoring requirements OpenStack cloud management software and Ceilometer Comparison between Nagios and Ceilometer: Technological paradigms Commonalities and differences How to integrate Nagios with Ceilometer Can't wait!

Cloud Monitoring Requirements Cloud ≈ virtualization + elasticity Types of clouds: IaaS: virtual VMs and network devices, elasticity in number/size of devices PaaS: virtual, elastically sized platform SaaS: software provided by employing virtual, elastic resources Cloud is a collection of virtual resources provided in physical infrastructure Cloud provides resources elastically

Cloud Monitoring Requirements Why should someone use clouds? Cloud consumer can outsource IT infrastructure No fixed costs for cloud consumer Pay for resource utilization Cloud provider responsible for building and maintaining physical infrastructure Cloud provider can rent out unused IT infrastructure Eliminate waste Get money back for overcapacity

Monitoring OpenStack OpenStack Architecture Open source cloud computing software Consists in multiple services: Keystone: OpenStack identity services (authentication, authorization, accounting) Cinder: management of block storage volumes Nova: management and provision of virtual resources (VM instances) Glance: management of VM images Swift: management of object storage Neutron: management of network resources (IPs, routing, connectivity) Horizon: GUI dashboard for end users Heat: orchestration of virtualized environments (important for providing elasticity) Ceilometer: monitoring of virtual resources

Monitoring OpenStack Things to monitor Operation of OpenStack itself: Services: Cinder, Glance, Nova, Swift ... Infrastructure: Hardware, Operating System where OpenStack services are running Operation of virtual resources provided by OpenStack: Resource availability: VMs, virtual network devices Resource utilization: VM uptime, CPU / memory usage → Virtual resources are commonly monitored by Ceilometer → Ceilometer gathers data through the API of OpenStack services

Monitoring OpenStack Why is Ceilometer not enough? → Ceilometer monitors virtual resources through APIs of OpenStack components, BUT NOT operation of the OpenStack components

Comparison Nagios / Ceilometer Nagios operational model Configuration: Check interval (and retry interval) to poll system status and update frontend GUI Remote execution of monitoring clients (usually Nagios plugins) Thresholds that result in "Okay", "Warning", "Critical" status messages which are sent back to Nagios server (and "Unknown" if status not measurable) Main usage: Effective monitoring solution for physical servers System administration console that allows for fast reaction in case of problems Strength: extensibility and customizability Nagios must be extended in order to monitor virtual resources inside administrated systems

Comparison Nagios / Ceilometer Ceilometer operational model Configuration: Polling services check metrics OpenStack objects generate event notifications automatically All events and metrics collected in a database Main usage: OpenStack integrated metrics collector and database Temporal database that can be used for rating, charging and billing of virtual resource utilization Strength: fully integrated in OpenStack, collecting most important metrics and storing their change history Weakness: Does not monitor physical hosts

Nagios / OpenStack Integration Alternative 1: Ceilometer Plugin in Nagios Use Nagios server as frontend for Ceilometer: Nagios plugin that queries Ceilometer database Virtual resource utilization data collected by Ceilometer Nagios server responsible for monitoring non-virtual resources Benefits: Simple and easy to implement No extra Nagios plugins required to monitor virtual devices that are managed within OpenStack Ceilometer tool can be left unchanged Drawbacks: Monitoring data is stored at 2 different places: Nagios flat file and Ceilometer database

Nagios / OpenStack Integration Alternative 1: Ceilometer Plugin in Nagios Implementation: Nagios plugin on client which hosts the Ceilometer API (code sample below) Initialization with default values, OpenStack authentication: #!/bin/bash #initialization with default values SERVICE='cpu_util' THRESHOLD='50.0' CRITICAL_THRESHOLD='80.0' #get openstack token to access ceilometer-api export OS_USERNAME="youruser" export OS_TENANT_NAME="yourtenant" export OS_PASSWORD="yourpassword" export OS_AUTH_URL=http://yourkeystoneurl:35357/v2.0/

Nagios / OpenStack Integration Alternative 1: Ceilometer Plugin in Nagios The plugin should receive paramaters for: Resource to be monitored (VM) Service (Ceilometer metric) Warning threshold Critical threshold while getopts ":hs:t:T:" opt do case $opt in h ) printusage;; r ) RESOURCE=${OPTARG};; s ) SERVICE=${OPTARG};; t ) THRESHOLD=${OPTARG};; T ) CRITICAL_THRESHOLD=${OPTARG};; ? ) printusage;; esac done

Nagios / OpenStack Integration Alternative 1: Ceilometer Plugin in Nagios Query Nova API to get resource to monitor (VM to be monitored): RESOURCE=$(nova list | grep $RESOURCE | tail -2 | head -1 | awk -F '|' '{print $2; end}') RESOURCE=$(echo $RESOURCE) Query metric on that resource, multiple entries possible requires an iterator): ITERATOR=$(ceilometer meter-list -q "resource_id=$RESOURCE" | grep -w $SERVICE | awk 'END{print NR; end}') Initialize with return code 0 (no warning or error): RETURNCODE=0

Nagios / OpenStack Integration Alternative 1: Ceilometer Plugin in Nagios Iterate through metric: for (( C=1; C<=$ITERATOR; C++ )) do METER_NAME=$(ceilometer meter-list -q "resource_id=$RESOURCE" | grep -w $SERVICE | awk -F '|' -v var="$C" '{if (NR == var) {print $2 $1; end}}') METER_UNIT=$(ceilometer meter-list -q "resource_id=$RESOURCE" | grep -w $SERVICE | awk -F '|' -v var="$C" '{if (NR == var) {print $4 $1; end}}') RESOURCE_ID=$(ceilometer meter-list -q "resource_id=$RESOURCE" | grep -w $SERVICE | awk -F '|' -v var="$C" '{if (NR == var) {print $5 $1; end}}') ACTUAL_VALUE=$(ceilometer sample-list -m $METER_NAME -q "resource_id=$RESOURCE" -l 1 | grep $RESOURCE_ID | head -4 | tail -1| awk -F '|' '{print $5; end}')

Nagios / OpenStack Integration Alternative 1: Ceilometer Plugin in Nagios Update return code if value of one metric is above a threshold: if [ $(echo "$ACTUAL_VALUE > $THRESHOLD" | bc) -eq 1 ] then if (( "$RETURNCODE" < "1" )) RETURNCODE=1 fi if [ $(echo "$ACTUAL_VALUE > $CRITICAL_THRESHOLD" | bc) -eq 1 ] if (( "$RETURNCODE" < "2" )) RETURNCODE=2

Nagios / OpenStack Integration Alternative 1: Ceilometer Plugin in Nagios Output return code: STATUS=$(echo "$METER_NAME on $RESOURCE_ID is: $ACTUAL_VALUE $METER_UNIT") echo $STATUS done echo $RETURNCODE

Nagios / OpenStack Integration Alternative 1: Ceilometer Plugin in Nagios Plugin can be downloaded from Github: https://github.com/kobe6661/nagios_ceilometer_plugin.git Additionally: NRPE-Plugin: remote execution of Nagios calls to Ceilometer Install NRPE on Nagios Core server and server that hosts Ceilometer API Change nrpe.cfg to include call to VM metric

Nagios / OpenStack Integration Alternative 1: Implementation OpenStack installed on 3 nodes: Management node: responsible for monitoring other OpenStack nodes Controller node: responsible for management and configuration of cloud resources (VMs, network) Compute node: provisions virtual resources

Nagios / OpenStack Integration Alternative 2: Nagios OpenStack Plugins Nagios as a tool to monitor OpenStack services and VMs: Plugins to monitor health of OpenStack services As soon as new VMs are created, Nagios should monitor them Requires elastic reconfiguration of Nagios Benefits: No data duplication, Nagios is the only monitoring tool required to monitor OpenStack Drawbacks: Elastic reconfiguration Rather complex Nagios configuration

Nagios / OpenStack Integration Alternative 2: Nagios OpenStack Plugins Problem: Dynamic provisioning of resources (Virtual Machines) Dynamic configuration of hosts in Nagios Server required

Nagios / OpenStack Integration Alternative 2: Nagios OpenStack Plugins Problem: What happens if VM is terminated by end user? Nagios assumes a host failure and produces a critical warning

Nagios / OpenStack Integration Alternative 2: Nagios OpenStack Plugins Solution: Nova-API triggers reconfiguration of Nagios if VMs are created or terminated

Nagios / OpenStack Integration Alternative 2: Nagios OpenStack Plugins Another problem: VMs must have Nagios plugins installed when they are created Solution: Use only VM Images that contain Nagios plugins for VM creation OR Use package management tools like Puppet, Chef…

Nagios / OpenStack Integration Alternative 2: Nagios OpenStack Plugins Trigger for dynamic Nagios configuration: Find available resources via nova-api (requires name of host and IP address) #!/bin/bash NUMLINES=$(nova list | wc -l) NUMLINES=$[$NUMLINES-3] for (( C=1; C<=$ITERATOR; C++ )) do VM_NAME=$(nova list | tail -$NUMLINES | awk -F'|' -v var="$I" '{if (NR==var){print $3 $1;end}}') IP_ADDRESS=$(nova list | tail -$NUMLINES | awk -F'|' -v var="$I" '{if (NR==var){print $7 $1;end}}' | sed 's/[a-zA-Z0-9]*[=|-]//g')

Nagios / OpenStack Integration Alternative 2: Nagios OpenStack Plugins Trigger for dynamic Nagios configuration: Create a config file including VM name and IP address from a template (e. g. vm_template.cfg) CONFIG_FILE=$(echo $VM_NAME).cfg sed "s/<vm_name>/$VM_NAME/g" vm_template.cfg>named_template.cfg sed "s/<ip_address>/$IP_ADDRESS/g" named_template.cfg>$CONFIG_FILE Set Nagios as owner of the file and move file to Nagios configuration directory chown nagios.nagios $CONFIG_FILE chmod 644 $CONFIG_FILE mv $CONFIG_FILE /usr/local/nagios/etc/objects/$CONFIG_FILE

Nagios / OpenStack Integration Alternative 2: Nagios OpenStack Plugins Trigger for dynamic Nagios configuration: Add config file to nagios.cfg echo "cfg_file=/usr/local/nagios/etc/objects/$CONFIG_FILE" >> /usr/local/nagios/etc/nagios.cfg Restart nagios service nagios restart

Nagios / OpenStack Integration Alternative 2: Nagios OpenStack Plugins Why restart Nagios? Nagios must know that a new VM is present or that an old VM has been terminated Reconfigure and restart Nagios (!)

Nagios / OpenStack Integration Alternative 2: Nagios OpenStack Plugins Trigger for dynamic Nagios configuration: Add trigger to Nova-API: Nagios Event Broker module: Check_MK: http://mathias-kettner.de/checkmk_livestatus.html Reconfigure Nagios dynamically: Edit nagios.cfg and restart Nagios – bad idea (!!) in a cloud environment Autoconfiguration tools: NagioSQL: http://www.nagiosql.org/documentation.html

Nagios / OpenStack Integration Alternative 2: Nagios OpenStack Plugins What other ways do exist to dynamically reconfigure Nagios? Puppet master that triggers: VMs to install Nagios NRPE plugins and Nagios Server to update its configuration Same can be done with Chef, Ansible… Drawback: Puppet scalability if 1‘000s of servers have to be (de-)commisioned dynamically

Nagios / OpenStack Integration Alternative 2: Nagios OpenStack Plugins What other ways do exist to dynamically reconfigure Nagios? Python fabric with Cuisine to trigger: VMs to install Nagios NRPE plugins and Nagios Server to update its configuration Get list of VMs from novaclient.client import Client nova = Client(VERSION, USERNAME, PASSWORD, PROJECT_ID, AUTH_URL) servers = nova.servers.list() Write VM list to file file = open('servers'‚ 'w') file.write(servers)

Nagios / OpenStack Integration Alternative 2: Nagios OpenStack Plugins What other ways do exist to dynamically reconfigure Nagios? Python fabric with Cuisine to trigger: VMs to install Nagios NRPE plugins and Nagios Server to update its configuration Create fabfile.py and define which servers should be configured from fabric.api import * from . import vm_recipe, nagios_recipe env.use_ssh_config = True servers=open('servers‘) serverlist=[str(line) for line in servers] env.roledefs = {‘vm': serverlist, ‘nagios_server': xx.xx.xx.xx }

Nagios / OpenStack Integration Alternative 2: Nagios OpenStack Plugins Assign recipes @roles(„vm") def configure_vm(): vm_recipe.ensure() @roles(„nagios") def configure_nagios(): nagios_recipe.ensure()

Nagios / OpenStack Integration Alternative 2: Nagios OpenStack Plugins Create vm_recipe.py and nagios_recipe.py from fabric.api import * import cuisine def ensure(): if not is_installed(): puts("Installing NRPE...") install() else: puts(„NRPE already installed") def install_prerequisites(): cuisine.package_ensure(„nrpe")

Choice of Alternatives Which option should we choose? Implementation advantages and drawbacks Implementation Advantages Drawbacks A1: Ceilometer collects data Very easy solution Scales well Data duplication Two monitoring systems working in parallel A2: Shell script No data duplication Easy solution Difficult to maintain Possibly insecure Nagios is forced to restart A2: Puppet Automatic VM and Nagios configuration Allows for elastic reconfiguration of Nagios Heavyweight Bad scalability for large IaaS clusters A2: Python fabric & cuisine Lightweight Bigger configuration effort for package management with strong dependencies between packages

Conclusion What did you talk about? How to use Nagios to monitor an OpenStack cloud environment Cloud monitoring requirements: Elasticity, dynamic provisioning of virtual machines OpenStack monitoring tools Nagios and Ceilometer Nagios as extensible monitoring system Ceilometer captures data through Nova-API Nagios/OpenStack integration Alternative 1: Ceilometer monitors VMs with Nagios as graphical frontend Alternative 2: Nagios monitors VMs and is automatically reconfigured Discovered need for dynamic reloading of Nagios configuration Discussed advantages/drawbacks of different implementations

Questions? Any questions? Thanks!

The End Konstantin Benz benn@zhaw.ch