Lemon Tutorial Lemon Overview Miroslav Siket, Dennis Waldron CERN-IT/FIO-FD.

Slides:



Advertisements
Similar presentations
TeraGrid Deployment Test of Grid Software JP Navarro TeraGrid Software Integration University of Chicago OGF 21 October 19, 2007.
Advertisements

GridPP7 – June 30 – July 2, 2003 – Fabric monitoring– n° 1 Fabric monitoring for LCG-1 in the CERN Computer Center Jan van Eldik CERN-IT/FIO/SM 7 th GridPP.
26/05/2004HEPIX, Edinburgh, May Lemon Web Monitoring Miroslav Šiket CERN IT/FIO
1 CHEP 2000, Roberto Barbera Roberto Barbera (*) Grid monitoring with NAGIOS WP3-INFN Meeting, Naples, (*) Work in collaboration with.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Post-C5 Lemon-web 2.0 Daniel Lenkes and Ivan Fedorko.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
ManageEngine TM Applications Manager 8 Monitoring Custom Applications.
Software Frameworks for Acquisition and Control European PhD – 2009 Horácio Fernandes.
Optinuity Confidential. All rights reserved. C2O Configuration Requirements.
NGOP J.Fromm K.Genser T.Levshina M.Mengel V.Podstavkov.
Institute of Computer Science AGH Performance Monitoring of Java Web Service-based Applications Włodzimierz Funika, Piotr Handzlik Lechosław Trębacz Institute.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
C Copyright © 2009, Oracle. All rights reserved. Appendix C: Service-Oriented Architectures.
Rsv-control Marco Mambelli – Site Coordination meeting October 1, 2009.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
OOI CyberInfrastructure: Technology Overview - Hyrax January 2009 Claudiu Farcas OOI CI Architecture & Design Team UCSD/Calit2.
Module 7: Fundamentals of Administering Windows Server 2008.
Fundamentals of Database Chapter 7 Database Technologies.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
1 Apache. 2 Module - Apache ♦ Overview This module focuses on configuring and customizing Apache web server. Apache is a commonly used Hypertext Transfer.
The Network Performance Advisor J. W. Ferguson NLANR/DAST & NCSA.
May PEM status report. O.Bärring 1 PEM status report Large-Scale Cluster Computing Workshop FNAL, May Olof Bärring, CERN.
Event-Based Hybrid Consistency Framework (EBHCF) for Distributed Annotation Records Ahmet Fatih Mustacoglu Advisor: Prof. Geoffrey.
GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.
1 The new Fabric Management Tools in Production at CERN Thorsten Kleinwort for CERN IT/FIO HEPiX Autumn 2003 Triumf Vancouver Monday, October 20, 2003.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Stephen Childs Trinity College Dublin &
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
INFSO-RI Enabling Grids for E-sciencE SCDB C. Loomis / Michel Jouvin (LAL-Orsay) Quattor Tutorial LCG T2 Workshop June 16, 2006.
RRDtool Miroslav Siket FIO-FS /
Lemon Monitoring Miroslav Siket, German Cancio, David Front, Maciej Stepniewski CERN-IT/FIO-FS LCG Operations Workshop Bologna, May 2005.
Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Usage of virtualization in gLite certification Andreas Unterkircher.
Lemon Monitoring Presented by Bill Tomlin CERN-IT/FIO/FD WLCG-OSG-EGEE Operations Workshop CERN, June 2006.
SAN DIEGO SUPERCOMPUTER CENTER Inca Control Infrastructure Shava Smallen Inca Workshop September 4, 2008.
EU 2nd Year Review – Feb – WP4 demo – n° 1 WP4 demonstration Fabric Monitoring and Fault Tolerance Sylvain Chapeland Lord Hess.
Configuring and Troubleshooting Identity and Access Solutions with Windows Server® 2008 Active Directory®
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Lemon for Quattor I.Fedorko CERN CF/IT 16 March 2011.
ClearQuest XML Server with ClearCase Integration Northwest Rational User’s Group February 22, 2007 Frank Scholz Casey Stewart
Chapter 5 Introduction To Form Builder. Lesson A Objectives  Display Forms Builder forms in a Web browser  Use a data block form to view, insert, update,
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Lemon Tutorial Sensor Exception Miroslav Siket, Dennis Waldron CERN-IT/FIO-FD.
Distributed Logging Facility Castor External Operation Workshop, CERN, November 14th 2006 Dennis Waldron CERN / IT.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CF Monitoring: Lemon, LAS, SLS I.Fedorko(IT/CF) IT-Monitoring.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Alarming with GNI VOC WG meeting 12 th September.
Lemon Tutorial Sensor How-To Miroslav Siket, Dennis Waldron CERN-IT/FIO-FD.
Presentation on developments for the period May - Sep 2006 on Fabric Management C. S. R.C. Murthy, Rohitashva Sharma, Salim A. Pathan & Dinesh Sarode.
Lemon security. Previous security enhancements user lemon: lemon-db-admin-OraMon will create user lemon (Miro). - OraMon switches to user lemon at its.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Lemon monitoring and Lemon Alarm System (sensors, exception, alarm)
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CC Monitoring I.Fedorko on behalf of CF/ASI 18/02/2011 Overview.
Monitoring Dynamic IOC Installations Using the alive Record Dohn Arms Beamline Controls & Data Acquisition Group Advanced Photon Source.
Lemon Computer Monitoring at CERN Miroslav Siket, German Cancio, David Front, Maciej Stepniewski Presented by Harry Renshall CERN-IT/FIO-FS.
CERN IT Department CH-1211 Genève 23 Switzerland t Load testing & benchmarks on Oracle RAC Romain Basset – IT PSS DP.
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
Lemon Tutorial Quattor and Non-Quattor Configuration of the lemon-agent Miroslav Siket, Dennis Waldron CERN-IT/FIO-FD.
Architecture Review 10/11/2004
WP4 meeting Heidelberg - Sept 26, 2003 Jan van Eldik - CERN IT/FIO
System Monitoring with Lemon
Monitoring and Fault Tolerance
Status of Fabric Management at CERN
Open Source distributed document DB for an enterprise
Miroslav Siket, Dennis Waldron
Spark Presentation.
Securing the Network Perimeter with ISA 2004
WP4-install status update
LitwareHR v2: an S+S reference application
Sending data to EUROSTAT using STATEL and STADIUM web client
Presentation transcript:

Lemon Tutorial Lemon Overview Miroslav Siket, Dennis Waldron CERN-IT/FIO-FD

09/10/2006Lemon Tutorial2 Tutorial Why? –Number of services is expanding. More to monitor every day. For whom? –Service managers to configure monitoring of their services –Developers to simplify their life when writing sensors –Site managers to setup their monitoring instances

09/10/2006Lemon Tutorial3 Tutorial Outline Architecture Writing sensors Running and configuring Agent Using lemon tools Running Lemon server(s) Running and configuring web interface Running alarm system

09/10/2006Lemon Tutorial4 Architecture

09/10/2006Lemon Tutorial5 Architecture II Three layers: Data producing/consuming Data manipulation Data Storage

09/10/2006Lemon Tutorial6 Client side Agent forks sensors and communicate with them using custom protocol over a bi-directional “pipes” configures metric instances of metric classes of a sensor and pulls for metrics checks on status of sensors agent sends data to servers using TCP or UDP monitors itself with internal MSA sensor caches data locally Default Linux client distribution comes with the agent, linux and file sensors. Footprint: agent - 5.5MB and 0.02% of CPU utilization* core sensors (Linux, file, exception) – 10MB, 0.2% of CPU* parseLog – 9.4MB Currently C++ and perl APIs available. * i386, SLC3/4, RHES3/4 – average over CERN CC

09/10/2006Lemon Tutorial7 Server side Two implementations: Oracle based – OraMon optimized for high performance and for large Computer Centers runs on Oracle 9i+ (with alarms system on 10g) validation of metric samples, metadata information Flat files based – FlatMon (edg-fmon-server) uses OS files for storing data for smaller sites (scalable to 1000 machines max.) General features: multithreaded UDP/TCP server built in authentication mechanism

09/10/2006Lemon Tutorial8 Server side - planning Space considerations –About 400kB of data per machine/day (Oracle Enterprise edition with compression) – 700kB without compression (XE, Standard) –About 1.2MB for FlatMon per machine per day CPU considerations –Dual PIV, 3GHz, 4GB of memory with Oracle DB server + OraMon requires about 15% CPU for 4000 monitored machines –Adding Alarm system on Oracle requires additional 5% of CPU –FlatMon saturates the above machine with 1000 monitored hosts –OraMon/FlatMon require about 105MB of memory Functionality considerations –FlatMon does not provide metric checks and has no metadata concept –Lemon Alarm System (LAS) runs on Oracle as PL/SQL procedures and requires Oracle 10g – integrated with OraMon schema in Oracle database –For HA architecture, use Oracle RAC and multiple OraMon servers

09/10/2006Lemon Tutorial9 User/administration tools Lemon-cli –Retrieving monitoring data from the local machine cache –Allows retrieving data from the server –Currently uses SOAP interface (to be retired soon) Lemon-host-check –Checks status of the machine based on the values of exceptions –Checks status of the monitoring agent and sensors –Manages status of exceptions

09/10/2006Lemon Tutorial10 Configuration management At CERN we use Quattor Configuration Database –Configuration is stored in hierarchical templates per domain/cluster/node –NCM framework is used to download configuration XML profile to nodes –NCM components are used: For agent/sensors configuration – using fmonagent component For server configuration (metadata) – using oramonserver component For smaller sites with homogeneous structures –Use default agent and sensor rpms from Lemon –Use rpms for custom sensors/settings

09/10/2006Lemon Tutorial11 Lemon RRD framework User front-end for visualization and caching monitoring data Two layers –Pre-processing – consumes monitoring data and creates rrd files per machine/cluster/… (aging, averages) - lemonmrd –Visualization – using rrd files for fast visualization or direct access to the monitoring repository – status web pages Different plugins/options available: –Synoptic display of the Computer Center (XML driven) –Lemon Alarm GUI –Quattor.tpl file browser, … Requirements –Web server with PHP (v5+ if want to use LAS) –rrdtool rpm –500kB space per machine’s rrd file

09/10/2006Lemon Tutorial12 Automatic recovery actions and alarms Sensor exception –For defined values of measured metrics an actuator is called with predefined action –An example: ssh daemon dead – action /sbin/service sshd start –Definition: metric X, field Y reference value Z => call actuator can be ==,,regexp, range, +,-,*,/ etc.. –Each occurrence is logged in the Monitoring Repository –Already about 230 predefined exceptions with automatic recovery actions –Exceptions are base for alarms in Lemon Alarm System –Allow multi-valued metrics and on-behalf metrics –Allow corrective actions (actuators) up to n-times or within given time window –Allow distinguishing of the alarm state (failed actuator, silenced,…) –Example: (10004:7 > 100 && (10005:3 – 34:5)>100:56) On behalf: (soap_srvx:302:1 > 10)

09/10/2006Lemon Tutorial13 Lemon Alarm System Newest addition to Lemon Build on top of the OraMon schema in Oracle database Comes in two pieces: –PL/SQL stored procedures (requires Oracle 10g) to consume exceptions and to produce alarms –GUI – web based interface based on AJAX – part of LRF Features –Reduction of alarms (by type or by node/cluster) –Possibility to hide/inhibit alarms –Access control –History tracking –Future: notifications, RSS feeds

09/10/2006Lemon Tutorial14 Software distribution RPM –direct download from or at –YUM setup with /etc/yum.repos.d/lemon.repo [lemon] name=Lemon baseurl= enabled=1 gpgcheck=1 gpgkey= –APT setup with /etc/apt/sources.list.d/lemon.list # Lemon stable rpm linux/RPMS/i386/sl4 lemon_stable_sl4 Source code –CVS

09/10/2006Lemon Tutorial15 Future and additional information Things not covered/under development –XML gateway with API to several languages (C++, perl, python, java,…) –Python Sensor API –LAS notification, RSS feeds –Encryption of data between agent and server –Authentication for user access –Service views for LRF Check Web pages: for additional informationhttp://cern.ch/lemon

Lemon Tutorial Sensor How-To Miroslav Siket, Dennis Waldron CERN-IT/FIO-FD

09/10/2006Lemon Tutorial17 Outline Terminology Examples of existing sensors Considerations Live Examples –Hello World –Service based monitoring Do’s and Don’ts

09/10/2006Lemon Tutorial18 Terminology Sensor: –A process or script which is connected to the lemon-agent via a bi-directional pipe and collects information on behalf of the agent. Sensors implement, Metric Classes: –The equivalent to a class in OOP (Object Orientated Programming) Metric Instance: –Is an instance (an object) of a metric class which has its own configuration data. Metric ID: –A unique identifier associated with a particular metric instance of a particular metric class.

09/10/2006Lemon Tutorial19 Existing sensors At CERN: –Approx 40 active sensors defined, providing 264 metrics and 227 exceptions. –Default installation of the Lemon agent comes with three sensors: MSA (builtin) – self monitoring of the agent. Linux – performance, file system and process monitoring. File – file tests e.g. size, mtime, ctime. –Together they provide 135 metrics (51% of all CERN metrics) –Other officially distributed sensors include: exception – correlation sensor for generating alarms. remote – provides ping and http web server checks. oracle – oracle database statistics monitoring. parselog – log file parsing sensor. –All available from the lemon software repository –Other contributing sensors are available from CVS:

09/10/2006Lemon Tutorial20 Considerations Question: What is your goal? How do you intend to use the monitoring information you collect? Is it for: –Pure data collection? OK –Graphs displayed on the lemon status pages? Just because you’ve collected data doesn’t give you graphs immediately! This is not automatic! –Information to be alarmed? Make sure the structure of the data you collect can be alarmed! Data that cannot be alarmed: –Timestamps as strings - NO –Timestamps as numbers - NO –Parsing of complex strings - NO

09/10/2006Lemon Tutorial21 Considerations (II) - Use Case Grid Certificate Expiry Use Case Outline: you wish to be notified or raise an alarm if the Grid Certificate on a machine will expiry in the next two weeks. You need 1 metric and 1 exception –The metric will record the expiry time of the certificate. –The exception will check the metric and decide if it expires in the next two weeks. The metric needs to be structured in such a way that the correlation unit of the exception sensor can understand it. Can I record the data as a: –String e.g. “Sun Oct 8 16:05: ” NO (Cannot be converted to a number) –UnixTime e.g. “ ” NO (Correlation unit doesn’t understand time, yet!!) Solution: –Record the number of seconds until the certificate expires. –E.g seconds (3 wks) can be mathematical alarmed :- If metric < (2 wks) then raise alarm

09/10/2006Lemon Tutorial22 Considerations (III) Misconception: –In Lemon that a metric has to be related to one and only one distinct piece of information (1 to 1 mapping) Not true: –A metric can be associated with multiple values and have multi rows with each row identified by a unique key.

09/10/2006Lemon Tutorial23 Considerations (IV) – Use Case Recording partition information Outline: you would like to know the total size, space used in megabytes, space used as a % and the mount options of all mounted partitions on a machine. –Under the idea of a 1 to 1 mapping, that’s 4 metrics per partition. An average machine may have 7 partitions (4x7 = 28 metrics in total). –Why not: Convert the data into a multi-valued metric? 7 metrics each reporting 4 values. So, –Metric 1 total_space –Metric 2 space_used_mb –Metric 3 space_used_perc –Metric 4 mount_options Becomes: –Metric A total_space space_used_mb space_used_perc mount_options –Go one step further: Convert the data into a multi-valued, multi-rowed metric 1 metric reporting the values for all mount points. So, –Metric A total_space space_used_mb space_used_perc mount_options Becomes: –Metric B mountname1 total_space space_used_mb space_used_perc mount_options -Metric B mountname2 total_space space_used_mb space_used_perc mount_options -…. -Benefits: -Monitoring of new mount points is dynamic, no need for reconfigurations, no need to going through a registration process to get new metric ids.

09/10/2006Lemon Tutorial24 Example 1 – Hello World Objective: To create a Perl sensor which records the value “Hello World” into Lemon. Simple sensor to demonstrate: –The generic build framework for sensors. –How to registering your Perl module with the API. –How to register metric classes that your modules provides. –How to store the text “Hello World” for the machine under which the sensor runs into Lemon. –Running and debugging your sensor on the command line. Functions used: –registerVersion() –registerMetric() –storeSample01() Documented at:

09/10/2006Lemon Tutorial25 Example 2 – Service Monitoring Objective: To check if a webpage is available on a remote web server and record the HTTP response code under a service name. Demonstrates: –The basics of on behalf reporting –The ability to parse configuration arguments –The ability to log messages Functions used: –registerMetric() –getParam() –log() –storeSample03()

09/10/2006Lemon Tutorial26 Do’s and Don’ts Don’t: –Call die() or exit() from inside your sensor. –Open or write to files in locations writeable by non-root users such as /tmp/ –Read from filehandles (e.g sockets) that may block. This will make your sensor unresponsive to requests from the agent. –Never rely on, or have dependencies on files on remote file systems such as AFS (Andrew File System). Your sensor should aim to have as few dependencies as possible Do’s: –Document your sensor. Refer to the sensor tutorial to see how this can be done automatically for you. –If you have the ability to use a timeout around calls to databases and services like LSF, use it!! –Make your metric classes configurable, avoid hard coded paths to non standard files. –Try to make your sensors as generic as possible so that others can benefit from your work.

Lemon Tutorial Sensor Exception Miroslav Siket, Dennis Waldron CERN-IT/FIO-FD

09/10/2006Lemon Tutorial28 Outline What is it? Configuration Correlation Examples. Actuators Dealing with transient alarms.

09/10/2006Lemon Tutorial29 What is it? Sensor-exception –An officially supported Lemon sensor coded in C++. –Developed in collaboration between CERN and BARC. –Implements the Lemon alarm protocol. –Has a LEX & YACC correlation engine which allows it to evaluate 1 or more metrics to determine if a problem exists on a machine. –Supports reporting alarms on behalf of other monitored entities. –Allows corrective actions (actuators) up to n-times or within a given time window. –Is the primary interface to inserting alarms into the Lemon framework. The output of the sensor is used by LAS and lemon-host-check. –Provides one and only one metric class “alarm.exception” Full documentation at: –

09/10/2006Lemon Tutorial30 Configuration The sensor has 6 configuration options: –Correlation The power behind the sensor exceptions capabilities This tells the sensor which metrics are involved in the alarm and how they should be evaluated –Actuator The path to an actuator to run if the correlation string is true. –MaxRuns The maximum number of times an actuator can run consecutively before a final alarm is generating –Timeout The maximum number of seconds that an actuator is allowed to run before being terminated by the sensor. –MinOccurs The minimum number of consecutive times a problem must be present before raising an alarm. Good for dealing with transient alarms. –Silent Defines whether the exception should run in silent mode. A silent exception will continue to be evaluated but the result will not be displayed on LAS or lemon-host-check. Good for testing and deployment of new alarms.

09/10/2006Lemon Tutorial31 Configuration (II) Basic format of a correlation is: [entity_name]: :... Where, –entity_name An optional parameter, used for reporting on behalf of other entities The name of the entity (wildcards ‘*’ are supported) –metric_id The id of the metric to check –field_position The field to use within the metric. Allows the correlation to extract a single value from a multi-valued metric –Operater E.g. ==, !=, >, <, eq, ne, regex, !regex … –reference_value A string or number used to compare the metric_id:field_position against

09/10/2006Lemon Tutorial32 Correlation Example (I) Objective: –To run a actuator when the occupancy of the /tmp partition is greater then 80%. Involved Metrics –9104 (system.partitionInfo) –Field 1 = mountname, field 5 = percentage occupancy Correlation Correlation ((9104:1 eq '/tmp') && (9104:5 > 80)) Actuator /usr/local/sbin/clean-tmp-partition -o 75 MaxRuns Timeout 300

09/10/2006Lemon Tutorial33 Correlation Example (II) Objective: –To raise an alarm “lemon_agent_wrong” if the memory utilisation, cpu utilisation or number of errors in the agents log file is not within acceptable limits. Correlation 10004:1 > 600 && (10004:7 > 10 || (10004:8 > && 4109:3 eq 'i386') || (10004:8 > && 4109:3 regex '64') || 10007:2 > 50 || 10007:3 > 10 || 10007:4 > 0) If the: (uptime of the agent (10004:1) is greater then 600 seconds) AND (the cpu utilisation of the sensors (10004:7) over the last sampling frequency is greater then 10%) OR (the memory consumed by the sensors (10004:8) is greater then 150 megabytes for machines of architecture type (4109:3) i386 or 600 megabytes for machines of architecture type x86_64) OR (the number of warning messages (10007:2) recorded over the last sampling frequency is greater the 50) OR (the number of error messages (10007:3) recorded over the last sampling frequency is greater the 10) OR (the number of fatal messages (10007:3) recorded over the last sampling frequency is greater the 0) raise an alarm

09/10/2006Lemon Tutorial34 Actuators Information: –Run as forked processes. –Are connected to the sensor via a pipe. –All information written to stdout or stderr by the actuator is caught and recorded in the agents log file. –All actuator attempts are logged centrally and recorded locally in the agents log file. Running shell style actuators: –The system call used to run actuator doesn’t provide shell style conveniences. –To use shell style syntax like *, &&, | etc you must define you actuator like this: Actuator/bin/sh –c \\” /bin/echo ‘This is a demo message from $HOSTNAME’ \\”

09/10/2006Lemon Tutorial35 Dealing with transient Alarms Why do we get transient alarms? –By default monitoring isn’t very tolerant of outside interventions –Maybe network issues. –A resource maybe temporarily unavailable. What can be done? –Use the configuration option MinOccurs –MinOccurs gives an exception a level of tolerance, a delay factor between detecting a problem and raising an alarm

Lemon Tutorial Quattor and Non-Quattor Configuration of the lemon-agent Miroslav Siket, Dennis Waldron CERN-IT/FIO-FD

09/10/2006Lemon Tutorial37 Outline What is the agent? How to install the agent Configuring the agent Demonstration

09/10/2006Lemon Tutorial38 What is the agent? A daemon on every monitored machine that is responsible for: –Launching, scheduling requests and communicating with sensors. –Checking on the status of sensors. –Sending sensor information to the central lemon servers using TCP and/or UDP. –Monitoring itself with the internal MSA sensor. –Caching data locally for use by other lemon tools e.g. lemon- host-check and lemon-cli Full documentation at:

09/10/2006Lemon Tutorial39 Configuring the agent Two supported ways: –Quattor Configuration is stored in hierarchical templates per domain/cluster/node NCM framework is used to download configuration XML profile to nodes NCM components are used to convert the xml profile information into the agents native configuration file structure. Documented at: –Non-Quattor Best suited for homogeneous sites. Use default agent and sensor rpms from Lemon Use rpms for custom sensors/settings The agent supports a modular style configuration where configuration files are places into sub directories depending on their purpose: –/etc/lemon/agent/metrics/<- metric configuration –/etc/lemon/agent/sensors/<- sensor configuration –/etc/lemon/agent/transports/<- transport configuration Both the Quattor and Non-Quattor styles of configuration can live together on the same machine.

09/10/2006Lemon Tutorial40 Demonstration Installation of the agent and default sensors –rpm –Uvh edg-fabricMonitoring-agent i386.rpm –rpm –Uvh lemon-sensor-exception i386.rpm Configuration of: –General agent’s settings (/etc/lemon/agent/general.conf) –Servers (transports) (/etc/lemon/agent/udp.conf) –Defining a new sensor –Defining a new metric

Lemon Tutorial lemon-host-check Miroslav Siket, Dennis Waldron CERN-IT/FIO-FD

09/10/2006Lemon Tutorial42 Outline What is it ? Demonstration

09/10/2006Lemon Tutorial43 What is it? Lemon-host-check is: –The latest Lemon tool. –A tool for checking the current status of all configured exceptions on the machine. –A tool for managing the state of exceptions, with the ability to turn a exceptions off and on, on the fly without the need for reconfiguration of the agent. –The first command you should run whenever you believe monitoring is incorrect!!! –Works by instructing the local agent to refresh all metrics contributing towards exceptions (raw metrics) and then requesting a refresh of all exceptions. –Uses fresh monitoring data. Fully documented at:

09/10/2006Lemon Tutorial44 Demonstration Installation of lemon-host-check –rpm –Uvh lemon-host-check noarch.rpm –rpm –Uvh edg-fabricMonitoring-mrs i386.rpm Show how to: –Interpret the information returned by lemon-host- check –Enable and disable exceptions –View pre alarms, running actuators and disabled metrics

Lemon Tutorial FlatMon and OraMon servers Miroslav Siket, Dennis Waldron CERN-IT/FIO-FD

09/10/2006Lemon Tutorial46 Authentication Flat File (FlatMon ) and Oracle based (OraMon) Used for both TCP and UDP (stateless connections) Using OpenSSL libraries with public key methods to authenticate – sign() and verify() methods Support both RSA/SHA1,MD5, DSA/DSS1 algorithms with different key sizes (default = 1024bit) Fastest – RSA/SHA1 X509 would provide too much overhead Three levels: –0 – no authentication –1 – authentication of signed packets, accepts also non-signed packets –2 – full enforcement of authentication

09/10/2006Lemon Tutorial47 Authentication - schema Node3 [rsa_encrypt(s.pub_key)] rsa_sign(n3.sec_key) Node3 [rsa_encrypt(s.pub_key)] rsa_sign(n3.sec_key) Server2 rsa_verify(metric,n1.pub_key) [rsa_decrypt(s.sec_key)] Server2 rsa_verify(metric,n1.pub_key) [rsa_decrypt(s.sec_key)] Node2 [rsa_encrypt(s.pub_key)] rsa_sign(n2.sec_key) Node2 [rsa_encrypt(s.pub_key)] rsa_sign(n2.sec_key) Node1 [rsa_encrypt(s.pub_key)] rsa_sign(n1.sec_key) Node1 [rsa_encrypt(s.pub_key)] rsa_sign(n1.sec_key) Server1 rsa_verify(metric,n3.pub_key) [rsa_decrypt(s.sec_key)] Server1 rsa_verify(metric,n3.pub_key) [rsa_decrypt(s.sec_key)] s.pub_key – server’s public key n(x).sec_key – agent’s secret key n(x).public_key – agent’s public key

09/10/2006Lemon Tutorial48 Setup of FlatMon Fast overview: –Install server rpm –Setup /etc/lemon/server/edg-fmon-server.conf file –Setup /etc/lemon/server/keys directory with client keys –Check authentication –Check data arriving at server –Check log files for problems

09/10/2006Lemon Tutorial49 Setup of OraMon Fast overview: –Rpms installation (lemon-ora-admin, lemon-OraMon) –DBA creation of schema (use adapted lemon_user.sql) –Setting up schema for OraMon with lemon-ora.admin –Configuring metadata information (/etc/oramon-server.conf) –Configuring OraMon: System settings with /etc/sysconfig/OraMon Access settings with /etc/lemon/server/lemon-oramon-server.conf –Checking the log file for problems –Checking data with lemon-ora.retrieve –Changing the metadata – admin/index.html –

Lemon Tutorial LRF Miroslav Siket, Dennis Waldron CERN-IT/FIO-FD

09/10/2006Lemon Tutorial51 Components Lemonmrd – pre-processing of data in python to RRD files: OraMon and FlatMon data backends Redefinition of RRD files (metrics) Cluster/grouping metric processing (average, aggregate) Primary key constraint selection (1 value) Added tools for history data upload Dynamic cluster definition via plugins (VO utilization) Warning: use rrdtool v (aka RRD version 0001 only)

09/10/2006Lemon Tutorial52 Components II Status pages –Data source both OraMon and FlatMon –Written in PHP, using JpGraph for direct graphs from data from Oracle and rrdtools for RRD graphs –Includes LAS (needed version 5 of PHP) in installation –Allows defining groupings of clusters, virtual clusters, racks, HW models, … –Added support for Oracle Databases –Synoptic view of the Computer Center –For integration with CDB – template and XML profile viewer integrated

09/10/2006Lemon Tutorial53 Setup Fast overview: –Rpm installation (lrf) –Setting up clusters - /etc/lemon/lrf/clusters.conf –Starting lemonmrd, checking log file (/var/log/lemonmrd.log) –Verifying PHP settings –Adding rack config file –Setting up Synoptic display –Viewing the data on status pages, fast overview –Changing the setup to work with Oracle XE: Modification of config files (/etc/lemon/lemonmrd.conf and /var/www/html/lrf/config.php) Setting up /etc/init.d/lemonmrd with Oracle XE –

Lemon Tutorial Lemon Alarm System Miroslav Siket, Dennis Waldron CERN-IT/FIO-FD

09/10/2006Lemon Tutorial55 Installation Lemon Alarm System (LAS) installation: –PL/SQL procedures shipped with lemon-ora-admin tools – add-on to OraMon schema (--create-las) –LAS GUI is shipped with LRF – add-on to status displays – integrated solution –Additional part – lemon-ora.entities – state and entity tree management (using state XML file) DISCLAIMER: Not used elsewhere yet, even though in production at CERN, so consider highly experimental.