Lemon Monitoring Miroslav Siket, German Cancio, David Front, Maciej Stepniewski CERN-IT/FIO-FS LCG Operations Workshop Bologna, 24-26 May 2005.

Slides:



Advertisements
Similar presentations
GridPP7 – June 30 – July 2, 2003 – Fabric monitoring– n° 1 Fabric monitoring for LCG-1 in the CERN Computer Center Jan van Eldik CERN-IT/FIO/SM 7 th GridPP.
Advertisements

26/05/2004HEPIX, Edinburgh, May Lemon Web Monitoring Miroslav Šiket CERN IT/FIO
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Torsten Antoni – LCG Operations Workshop, CERN 02-04/11/04 Global Grid User Support - GGUS -
CCTracker Presented by Dinesh Sarode Leaf : Bill Tomlin IT/FIO URL
1 CHEP 2000, Roberto Barbera Roberto Barbera (*) Grid monitoring with NAGIOS WP3-INFN Meeting, Naples, (*) Work in collaboration with.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
NGOP J.Fromm K.Genser T.Levshina M.Mengel V.Podstavkov.
The CERN Computer Centres October 14 th 2005 CERN.ch.
Current Status of Fabric Management at CERN, 26/7/2004 Current Status of Fabric Management at CERN CHEP 2004 Interlaken, 27/9/2004 CERN IT/FIO: G. Cancio,
Institute of Computer Science AGH Performance Monitoring of Java Web Service-based Applications Włodzimierz Funika, Piotr Handzlik Lechosław Trębacz Institute.
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, LCG Software Activities in India Rajesh K. Computer Division BARC.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
WP4-install task report WP4 workshop Barcelona project conference 5/03 German Cancio.
7/2/2003Supervision & Monitoring section1 Supervision & Monitoring Organization and work plan Olof Bärring.
1 Linux in the Computer Center at CERN Zeuthen Thorsten Kleinwort CERN-IT.
Large Computer Centres Tony Cass Leader, Fabric Infrastructure & Operations Group Information Technology Department 14 th January and medium.
Partner Logo DataGRID WP4 - Fabric Management Status HEPiX 2002, Catania / IT, , Jan Iven Role and.
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
Olof Bärring – WP4 summary- 4/9/ n° 1 Partner Logo WP4 report Plans for testbed 2
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
CERN IT Department CH-1211 Geneva 23 Switzerland t Daniel Gomez Ruben Gaspar Ignacio Coterillo * Dawid Wojcik *CERN/CSIC funded by Spanish.
Fermilab Distributed Monitoring System (NGOP) Progress Report J.Fromm K.Genser T.Levshina M.Mengel V.Podstavkov.
1 The new Fabric Management Tools in Production at CERN Thorsten Kleinwort for CERN IT/FIO HEPiX Autumn 2003 Triumf Vancouver Monday, October 20, 2003.
05/29/2002Flavia Donno, INFN-Pisa1 Packaging and distribution issues Flavia Donno, INFN-Pisa EDG/WP8 EDT/WP4 joint meeting, 29 May 2002.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Large Farm 'Real Life Problems' and their Solutions Thorsten Kleinwort CERN IT/FIO HEPiX II/2004 BNL.
Deployment work at CERN: installation and configuration tasks WP4 workshop Barcelona project conference 5/03 German Cancio CERN IT/FIO.
RRDtool Miroslav Siket FIO-FS /
SAN DIEGO SUPERCOMPUTER CENTER Inca TeraGrid Status Kate Ericson November 2, 2006.
Installing, running, and maintaining large Linux Clusters at CERN Thorsten Kleinwort CERN-IT/FIO CHEP
Olof Bärring – WP4 summary- 4/9/ n° 1 Partner Logo WP4 report Plans for testbed 2 [Including slides prepared by Lex Holt.]
Lemon Monitoring Presented by Bill Tomlin CERN-IT/FIO/FD WLCG-OSG-EGEE Operations Workshop CERN, June 2006.
The DIAMON Project Monitoring and Diagnostics for the CERN Controls Infrastructure Pierre Charrue, Mark Buttner, Joel Lauener, Katarina Sigerud, Maciej.
EU 2nd Year Review – Feb – WP4 demo – n° 1 WP4 demonstration Fabric Monitoring and Fault Tolerance Sylvain Chapeland Lord Hess.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Fabric Management with ELFms BARC-CERN collaboration meeting B.A.R.C. Mumbai 28/10/05 Presented by G. Cancio – CERN/IT.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
Maite Barroso - 10/05/01 - n° 1 WP4 PM9 Deliverable Presentation: Interim Installation System Configuration Management Prototype
CSE 548 Advanced Computer Network Security Trust in MobiCloud using Hadoop Framework Updates Sayan Kole Jaya Chakladar Group No: 1.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Agile Infrastructure Monitoring HEPiX Spring th April.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CF Monitoring: Lemon, LAS, SLS I.Fedorko(IT/CF) IT-Monitoring.
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
Lemon Tutorial Sensor How-To Miroslav Siket, Dennis Waldron CERN-IT/FIO-FD.
SAN DIEGO SUPERCOMPUTER CENTER Welcome to the 2nd Inca Workshop Sponsored by the NSF September 4 & 5, 2008 Presenters: Shava Smallen
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
03/09/2007http://pcalimonitor.cern.ch/1 Monitoring in ALICE Costin Grigoras 03/09/2007 WLCG Meeting, CHEP.
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Lemon monitoring and Lemon Alarm System (sensors, exception, alarm)
Site Authorization Service Local Resource Authorization Service (VOX Project) Vijay Sekhri Tanya Levshina Fermilab.
Quattor tutorial Introduction German Cancio, Rafael Garcia, Cal Loomis.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CC Monitoring I.Fedorko on behalf of CF/ASI 18/02/2011 Overview.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 1 / 10 Grid Monitoring A conceptual introduction to GridICE Sergio Andreozzi
Partner Logo Olof Bärring, WP4 workshop 10/12/ n° 1 (My) Vision of where we are going WP4 workshop, 10/12/2002 Olof Bärring.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
TIFR, Mumbai, India, Feb 13-17, GridView - A Grid Monitoring and Visualization Tool Rajesh Kalmady, Digamber Sonvane, Kislay Bhatt, Phool Chand,
Lemon Computer Monitoring at CERN Miroslav Siket, German Cancio, David Front, Maciej Stepniewski Presented by Harry Renshall CERN-IT/FIO-FS.
Lemon Tutorial Quattor and Non-Quattor Configuration of the lemon-agent Miroslav Siket, Dennis Waldron CERN-IT/FIO-FD.
WP4 meeting Heidelberg - Sept 26, 2003 Jan van Eldik - CERN IT/FIO
System Monitoring with Lemon
Blueprint of Persistent Infrastructure as a Service
Monitoring and Fault Tolerance
Status of Fabric Management at CERN
Overview – SOE PatchTT November 2015.
Miroslav Siket, Dennis Waldron
Database Services for CERN Deployment and Monitoring
Presentation transcript:

Lemon Monitoring Miroslav Siket, German Cancio, David Front, Maciej Stepniewski CERN-IT/FIO-FS LCG Operations Workshop Bologna, May 2005

25/05/2005LCG Operations Workshop /05/2005 Bologna 2 Outline Lemon Structure and design How it works, deployment Use cases, web interface Installation and setup Summary

25/05/2005LCG Operations Workshop /05/2005 Bologna 3 Lemon – LHC Era Monitoring Lemon is a system containing tools for monitoring status and performance of computers: –Distributed monitoring system scalable to ~10k nodes –Provides active monitoring of software and hardware in the Computer Center on centrally managed clusters –Facilitates early error detection and problem prevention –Executes corrective actions and sends notifications –Provides persistent storage of the monitoring data –Offers a framework for further creation of sensors for monitoring –Site independent functionality Link: Part of the ELFms toolsuite:

25/05/2005LCG Operations Workshop /05/2005 Bologna 4 Lemon Use It is used in-and-outside CERN by: –System administrators, service managers, cluster responsibles –Developers and service/data challenges –Managers and general users Deployments outside CERN : –EDG testbeds –Accelerator (AB) department at CERN –CMS online –GridICE –BARC India (development partner)

25/05/2005LCG Operations Workshop /05/2005 Bologna 5 Lemon architecture Correlation Engines Web browser Lemon CLI User Monitoring Repository TCP/UDP SOAP Repository backend Prot Nodes Monitoring Agent Sensor RRDTool / PHP apache HTTP

25/05/2005LCG Operations Workshop /05/2005 Bologna 6 Components Lemon is a typical server/client application with following components: –MSA – Monitoring Sensor Agent (Lemon Agent) Daemon on a client machine that spawns multiple Monitoring Sensors to measure data in defined intervals and sends data to Monitoring Repository –MS - Monitoring Sensor Uses standard C++, perl API – it is easy to write your own sensor Several sensors exist for performance, process, hw and sw monitoring, grid VO’s job reporting, database monitoring, security, alarms (total 260 metrics) –MR – Monitoring Repository Server application that receives samples and processes/validates them Stores the full monitoring history data Two implementations - flat files or Oracle DB based –LRF - Lemon RRD Framework Pre-processes data into rrd files and creates cluster summaries These are used for web graphics Provides service and cluster overview in its web displays –LAG – Lemon Alarm Gateway Generic gateway for alarms (in development) Gateways to MonALISA and GridICE exist

25/05/2005LCG Operations Workshop /05/2005 Bologna 7 Lemon at CERN Lemon monitors about 2200 computers in ~100 clusters On average it collects about 70 metrics from each host Integrated with Sure alarm system Collecting about 1.5 GB/day LEAF (LHC-Era Automated Fabric) for high-level intervention scheduling Node Configuration Management Node Management Configuration Derived from the Quattor Configuration Database (CDB) individual configuration per cluster/host hierarchical structure Alarm system Sure – legacy system receiving alarms from Lemon Integration with new LASER system (LHC alarm system) via LAG is ongoing

25/05/2005LCG Operations Workshop /05/2005 Bologna 8 Web interface Cluster view displays accumulated statistics and status for all machines in the cluster Host view gives overview of the host status with basic metrics Other views available: –Rack view –Hardware type view –Other views can be added, working on user defined views With the newest version (to be released soon): –Generic entry page displaying status overview of the key services –Configurable views In development: database services monitoring with database specific view

25/05/2005LCG Operations Workshop /05/2005 Bologna 9 Use(ful) case Kernel upgrade –Kernel version is “measured” on the boot of the machine –Automatic tools for upgrading the kernel on a cluster retrieve information from Lemon and schedule reboot of a machine based on this info –Web interface allows monitoring of the progress Reboot occurrence history graph

25/05/2005LCG Operations Workshop /05/2005 Bologna 10 Computer Center display Lemon Web Interface can be interfaced with a Computer Center database of objects (racks, silos, …) Provides search of objects as well as listing Interfaced through a XML defined geometry of the computer center Generic design that can be used anywhere:

25/05/2005LCG Operations Workshop /05/2005 Bologna 11 Service challenges, GRID VOs Lemon allows for –Virtual clusters clusters defined on request by service managers or defined by scripts – updated dynamically on demand or defined for specific purpose Examples: Alice MDC, network challenges,… –Clusters defined dynamically example: hosts running GRID jobs on the batch cluster belonging to the given Virtual Organization hooks in Lemon for defining any dynamic grouping of hosts

25/05/2005LCG Operations Workshop /05/2005 Bologna 12 Automatic recovery actions and Alarms Alarm Sensor –For defined values of measured metrics an actuator is called with predefined action –An example: ssh daemon dead – action /sbin/service sshd start –Definition: metric X, field Y reference value Z => call actuator can be ==,,regexp, range, etc.. If success log only, else call action up to max times –Each occurrence is logged in the Monitoring Repository –Already about 70 predefined alarms with automatic recovery actions –After first month of deployment it reduced number of problem tickets by half Correlation engine (CMDaemon) –Allows ‘global’ correlations, and in the future client/server alarms and recovery actions Lemon Alarm gateway (LAG) –Lemon’s LAG can be used to feed alarms into arbitrary alarm systems (under development)

25/05/2005LCG Operations Workshop /05/2005 Bologna 13 Installation and setup (I) Lemon installation consists of three steps: 1.Server installation 2.Client installation 3.Web interface installation 1. Server installation: –install edg-fabricMonitoring-server rpm (“flat file” server) –Configure receiving port in /etc/edg-fmon-server.conf –Start the server daemon 2. Client installation: –Install edg-fabricMonitoring-agent rpm (comes with default metric configuration) –Configure server and its port in /etc/edg-fmon-agent.conf –Start the client daemon on all monitored hosts

25/05/2005LCG Operations Workshop /05/2005 Bologna 14 Installation and setup (II) 3. Web interface installation –Install and start apache server (with php) on your server –Install rrdtool and lrf (lemon rrd framework) rpms –Configure your clusters in clusters.conf file and start lemonmrd daemon Drink Champagne… you have Lemon up and running! ;-) –You can do all this on your laptop! Possible additional components: –Computer center synoptic view through xml file –Problem tracking system integration (through php plug-in to your DB/application) –Quattor CDB configuration view – through CDB xml profiles –Oracle based Repository (for very large installations with high scalability and increased functionality) –Other, new components are easy to add View detailed instructions at:

25/05/2005LCG Operations Workshop /05/2005 Bologna 15 Summary Lemon serves to provide monitoring information about the farms in Computer Centers (or your laptop). Lemon provides framework for recovery actions and alarms. Lemon is easy to install (…and it is easy to add your own metrics and visualize them). It is flexible with respect to your needs – you can add clusters, views, specify your definition of virtual and dynamic clusters. It has been a useful tool for general monitoring of performance and also for system administrators in debugging problems. For more information check