Lemon Monitoring Presented by Bill Tomlin CERN-IT/FIO/FD WLCG-OSG-EGEE Operations Workshop CERN, 19-20 June 2006.

Slides:



Advertisements
Similar presentations
GridPP7 – June 30 – July 2, 2003 – Fabric monitoring– n° 1 Fabric monitoring for LCG-1 in the CERN Computer Center Jan van Eldik CERN-IT/FIO/SM 7 th GridPP.
Advertisements

CERN – BT – 01/07/ Cern Fabric Management -Hardware and State Bill Tomlin GridPP 7 th Collaboration Meeting June/July 2003.
26/05/2004HEPIX, Edinburgh, May Lemon Web Monitoring Miroslav Šiket CERN IT/FIO
1 CHEP 2000, Roberto Barbera Roberto Barbera (*) Grid monitoring with NAGIOS WP3-INFN Meeting, Naples, (*) Work in collaboration with.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
NGOP J.Fromm K.Genser T.Levshina M.Mengel V.Podstavkov.
The CERN Computer Centres October 14 th 2005 CERN.ch.
Current Status of Fabric Management at CERN, 26/7/2004 Current Status of Fabric Management at CERN CHEP 2004 Interlaken, 27/9/2004 CERN IT/FIO: G. Cancio,
Institute of Computer Science AGH Performance Monitoring of Java Web Service-based Applications Włodzimierz Funika, Piotr Handzlik Lechosław Trębacz Institute.
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
CERN IT Department CH-1211 Genève 23 Switzerland t Some Hints for “Best Practice” Regarding VO Boxes Running Critical Services and Real Use-cases.
Performance and Exception Monitoring Project Tim Smith CERN/IT.
International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, LCG Software Activities in India Rajesh K. Computer Division BARC.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
ATLAS DQ2 Deletion Service D.A. Oleynik, A.S. Petrosyan, V. Garonne, S. Campana (on behalf of the ATLAS Collaboration)
7/2/2003Supervision & Monitoring section1 Supervision & Monitoring Organization and work plan Olof Bärring.
CCR GRID 2010 (Catania) Daniele Gregori, Stefano Antonelli, Donato De Girolamo, Luca dell’Agnello, Andrea Ferraro, Guido Guizzunti, Pierpaolo Ricci, Felice.
Large Computer Centres Tony Cass Leader, Fabric Infrastructure & Operations Group Information Technology Department 14 th January and medium.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Simply monitor a grid site with Nagios J.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
May PEM status report. O.Bärring 1 PEM status report Large-Scale Cluster Computing Workshop FNAL, May Olof Bärring, CERN.
CERN IT Department CH-1211 Geneva 23 Switzerland t Open projects in Grid Monitoring IT-GS-MDS Section Meeting 25 th January 2008.
Fermilab Distributed Monitoring System (NGOP) Progress Report J.Fromm K.Genser T.Levshina M.Mengel V.Podstavkov.
1 The new Fabric Management Tools in Production at CERN Thorsten Kleinwort for CERN IT/FIO HEPiX Autumn 2003 Triumf Vancouver Monday, October 20, 2003.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
CERN IT Department CH-1211 Genève 23 Switzerland t IT Monitoring WG IT/CS Monitoring System Virginie Longo September 14th 2011.
Lemon Monitoring Miroslav Siket, German Cancio, David Front, Maciej Stepniewski CERN-IT/FIO-FS LCG Operations Workshop Bologna, May 2005.
Storage cleaner: deletes files on mass storage systems. It depends on the results of deletion, files can be set in states: deleted or to repeat deletion.
EU 2nd Year Review – Feb – WP4 demo – n° 1 WP4 demonstration Fabric Monitoring and Fault Tolerance Sylvain Chapeland Lord Hess.
CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Lemon for Quattor I.Fedorko CERN CF/IT 16 March 2011.
INFSO-RI Enabling Grids for E-sciencE GridICE: Grid and Fabric Monitoring Integrated for gLite-based Sites Sergio Fantinel INFN.
INFSO-RI Enabling Grids for E-sciencE The gLite File Transfer Service: Middleware Lessons Learned form Service Challenges Paolo.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Fabric Management with ELFms BARC-CERN collaboration meeting B.A.R.C. Mumbai 28/10/05 Presented by G. Cancio – CERN/IT.
Site Manageability & Monitoring Issues for LCG Ian Bird IT Department, CERN LCG MB 24 th October 2006.
Lemon Tutorial Sensor Exception Miroslav Siket, Dennis Waldron CERN-IT/FIO-FD.
EGEE is a project funded by the European Union under contract IST GLite Integration Infrastructure Integration Team JRA1.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Agile Infrastructure Monitoring HEPiX Spring th April.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CF Monitoring: Lemon, LAS, SLS I.Fedorko(IT/CF) IT-Monitoring.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
Lemon Tutorial Sensor How-To Miroslav Siket, Dennis Waldron CERN-IT/FIO-FD.
FTS monitoring work WLCG service reliability workshop November 2007 Alexander Uzhinskiy Andrey Nechaevskiy.
Operations model Maite Barroso, CERN On behalf of EGEE operations WLCG Service Workshop 11/02/2006.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Lemon monitoring and Lemon Alarm System (sensors, exception, alarm)
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CC Monitoring I.Fedorko on behalf of CF/ASI 18/02/2011 Overview.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN Agile Infrastructure Monitoring Pedro Andrade CERN – IT/GT HEPiX Spring 2012.
Streaming Analytics with Spark 1 Magnoni Luca IT-CM-MM 09/02/16EBI - CERN meeting.
DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 1 / 10 Grid Monitoring A conceptual introduction to GridICE Sergio Andreozzi
CMS Experience with the Common Analysis Framework I. Fisk & M. Girone Experience in CMS with the Common Analysis Framework Ian Fisk & Maria Girone 1.
Monitoring and Information Services Core Infrastructure (MIS-CI) Service Description Mark L. Green OSG Integration Workshop at UC Feb 15-17, 2005.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Author etc Alarm framework requirements Andrea Sciabà Tony Wildish.
Lemon Computer Monitoring at CERN Miroslav Siket, German Cancio, David Front, Maciej Stepniewski Presented by Harry Renshall CERN-IT/FIO-FS.
Monitoring Working Group Update Grid Deployment Board 5 th December, CERN Ian Neilson.
Lemon Tutorial Quattor and Non-Quattor Configuration of the lemon-agent Miroslav Siket, Dennis Waldron CERN-IT/FIO-FD.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios Grid Monitor E. Imamagic, SRCE OAT.
Service Availability Monitoring
Jean-Philippe Baud, IT-GD, CERN November 2007
WP4 meeting Heidelberg - Sept 26, 2003 Jan van Eldik - CERN IT/FIO
System Monitoring with Lemon
Monitoring and Fault Tolerance
Status of Fabric Management at CERN
LEMON – Monitoring in the CERN Computer Centre
Miroslav Siket, Dennis Waldron
Monitoring: problems, solutions, experiences
FTS Monitoring Ricardo Rocha
Data Management cluster summary
Database Services for CERN Deployment and Monitoring
Presentation transcript:

Lemon Monitoring Presented by Bill Tomlin CERN-IT/FIO/FD WLCG-OSG-EGEE Operations Workshop CERN, June 2006

19/06/2006WLCG-OSG-EGEE Operations Workshop 2 Lemon – LHC Era Monitoring Distributed monitoring framework + default metrics For nodes, DBs, power consumption, backups, VO jobs Scalable to ~10k nodes, 500+ metrics Early error detection and automatic recovery Web interface Integrated alarm system Data persisted to Oracle, Oracle Express or flat files Framework for plug-in sensors Site independent: BARC, CERN IT+AB, FZK, IN2P3, INFN, RAL GridICE based on LEMON (~180 sites) Easy to install out of the box Well documented at

19/06/2006WLCG-OSG-EGEE Operations Workshop 3 Lemon architecture Correlation Engines Web browser Lemon CLI User Monitoring Repository TCP/UDP SOAP Repository backend Prot Nodes Monitoring Agent Sensor RRDTool / PHP apache HTTP

19/06/2006WLCG-OSG-EGEE Operations Workshop 4 Automatic Recovery Actions Actuator called for defined conditions Complex correlations: m1 > m2 – 50 and m3 < m4 Retry n times before raising an alarm; All actions logged, including success/failure Example: ssh daemon dead – action /sbin/service sshd start ~62 corrective actions defined

19/06/2006WLCG-OSG-EGEE Operations Workshop 5 Web Interface

19/06/2006WLCG-OSG-EGEE Operations Workshop 6 LEMON Alarm System Oracle based AJAX web based GUI Oracle PL/SQL based business logic (reductions of alarms for operators) Notifications: RSS feeds, , SMS Integrated with quattor and State Management System Plug-ins for site-specific integration e.g. Remedy Phasing in Lemon Alarm System (August 2006) Ongoing work

19/06/2006WLCG-OSG-EGEE Operations Workshop 7 Summary –Can re-use whole or part of LEMON –Good fabric management essential to providing good grid services –Queries to: –More details: –LEMON tutorial at CERN on 22nd of September