System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam.

Slides:



Advertisements
Similar presentations
NAGIOS AND CACTI NETWORK MANAGEMENT AND MONITORING SYSTEMS.
Advertisements

CWG10 Control, Configuration and Monitoring Status and plans for Control, Configuration and Monitoring 16 December 2014 ALICE O 2 Asian Workshop
Clara Gaspar on behalf of the LHCb Collaboration, “Physics at the LHC and Beyond”, Quy Nhon, Vietnam, August 2014 Challenges and lessons learnt LHCb Operations.
Pankaj Kumar Qinglan Zhang Sagar Davasam Sowjanya Puligadda Wei Liu
CHEP 2012 – New York City 1.  LHC Delivers bunch crossing at 40MHz  LHCb reduces the rate with a two level trigger system: ◦ First Level (L0) – Hardware.
June 19, 2002 A Software Skeleton for the Full Front-End Crate Test at BNL Goal: to provide a working data acquisition (DAQ) system for the coming full.
1 Databases in ALICE L.Betev LCG Database Deployment and Persistency Workshop Geneva, October 17, 2005.
Cacti Workshop Tony Roman Agenda What is Cacti? The Origins of Cacti Large Installation Considerations Automation The Current.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Monitoring a Large-Scale Network: Selecting the Right Tool Sayadur Rahman United International University & Network Manager, Financial Service.
Monitoring network traffic of Cisco 2950 switch and Cisco 1600 router Group 4 Ishan Shah (CIN: ) Jyotsna Mishra (CIN: ) Parth Chavda (CIN: )
CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/20 New Experiences with the ALICE High Level Trigger Data Transport.
F Fermilab Database Experience in Run II Fermilab Run II Database Requirements Online databases are maintained at each experiment and are critical for.
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
1 Network Statistic and Monitoring System Wayne State University Division of Computing and Information Technology Information Technology.
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
Zabbix Performance Tuning
Inventory:OCSNG + GLPI Monitoring: Zenoss 3
Hsu Chun-Hung Network Benchmarking Lab
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.
Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006.
Ramiro Voicu December Design Considerations  Act as a true dynamic service and provide the necessary functionally to be used by any other services.
1 Alice DAQ Configuration DB
Update on Database Issues Peter Chochula DCS Workshop, June 21, 2004 Colmar.
workshop eugene, oregon What is network management? System & Service monitoring  Reachability, availability Resource measurement/monitoring.
May PEM status report. O.Bärring 1 PEM status report Large-Scale Cluster Computing Workshop FNAL, May Olof Bärring, CERN.
The ALICE Data-Acquisition Software Framework DATE V5 F. Carena, W. Carena, S. Chapeland, R. Divià, I. Makhlyueva, J-C. Marin, K. Schossmaier, C. Soós,
Roberto Divià, CERN/ALICE 1 CHEP 2009, Prague, March 2009 The ALICE Online Data Storage System Roberto Divià (CERN), Ulrich Fuchs (CERN), Irina Makhlyueva.
Graphing and statistics with Cacti AfNOG 11, Kigali/Rwanda.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
A.Golunov, “Remote operational center for CMS in JINR ”, XXIII International Symposium on Nuclear Electronics and Computing, BULGARIA, VARNA, September,
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Clara Gaspar, March 2005 LHCb Online & the Conditions DB.
CHEP 2000: 7-11 February, 2000 I. SfiligoiData Handling in KLOE 1 CHEP 2000 Data Handling in KLOE I.Sfiligoi INFN LNF, Frascati, Italy.
Lemon Monitoring Miroslav Siket, German Cancio, David Front, Maciej Stepniewski CERN-IT/FIO-FS LCG Operations Workshop Bologna, May 2005.
1 Oracle Enterprise Manager Slides from Dominic Gélinas CIS
Local Monitoring at SARA Ron Trompert SARA. Ganglia Monitors nodes for Load Memory usage Network activity Disk usage Monitors running jobs.
Online Software 8-July-98 Commissioning Working Group DØ Workshop S. Fuess Objective: Define for you, the customers of the Online system, the products.
News on GEM Readout with the SRS, DATE & AMORE
Monitoring and Managing Server Performance. Server Monitoring To become familiar with the server’s performance – typical behavior Prevent problems before.
Xrootd Monitoring and Control Harsh Arora CERN. Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module.
P. Vande Vyvre – CERN/PH CERN – January Research Theme 2: DAQ ARCHITECT – Jan 2011 P. Vande Vyvre – CERN/PH2 Current DAQ status: Large computing.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
DØ Online Workshop3-June-1999S. Fuess Online Computing Overview DØ Online Workshop 3-June-1999 Stu Fuess.
Management of the LHCb Online Network Based on SCADA System Guoming Liu * †, Niko Neufeld † * University of Ferrara, Italy † CERN, Geneva, Switzerland.
Monitoring Update David Lawrence, JLab Feb. 20, /20/14Online Monitoring Update -- David Lawrence1.
AliEn central services Costin Grigoras. Hardware overview  27 machines  Mix of SLC4, SLC5, Ubuntu 8.04, 8.10, 9.04  100 cores  20 KVA UPSs  2 * 1Gbps.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
03/09/2007http://pcalimonitor.cern.ch/1 Monitoring in ALICE Costin Grigoras 03/09/2007 WLCG Meeting, CHEP.
Monitoring for the ALICE O 2 Project 11 February 2016.
R.Divià, CERN/ALICE 1 ALICE off-line week, CERN, 9 September 2002 DAQ-HLT software interface.
DAQ & ConfDB Configuration DB workshop CERN September 21 st, 2005 Artur Barczyk & Niko Neufeld.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CC Monitoring I.Fedorko on behalf of CF/ASI 18/02/2011 Overview.
P. Vande Vyvre – CERN/PH for the ALICE collaboration CHEP – October 2010.
TIFR, Mumbai, India, Feb 13-17, GridView - A Grid Monitoring and Visualization Tool Rajesh Kalmady, Digamber Sonvane, Kislay Bhatt, Phool Chand,
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
WLCG Transfers monitoring EGI Technical Forum Madrid, 17 September 2013 Pablo Saiz on behalf of the Dashboard Team CERN IT/SDC.
Database 12.2 and Oracle Enterprise Manager 13c Liana LUPSA.
ALICE Computing Data Challenge VI
Application or server monitoring
System Monitoring with Lemon
CMS High Level Trigger Configuration Management
ALICE – First paper.
Offline shifter training tutorial
CERN-Russia Collaboration in CASTOR Development
Oracle Database Monitoring and beyond
Network Performance Insight PoC Performance Readout
Presentation transcript:

System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam

The ALICE Data Acquisition system ALICE at the CERN LHC Data Acquisition system requirements: 4 GB/s sustained recording rate 2.5 GB/s transfer to tape 15/10/2013 Adriana Telesca, CHEP /24

The ALICE Data Acquisition system For Run 2 ( ): ~ 1000 nodes Readout Event building Recording Storage Support (network, PDUs) Operations For Run 3 ( ): ~ 2000 nodes 15/10/2013 Adriana Telesca, CHEP /24

The ALICE Data Acquisition system For Run 2 ( ): ~ 1000 nodes Readout Event building Recording Storage Support (network, PDUs) Operations For Run 3 ( ): ~ 2000 nodes 15/10/2013 Adriana Telesca, CHEP /24 O2: a new combined online and offline computing for ALICE after 2018 P. Vande Vyvre’s talk today at 16:45 – Data Acquisition track

Lemon was used to monitor the DAQ system during Run 1 ( ). Decision to replace it: Lemon future unsure Tools with additional/new functionalities LHC Long Shutdown 1 Lemon 15/10/2013 Adriana Telesca, CHEP /24

ALICE DAQ monitoring system needs Low impact Extensibility/Flexibility Scalability 15/10/2013 Adriana Telesca, CHEP /24

ALICE DAQ monitoring system needs Full administration GUI Easy access to data Interface with other components ORTHOS Alarming system 15/10/2013 Adriana Telesca, CHEP /24

Parameters to monitor CPU Memory Disk usage Network Interfaces Processes Voltage Current Temperature Outlet status Disk status Ethernet: CPU utilization Memory utilization Cards temperature Fiber Channel: RX/TX ports rate Readout links Bytes In/Out DAQ XOFF, HLT XOFF Processes CPU and memory 15/10/2013 Adriana Telesca, CHEP /24

Shortlist Selection criteria: 1.SNMP 2.Logical grouping 3. Large user community 4. Distributed monitoring NameAgentSNMPSyslogWebAppData Storage MethodLicense Cacti NoYes Full ControlRRDtool, MySQLGPL Icinga SupportedVia plugin Full Control MySQL, PostgreSQL, Oracle DatabaseGPL Zabbix SupportedYes Full Control Oracle, MySQL, PostgreSQL, IBM DB2, SQLiteGPL Zenoss NoYes Full ControlZODB, MySQL, RRDtoolGPL + Splunk + MonALISA SupportedYes Full controlRaw filesCommercial 15/10/2013 Adriana Telesca, CHEP /24 Source:

NameData gathering GraphingTriggeringScalabilityData Storage Extensibilit y IcingaAgent 011 – up to 1000 hosts DB 2 CactiServer 201 – up to 1000 hosts RRDtool – DB 2 ZenossServer 112 – RRDtool – DB 1 ZabbixAgent or Server 212 – DB 2 SplunkAgent 212 – Raw files 2 MonALISAAgent 212 – DB 2 Tools comparison 0-1 Absent-Present Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP /24

NameData gathering GraphingTriggeringScalabilityData Storage Extensibilit y IcingaAgent 011 – up to 1000 hosts DB 2 CactiServer 201 – up to 1000 hosts RRDtool – DB 2 ZenossServer 112 – RRDtool – DB 1 ZabbixAgent or Server 212 – DB 2 SplunkAgent 212 – Raw files 2 MonALISAAgent 212 – DB 2 Tools comparison 0-1 Absent-Present Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP /24

NameData gathering GraphingTriggeringScalabilityData Storage Extensibilit y IcingaAgent 011 – up to 1000 hosts DB 2 CactiServer 201 – up to 1000 hosts RRDtool – DB 2 ZenossServer 112 – RRDtool – DB 1 ZabbixAgent or Server 212 – DB 2 SplunkAgent 212 – Raw files 2 MonALISAAgent 212 – DB 2 Tools comparison 0-1 Absent-Present Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP /24

NameData gathering GraphingTriggeringScalabilityData Storage Extensibilit y IcingaAgent 011 – up to 1000 hosts DB 2 CactiServer 201 – up to 1000 hosts RRDtool – DB 2 ZenossServer 112 – RRDtool – DB 1 ZabbixAgent or Server 212 – DB 2 SplunkAgent 212 – Raw files 2 MonALISAAgent 212 – DB 2 Tools comparison 0-1 Absent-Present Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP /24

NameData gathering GraphingTriggeringScalabilityData Storage Extensibilit y IcingaAgent 011 – up to 1000 hosts DB 2 CactiServer 201 – up to 1000 hosts RRDtool – DB 2 ZenossServer 112 – RRDtool – DB 1 ZabbixAgent or Server 212 – DB 2 SplunkAgent 212 – Raw files 2 MonALISAAgent 212 – DB 2 Tools comparison 0-1 Absent-Present Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP /24

Tools comparison NameSNMPCommunit y GranularityAuto Discovery Free Icinga minute /metric 21 Cacti minute / metric 11 Zenoss minute /collector 21 Zabbix No limit /metric 21 Splunk No limit / metric 20 MonALISA minute /metric Absent-Present Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP /24

Tools comparison NameSNMPCommunit y GranularityAuto Discovery Free Icinga minute /metric 21 Cacti minute / metric 11 Zenoss minute /collector 21 Zabbix No limit /metric 21 Splunk No limit / metric 20 MonALISA minute /metric Absent-Present Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP /24

Tools comparison NameSNMPCommunit y GranularityAuto Discovery Free Icinga minute /metric 21 Cacti minute / metric 11 Zenoss minute /collector 21 Zabbix No limit /metric 21 Splunk No limit / metric 20 MonALISA minute /metric Absent-Present Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP /24

Tools comparison NameSNMPCommunit y GranularityAuto Discovery Free Icinga minute /metric 21 Cacti minute / metric 11 Zenoss minute /collector 21 Zabbix No limit /metric 21 Splunk No limit / metric 20 MonALISA minute /metric Absent-Present Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP /24

Tools comparison NameSNMPCommunit y GranularityAuto Discovery Free Total Icinga minute /metric Cacti minute / metric Zenoss minute /collector Zabbix No limit /metric Splunk No limit / metric MonALISA minute /metric Absent-Present Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP /24

Tools comparison NameSNMPCommunit y GranularityAuto Discovery Free Total Icinga minute /metric Cacti minute / metric Zenoss minute /collector Zabbix No limit /metric Splunk No limit / metric MonALISA minute /metric Absent-Present Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP /24

Graphing Full configuration GUI Many ways of data retrieval  scalability Zabbix characteristics 15/10/2013 Adriana Telesca, CHEP /24

Zabbix characteristics 15/10/2013 Adriana Telesca, CHEP /24

Zabbix characteristics 15/10/2013 Adriana Telesca, CHEP /24

Zabbix characteristics 15/10/2013 Adriana Telesca, CHEP /24

Zabbix characteristics 15/10/2013 Adriana Telesca, CHEP /24

Zabbix characteristics 15/10/2013 Adriana Telesca, CHEP /24

Zabbix characteristics 15/10/2013 Adriana Telesca, CHEP /24

Zabbix footprint tests 15/10/2013 Adriana Telesca, CHEP /24

Zabbix footprint tests 15/10/2013 Adriana Telesca, CHEP /24

Zabbix footprint tests 15/10/2013 Adriana Telesca, CHEP /24

Zabbix footprint tests 15/10/2013 Adriana Telesca, CHEP /24

Zabbix dashboard and usage 15/10/2013 Adriana Telesca, CHEP /24

Zabbix dashboard and usage 15/10/2013 Adriana Telesca, CHEP /24

Zabbix dashboard and usage 15/10/2013 Adriana Telesca, CHEP /24

Zabbix dashboard and usage 15/10/2013 Adriana Telesca, CHEP /24

The evaluation of different monitoring tools resulted in the selection of Zabbix. Zabbix meets the ALICE DAQ needs. Zabbix will be in production for Run 2. Conclusion 15/10/2013 Adriana Telesca, CHEP /24

Thanks. Questions?