System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam
The ALICE Data Acquisition system ALICE at the CERN LHC Data Acquisition system requirements: 4 GB/s sustained recording rate 2.5 GB/s transfer to tape 15/10/2013 Adriana Telesca, CHEP /24
The ALICE Data Acquisition system For Run 2 ( ): ~ 1000 nodes Readout Event building Recording Storage Support (network, PDUs) Operations For Run 3 ( ): ~ 2000 nodes 15/10/2013 Adriana Telesca, CHEP /24
The ALICE Data Acquisition system For Run 2 ( ): ~ 1000 nodes Readout Event building Recording Storage Support (network, PDUs) Operations For Run 3 ( ): ~ 2000 nodes 15/10/2013 Adriana Telesca, CHEP /24 O2: a new combined online and offline computing for ALICE after 2018 P. Vande Vyvre’s talk today at 16:45 – Data Acquisition track
Lemon was used to monitor the DAQ system during Run 1 ( ). Decision to replace it: Lemon future unsure Tools with additional/new functionalities LHC Long Shutdown 1 Lemon 15/10/2013 Adriana Telesca, CHEP /24
ALICE DAQ monitoring system needs Low impact Extensibility/Flexibility Scalability 15/10/2013 Adriana Telesca, CHEP /24
ALICE DAQ monitoring system needs Full administration GUI Easy access to data Interface with other components ORTHOS Alarming system 15/10/2013 Adriana Telesca, CHEP /24
Parameters to monitor CPU Memory Disk usage Network Interfaces Processes Voltage Current Temperature Outlet status Disk status Ethernet: CPU utilization Memory utilization Cards temperature Fiber Channel: RX/TX ports rate Readout links Bytes In/Out DAQ XOFF, HLT XOFF Processes CPU and memory 15/10/2013 Adriana Telesca, CHEP /24
Shortlist Selection criteria: 1.SNMP 2.Logical grouping 3. Large user community 4. Distributed monitoring NameAgentSNMPSyslogWebAppData Storage MethodLicense Cacti NoYes Full ControlRRDtool, MySQLGPL Icinga SupportedVia plugin Full Control MySQL, PostgreSQL, Oracle DatabaseGPL Zabbix SupportedYes Full Control Oracle, MySQL, PostgreSQL, IBM DB2, SQLiteGPL Zenoss NoYes Full ControlZODB, MySQL, RRDtoolGPL + Splunk + MonALISA SupportedYes Full controlRaw filesCommercial 15/10/2013 Adriana Telesca, CHEP /24 Source:
NameData gathering GraphingTriggeringScalabilityData Storage Extensibilit y IcingaAgent 011 – up to 1000 hosts DB 2 CactiServer 201 – up to 1000 hosts RRDtool – DB 2 ZenossServer 112 – RRDtool – DB 1 ZabbixAgent or Server 212 – DB 2 SplunkAgent 212 – Raw files 2 MonALISAAgent 212 – DB 2 Tools comparison 0-1 Absent-Present Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP /24
NameData gathering GraphingTriggeringScalabilityData Storage Extensibilit y IcingaAgent 011 – up to 1000 hosts DB 2 CactiServer 201 – up to 1000 hosts RRDtool – DB 2 ZenossServer 112 – RRDtool – DB 1 ZabbixAgent or Server 212 – DB 2 SplunkAgent 212 – Raw files 2 MonALISAAgent 212 – DB 2 Tools comparison 0-1 Absent-Present Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP /24
NameData gathering GraphingTriggeringScalabilityData Storage Extensibilit y IcingaAgent 011 – up to 1000 hosts DB 2 CactiServer 201 – up to 1000 hosts RRDtool – DB 2 ZenossServer 112 – RRDtool – DB 1 ZabbixAgent or Server 212 – DB 2 SplunkAgent 212 – Raw files 2 MonALISAAgent 212 – DB 2 Tools comparison 0-1 Absent-Present Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP /24
NameData gathering GraphingTriggeringScalabilityData Storage Extensibilit y IcingaAgent 011 – up to 1000 hosts DB 2 CactiServer 201 – up to 1000 hosts RRDtool – DB 2 ZenossServer 112 – RRDtool – DB 1 ZabbixAgent or Server 212 – DB 2 SplunkAgent 212 – Raw files 2 MonALISAAgent 212 – DB 2 Tools comparison 0-1 Absent-Present Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP /24
NameData gathering GraphingTriggeringScalabilityData Storage Extensibilit y IcingaAgent 011 – up to 1000 hosts DB 2 CactiServer 201 – up to 1000 hosts RRDtool – DB 2 ZenossServer 112 – RRDtool – DB 1 ZabbixAgent or Server 212 – DB 2 SplunkAgent 212 – Raw files 2 MonALISAAgent 212 – DB 2 Tools comparison 0-1 Absent-Present Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP /24
Tools comparison NameSNMPCommunit y GranularityAuto Discovery Free Icinga minute /metric 21 Cacti minute / metric 11 Zenoss minute /collector 21 Zabbix No limit /metric 21 Splunk No limit / metric 20 MonALISA minute /metric Absent-Present Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP /24
Tools comparison NameSNMPCommunit y GranularityAuto Discovery Free Icinga minute /metric 21 Cacti minute / metric 11 Zenoss minute /collector 21 Zabbix No limit /metric 21 Splunk No limit / metric 20 MonALISA minute /metric Absent-Present Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP /24
Tools comparison NameSNMPCommunit y GranularityAuto Discovery Free Icinga minute /metric 21 Cacti minute / metric 11 Zenoss minute /collector 21 Zabbix No limit /metric 21 Splunk No limit / metric 20 MonALISA minute /metric Absent-Present Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP /24
Tools comparison NameSNMPCommunit y GranularityAuto Discovery Free Icinga minute /metric 21 Cacti minute / metric 11 Zenoss minute /collector 21 Zabbix No limit /metric 21 Splunk No limit / metric 20 MonALISA minute /metric Absent-Present Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP /24
Tools comparison NameSNMPCommunit y GranularityAuto Discovery Free Total Icinga minute /metric Cacti minute / metric Zenoss minute /collector Zabbix No limit /metric Splunk No limit / metric MonALISA minute /metric Absent-Present Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP /24
Tools comparison NameSNMPCommunit y GranularityAuto Discovery Free Total Icinga minute /metric Cacti minute / metric Zenoss minute /collector Zabbix No limit /metric Splunk No limit / metric MonALISA minute /metric Absent-Present Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP /24
Graphing Full configuration GUI Many ways of data retrieval scalability Zabbix characteristics 15/10/2013 Adriana Telesca, CHEP /24
Zabbix characteristics 15/10/2013 Adriana Telesca, CHEP /24
Zabbix characteristics 15/10/2013 Adriana Telesca, CHEP /24
Zabbix characteristics 15/10/2013 Adriana Telesca, CHEP /24
Zabbix characteristics 15/10/2013 Adriana Telesca, CHEP /24
Zabbix characteristics 15/10/2013 Adriana Telesca, CHEP /24
Zabbix characteristics 15/10/2013 Adriana Telesca, CHEP /24
Zabbix footprint tests 15/10/2013 Adriana Telesca, CHEP /24
Zabbix footprint tests 15/10/2013 Adriana Telesca, CHEP /24
Zabbix footprint tests 15/10/2013 Adriana Telesca, CHEP /24
Zabbix footprint tests 15/10/2013 Adriana Telesca, CHEP /24
Zabbix dashboard and usage 15/10/2013 Adriana Telesca, CHEP /24
Zabbix dashboard and usage 15/10/2013 Adriana Telesca, CHEP /24
Zabbix dashboard and usage 15/10/2013 Adriana Telesca, CHEP /24
Zabbix dashboard and usage 15/10/2013 Adriana Telesca, CHEP /24
The evaluation of different monitoring tools resulted in the selection of Zabbix. Zabbix meets the ALICE DAQ needs. Zabbix will be in production for Run 2. Conclusion 15/10/2013 Adriana Telesca, CHEP /24
Thanks. Questions?