Presentation is loading. Please wait.

Presentation is loading. Please wait.

System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam.

Similar presentations


Presentation on theme: "System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam."— Presentation transcript:

1 System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam

2 The ALICE Data Acquisition system ALICE at the CERN LHC Data Acquisition system requirements: 4 GB/s sustained recording rate 2.5 GB/s transfer to tape 15/10/2013 Adriana Telesca, CHEP 2013 2/24

3 The ALICE Data Acquisition system For Run 2 (2015-2017): ~ 1000 nodes Readout Event building Recording Storage Support (network, PDUs) Operations For Run 3 (2019-2021): ~ 2000 nodes 15/10/2013 Adriana Telesca, CHEP 2013 3/24

4 The ALICE Data Acquisition system For Run 2 (2015-2017): ~ 1000 nodes Readout Event building Recording Storage Support (network, PDUs) Operations For Run 3 (2019-2021): ~ 2000 nodes 15/10/2013 Adriana Telesca, CHEP 2013 3/24 O2: a new combined online and offline computing for ALICE after 2018 P. Vande Vyvre’s talk today at 16:45 – Data Acquisition track

5 Lemon was used to monitor the DAQ system during Run 1 (2008-2013). Decision to replace it: Lemon future unsure Tools with additional/new functionalities LHC Long Shutdown 1 Lemon 15/10/2013 Adriana Telesca, CHEP 2013 4/24

6 ALICE DAQ monitoring system needs Low impact Extensibility/Flexibility Scalability 15/10/2013 Adriana Telesca, CHEP 2013 5/24

7 ALICE DAQ monitoring system needs Full administration GUI Easy access to data Interface with other components ORTHOS Alarming system 15/10/2013 Adriana Telesca, CHEP 2013 6/24

8 Parameters to monitor CPU Memory Disk usage Network Interfaces Processes Voltage Current Temperature Outlet status Disk status Ethernet: CPU utilization Memory utilization Cards temperature Fiber Channel: RX/TX ports rate Readout links Bytes In/Out DAQ XOFF, HLT XOFF Processes CPU and memory 15/10/2013 Adriana Telesca, CHEP 2013 7/24

9 Shortlist Selection criteria: 1.SNMP 2.Logical grouping 3. Large user community 4. Distributed monitoring NameAgentSNMPSyslogWebAppData Storage MethodLicense Cacti NoYes Full ControlRRDtool, MySQLGPL Icinga SupportedVia plugin Full Control MySQL, PostgreSQL, Oracle DatabaseGPL Zabbix SupportedYes Full Control Oracle, MySQL, PostgreSQL, IBM DB2, SQLiteGPL Zenoss NoYes Full ControlZODB, MySQL, RRDtoolGPL + Splunk + MonALISA SupportedYes Full controlRaw filesCommercial 15/10/2013 Adriana Telesca, CHEP 2013 8/24 Source: http://en.wikipedia.org/wikiComparison_of_network_monitoring_systems

10 NameData gathering GraphingTriggeringScalabilityData Storage Extensibilit y IcingaAgent 011 – up to 1000 hosts DB 2 CactiServer 201 – up to 1000 hosts RRDtool – DB 2 ZenossServer 112 – 1000+ RRDtool – DB 1 ZabbixAgent or Server 212 – 1000+ DB 2 SplunkAgent 212 – 1000+ Raw files 2 MonALISAAgent 212 – 1000+ DB 2 Tools comparison 0-1 Absent-Present 0-1-2 Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP 2013 9/24

11 NameData gathering GraphingTriggeringScalabilityData Storage Extensibilit y IcingaAgent 011 – up to 1000 hosts DB 2 CactiServer 201 – up to 1000 hosts RRDtool – DB 2 ZenossServer 112 – 1000+ RRDtool – DB 1 ZabbixAgent or Server 212 – 1000+ DB 2 SplunkAgent 212 – 1000+ Raw files 2 MonALISAAgent 212 – 1000+ DB 2 Tools comparison 0-1 Absent-Present 0-1-2 Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP 2013 9/24

12 NameData gathering GraphingTriggeringScalabilityData Storage Extensibilit y IcingaAgent 011 – up to 1000 hosts DB 2 CactiServer 201 – up to 1000 hosts RRDtool – DB 2 ZenossServer 112 – 1000+ RRDtool – DB 1 ZabbixAgent or Server 212 – 1000+ DB 2 SplunkAgent 212 – 1000+ Raw files 2 MonALISAAgent 212 – 1000+ DB 2 Tools comparison 0-1 Absent-Present 0-1-2 Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP 2013 9/24

13 NameData gathering GraphingTriggeringScalabilityData Storage Extensibilit y IcingaAgent 011 – up to 1000 hosts DB 2 CactiServer 201 – up to 1000 hosts RRDtool – DB 2 ZenossServer 112 – 1000+ RRDtool – DB 1 ZabbixAgent or Server 212 – 1000+ DB 2 SplunkAgent 212 – 1000+ Raw files 2 MonALISAAgent 212 – 1000+ DB 2 Tools comparison 0-1 Absent-Present 0-1-2 Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP 2013 9/24

14 NameData gathering GraphingTriggeringScalabilityData Storage Extensibilit y IcingaAgent 011 – up to 1000 hosts DB 2 CactiServer 201 – up to 1000 hosts RRDtool – DB 2 ZenossServer 112 – 1000+ RRDtool – DB 1 ZabbixAgent or Server 212 – 1000+ DB 2 SplunkAgent 212 – 1000+ Raw files 2 MonALISAAgent 212 – 1000+ DB 2 Tools comparison 0-1 Absent-Present 0-1-2 Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP 2013 9/24

15 Tools comparison NameSNMPCommunit y GranularityAuto Discovery Free Icinga 221 - 1 minute /metric 21 Cacti 221 - 1 minute / metric 11 Zenoss 111 - 1 minute /collector 21 Zabbix 222 - No limit /metric 21 Splunk 222 - No limit / metric 20 MonALISA 211 - 1 minute /metric 21 0-1 Absent-Present 0-1-2 Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP 2013 10/24

16 Tools comparison NameSNMPCommunit y GranularityAuto Discovery Free Icinga 221 - 1 minute /metric 21 Cacti 221 - 1 minute / metric 11 Zenoss 111 - 1 minute /collector 21 Zabbix 222 - No limit /metric 21 Splunk 222 - No limit / metric 20 MonALISA 211 - 1 minute /metric 21 0-1 Absent-Present 0-1-2 Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP 2013 10/24

17 Tools comparison NameSNMPCommunit y GranularityAuto Discovery Free Icinga 221 - 1 minute /metric 21 Cacti 221 - 1 minute / metric 11 Zenoss 111 - 1 minute /collector 21 Zabbix 222 - No limit /metric 21 Splunk 222 - No limit / metric 20 MonALISA 211 - 1 minute /metric 21 0-1 Absent-Present 0-1-2 Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP 2013 10/24

18 Tools comparison NameSNMPCommunit y GranularityAuto Discovery Free Icinga 221 - 1 minute /metric 21 Cacti 221 - 1 minute / metric 11 Zenoss 111 - 1 minute /collector 21 Zabbix 222 - No limit /metric 21 Splunk 222 - No limit / metric 20 MonALISA 211 - 1 minute /metric 21 0-1 Absent-Present 0-1-2 Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP 2013 10/24

19 Tools comparison NameSNMPCommunit y GranularityAuto Discovery Free Total Icinga 221 - 1 minute /metric 21 12 Cacti 221 - 1 minute / metric 11 12 Zenoss 111 - 1 minute /collector 21 11 Zabbix 222 - No limit /metric 21 16 Splunk 222 - No limit / metric 20 15 MonALISA 211 - 1 minute /metric 21 14 0-1 Absent-Present 0-1-2 Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP 2013 11/24

20 Tools comparison NameSNMPCommunit y GranularityAuto Discovery Free Total Icinga 221 - 1 minute /metric 21 12 Cacti 221 - 1 minute / metric 11 12 Zenoss 111 - 1 minute /collector 21 11 Zabbix 222 - No limit /metric 21 16 Splunk 222 - No limit / metric 20 15 MonALISA 211 - 1 minute /metric 21 14 0-1 Absent-Present 0-1-2 Absent - Present but not good - Good 15/10/2013 Adriana Telesca, CHEP 2013 11/24

21 Graphing Full configuration GUI Many ways of data retrieval  scalability Zabbix characteristics 15/10/2013 Adriana Telesca, CHEP 2013 12/24

22 Zabbix characteristics 15/10/2013 Adriana Telesca, CHEP 2013 13/24

23 Zabbix characteristics 15/10/2013 Adriana Telesca, CHEP 2013 14/24

24 Zabbix characteristics 15/10/2013 Adriana Telesca, CHEP 2013 14/24

25 Zabbix characteristics 15/10/2013 Adriana Telesca, CHEP 2013 14/24

26 Zabbix characteristics 15/10/2013 Adriana Telesca, CHEP 2013 14/24

27 Zabbix characteristics 15/10/2013 Adriana Telesca, CHEP 2013 14/24

28 Zabbix footprint tests 15/10/2013 Adriana Telesca, CHEP 2013 15/24

29 Zabbix footprint tests 15/10/2013 Adriana Telesca, CHEP 2013 16/24

30 Zabbix footprint tests 15/10/2013 Adriana Telesca, CHEP 2013 17/24

31 Zabbix footprint tests 15/10/2013 Adriana Telesca, CHEP 2013 18/24

32 Zabbix dashboard and usage 15/10/2013 Adriana Telesca, CHEP 2013 19/24

33 Zabbix dashboard and usage 15/10/2013 Adriana Telesca, CHEP 2013 20/24

34 Zabbix dashboard and usage 15/10/2013 Adriana Telesca, CHEP 2013 21/24

35 Zabbix dashboard and usage 15/10/2013 Adriana Telesca, CHEP 2013 22/24

36 The evaluation of different monitoring tools resulted in the selection of Zabbix. Zabbix meets the ALICE DAQ needs. Zabbix will be in production for Run 2. Conclusion 15/10/2013 Adriana Telesca, CHEP 2013 23/24

37 Thanks. Questions?


Download ppt "System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam."

Similar presentations


Ads by Google