Presentation is loading. Please wait.

Presentation is loading. Please wait.

DIAMON. What is DIAMON ? Technology stack Current Situation & Plans.

Similar presentations


Presentation on theme: "DIAMON. What is DIAMON ? Technology stack Current Situation & Plans."— Presentation transcript:

1 DIAMON

2 What is DIAMON ? Technology stack Current Situation & Plans

3 What is DIAMON? A service which allows to check what IS and what WAS going on.

4 The User Interface Informs on Health of computers / applications / equipment Computer/ application problem logs Host events: installations, restarts, reboots Responsible of computers Allows to Restart software or computer Subscribe to mail/sms notifications Introspect remotely processes JMX/CMX History Created in 2006 based on LASER-1 technology DIAMON V.2 based on C2Mon since 2012 Targeted at Developers + Operators (+ Service Managers)

5 The User Interface Shows general information on entity: responsible, operational, location Shows metrics and their health: computer, FECs, workstations, PLCs, module driver status, timing board status, process up/down Allows access to do diagnostic Introspect remotely processes JMX/CMX Check FIP Board status Check loaded drivers for modules Check Timing board state Check Host events : installations, restarts, reboots Check Problem logs Check running processes Actions: Restart software or computer Ping computer, SSH to machine Subscribe to mail/sms notifications Targeted at Developers + Operators (+ Service Managers)

6 What do we monitor ? Applications: metrics via JMX/RDA/CMX + up/down state Computers: FECs, Workstations, servers, VMs CPU/disk/uptime/network/ SNMP devices, e.g. WhiteRabbit Dedicated checks, e.g. JMS service performance Timing Board status Module Driver Status (correct/incorrect) FIP Module status 3400 computers 6880 processes 152.000 metrics 3400 computers 6880 processes 152.000 metrics

7 Entities are organized in tree structure Details Tools to filter Groups / Hosts

8 Find computer/process/metrics quickly HOSTSProcesesMetrics

9 Problem Overview for Host

10 Metric Details for Host

11 Host Events : restarts, reboots, installations..

12 Shows logs from Tracing

13 Custom Extension s Module Driver Status FIP board state

14 Remote “live” introspection of Java/C/C++ Processes C/C++ Services Java Services

15 Get Responsible quickly

16 Enable your notifications

17 Notification Example Alert: IO Wait on computer too high IO Wait back to OK

18 Manage your notifications If in WARN/ERROR this sends value change notification Filter items EGroup Edit Mode Notification Level Enable regular problem report

19 DIAMON GUI as portal to more information JAPC ToolboxCCDB Tracing CMWAdmin CCDB DIAMON Configuration Only for JMX & CLIC …

20 How is DIAMON used GUI Operators, Equipment specialists, Service Managers User specific configurations = “views” (CCDB) Backend Notify on problems: > 90 Messages/day, 29 users Applications read data from DIAMON via RDA (i.e. MOON) Host alarms are expected by OP FECs host metrics in Lemon Driver Feedback FIP problems

21 Technology Stack

22 CERN Control and Monitoring Platform Acquisition and filtering of metrics Evaluation of rules and alarms Provides historical data and replay Extensible and modular The C2MON Project C2MON Server DAQ API Client API myApp myDAQ Acquisition Filtering Business Client Apps 22 myMod

23 CLICRDA LASERJMX DIAMON (C2Mon) CCDB RDALASER History ~150.000 datapoints ~8271 equipments 24 DAQs for acquisition & filtering PING 1 Server Lemon MOONLemon Acquisition & Filtering > 15 Equipment types > 26 Million updates/day > 26 Million updates/day Business Logic Layer > 150k data points > 300k alarms > 20K commands > 50K business rules > 28 Million updates/day > 70 configurations/day DIAMON GUI TRACING LASER configures C2MON 4 Outputs SNMP

24 The CLIC Agent C++ Process on every machine Compatible with Windows/Linux down to PPC4 Accessible via JMS Java API or /mcr/bin/dmnsh command Sends metrics every x seconds Extensions: Allows access to Timing board state Allows access to CMX enabled C/C++ processes Allows access to FIP Bus Only client-side acquisition of metrics CLIC communication via STOMP

25 What is good.. System is used by large user group on daily basis Acquisition and standardizing information from various sources (RDA, JMX, SNMP, HOST metrics…) C2Mon platform is quite reliable & evolving in right direction Online reconfiguration is VERY useful Mail/sms Notification very useful for pre-failure detection What is not so good: CCDB driven (Re)configuration SQL based history limited to 3 months & slow (C2Mon) Limited rule evaluation (C2Mon) Notification messages sometimes not really clear Some information is still missing compared to e.g. lemon Notifications or triggering rules cannot be easily trained

26 General Experience Sometimes not clear who is responsible for certain problem People have most experience, not the system We can show the responsible for host, but what about a module or timing card problem ? No information on responsible for deployed software process No concept of acknowledging a problem Currently rather host, less “service” oriented Notifications cannot be trained

27 Plans & Dreams Move configuration part away from CCDB Make information easy accessible: why only TN? Replace SQL history storage Enable sending/receiving data via REST C2MON Replace JCache implementation (Terracotta -> Ignite?) Find better solution for rule engine DIAMON@grafana

28 Appendix

29 CERN Control and Monitoring Platform Acquisition and filtering of metrics Evaluation of rules and alarms Provides historical data and replay Extensible and modular The C2MON Project C2MON Server DAQ API Client API myApp myDAQ Acquisition Filtering Business Client Apps 29 myMod

30 C2MON in Action DAQ C2MON Server Terracotta … C2MON Server DAQ … Access Dashboard Data Analysis Terracotta Standby 30 Removes database dependency Allows scaling horizontally Client JMX SNMP Alarm Web Interface

31 Technologies used for C2MON C2MON server: No J2EE server and only open source!  Java 7, Spring 4.2  persistence framework: MyBATIS (server), Hibernate (client) Dependency management through Maven Database: Oracle, but no major dependencies. Works also with HSQL, MySQL Middleware: JMS ActiveMQ Message transport format: XML and JSON Remote caching solution for C2MON server cluster: Terracotta/Ehcache, (Apache Ignite foreseen)  horizontally scalable  proven technology  open source  support contract possible


Download ppt "DIAMON. What is DIAMON ? Technology stack Current Situation & Plans."

Similar presentations


Ads by Google