Presentation is loading. Please wait.

Presentation is loading. Please wait.

ACET Accelerator Controls Exploitation Tools Progress and plans, December 2012.

Similar presentations


Presentation on theme: "ACET Accelerator Controls Exploitation Tools Progress and plans, December 2012."— Presentation transcript:

1 ACET Accelerator Controls Exploitation Tools Progress and plans, December 2012

2 Outline  Controls system overview  Motivation and purpose  Focus points  2013  Conclusions ACET - TC on 06 December 2012 2

3 3 Controls system overview Knobs Services “Core”Diagnostics Applications Middletier Front Ends Sequencer Orbit InCA/LSA Proxies JMS SIS CMW/FESA Timing Drivers DB Boot NFS cmwDir RBAC DiaMon cmwAdmin FESA Navigator Video Syslog Hardware Tune RT 425 Consoles 400 GUIs 300 Servers 200 Java servers 1300 FECs 600 module types 85.000 devices

4 Outline  Controls system overview  Motivation and purpose  Focus points  2013  Conclusions ACET - TC on 06 December 2012 4

5 ACET  Motivation  Distributed and complex controls system  Knowledge distributed over many experts  Move towards uniform (LHC) exploitation model across machines  Purpose: Allow (non-)experts to carry out more efficient diagnostics  ACET collaborates with CO projects to improve diagnostic facilities of the control system ACET - TC on 06 December 2012 5

6 Outline  Controls system overview  Motivation and purpose  Focus points  2013  Conclusions ACET - TC on 06 December 2012 6

7 Focus points  Diagnostic Tools – aggregation and training  Process metrics – JMX & CMX  DiaMon – GUI and CLIC agent  Documentation  Wiki/site structure, Portal and Useful links  Dynamic/runtime dependencies  Feedback – Tracing & Config  message format, transport, analysis  Trace analysis using Splunk  Config analysis in CCDB ACET - TC on 06 December 2012 7

8 Diagnostic tools  Tools evaluated for criticality  Aggregation into CCM diagnostic menu  Training given during shutdown lectures ACET - TC on 06 December 2012 8

9 Focus points  Diagnostic Tools – aggregation and training  Process metrics – JMX & CMX  DiaMon – GUI and clic agent  Documentation  Wiki/site structure, Portal and Useful links  Dynamic/runtime dependencies  Feedback – Tracing & Config  message format, transport, analysis  Trace analysis using Splunk  Config analysis in CCDB ACET - TC on 06 December 2012 9

10 Process Metrics – JMX architecture  http://wikis/display/ACET/JMX+client+instrumentation C2Mon SRV JMX-DAQ DiaMon GUI Metrics RMI JMX mBeans JMX viewer JmxDirectory jConsole jar1 jar2 mgt JVM jmx-dir-client jVisualVM SRV ACET - TC on 06 December 2012 10

11 Process metrics – CMX architecture  http://wikis/display/MW/CMX C2Mon CLIC-DAQ DiaMon GUI lib1 lib2 p1 lib1lib2 cmx-lib-c shared memory segments C process p1 cmx-lib registry lib3lib4 cmx-lib-c++ C++ process p2 lib3 lib4 p2 cmx-lib-c++ CLIC agent CMX viewer ACET - TC on 06 December 2012 11 FEC Command line tool DB Metrics

12 Process metrics – DiaMon JMX integration ACET - TC on 06 December 2012 12

13 Process metrics - jConsole ACET - TC on 06 December 2012 13

14 Process metrics - Viewers ACET - TC on 06 December 2012 14

15 Process metrics – JMX lookup ACET - TC on 06 December 2012 15

16 Focus points  Diagnostic Tools – aggregation and training  Process metrics – JMX & CMX  DiaMon – GUI and clic agent  Documentation  Wiki/site structure, Portal and Useful links  Dynamic/runtime dependencies  Feedback – Tracing & Config  message format, transport, analysis  Trace analysis using Splunk  Config analysis in CCDB ACET - TC on 06 December 2012 16

17 Documentation - Structure ACET - TC on 06 December 2012 17

18 Documentation – Portal ACET - TC on 06 December 2012 18

19 Documentation – Useful links ACET - TC on 06 December 2012 19

20 Focus points  Diagnostic Tools – aggregation and training  Process metrics – JMX & CMX  DiaMon – GUI and clic agent  Documentation  Wiki/site structure, Portal and Useful links  Dynamic/runtime dependencies  Feedback – Tracing & Config  message format, transport, analysis  Trace analysis using Splunk  Config analysis in CCDB ACET - TC on 06 December 2012 20

21 Dependencies - architecture FEC cmwadmin-scanner Visualization client connections cmwAdmin CMW/FESA Dependency analysis FEC cmwDirectory “dot” files log files ACET - TC on 06 December 2012 21  Data collection before LS1 http://wikis/display/MW/Statistics

22 Dependencies – a view ACET - TC on 06 December 2012 22

23 Dependencies – a view ACET - TC on 06 December 2012 23 http://wikis/display/MW/Statistics Face FecBook

24 Focus points  Diagnostic Tools – aggregation and training  Process metrics – JMX & CMX  DiaMon – GUI and clic agent  Documentation  Wiki/site structure, Portal and Useful links  Dynamic/runtime dependencies  Feedback – Tracing & Config  message format, transport, analysis  Trace analysis using Splunk  Config analysis in CCDB ACET - TC on 06 December 2012 24

25 Feedback – architecture  http://wikis/display/MW/Log+and+Tracing JMS@cs-ccr-tracing cmw-fb-c C process cmw FESA3 cmw-log CCDB cmw-log4j Java process jar1jar2 ACET - TC on 06 December 2012 25 Listeners GUIs C process syslog@cs-ccr-feop syslog@cs-ccr-tracing /var/log/messages FEC/SRV JMS@cs-ccr-cmw Syslog tracing APEX GUIs Splunk syslog converters Java tracing Tracing & Config libs logfiles Impl make Scripts cmmnbld deploy wreboot

26 Feedback – CCDB tracing GUI ACET - TC on 06 December 2012 26

27 Feedback – Hardware config CCDB GUI ACET - TC on 06 December 2012 27

28 Splunk - architecture  Central instance running on dedicated machine  Project accounts set up  Training given to projects  Project-specific searches created FEC JMS@cs-ccr-tracing FEC Splunk@cs-ccr-tracing syslog@cs-ccr-feop syslog@cs-ccr-tracing /var/log/messages FEC JMS@cs-ccr-cmw FEC SRV logfiles ACET - TC on 06 December 2012 28 Contact Steen for Splunk access FEC filter&throttle logfiles cmw-log SRV cmw-log4j filters

29 Splunk – Message filter GUI ACET - TC on 06 December 2012 29

30 Splunk – saved searches ACET - TC on 06 December 2012 30

31 Splunk - visualization ACET - TC on 06 December 2012 31

32 Splunk – dashboard ACET - TC on 06 December 2012 32

33 Splunk – Use case: japc-ext-dir  Queue overflow messages from CMW proxy  Hosts and PIDs reported  Client application identified  japc-ext-dir suspected – and verified  Subscriptions made to “constant” properties  Data never consumed => Queue overflow in proxy  Problem fixed by Eric ACET - TC on 06 December 2012 33

34 Splunk – Use cases  Leap second  RBAC tokens missing/malformed/expired  CMW slow clients  Telegram layout and configuration  JAPC applying wrong token in certain cases  FESA handling of Timlib error  Separating test environment from operational ACET - TC on 06 December 2012 34

35 Splunk – Comments (1)  “Proper usage requires very good configuration”  “We need to rework our way to log information…”  “Log files are a bit of a mess now, and only contain a sub-set of necessary data…it is necessary to clean up and extend logging…”  “…it must be possible for others to access the data…” ACET - TC on 06 December 2012 35

36 Splunk – Comments (2) ACET - TC on 06 December 2012 36  Positive comments  “Powerful tool for detecting and reporting anomalies”  “Very useful for proactive actions”  “Powerful tool to make statistics”  “It avoids spending time creating tools for decoding traces”  “It is an agile way to gather analytics, to inform design decisions”  “It is a very powerful auditing tool”  “Trends over time allow spotting new types of problems”  “It was useful for me several times for seeing if a problem is on one or multiple machines”  “It gives an easy, reusable way of looking at logfiles”  “It could become a valuable tool to spot errors, where currently we feel blind whenever there is a problem”

37 Splunk – vision  Active, daily use by component providers - Dashboards  Exploit tracing for  Pro-active operation  Informed evolution  Preventive maintenance  10 user-friendly message types per project  ERROR or WARNING  Contact information  Link to documentation  Message body meaningful to non-expert  No java stack trace  Continuous improvement of messages ACET - TC on 06 December 2012 37

38 Outline  Controls system overview  Motivation and purpose  Focus points  2013  Conclusions ACET - TC on 06 December 2012 38

39 Plans for 2013 (a)  DiaMon  Interactive service-oriented dependency view  Declare and monitor process metrics  Integrate metrics viewers  Launching of external tools  Make contact information accessible  Splunk  Improve current setup and configurations  Increase support and project uptake  Investigate integration of ITAT ACET - TC on 06 December 2012 39

40 Plans for 2013 (b)  Documentation  Agree/implement CO-wide website/wiki structure  Agree on maintenance responsibilities  Portal – review, add and extend pages  Content – all projects provide ½-page description  Databases  Finalize Hardware Configuration Feedback mechanisms  Capturing version information, detecting time bombs  Update contact information ACET - TC on 06 December 2012 40

41 Plans for 2013 (c)  Feedback (Tracing and Configuration)  Improve message quality (structure, content, level)  Increase project usage of feedback API  All projects review configuration/version feedback  Process metrics  Work with projects to expose metrics  Extend CMX (commands,…) ?  MW team take over jmxDirectory ACET - TC on 06 December 2012 41

42 Plans for 2013 (d)  Runtime dependency data  Analysis and visualization of CMW data  Collecting network connection information  Drivers  Finalize hardware configuration feedback  Version feedback implementation ACET - TC on 06 December 2012 42

43 Outline  Controls system overview  Motivation and purpose  Focus points  2013  Conclusions ACET - TC on 06 December 2012 43

44 Conclusions  Done  Means for provision/transport of tracing, configuration and metrics  Centralized Tracing and analysis  Todo  Data generation by projects  Documentation  Analysis and presentation  Good support from projects in 2012, but…  Too many other priorities for developers – and for me…  2013 is for bringing the pieces together ACET - TC on 06 December 2012 44 ACET needs time from all projects in 2013


Download ppt "ACET Accelerator Controls Exploitation Tools Progress and plans, December 2012."

Similar presentations


Ads by Google