Presentation is loading. Please wait.

Presentation is loading. Please wait.

MONITORING WITH MONALISA Costin Grigoras. M ON ALISA COMMUNICATION ARCHITECTURE MonALISA software components and the connections between them Data consumers.

Similar presentations


Presentation on theme: "MONITORING WITH MONALISA Costin Grigoras. M ON ALISA COMMUNICATION ARCHITECTURE MonALISA software components and the connections between them Data consumers."— Presentation transcript:

1 MONITORING WITH MONALISA Costin Grigoras

2 M ON ALISA COMMUNICATION ARCHITECTURE MonALISA software components and the connections between them Data consumers Multiplexing layer Helps firewalled endpoints connect Registration and discovery JINI-Lookup Services Secure & Public MonALISA services Proxies Clients HL services Agents Network of Data gathering services Fully Distributed System with no Single Point of Failure 2 26.01.2012 - T1/T2 ALICE Workshop @ Karlsruhe

3 ALICE G RID MONITORING ARCHITECTURE Monitoring follows the general AliEn layout: one service per site collects and aggregates site-local monitoring information 3 Long History DB LCG Tools MonALISA AliEn Site ApMon AliEn Job Agent ApMon AliEn Job Agent ApMon AliEn Job Agent MonALISA @CERN MonALISA LCG Site ApMon AliEn CE ApMon AliEn SE ApMon Cluster Monitor ApMon AliEn TQ ApMon AliEn Job Agent ApMon AliEn Job Agent ApMon AliEn Job Agent ApMon AliEn CE ApMon AliEn SE ApMon Cluster Monitor ApMon AliEn IS ApMon AliEn Optimizers ApMon AliEn Brokers ApMon MySQL Servers ApMon CastorGrid Scripts ApMon API Services MonALISARepository Aggregated Data rss vsz cpu time run time job slots free space nr. of files open files Queued JobAgents cpu ksi2k job status disk used processes load net In/out jobs status sockets migrated mbytes active sessions MyProxy status Alerts Actions http://alimonitor.cern.ch/ 3 26.01.2012 - T1/T2 ALICE Workshop @ Karlsruhe

4 A P M ON ApMon: embeddable APlication MONitoring library Lightweight library of APIs (C, C++, Java, Perl, Python) that can be used to send any information to MonALISA Service(s) over UDP (open XDR binary format) Flexible configuration (hardcoded / configuration file / URL) Dynamic change of options and target while the app is running Background system monitoring (optional) Load, CPU, memory & swap usage Network interfaces (in/out/ip/errs) Sockets in each state, processes in each state Disk IO, swap IO Background application monitoring (optional) Used CPU & wall time, % of the machine CPU Partition stats, size of workdir, open files Memory usage (rss, virtual and %), page faults Very high throughput (O(10K msg/s) on a regular machine) 4 26.01.2012 - T1/T2 ALICE Workshop @ Karlsruhe

5 S ERVICE D EPLOYMENT One service per site deployed on the VoBox AliEn services, proxies’ status, critical local directories on the VoBox Xrootd storage nodes (ALICE:: :: _xrootd_*) Machine parameters (CPU, load, memory, sockets, processes, network traffic, disk and swap IO) Xrootd internal parameters Storage space as seen by xrootd Job agents ( _JobAgent) CPU and memory usage, payload job ID, status Jobs (ALICE:: :: _Jobs) Full machine parameters (same as above) CPU and memory usage of the process tree itself (+ Si2K normalized values) Owner and Masterjob / Subjob IDs Status (STARTING, RUNNING, SAVING) Xrdcp transfers, Available bandwidth, Traceroute, PROOF monitoring, POD, … 5 26.01.2012 - T1/T2 ALICE Workshop @ Karlsruhe

6 A GGREGATED VALUES PER SITE Values are aggregated in various categories (*_Summary) Job summaries Number of jobs in each state (absolute and rates) min/max/avg/total resource usage (absolute and rates) Per cluster and per user Top memory consumers Job agent overview Number of JAs in each state Average TTL Queued JAs (that don’t report yet active monitoring data) Histograms of number of executed jobs per JA Cluster nodes’ summary status Aggregated network traffic per source/target SE Xrootd and PROOF cluster summary 6 26.01.2012 - T1/T2 ALICE Workshop @ Karlsruhe

7 O PEN DATA ACCESS Adapting ML to your needs You can recycle the VoBox ML instance for your own purposes Sending extra data to it from local services For example cluster monitoring with http://monalisa.cern.ch/MLSensor/ Or start a separate, independent service in the “alice” group In case you want to deploy custom filters / alarms Or start your own monitoring group For several sites / redundant monitoring Services and clients can belong/connect to several groups Any number of consumers can subscribe to any cut in the data stream But try to find the minimal expression that gives the data you want 7 26.01.2012 - T1/T2 ALICE Workshop @ Karlsruhe

8 M ON ALISA S ERVICE MonALISA service includes many modules; easily extendable The service package includes: Local host monitoring (CPU, memory, network traffic, processes and sockets in each state, LM sensors, APC UPSs), log files tailing SNMP generic & specific modules; Condor, PBS, LSF and SGE (accounting & host monitoring), Ganglia Ping, tracepath, traceroute, pathload, xrootd Ciena, Optical switches (TL1); Netflow/Sflow (Force10) Calling external applications/scripts that output the values as text XDR-formatted UDP messages (ApMon) New modules can be added by implementing a simple Java interface. Filters can also be defined to aggregate data in new ways The Service can also react to the monitoring data it receives, more about the actions it can take later 8 26.01.2012 - T1/T2 ALICE Workshop @ Karlsruhe

9 M ON ALISA CLIENTS Two important clients GUI client Interactive exploring of all the parameters Can plot history or real-time values Customizable history query interval Subscribes to those particular series and updates the plots in real time Storage client (aka Repository) Subscribes to a set of parameters and stores them in database structures suitable for long-term archival Is usually complemented by a web interface presenting these values Can also be embedded in another controlling application WebServices & REST clients Limited functionality: they lack the subscription mechanism 9 26.01.2012 - T1/T2 ALICE Workshop @ Karlsruhe

10 I NTERACTIVE CLIENT http://monalisa.cern.ch/monalisa__Interactive_Clients__MonALISA_client.html Select “alice” only 10 26.01.2012 - T1/T2 ALICE Workshop @ Karlsruhe

11 I NTERACTIVE CLIENT Browsing parameters 4 hierarchical levels of parameters (Farm, Cluster, Node, Function) 11 26.01.2012 - T1/T2 ALICE Workshop @ Karlsruhe

12 I NTERACTIVE CLIENT Dynamic views 12 26.01.2012 - T1/T2 ALICE Workshop @ Karlsruhe

13 W EB INTERFACE http://monalisa.cern.ch/monalisa__Download__Repository.html Single package including Headless client Apache Tomcat PostgreSQL database (64bit binaries) Web interface includes examples of dynamic views History charts Real-time bar plots Pie, spider, histograms Status tables The above only need a configuration file per plot describing what should be displayed (ask Stefano, Kilian, Martin for configuration tips) With this foundation you can build a custom repository Dynamic pages (JSP or servlets) Over plots with the included JFreeChart library Data aggregation filters Alarms 13 26.01.2012 - T1/T2 ALICE Workshop @ Karlsruhe

14 P LOT EXAMPLES History charts with annotations; any past time interval can be selected 14 26.01.2012 - T1/T2 ALICE Workshop @ Karlsruhe

15 P LOT EXAMPLES History in area mode 15 26.01.2012 - T1/T2 ALICE Workshop @ Karlsruhe Statistics per host below the charts

16 P LOT EXAMPLES Bar chart with optional second axis for cumulative values 16 26.01.2012 - T1/T2 ALICE Workshop @ Karlsruhe

17 P LOT EXAMPLES Dynamic data aggregation 17 26.01.2012 - T1/T2 ALICE Workshop @ Karlsruhe Pie with several data selection options (last / integral / min / max / avg) Bar charts, same options

18 P LOT EXAMPLES Radar plots 18 26.01.2012 - T1/T2 ALICE Workshop @ Karlsruhe

19 P LOT EXAMPLES Table view - mostly used to display current status, but aggregation functions are also available 19 26.01.2012 - T1/T2 ALICE Workshop @ Karlsruhe

20 P LOT EXAMPLES Custom views 20 26.01.2012 - T1/T2 ALICE Workshop @ Karlsruhe

21 P LOT EXAMPLES Custom views 21 26.01.2012 - T1/T2 ALICE Workshop @ Karlsruhe

22 MonALISA keeps a memory buffer for a minimal monitoring history In addition, data can be kept in configurable database structures -The service keeps one week of raw data and one month of averaged values -The client creates three averaged structures Parallel database backends can be used to increase performance and reliability D ATA STORAGE MODEL Shared codebase between components Response Short term, high resolution Short term, high resolution Medium term, lower resolution Medium term, lower resolution Long term, low resolution Long term, low resolution Memory buffer Volatile storage Persistent storage (DB) time Request at highest resolution 22 26.01.2012 - T1/T2 ALICE Workshop @ Karlsruhe

23 D ATA VOLUMES Disk space is not infinite … One can subscribe to detailed information, aggregate and discard the details The system is quick in forgetting The ALICE Central Repository is now 330GB database for 6 years of history data (~100B / data point) 100K distinct active parameters that are stored Usually receives updates for ~300K parameters (1KHz) Storing at a flat rate of 0.6KHz Generating dynamic pages at 2Hz in avg(0.2s) 23 26.01.2012 - T1/T2 ALICE Workshop @ Karlsruhe

24 A UTOMATIC A CTIONS What to do with this monitoring data? Decisions taken upon: Absence/presence of some parameter(s) Values above/below predefined thresholds Arbitrary correlations between values User-defined code Action types: Notifications RSS/Atom feeds Annotation of charts with the events Calling external programs Logging Running custom code, for example in ALICE restart site services maintain DNS-based load balancing ML Service Actions Traffic Jobs Hosts Apps Temperature Humidity A/C Power … Global ML Services Global ML Services 24 26.01.2012 - T1/T2 ALICE Workshop @ Karlsruhe

25 C USTOM FILTERS Memory watchdog 25 26.01.2012 - T1/T2 ALICE Workshop @ Karlsruhe

26 2Q||!2Q ? Questions ? P.S. Don’t forget to subscribe http://alimonitor.cern.ch/xml.jsp 26 26.01.2012 - T1/T2 ALICE Workshop @ Karlsruhe


Download ppt "MONITORING WITH MONALISA Costin Grigoras. M ON ALISA COMMUNICATION ARCHITECTURE MonALISA software components and the connections between them Data consumers."

Similar presentations


Ads by Google