Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF 10. 12. 2010. - Post-C5 Lemon-web 2.0 Daniel Lenkes and Ivan Fedorko.

Similar presentations


Presentation on theme: "Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF 10. 12. 2010. - Post-C5 Lemon-web 2.0 Daniel Lenkes and Ivan Fedorko."— Presentation transcript:

1 Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF 10. 12. 2010. - Post-C5 Lemon-web 2.0 Daniel Lenkes and Ivan Fedorko 1

2 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF Overview Lemon Current lemon-web and our experience Development 2010: –Federated lemon –Power measurement –Lemonmrd Lemon-web 2.0 Lemon plans 2

3 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF Lemon overview SQL TCP/UDP HTTP Sensor Monitoring Agent Local Cache Oracle Database Repository Backend Application Server Lemon CLI Lemon-host-check Web Browser RRD tool / Python Apache/ PHP (command line tool to access data) (command line tool node exceptions) Measurement Repository User InterfacesNode Monitoring 3

4 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF Lemon in numbers ~11k monitored entities (~8k nodes) ~1.1k metrics, 473 exceptions, 254 classes ~60% of metrics covered by core sensors ~1.7M monitored metrics across Lemon ~300GB of data / month produced 4

5 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF How many services do we monitor? number of unique entities Metrics Count of all not null metrics entries over all metric tables (if we monitor two partitions on host, two entries are counted) 5

6 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF How many services do we monitor? number of metrics monitored number of nodes <502923 50-10053 100-150256 150-2005329 200-2502608 >2502 number of nodes number of sensors/agent 6

7 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF Lemon-db Lemonops (latest only data) Size Used Avail Use% Mounted on 32G 29G 3.8G 89% /ORA/dbs03/LEMONOP Lemonrac (historical data) Size Used Avail Use% Mounted on 1.6T 1.5T 76G 96% /ORA/dbs03/LEMONRAC Data income: ~300 GB/month Not enough in one year 7

8 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF Lemon-web suite Lemon-web –~50-70k hits/day –LAS, lemon-web, entry point to cdb-tpl-viewer –~140 unique IPs accessing lemon-web /day Lemon-gateway –Called by lemon-cli –~100k hits/day –Used by ~100 sites All together ~150k hits/day 8

9 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF Lemon overview SQL HTTP Sensor Monitoring Agent Local Cache Oracle Database Repository Backend Application Server Lemon CLI Lemon-host-check Web Browser RRD tool / Python Apache/ PHP (command line tool to access data) (command line tool node exceptions) Measurement Repository User InterfacesNode Monitoring 9 TCP/UDP

10 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF Web suite structure It has two parts: Lemonmrd Lemon-web Retrieve, Store, Display information 10 Lemon-web lemonmrd Configuration RRD files DB CDB

11 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF What is lemonmrd? Lemon Monitoring Repository Daemon Cache data –Collects data from DB, stores in RRD files Aggregate data Clustering on RRD level (only sum and avg) Repartition the data –Data by metrics → data by Nodes lemonmrd Configuration RRD files DB 11

12 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF What is RRD? (Round Robin Database) Fixed size cache (~14Mb /entity) Round-robin archive of data (RRA) Each RRA makes avg of the previous ones  precision lose for historical data RRA 1 1 sec 5 min 1 hour 1 day 12

13 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF What is Lemon-web? PHP based web application Provides data about entities –Hosts, clusters, power/temperature sensors etc. –(all) metrics and exceptions, alarms Metrics graphs –Based on rrd –Based on DB selects Lemon-web Configuration RRD files DB 13

14 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF Current version Working for a couple of years without major development Tightly bound to CDB hierarchy Stable, but hits the limits Maintenance limitation 14

15 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF Development by summer 2010 Federated Lemon (in cooperation with Morgan Stanley) Power measurements with formulas Excel export functionality Many small enhancements / bug fixes –Metric distribution (over all CC e.g.: OS metric) –Parent/child links between entities –Metric Graphs (fix for multiple primary key metrics) –RRD parameter tuning to fix gap problems 15

16 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF Federated Lemon search over all instances federated cluster from all measurement repositories Federated Lemon-web Measurement Repository Measurement Repository Lemon-web search over entities grouping entities rrd file/entity Lemon-web search over entities grouping entities rrd file/entity 16

17 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF Power measurement Collect power data and provide trends and efficiencies –Beyond cluster hierarchy –Beyond simple sum and average 17

18 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF Power measurement Implemented and in production PHP+config level (extracts data from rrd and performs on fly rrd operations) Error prone configuration 18

19 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF Current Lemonmrd - Shortcomings Performance issues with more entities –Long update loops, causing gaps in the graphs –Long startup > 30 minutes –Peaks lost or became hills Not capable for parallel processing –Bugs in the underlying (Python 2.3) libraries Maintenance –Logging level change only with restart –Missing some debug info 19

20 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF Current Lemonmrd - Shortcomings Required improvements –Configuration change without restart like: log level –Advanced logging –Enhanced configuration (protection against mistakes) –New math operations (-, *, / ) for dynamic cluster data aggregation in current version only sum and average –Data aggregation from multiple DB backends 20

21 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF Lemonmrd 2.0 - Results Multithread application Runtime configuration parameters: –No need to restart in case of all change Dynamic reference resolution for cluster – sub cluster hierarchy –Recursively checks the content of the cluster and preprocesses the subparts –The startup is <1min (~ 30 times faster) –Collecting loop 1-3 sec (> 100 times faster) +, avg, -, *, / operations in the cluster configuration Based on Python 2.6 → portable to SLC6 Failsafe, simplified configuration 21

22 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF Lemonmrd 2.0 – cluster math Weighted summary for lemon cluster: lxmred080340% lxmred06035% dbsrvd277 35% lxmred060510% lxfsec16145% lemon2build0115% Graph reliability: How many entities are reported from the expected? 22

23 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF Lemonmrd 2.0 - Results Improved data precision: Current: New: 23

24 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF Current Lemon-web - Shortcomings Security concerns –e.g. Lemon Alarm System (LAS) Difficulties to add / modify functionality Limited performance 24

25 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF Lemon-web 2.0 - New design Security –CERN SSO (NICE) with E-group based authorization (critical for LAS) Architecture –MVC design pattern –Modular design –Single entry point –Using memcached, APC Maintainability –Advanced configuration –Advanced logging Controller Modules Controller Modules Model Database Model Database View Templates, layout View Templates, layout demand data request HTTP,CLI response HTML, RSS, XML, JSON 25

26 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF Lemon-web 2.0 - New features Flexible menu structure Connecting to multiple DB sources Personal views by service Tiny url-s pointing at graphs, can be embedded in any pages Auto-complete search field Possibility to support other DB engines Data export 26

27 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF Future Plans Current activity Lemon-web 2.0 development ongoing Will be released by the end of January Lemon enhancements under consideration New Lemon DB schema –Increase of monitoring data impacts the size and performance of DB repository –Impact on many Lemon components Lemon repository data export –Reduce amount of historical data stored in DB export to data files 27

28 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF Future Plans Lemon-sensors review/development –Pending enhancement practically on all core sensors –New sensors (e.g. for SafeHost) –Python API High level objects –Trigger alarm if > 40% of cluster nodes is on high load –Data aggregation on data collection Integration with Windows monitoring (one LAS) Support for virtualization (new instances +federated web) 28

29 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF Nagios feedback CHEP 2010, HePix, LHC experiments Based on push model and probes Usually at scale up to 2000 nodes In combination with other tools like Ganglia Limitations: ~3000 nodes ~30k monitored services (service =(node,metric)) Attractive for application/service testing 29

30 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF Lemon and Nagios Lemon tasks under consideration Fresh monitoring tool review –Nagios, Ganglia, etc. Interfacing Nagios in Lemon –Monitoring of Drupal, GRID 30

31 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF Summary Current development –Lemonmrd 2.0 Startup time <1min (~ 30 times faster) Collecting loop 1-3 sec (> 100 times faster) –Lemon-web 2.0 Required development –New Lemon DB schema and repository data export –Lemon-sensors review/development –Integration with Windows monitoring –Support for virtualization –Interfacing Nagios in Lemon Available manpower: –~50% staff FTE –Fellow FTE for the next 6 months 31

32 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF Questions Questions? Thank you for your attention! 32


Download ppt "Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CF 10. 12. 2010. - Post-C5 Lemon-web 2.0 Daniel Lenkes and Ivan Fedorko."

Similar presentations


Ads by Google