Presentation is loading. Please wait.

Presentation is loading. Please wait.

Overview of cluster management tools Marco Mambelli – August 9 2011 OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.

Similar presentations


Presentation on theme: "Overview of cluster management tools Marco Mambelli – August 9 2011 OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO."— Presentation transcript:

1 Overview of cluster management tools Marco Mambelli – marco@hep.uchicago.edu August 9 2011 OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO

2 Cluster Management Overview Management infrastructure Provisioning Configuration and software package management Monitoring 8/9/11 Overview of cluster management tools 2

3 Management Infrastructure Remote power-cycling and serial console access FEF has standardized on Avocent ACS serial console servers and Avocent PM line of PDU Other depts use IPMI or a mix of Avocent and APC products All depts have scripts to control and configure remote power- cycling and serial console access 8/9/11 Overview of cluster management tools 3

4 CMS Tier 1 Power Usage Plots CMS plots power utilization by querying PDUs using SNMP. This data can be particularly useful to datacenter managers 8/9/11 Overview of cluster management tools 4

5 Provisioning Tools  Preparing a system for use; OS installation and initial configuration Tools  PXE/Kickstart  Rocks  Cobbler  Perceus 8/9/11 Overview of cluster management tools 5

6 FEF’s PXE/Kickstart Setup FEF uses custom tool built on top of MySQL and Perl DCHP server modules No dhcpd restart required Web front-end for specifying kickstart / nodecombinations Very flexible Kickstart files are created dynamically based on selections from the web GUI Planning to eval Cobbler later this year. 8/9/11 Overview of cluster management tools 6

7 Rocks Clusters Open source Linux distribution based on CentOS Created in 2000 Created for easy deployment of large clusters Used by CMS Tier 1 at Fermilab for 2400 machines CMS can install ~500 systems in 1 hour Only one OS version per Rocks server may be a deal breaker for some 8/9/11 Overview of cluster management tools 7

8 Cobbler Cobbler is infrastructure to provision a node Cobbler is RedHat specific, uses kickstart Cobbler holds OS, profiles and system data A new system requires a profile, a MAC address and a name, nothing more. Provisions new system via PXE boot, via settings for individual system. CLI and Web GUI 8/9/11 8 Overview of cluster management tools

9 Cobler at MWT2 Cobbler is implemented through multiple services on a single server. It acts as a TFTP server for PXE booting, controls the repositories used for install, and provides DHCP/DNS support as well. 8/9/11 Overview of cluster management tools 9

10 Configuration and Package Management Tools that help manage system configuration (files, dirs, permissions, etc.) and software packages Popular open source solutions  Cfengine  Puppet  Bcfg2  Warewulf 8/9/11 Overview of cluster management tools 10

11 Cfengine First version released in 1993 Written in C Fairly easy to understand syntax Relatively easy to find sysadmins with experience FEF and UWM used for many years 8/9/11 Overview of cluster management tools 11

12 Puppet New generation of configuration management system Extensible, declarative language Understands dependencies (huge benefit) Better reporting than Cfengine Auto generation of documentation (think Javadoc) 8/9/11 Overview of cluster management tools 12

13 FEF Puppet Usage Management of all external mounts Kerberos files -- keytab files,.k5login, etc Package management (RPM sets grouped by cluster) NIC bonding configs Group quotas Grid host certs FEF_backup FEF avg is 325 actions per node every Puppet run 8/9/11 Overview of cluster management tools 13

14 Cfengine vs. Puppet (High Level) 8/9/11 Overview of cluster management tools 14

15 Puppet Add User Example 8/9/11 Overview of cluster management tools 15

16 Cfengine Add User Example 8/9/11 Overview of cluster management tools 16

17 Example Puppet Dependency Graph 8/9/11 Overview of cluster management tools 17

18 Puppet Dashboard Puppet dashboard is a web interface for quickly viewing puppet run status and the state of individual system configurations. 8/9/11 Overview of cluster management tools 18

19 Bcfg2 BCFG2 is an xml-based configuration management system Developed by Argonne National Lab Being used by CMS Tier 1 at Fermilab for 2 years to manage a limited number of configuration items. Not RPMS Developers are very responsive; provide support via mailing list and IRC Complex file manipulation can be tricky Does simple pre/post dependencies 8/9/11 Overview of cluster management tools 19

20 Bcfg2 Reporting System Node status and last run times are viewable from the Bcfg2 web interface 8/9/11 Overview of cluster management tools 20

21 Monitoring Tools to help monitor system state and performance In use at Fermilab:  Zabbix  Nagios  Ganglia  Many custom solutions using MRTG, RRDtool, etc 8/9/11 Overview of cluster management tools 21

22 Zabbix Being used by CMS Tier 1 to monitor approx 2.4K nodes; performs 100K checks. Does status and performance monitoring. Relatively new compared to Nagios. Most configuration is done via the web interface. Easy to add custom checks and alerts. 8/9/11 Overview of cluster management tools 22

23 Zabbix Dashboard Zabbix provides a polished web interface that displays finely grained status and performance information 8/9/11 Overview of cluster management tools 23

24 Nagios Used by FEF; 3.5K nodes, approx 30K checks on one server Around for many years. Create a new check by dropping shell script on node (check_mk plugin) Nagios support built-in to Puppet. Web interface can be slow and feels dated. 8/9/11 Overview of cluster management tools 24

25 Nagios Web Interface The Nagios web interface is functional but feels dated. Performance is an issue when monitoring many hosts 8/9/11 Overview of cluster management tools 25

26 OSG Survey The most used cluster management tool is Rocks, sometime customized Followed by Puppets, Cobbler and Cfengine. Users are happy with the performance of the tools they use, especially Rocks and Puppets. The difficulty of the first installation is average (sometime long or with some guesswork) to easy (works out of the box); same for the updates. 8/9/11 Overview of cluster management tools 26

27 OSG Survey (cont) The operation is automatic or requires simple documented tasks. Rocks seem the easiest to operate Puppet is the easiest to install/update Available documentation is good for Rocks, good to average (there could be more or sometime is confusing) for the others. There are some long time users of Rocks and Cfengine while Puppet gained popularity in recent times. 8/9/11 Overview of cluster management tools 27

28 Comparison of Puppet/Cfengine/Bcfg2  https://cd-docdb.fnal.gov:440/cgi- bin/ShowDocument?docid=3967 https://cd-docdb.fnal.gov:440/cgi- bin/ShowDocument?docid=3967 Evaluation and feature comparison of the Nagios and Zabbix monitoring systems  http://cd-docdb.fnal.gov/cgi- bin/ShowDocument?docid=3277 http://cd-docdb.fnal.gov/cgi- bin/ShowDocument?docid=3277 Survey about the setup of clusters in OSG  Available from OSG DOCDB 8/9/11 Overview of cluster management tools 28

29 Credits Thank you to Jason Allen, Head of Fermilab Experiments Facilities (FEF) Department in the Computing Division – this talk is based heavily on his Cluster Management talk at the 2011 OSG all hands meeting 8/9/11 Overview of cluster management tools 29

30 8/9/11 Overview of cluster management tools 30 ?!?!


Download ppt "Overview of cluster management tools Marco Mambelli – August 9 2011 OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO."

Similar presentations


Ads by Google