Presentation is loading. Please wait.

Presentation is loading. Please wait.

DataGrid is a project funded by the European Commission under contract IST-2000-25182 IT Post-C5, 12.12.2003 Managing Computer Centre machines with Quattor.

Similar presentations


Presentation on theme: "DataGrid is a project funded by the European Commission under contract IST-2000-25182 IT Post-C5, 12.12.2003 Managing Computer Centre machines with Quattor."— Presentation transcript:

1 DataGrid is a project funded by the European Commission under contract IST-2000-25182 IT Post-C5, 12.12.2003 Managing Computer Centre machines with Quattor Germán Cancio and Piotr Poznański IT/FIO Post C5, 12/12/03 http://quattor.org

2 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 2 Outline u Concepts u Architecture and Functionality u Deployment, next steps

3 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 3 u Part of, together with n LEMON monitoring system n LEAF Hardware and State Mgmt system quattor in a nutshell u : fabric management system developed by EDG WP4 n Configuration, installation and management of fabric nodes u Used to manage most of the Linux nodes in the CERN CC n >1700 nodes out of ~ 2000 n Multiple functionality (batch nodes, disk servers, tape servers, DB, web, …) n Heterogeneous hardware (memory, HD size,..)

4 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 4 Key concepts behind quattor u Autonomous nodes: n Local configuration files n No remote management scripts n No reliance on global file systems AFS/NFS u Central control: n Primary configuration is kept centrally (and replicated on the nodes) n A single source for all configuration information u Reproducibility: n Idempotent operations n Atomicity of operations u Scalability: n Load balanced servers, scalable protocols u Use of standards: n HTTP, XML, RPM/PKG, SysV init scripts, … u Portability: n Linux, Solaris

5 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 5 quattor architecture - overview u Configuration Management n Configuration Database n Configuration access and caching n Graphical and Command Line Interfaces u Node and Cluster Management n Automated node installation n Node Configuration Management n Software distribution and management Node Configuration Management Node Management

6 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 6 Configuration Management

7 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 7 Configuration Information u Configuration is expressed using the Pan language u Information is arranged in templates n Common properties set only once u Using templates it is possible to create hierarchies to match service structure CERN CC name_srv1: 137.138.16.5 time_srv1: ip-time-1 lxbatch cluster_name: lxbatch master: lxmaster01 pkg_add (lsf5.1) lxplus cluster_name: lxplus pkg_add (lsf5.1) disk_srvlxplus001 eth0/ip: 137.138.4.246 pkg_add (lsf5.1_debug) lxplus020 eth0/ip: 137.138.4.225 lxplus029

8 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 8 Configuration Management Infrastructure CDB pan GUI Scripts CLI CCM Cache Node PERLPERL XML RDBMS SQLSQL SOAPSOAP HTTPHTTP

9 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 9 Configuration Database (CDB) u Keeps complete configuration information u Configuration describes the desired state of the managed machines. u Data consistency is enforced by a transaction mechanism n All changes are done in transactions u Configuration is validated and kept under version control n Built-in validation (e.g. types), user defined validation u Going back to previous versions of the configuration is possible n Full history is kept in CVS. u Conflicts of concurrent modification of the same configuration information are detected

10 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 10 SQL Query Interface u We can ask about properties spanning across machines u We can run SQL queries (SELECT) and create views: n “give me all machines with more than 512 Mbytes of memory” n “give me all machines that belong to lxplus” u Portability: available for Oracle and MySQL

11 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 11 Examples of information in CDB u Hardware n CPU n Hard disk n Network card n Memory size n Node location in CC u Software n Repository definitions n Service definitions = groups of packages (RPMs) u System n Partition table n Load balancing information u Cluster information n Cluster name and type n Batch master u Audit information n Contract type and number n Purchase date

12 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 12 Graphical User Interface - PanGUIn

13 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 13 Configuration Cache Manager (CCM) u Runs on every managed node u Provides a local interface to the node’s configuration information u Information is downloaded from CDB and cached: n The access to the configuration is fast n Avoid peaks on CDB servers n Disconnected operation are supported n Information is kept in sync with CDB using notification/polling mechanism u Access to local configuration information is performed through an easy-to-use API

14 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 14 Node (Cluster) Management

15 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 15 Managing (cluster) nodes Install server base OS dhcp pxe nfs/http Vendor System installer RH73, RHES, Fedora,… System services AFS,LSF,SSH,accounting.. Installed software kernel, system, applications.. CCM Node Configuration Manager (NCM) RPM, PKG nfs http ftp Software Servers packages (RPM, PKG) SWRep packages CDB Standard nodesManaged nodes Install Manager Node (re)install cache SW package Manager (SPMA)

16 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 16 Install Manager u Sits on top of the standard vendor installer, and configures it n Which OS version to install n Network and partition information n What core packages n Custom post-installation instructions u Automated generation of control file (KickStart) u It also takes care of managing DHCP (and TFTP/PXE) entries u Can get its configuration information from CDB or via command line u Available for RedHat Linux (Anaconda installer) n Allows for plugins for other distributions (SuSE, Debian) or Solaris

17 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 17 Node Configuration (I) u NCM (Node Configuration Manager) is responsible for ensuring that reality on a node reflects the desired state in CDB. u Framework system, where service specific plug-ins called Components make the necessary system changes n Regenerate local config files (eg. /etc/sshd/sshd_config) n Restard/reload services (SysV scripts) n configuration dependencies (eg. configure network before sendmail) u Components invoked on boot, via cron or on CDB config changes u Porting of SUE features to NCM components started, to be completed with next CERN certified Linux and Solaris versions. n Currently available: grub, quota, snmp, dns, automounter, network, inetd, globuscfg, spma, sysaccounting, edg-cfg, … n keep portability between Linux/Solaris whenever possible

18 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 18 Node Configuration (II) u Component support libraries for ease of component development n Configuration information access n Configuration file manipulation n Advanced file operations n Process management n Exception handling libraries u A tool geared towards sysadmins/operators allows to query/visualize the node’s configuration profile.

19 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 19 Software Management (I - Server) (SPMA and SWRep were introduced in post-C5 14/3/03) u SWRep = Software Repository u Universal repository for storing Software: n Extendable to multiple platforms and packagers (RH Linux RPM, Solaris PKG, others like Debian pkg) n Multiple package versions/releases u Management (“product maintainers”) interface: n ACL based mechanism to grant/deny modification rights (packages associated to “areas”) u Client access: via standard protocols n HTTP, AFS/NFS, FTP u Replication: using standard tools (rsync) n load balancing, redundancy

20 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 20 Software Management (II - Clients) u SPMA = Software Package Management Agent u Manage all or a subset of packages on the nodes n On production nodes: wipe out unknown packages, (re)install missing ones. n On development nodes (or desktops): non-intrusive, configurable management of system and security updates. u Package manager, not only upgrader n Can roll back package versions n Transactional verification of operations u Portability: Generic plug-in framework n Plug-ins available for Linux RPM and Solaris PKG, (can be extended) u Scalability: n Supports HTTP (also FTP, AFS/NFS) n time smearing n Package pre-caching u Possible to access multiple repositories (division/experiment specific) u Modularity: can be configured via CDB, or locally

21 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 21 Deployment

22 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 22 Quattor deployment @ CERN u Quattor is used by FIO to manage most CC Linux nodes: n >1700 nodes, 15 clusters – to be scaled up to >5000 in 2006-8 (LHC) n LXPLUS, LXBATCH, LXSHARE, LXBUILD, disk and tape servers, Oracle DB servers n RedHat 7.3 and RHES 2.1 n RHES30 (also on IA64) to come soon u Not deployed: InstallMgr – as CERN legacy solution (AIMS) still in use n AIMS interfaced to CDB by FIO (not fully automatic) u Server cluster (LXSERV) hosting CDB and SWRep replicas n 4 RH73 nodes n CDB: ~ 260 general templates, and 2 templates per node (one derived from LANDB) : >3800 templates in total n SWRep: > 5900 software packages u Solaris clusters, server nodes and desktops to come for Solaris9 n Cf. Ignacio’s C5 presentation

23 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 23 FIO usage examples @ CERN-CC u LSF batch system upgrade: n Upgrade from LSF 4.2 to LSF 5.1 on >1000 nodes within 15 minutes, without service interruption u Security updates: n All security upgrades are done by SPMA s SSH security updates s KDE upgrades (~ 400 MB per node) on >700 nodes s etc … (~once a week!) u Kernel upgrades: n SPMA can handle multiple versions of the same package -> n Allows to separate in time installation and activation (after reboot) of new kernel n NCM component configures which kernel version to use

24 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 24 Deployment outside CERN-CC u EDG: no time for wide deployment n Estimated effort for moving from LCFG to quattor exceeded remaining EDG lifetime n EDG focus on stability rather than middleware functionality u Tutorials held at HEPiX and EDG conferences have caused positive feedback and interests: n Experiments: LHCb, Atlas n HEP institutes: UAM Madrid, LAL/IN2P3, Liverpool University, NIKHEF n Projects: Grille 5K (CNRS France)

25 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 25 Work in Progress (I) u EDG finish-up n End user documentation, install guide, packaging n FIO will continue maintaining Quattor (as part of ELFms) after EDG finishes. u Remaining developments n CDB fine grained access control n Scalability audit n Data encryption mechanisms: for sensitive data (ACLs) n Specialized GUIs and CLIs (eg. operators) u Improved procedures and workflows n Take into account new commands and functionality n Release cycle (‘test’, ‘new’, ‘production’ branches of CDB information) u Finish migration out of legacy tools n Finish SUE migration port of SUE features to NCM components in time for next certified Linux n ASIS + SUE/security + rpmupdate phaseout was finished in August.

26 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 26 Work in Progress (II) u Single CDB for CERN-CC n Inclusion of Solaris clusters and nodes u More integration with ELFms LEMON/LEAF n Interfaces from LEAF HMS/SMS to quattor being developed n LEMON sensors for quattor u Upgrade LXSERV service cluster n New hardware n Split in front-end / back end nodes u LCG-2 integration for LXBATCH nodes (WorkerNodes) n Deploy software from EDG, LCG, experiment SW n Deploy NCM components for local grid services

27 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 27 http://quattor.org

28 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 28 Differences with ASIS/SUE SUE: u Focus on configuration, not installation u Powerful configuration language n True hierarchical structures n Extendable data manipulation language n (user defined) typing and validation n Sharing of configuration data between components now possible u Central Configuration Database u Supports unconfiguring services u Improved depenency model n Pre/post dependencies u Revamped component support libraries ASIS: See post-C5 14/3/2003 u Scalability n HTTP vs. shared file system u Supports native packaging system (RPM, PKG) u Manages all software on the node u ‘real’ Central Configuration database u (But: no end-user GUI, no package generation tool)

29 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 29 Differences with EDG-LCFG u New and powerful configuration language n True hierarchical structures n Extendable data manipulation language n (user defined) typing and validation u Portability n Plug-in architecture -> Linux and Solaris u Enhanced components n Sharing of configuration data between components now possible n New component support libraries n Native configuration access API (NVA-API) u Stick to the standards where possible n Installation subsystem uses system installer n Components don’t replace SysV init.d subsystem u Modularity n Clearly defined interfaces and protocols n Mostly independent modules n “light” functionality built in (eg. package management) u Removed non-scalable protocols n NFS mounts not necessary any longer u Enhanced management of software packages n ACL’s for SWRep n No need for RPM ‘header’ files

30 Managing Computer Centre Machines with Quattor – Post-C5 – Cancio, Poznański - n° 30 NCM Component example [...] sub Configure { my ($self,$config) = @_; # access configuration information my $arch=$config->getValue('/system/architecture’); # CDB API $self->Fail (“not supported") unless ($arch eq ‘i386’); # (re)generate and/or update local config file(s) open (myconfig,’/etc/myconfig’); … # notify affected (SysV) services if required if ($changed) { system(‘/sbin/service myservice reload’); … } sub Unconfigure {... }


Download ppt "DataGrid is a project funded by the European Commission under contract IST-2000-25182 IT Post-C5, 12.12.2003 Managing Computer Centre machines with Quattor."

Similar presentations


Ads by Google