Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fabric Management: Progress and Plans PEB Tim Smith IT/FIO.

Similar presentations


Presentation on theme: "Fabric Management: Progress and Plans PEB Tim Smith IT/FIO."— Presentation transcript:

1 Fabric Management: Progress and Plans PEB Tim Smith IT/FIO

2 2003/09/23Fabric Management: Tim.Smith@cern.ch2 Contents  Manageable Nodes  Framework for management  Deployed at CERN  Plans

3 2003/09/23Fabric Management: Tim.Smith@cern.ch3 /var/run /var/log /etc/init.d /etc /etc/sysconfig Standard / Autonomous Nodes  Linux Standards Base  http://www.linuxbase.org http://www.linuxbase.org  Distributed / Autonomous  Decoupling of nodes  Local config files  Local programs do all work  Avoid inherent drift  No external crontabs  No remote mgmt scripts  No remote application updates  No parallel cmd engines  No global file-systems AFS/NFS

4 2003/09/23Fabric Management: Tim.Smith@cern.ch4 /var/run /var/log /etc/init.d /etc /etc/sysconfig RPM Standard / Autonomous Nodes RPM

5 2003/09/23Fabric Management: Tim.Smith@cern.ch5 Config Manager Package Manager /var/run /var/log /etc/init.d /etc /etc/sysconfig RPM Standard / Autonomous Nodes  All SW in RPM  Control complete SW set  XML desired state  Transactions  Update local cfgs  Manage through SysV scripts RPM

6 2003/09/23Fabric Management: Tim.Smith@cern.ch6 RPM Config Manager RPM Package Manager /var/run /var/log /etc/init.d /etc /etc/sysconfig RPM Standard / Autonomous Nodes RPM

7 2003/09/23Fabric Management: Tim.Smith@cern.ch7 Framework Principles  Installation  Automation of complete process  Reproducible  Clear installation:  Short, clean, and reproducible Kickstart base installation  All Software installation by RPM  System configuration by one tool  Configuration information through unique interface, from unique store  Maintenance:  The running system must be ‘updateable’  The updates must go into new installations afterwards  System updates have to be automated, but steerable

8 2003/09/23Fabric Management: Tim.Smith@cern.ch8 Framework for Management Node Cfg Cache CDB Config Manager Monitoring Manager Cfg AgentMonAgent Desired StateActual State

9 2003/09/23Fabric Management: Tim.Smith@cern.ch9 Fault Manager SW Rep Framework for Management Node Cfg Cache SW Cache SW Agent SW Manager CDB Config Manager XMLRPM http Monitoring Manager Cfg AgentMonAgent State Manager HardWare Manager

10 2003/09/23Fabric Management: Tim.Smith@cern.ch10 SW Rep ELFms: Node Cfg Cache SW Cache SPMA SWRep CDB OraMon NCMMSA SMS HMS LEMON LEAF

11 2003/09/23Fabric Management: Tim.Smith@cern.ch11 quattor configuration GUI CLI Pan XML Server Module SQL/LDAP/HTTP Server Module SQL/LDAP/HTTP N V A AP I CDB Installation... CCM Cache Node Configuration Data Base (CDB) Configuration Information store. The information is updated in transactions, it is validated and versioned. Pan Templates are compiled into XML profiles Server Modules Provide different access patterns to Configuration Information Configuration Information is stored in the local cache. It is accessed via NVA-API HTTP + notifications nodes are notified about changes of their configuration nodes fetch the XML profiles via HTTP Pan Templates with configuration information are input into CDB via GUI & CLI pan SOAPSOAP

12 2003/09/23Fabric Management: Tim.Smith@cern.ch12 Config/SW Considerations  Hierarchical configuration specification  Graph rather than tree structure  Common properties set only once  Node profiles  Complete specification in one XML file  Local cache  Transactions / Notifications  Externally specified, Versioned: CVS repos.  One tool to manage all SW: SPMA  System and application  Update verification nodes + release cycle  Procedures and Workflows

13 2003/09/23Fabric Management: Tim.Smith@cern.ch13 NCM details  Modules:  “Components”  Invocation and notification framework  Component support libraries  Configuration query tool  NCM is responsible for ensuring that reality on a node reflects the desired state in CDB.  components which make the necessary system changes  components do not install software, just ensure config is correct  components register with the framework to be notified when the node configuration is updated in CDB.  Usually, this implies regenerating and/or updating local config files (eg. /etc/sshd_config )  Use standard system facilities (SysV scripts) for managing services  Components notify services using SysV scripts when cfg changes.  Possible to define configuration dependencies between components  Eg. configure network before sendmail

14 2003/09/23Fabric Management: Tim.Smith@cern.ch14 Hardware variety hardware_diskserv_elonex_1100 hardware_elonex_500 hardware_elonex_600 hardware_elonex_800 hardware_elonex_800_mem1024mb hardware_elonex_800_mem128mb hardware_seil_2002 hardware_seil_2002_interactiv hardware_seil_2003 hardware_siemens_550 hardware_techas_600 hardware_techas_600_2 hardware_techas_600_mem512mb hardware_techas_800 template hardware_diskserver_elonex_1100; "/hardware/cpus" = list(create("hardware_cpu_GenuineIntel_Pentium_III_1100"), create("hardware_cpu_GenuineIntel_Pentium_III_1100")); "/hardware/harddisks" = nlist("sda", create("pro_hardware_harddisk_WDC_20")); "/hardware/ram" = list(create("hardware_ram_1024")); "/hardware/cards/nic" = list(create("hardware_card_nic_Broadcom_BCM5701")); structure template hardware_cpu_GenuineIntel_Pentium_III_1100; "vendor" = "GenuineIntel"; "model" = "Intel(R) Pentium(R) III CPU family 1133MHz"; "speed" = 1100; structure template hardware_card_nic_Broadcom_BCM5701; "manufacturer" = "Broadcom Corporation NetXtreme BCM5701 Gigabit Ethernet"; "name" = "3Com Corporation 3C996B-T 1000BaseTX"; "media" = "GigaBit Ethernet"; "bus" = "pci";

15 2003/09/23Fabric Management: Tim.Smith@cern.ch15 What is in CDB ?  Hardware  CPU  Hard disk  Network card  Memory size  Location  Software  Repository definitions  Service definitions = groups of packages (RPMs)  System  Partition table  Cluster name and type  CCDB name  Site release  Load balancing information

16 2003/09/23Fabric Management: Tim.Smith@cern.ch16 quattor installation CCM SPMA NCM Components Cdispd NCM Registration Notification SPMA SPMA.cfg CDB nfs http ftp Mgmt API ACL’s Client Nodes SWRep Servers cache Packages (rpm, pkg) packages (RPM, PKG) PXE DHCP Mgmt API ACL’s Installation server DHCP handling KS/JS PXE handling KS/JS generator Node Install CCM Node (re)install?

17 2003/09/23Fabric Management: Tim.Smith@cern.ch17 SWRep  Client-server toolsuite for the storage of software packages  Universal repository:  Extendable to multiple platforms and package formats (RHLinux/RPM, Solaris/PKG,… others like Debian dpkg)  Multiple package versions/releases  Currently, RPM for RH73, RH21ES and soon RH10 on LXSERV cluster  Management (“product maintainers”) interface:  ACL based mechanism to grant/deny modification rights (packages associated to “areas”)  Management done by FIO/FS  Client access: via standard protocols  On LXSERV: HTTP  Replication: using standard tools (eg. rsync)  Availability, load balancing

18 2003/09/23Fabric Management: Tim.Smith@cern.ch18 SPMA  Runs on every target node  All LXPLUS/LXBATCH/LXSHARE/ … nodes; except old RH61 nodes  Plug-in framework allows for portability  FIO/FS uses it for RPM, PS group will use it for Solaris (PKG)  Can manage either all or a subset of packages on the nodes  Configured by FIO/FS to manage ALL packages.  Which means: Not configured packages are wiped out.  And: Packages missing on node but in configuration are (re)installed.  Addresses scalability  Packages can be stored ahead in a local cache, avoiding peak loads on software repository servers (simultaneous upgrades of large farms)

19 2003/09/23Fabric Management: Tim.Smith@cern.ch19 SPMA (II)  SPMA functionality: 1.Compares the packages currently installed on the local node with the packages listed in the configuration 2.Computes the necessary install/deinstall/upgrade operations 3.Invokes the packager RPMT/PKGT with the right operation transaction set  RPMT (RPM transactions) is a small tool on top of the RPM libraries, which allows for multiple simultaneous package operations resolving dependencies (unlike RPM)  Example: ‘upgrade X, deinstall Y, downgrade Z, install T’ and verify/resolve appropriate dependencies

20 2003/09/23Fabric Management: Tim.Smith@cern.ch20 Currently deployed at CERN  Scale  1400 node profiles  3M lines of node XML specifications  From 28k lines of PAN definitions  Scalability  3 node load balanced web servers cluster  For Configuration profiles  For SW Repository  Time smeared transactions  Pre-deployment caching  Security  GPG key pair per host – during installation  Per host encrypted files served

21 2003/09/23Fabric Management: Tim.Smith@cern.ch21 Examples  Clusters  LXPLUS, LXBATCH, LXBUILD, LXSHARE, OracleDBs  TapeServers, DiskServers, StageServers  Supported Platforms  RH7.3.32400 packages  RHES1300 packages  RH10  KDE/Gnome security patch rollout  0.5 GB onto 700 nodes  15 minute time smearing from 3 lb-servers  LSF 4-5 transition  10 minutes  No queues or jobs stopped  C.f. last time 3 weeks, 3 people!

22 2003/09/23Fabric Management: Tim.Smith@cern.ch22 Plans  Deploying NCM  Port SUE features to NCM for RH10  Use for Solaris10  Deploying State Manager (SMS)  Use case targetted GUIs  Web Servers  Split: FrontEnd / BackEnd Architecture

23 2003/09/23Fabric Management: Tim.Smith@cern.ch23 Conclusions  Maturity brings…  Degradation of initial state definition  HW + SW  Accumulation of innocuous temporary procedures  Scale brings…  Marginal activities become full time  Many hands on the systems  Combat with strong management automation


Download ppt "Fabric Management: Progress and Plans PEB Tim Smith IT/FIO."

Similar presentations


Ads by Google