Presentation is loading. Please wait.

Presentation is loading. Please wait.

Running Computers in CC

Similar presentations


Presentation on theme: "Running Computers in CC"— Presentation transcript:

1 Running Computers in CC
FIO Tools for all Service Managers

2 Overview Goal of this presentation FIO goals
Existing tools (selection) Having all Service Managers able to use them

3 Goal of this presentation
FIO has tools for managing computers Many of these are of general interest non-FIO SMs could use them to easily set-up their systems for administration/operation Goal 1 : list /select the interesting tools Goal 2 : identify what is missing for having non-FIO Service Managers able to use them easily List of FIO tools to promote => interesting from a Service Manager point of view Offload FIO people by having non-FIO SM able to use our tools correctly

4 FIO goals Homogeneous way of managing nodes
Efficient and by everybody (?) Common configuration repository Promote our tools and methods Usage of SMS (monitoring, vendor calls,…) Keep CDB up-to-date Delegation of tasks (Software updates, SysAdmins,…) etc… Listed here because said by individuals but want to make sure everybody agrees Why: time consuming (for us) and will off-load FIO not scalable otherwise avoid repetitive work (for other SM) ensure correctness (CDB not obsolete)

5 Existing FIO tools (selection)
CDB ELFms Console service OPMs (procedures) Remedy (interventions)

6 These are the tools, but…
What prevents non-FIO Service Managers to use those tools ?

7 Using FIO tools Missing documentation Offering good setup
Easy to find Explaining concepts Giving solutions to Service Managers Offering good setup Tools easy to use What can be improved From my experience in assisting non-FIO service managers and discussions with several people, I see three areas of improvements - provide better documentation intended for SM - provide a good base on which the SM can build on - easy usage of our tools/interfaces

8 Documentation (finding)
Create a single entry point (web page/site) Improve publishing : Better structure of web sites Some (sub-)services web sites not user oriented Some information is difficult to find e.g. is in CERN specific area Locations TWiki : currently on a (private) server ! Some documents in private web spaces Look and feel (IT schema) Lack of information : Some (sub-)services web sites not user oriented, but rather developer/maintainer help (which are not mutually exclusive!). - recent cases (my experience) : console service (found info after long search) + access control component : doc in private space Twiki = DES server but not central service

9 Documentation (explaining)
Brief explanations about Management model Recommend ELFms or suggest alternatives (i.e. use AIMS + kickstart file) Benefits for SM Expectations from Service Managers Not all SM are equal (standard, advanced, power user) SM to identify his implication Indicate next steps Depending on “category of user” (SM) Propose tutorials Recommend ELFms : when to quattorize, what does that mean ? Missing tutorials explaining the usage of the tools (cdbop, spma_wrapper.sh, ...) in an adequate context e.g. kernel upgrade (different implications of Service Managers) and from the Service Manager point of view (different than FIO/programmers/power users). => you must provide this and keep up-to-date CDB this way

10 Documentation (solutions)
Provide check lists Information to provide according to category Whom to contact, interface to use Publish workflows and use cases e.g. using SMS, hook into it e.g. running components Create HOW-TO’s from the Service Manager point of view e.g. adding extra software, having data backed up e.g. upgrading the kernel Check list example: Machine to be quattorized : things to know (e.g. responsibility toward extra software), information to provide, actions expected, whom to contact, how,…

11 Documentation (solutions)
For the “advanced users” Describe CDB structure and fields Which templates holds which information profile_X, pro_type_Y_Z Purpose of fields and expected values List / explain components and features certainly offering improvements, but not understood arising some worry (loss of control vs. automation dilemma) No exhaustive list and descriptions of such components is known

12 Offering good setup CDB registration
Avoid to put nodes in lxnoq cluster Not good for further maintenance, e.g. by FIO colleagues Create instead a pro_type_cluster_noq or pro_type_cluster_os Proactively grant permissions to update templates Provide a “ready-to-use” template/setup Based on the current certified OS With fixed software base(s) With a minimal monitoring Foresee a few base components Also put explanations in pro_type_ We provide 3 types of (standard) hardware We don’t provide any default (standard) setup This is not toward a uniform way of management AIMS+KS, not necessary what is wanted Recurrent requests to use “simple linux boxes” … … installed by the SysAdmins … with little effort to maintain (auto-updates) … and basic monitoring (fs full, no contact,…) … on top of which more software can be added Provide a “ready-to-use” template/setup that can anyway be customized With fixed software base(s) Not lxbatch (LSF, quota, accounting, …) Not lxplus (but “interactive base” as additional option) Foresee a few base components Not usable as is, but that can be easily customized (e.g. access control)

13 Tools easy to use (improvements)
Limit the number of logins/accounts CDB, Remedy, OPMs, etc… Simple (web) interface for usual tasks : To be determined from the check lists Mainly for non-expert or administrative tasks configuration interface (CDB) for high level functions A more sophisticated CDB interface (advanced users) cdbop not suitable for every SM Remember Panguin ? Lack of simple (web = appealing and effective) interfaces, like : Example : Request form for new equipment and interaction with operations configuration (CDB) interface hiding the underlying template structure but allowing high level functions.

14 Missing tools Configurable alarm system CDB interface(s)
for thresholds and views (filtering) and allowing to set recovery actions (and notifications?) would make visible what is monitored (alarms) CDB interface(s) CDB web tool (suite?) being developed All requirements collected ? Deserves a better visibility User guide or self explanatory pages Panguin replacement Also showing impacted machines/clusters when editing a template Alarm system is NOT alarm display, possibly configurable by user (for thresholds and views) and allowing to set recovery actions. Also means a quick view of what is currently monitored on my nodes (as opposed to a list of all existing metrics) for power users, a more sophisticated CDB interface like Panguin was, but also highlighting impacted machines (changes in templates)

15 Even more tools No (kind of) "control centre" which would allow :
to group access to other tools (entry point) to offer access to the console with a simple click to change the configuration and trigger the actions on relevant target nodes making use of the ELFms tools whenever possible Lemon-status started to integrate access to other parts of the system (templates, Remedy tickets…) More tools = wishes a control centre, which would allow to change the configuration and trigger the actions on relevant target nodes, without having to wait on a synchronization It could offer access to the console with a simple click and make use of the ELFms tools whenever possible (linux limitation!) ELFms tools and scripts : like CDBMoveHost


Download ppt "Running Computers in CC"

Similar presentations


Ads by Google