Presentation is loading. Please wait.

Presentation is loading. Please wait.

Status of Farm Monitor and Control CERN, February 24, 2005 Gianluca Peco, INFN Bologna.

Similar presentations


Presentation on theme: "Status of Farm Monitor and Control CERN, February 24, 2005 Gianluca Peco, INFN Bologna."— Presentation transcript:

1 Status of Farm Monitor and Control CERN, February 24, 2005 Gianluca Peco, INFN Bologna

2 Status of Farm Monitor and Control, 2 Gianluca Peco Summary Already done: SubFarm Monitor and Control architecture New features: Process Controller, PVSS SFM improvement For the Real Time Trigger Challenge Higher Priority - Archiving, Boot Manager, IPMI Lower Priority - fwTrending, FSM integration To be done later Ms Windows SFM software porting, Oracle & PVSS

3 Status of Farm Monitor and Control, 3 Gianluca Peco SubFarm Node Task Manager Logger Monitor SubFarm Node Task Manager Logger Monitor SubFarm Node Task Manager Logger Monitor Control PC TTY Client PVSS Integration SubFarmMonitor Architecture Logger Monitor Process Control Task Manager LoggerTask Manager SubFarm Node Task Manager Logger Monitor DIM Communication Layer Services Command and Services

4 Status of Farm Monitor and Control, 4 Gianluca Peco New Item - Process Controller It is a program executed on the Control PC that controls the processes in execution in all farm nodes and restart them (immediately) in case of death. It reads from an XML file (in future from the Configuration DB) the list of processes to start on each node and their execution mode (arguments, environment, user, scheduler, priority, re-spawn parameter, etc) It works by contacting the Light ServerTask Managers running on every node through the DIM Cmd and Service. Process restart is triggered by process death: The Task Manager handles the SIGCHLD signals from the children processes In case of SIGCHLD signal the Task Manager updates the “DIM list service” (within about 0.1 s); The update of the “DIM list service” triggers the reaction of the Process Controller, which then schedules the process restart (within about 0.1 s).

5 Status of Farm Monitor and Control, 5 Gianluca Peco Control PC PVSS Integration SubFarm Monitor – Process Controller SFController SFNode_01 SFNode_02 SFNode_03 SFNode_n Logger Task Manager Logger SFarm_n Monitor Process Control Task Manager Monitor DIM Communication Layer C light Publisher XML File Process List DP DIM

6 Status of Farm Monitor and Control, 6 Gianluca Peco New Item - Process Controller (II) A re-spawn control is also implemented: If a process is re-started more than maxStartN times in checkPeriod seconds the process restart is disabled for disPeriod seconds. If maxStartN =-1 the re-spawn control is excluded, i.e. process can be restarted indefinitely. If disPeriod =-1 the process-restart, once disabled, is never re-enabled. (one time re-spawn)

7 Status of Farm Monitor and Control, 7 Gianluca Peco New Items - PVSS Improvements HELP Online help is available The meaning of every item can be established by right clicking the PVSS Panel objects Alarm DP A PVSS script determine the status of each monitor sensor (DU) used by the FSM. Parametrization is hard coded inside CTRL script. This will be updated soon using configuration DP’s

8 Status of Farm Monitor and Control, 8 Gianluca Peco To be done for the Real Time Trigger Challenge HIGH PRIORITY PVSS Archiving ( under development ) Implement PVSS archive using native RAIMA DB Static selection of the data to be archived. It’s not allowed to select data to archive from UI. Boot Manager ( under investigation ) Implement a system inside PVSS UInterface with a mechanism to set boot node configuration ( DHCP,HOST,Static ARP,Route,etc.) using Configuration DB and/or graphical UI Plug & Play system for node adding\removing IPMI Under investigation to allow necessary method to change electrical power state w/o OS interaction. Over LAN messages directly to HW. Power down, soft reboot, power up, etc

9 Status of Farm Monitor and Control, 9 Gianluca Peco IPMI v2.0 Architecture Baseboard System Bus BridgeController ICMB Aux. IPMB Remote Mgmt. Card SMBus/PCI Mgmt. Bus BaseboardMgmt.Controller(BMC) I 2 C/SMBus SDR, SEL, FRU NV Store MgmtNetwkCtrlr LAN PCI RS-232 MODEM / Serial IPMB (I 2 C) Chassis sensors & control circuitry FRU SEEPROM SatelliteMgmt.Controller “side- band” System Interface SENSORs & control circuitry I 2 C / SMBus IPMI Architecture and Initiative Update IPMI Messages

10 Status of Farm Monitor and Control, 10 Gianluca Peco LAN chassis IPMI in modular architecture Typical Modular Application compute node A BMC compute node B i/o node SatelliteController mgmt module mgmt module SatelliteController PS FAN temp FAN Sys I/F BP I/F Mgmt. Module Processor Backplane Mgmt Interconnect BMC IPMI Messages Remote Mgmt Console System BP I/F CIM to IPMI

11 Status of Farm Monitor and Control, 11 Gianluca Peco To be done for the Real Time Trigger Challenge LOW PRIORITY fwTrending integration in SFM (under development) Probably easy to implement using framework feature and power graphical trending tool on archived DPelement (historical data ). Excel export for further analisys. FSM integration in SFM (just started) Using dp alarm structure already implemented we can trigger FSM alarms and relative command for the node (start,stop,reboot,etc)

12 Status of Farm Monitor and Control, 12 Gianluca Peco To be done later Ms Windows SFM porting One possibility under investigation is to use Windows Management Interface and relative API (.NET platform ) The idea is to recompile Linux Monitor Sensor code using a low layer to take information from the WMI structure. More difficult should be theTask Manager and Process Controller ! Totally new code with different signal handling. We are interested to work on the Oracle DB for the LHCb needs In particular we are taking care the Oracle & PVSS integration

13 Status of Farm Monitor and Control, 13 Gianluca Peco END

14 Status of Farm Monitor and Control, 14 Gianluca Peco Process Controller ( Backup ) If more than one process dies in a short time interval, more than one process list updates is scheduled. If the process controller takes more time than the update time difference to start a new process, it receives more than one updates with the missing process and therefore restart the process more than once. The problem can be solved by implementing a coalescence mechanism and disabling list updating during process restart (this is achieved by means of a mutex which arbitrate between update thread and start process thread). Is it possible to implement these mechanisms in PVSS?


Download ppt "Status of Farm Monitor and Control CERN, February 24, 2005 Gianluca Peco, INFN Bologna."

Similar presentations


Ads by Google