Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Process Manager in the ATLAS DAQ System G. Avolio, M. Dobson, G. Lehmann Miotto, M. Wiesmann (CERN)

Similar presentations


Presentation on theme: "The Process Manager in the ATLAS DAQ System G. Avolio, M. Dobson, G. Lehmann Miotto, M. Wiesmann (CERN)"— Presentation transcript:

1 The Process Manager in the ATLAS DAQ System G. Avolio, M. Dobson, G. Lehmann Miotto, M. Wiesmann (CERN)

2 2 Giuseppe Avolio (CERN)RT07 Outlines Introduction Process Manager requirements General capabilities Functionalities Process Manager implementation Client interface Server Communication schema Launching a process Host integration Conclusions

3 3 Giuseppe Avolio (CERN)RT07 Introduction The Process Manager (PM) is part of the distributed architecture upon which the DAQ system is built It is responsible for launching and controlling processes It cannot rely on any other system it has to launch Its failure means the loss of the system control Distributed network (~3K machines) ~15K concurrent processes

4 4 Giuseppe Avolio (CERN)RT07 Process Manager Requirements General capabilities Start and kill processes Indicate and notify any change of the process status General constraints PM failures shall not stop data taking activities PM has to able to start any kind of process The started process is not required to have any knowledge of the PM system

5 5 Giuseppe Avolio (CERN)RT07 Process Manager Requirements Functional requirements Processes should be started on behalf of the requesting user Each started process should be uniquely identified The user should be able to Be notified about the process creation (successful or not) Verify the state of the created process Send POSIX signals to a started process Be notified when a process dies (exited or signaled) OwnershipIdentificationCreat. NotificationVerificationSignalingMonitoring & Term. Notification

6 6 Giuseppe Avolio (CERN)RT07 Process Manager Requirements Functional requirements Clients should also be able to control already started processes Get process notifications too! The PM has to take into account SW/HW resources Collaborate with a Resource Manager Process creation and signaling should be subject to access control Collaborate with an Access Manager After any failure the PM has to inherit any alive process Client ControlResourcesAccess ControlError Recovery

7 7 Giuseppe Avolio (CERN)RT07 Process Manager Implementation Architecture: The Client Interface The Client module resides on the host where the requests are initiated It offers a user interface to the PM system Process creation and control Get process information Get process identification Tools to query a host about running processes Two kinds of processes: Linked - process status updated by call-backs (push- mode) Unlinked - client can ask about process status (pull-mode) UnlinkedLinkedTerminated Control Allowed Disallowed Process Info Allowed – local data Link AllowedIgnoreDisallowed Unlink IgnoreAllowedIgnore Handle Allowed – local data Allowed process operations Server side Server Launcher Operating System Client CORBA

8 8 Giuseppe Avolio (CERN)RT07 Process Manager Implementation Architecture: The Server A Server instance runs on each host Server responsibilities: Manage process hierarchy Manage call-back lists Handle process resource allocations Communicating with an external Resource Manager system Handle user authorization Communicating with an external Access Manger system Publish process information Interact with Clients Interact with Launchers Server side Server Launcher Operating System Client CORBA The Server builds the process Handle pmg:://server/partition/application/id It is used to uniquely identify a started process (needed for communication between the Client and the Server)

9 9 Giuseppe Avolio (CERN)RT07 Process Manager Implementation Architecture: The Launcher The Launcher is the component handling low-level process management Why a separate process A large CORBA-based program running as root would represent a security risk root privileges needed to: start processes under an arbitrary user identity wait on arbitrary process to exit It makes easier for the Server reattaching to running processes after a failure Small component with less chances of crashing and better abstraction layer Start the process with correct parameters Monitor the process for termination Send control signals to the process Transmit process status information to the Server Server side Server Launcher Operating System Client CORBA

10 10 Giuseppe Avolio (CERN)RT07 Process Manager Implementation Communication Schema CORBA Client – Server Server – Other DAQ services Mapped Files (Manifests) Shared between Server and Launcher Contains process description and status Named Pipes (FIFOs) Server Launcher: send command Launcher Server: update process status OS System Calls Launcher – Process: wait and signal PM Client PM Server Access Manager Resource Manager PM Launcher PM Launcher PM Launcher Process CORBA Mapped FilesNamed Pipe Child Status File system resident objects (mapped files and FIFOs) allow the PM Server to inherit running processes after a failure

11 11 Giuseppe Avolio (CERN)RT07 Process Manager Implementation Launch end Process States Running Requested Exited Created Sync Error Signaled Fail  Start Exec Error Exec Timeout Exit Signal Run Client Create a process description Client Application Server Acquire Resources Resource Manager Create Reference & Manifest Filesystem Start the Launcher Ask Authorization Access Manager Start Procedure Process States

12 12 Giuseppe Avolio (CERN)RT07 Process Manager Implementation Host Integration The PM needs to launch any process belonging to the DAQ infrastructure It is started automatically as a system service Processes can be started with any user-identity The Launcher binary has to belong to root and have the setuid bit The PM runs on Linux and is adapted to the OS constraints

13 13 Giuseppe Avolio (CERN)RT07 Process Manager Performance High activity mostly on system state transition (i.e., boot or shutdown) No component needs to do polling All the Launchers blocked waiting for processes to terminate Server and Launchers in blocking read on FIFOs Basically free data transfer between Server and Launchers FIFOs and mapped files Small file system usage Mapped file only 32KB

14 14 Giuseppe Avolio (CERN)RT07 Conclusions The current implementation of the Process Manager meets all the needed requirements Robustness Tolerance of other system failures Error recovery Small usage of the host resources Successfully used during the ATLAS detector commissioning DAQ system testing Cosmic runs


Download ppt "The Process Manager in the ATLAS DAQ System G. Avolio, M. Dobson, G. Lehmann Miotto, M. Wiesmann (CERN)"

Similar presentations


Ads by Google