The Process Manager in the ATLAS DAQ System G. Avolio, M. Dobson, G. Lehmann Miotto, M. Wiesmann (CERN)

Slides:



Advertisements
Similar presentations
Database Architectures and the Web
Advertisements

COURSE: COMPUTER PLATFORMS
Protocol Configuration in Horner OCS
Chorus Vs Unix Operating Systems Overview Introduction Design Principles Programmer Interface User Interface Process Management Memory Management File.
Chap 2 System Structures.
PlanetLab Operating System support* *a work in progress.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
Objektorienteret Middleware Presentation 2: Distributed Systems – A brush up, and relations to Middleware, Heterogeneity & Transparency.
28.2 Functionality Application Software Provides Applications supply the high-level services that user access, and determine how users perceive the capabilities.
Chapter 19: Network Management Business Data Communications, 4e.
Distributed components
Slide 1 Client / Server Paradigm. Slide 2 Outline: Client / Server Paradigm Client / Server Model of Interaction Server Design Issues C/ S Points of Interaction.
Process Description and Control
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
1 ITC242 – Introduction to Data Communications Week 12 Topic 18 Chapter 19 Network Management.
Page 1 Processes and Threads Chapter 2. Page 2 Processes The Process Model Multiprogramming of four programs Conceptual model of 4 independent, sequential.
Page 1 Processes and Threads Chapter Processes 2.2 Threads 2.3 Interprocess communication 2.4 Classical IPC problems 2.5 Scheduling.
INTRODUCTION OS/2 was initially designed to extend the capabilities of DOS by IBM and Microsoft Corporations. To create a single industry-standard operating.
1 CS 333 Introduction to Operating Systems Class 2 – OS-Related Hardware & Software The Process Concept Jonathan Walpole Computer Science Portland State.
Figure 1.1 Interaction between applications and the operating system.
WNT Client/Server SDK Tony Vaccaro CS699 Project Presentation.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
NFS. The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs. The implementation.
L. Granado Cardoso, F. Varela, N. Neufeld, C. Gaspar, C. Haen, CERN, Geneva, Switzerland D. Galli, INFN, Bologna, Italy ICALEPCS, October 2011.
1 Network File System. 2 Network Services A Linux system starts some services at boot time and allow other services to be started up when necessary. These.
Chapter 26 Client Server Interaction Communication across a computer network requires a pair of application programs to cooperate. One application on one.
Overview SAP Basis Functions. SAP Technical Overview Learning Objectives What the Basis system is How does SAP handle a transaction request Differentiating.
File System. NET+OS 6 File System Architecture Design Goals File System Layer Design Storage Services Layer Design RAM Services Layer Design Flash Services.
Java Based Run Control for CMS Small DAQ Systems Michele Gulmini CHEP2000 February 2000 M. Bellato (INFN – Padova) L. Berti (INFN – Legnaro) D. Ceccato.
 1. Introduction  2. Development Life-Cycle  3. Current Component Technologies  4. Component Quality Assurance  5. Advantages and Disadvantages.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
WINDOWS SERVICES. Introduction You often need programs that run continuously in the background Examples: – servers –Print spooler You often need.
GRAPPA Part of Active Notebook Science Portal project A “notebook” like GRAPPA consists of –Set of ordinary web pages, viewable from any browser –Editable.
Self Adaptivity in Grid Computing Reporter : Po - Jen Lo Sathish S. Vadhiyar and Jack J. Dongarra.
B.Ramamurthy9/19/20151 Operating Systems u Bina Ramamurthy CS421.
Remote Access Chapter 4. Learning Objectives Understand implications of IEEE 802.1x and how it is used Understand VPN technology and its uses for securing.
DCE (distributed computing environment) DCE (distributed computing environment)
April 2000Dr Milan Simic1 Network Operating Systems Windows NT.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 6 System Calls OS System.
FailSafe SGI’s High Availability Solution Mayank Vasa MTS, Linux FailSafe Gatekeeper
Control in ATLAS TDAQ Dietrich Liko on behalf of the ATLAS TDAQ Group.
Chapter 1 Introduction to Databases. 1-2 Chapter Outline   Common uses of database systems   Meaning of basic terms   Database Applications  
30 October Agenda for Today Introduction and purpose of the course Introduction and purpose of the course Organization of a computer system Organization.
Middleware for Grid Computing and the relationship to Middleware at large ECE 1770 : Middleware Systems By: Sepehr (Sep) Seyedi Date: Thurs. January 23,
Copyright © cs-tutorial.com. Overview Introduction Architecture Implementation Evaluation.
Distributed System Concepts and Architectures 2.3 Services Fall 2011 Student: Fan Bai
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
Presented By: Samreen Tahir Coda is a network file system and a descendent of the Andrew File System 2. It was designed to be: Highly Highly secure Available.
Netprog: Corba Object Services1 CORBA 2.0 Object Services Ref: The Essential Distributed Objects Survival Guide: Orfali, Harky & Edwards.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
Silberschatz, Galvin and Gagne  Operating System Concepts UNIT II Operating System Services.
ACCESS CONTROL. Components of a Process  Address space  Set of data structures within the kernel - process’s address space map - current status - execution.
CSC 480 Software Engineering Lecture 17 Nov 4, 2002.
G. Anders, G. Avolio, G. Lehmann Miotto, L. Magnoni CERN, Geneva, Switzerland The Run Control System and the Central Hint and Information Processor of.
A Validation System for the Complex Event Processing Directives of the ATLAS Shifter Assistant Tool G. Anders (CERN), G. Avolio (CERN), A. Kazarov (PNPI),
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
ZOOKEEPER. CONTENTS ZooKeeper Overview ZooKeeper Basics ZooKeeper Architecture Getting Started with ZooKeeper.
M. Caprini IFIN-HH Bucharest DAQ Control and Monitoring - A Software Component Model.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
Database Architectures and the Web
Distributed Computing
CSC 480 Software Engineering
Database Architectures and the Web
INTER-PROCESS COMMUNICATION
File service architecture
Process Description and Control
Process Description and Control
Chapter 2 Processes and Threads 2.1 Processes 2.2 Threads
Process Description and Control in Unix
Presentation transcript:

The Process Manager in the ATLAS DAQ System G. Avolio, M. Dobson, G. Lehmann Miotto, M. Wiesmann (CERN)

2 Giuseppe Avolio (CERN)RT07 Outlines Introduction Process Manager requirements General capabilities Functionalities Process Manager implementation Client interface Server Communication schema Launching a process Host integration Conclusions

3 Giuseppe Avolio (CERN)RT07 Introduction The Process Manager (PM) is part of the distributed architecture upon which the DAQ system is built It is responsible for launching and controlling processes It cannot rely on any other system it has to launch Its failure means the loss of the system control Distributed network (~3K machines) ~15K concurrent processes

4 Giuseppe Avolio (CERN)RT07 Process Manager Requirements General capabilities Start and kill processes Indicate and notify any change of the process status General constraints PM failures shall not stop data taking activities PM has to able to start any kind of process The started process is not required to have any knowledge of the PM system

5 Giuseppe Avolio (CERN)RT07 Process Manager Requirements Functional requirements Processes should be started on behalf of the requesting user Each started process should be uniquely identified The user should be able to Be notified about the process creation (successful or not) Verify the state of the created process Send POSIX signals to a started process Be notified when a process dies (exited or signaled) OwnershipIdentificationCreat. NotificationVerificationSignalingMonitoring & Term. Notification

6 Giuseppe Avolio (CERN)RT07 Process Manager Requirements Functional requirements Clients should also be able to control already started processes Get process notifications too! The PM has to take into account SW/HW resources Collaborate with a Resource Manager Process creation and signaling should be subject to access control Collaborate with an Access Manager After any failure the PM has to inherit any alive process Client ControlResourcesAccess ControlError Recovery

7 Giuseppe Avolio (CERN)RT07 Process Manager Implementation Architecture: The Client Interface The Client module resides on the host where the requests are initiated It offers a user interface to the PM system Process creation and control Get process information Get process identification Tools to query a host about running processes Two kinds of processes: Linked - process status updated by call-backs (push- mode) Unlinked - client can ask about process status (pull-mode) UnlinkedLinkedTerminated Control Allowed Disallowed Process Info Allowed – local data Link AllowedIgnoreDisallowed Unlink IgnoreAllowedIgnore Handle Allowed – local data Allowed process operations Server side Server Launcher Operating System Client CORBA

8 Giuseppe Avolio (CERN)RT07 Process Manager Implementation Architecture: The Server A Server instance runs on each host Server responsibilities: Manage process hierarchy Manage call-back lists Handle process resource allocations Communicating with an external Resource Manager system Handle user authorization Communicating with an external Access Manger system Publish process information Interact with Clients Interact with Launchers Server side Server Launcher Operating System Client CORBA The Server builds the process Handle pmg:://server/partition/application/id It is used to uniquely identify a started process (needed for communication between the Client and the Server)

9 Giuseppe Avolio (CERN)RT07 Process Manager Implementation Architecture: The Launcher The Launcher is the component handling low-level process management Why a separate process A large CORBA-based program running as root would represent a security risk root privileges needed to: start processes under an arbitrary user identity wait on arbitrary process to exit It makes easier for the Server reattaching to running processes after a failure Small component with less chances of crashing and better abstraction layer Start the process with correct parameters Monitor the process for termination Send control signals to the process Transmit process status information to the Server Server side Server Launcher Operating System Client CORBA

10 Giuseppe Avolio (CERN)RT07 Process Manager Implementation Communication Schema CORBA Client – Server Server – Other DAQ services Mapped Files (Manifests) Shared between Server and Launcher Contains process description and status Named Pipes (FIFOs) Server Launcher: send command Launcher Server: update process status OS System Calls Launcher – Process: wait and signal PM Client PM Server Access Manager Resource Manager PM Launcher PM Launcher PM Launcher Process CORBA Mapped FilesNamed Pipe Child Status File system resident objects (mapped files and FIFOs) allow the PM Server to inherit running processes after a failure

11 Giuseppe Avolio (CERN)RT07 Process Manager Implementation Launch end Process States Running Requested Exited Created Sync Error Signaled Fail  Start Exec Error Exec Timeout Exit Signal Run Client Create a process description Client Application Server Acquire Resources Resource Manager Create Reference & Manifest Filesystem Start the Launcher Ask Authorization Access Manager Start Procedure Process States

12 Giuseppe Avolio (CERN)RT07 Process Manager Implementation Host Integration The PM needs to launch any process belonging to the DAQ infrastructure It is started automatically as a system service Processes can be started with any user-identity The Launcher binary has to belong to root and have the setuid bit The PM runs on Linux and is adapted to the OS constraints

13 Giuseppe Avolio (CERN)RT07 Process Manager Performance High activity mostly on system state transition (i.e., boot or shutdown) No component needs to do polling All the Launchers blocked waiting for processes to terminate Server and Launchers in blocking read on FIFOs Basically free data transfer between Server and Launchers FIFOs and mapped files Small file system usage Mapped file only 32KB

14 Giuseppe Avolio (CERN)RT07 Conclusions The current implementation of the Process Manager meets all the needed requirements Robustness Tolerance of other system failures Error recovery Small usage of the host resources Successfully used during the ATLAS detector commissioning DAQ system testing Cosmic runs