Presentation is loading. Please wait.

Presentation is loading. Please wait.

Controlling a large CPU farm using industrial tools

Similar presentations


Presentation on theme: "Controlling a large CPU farm using industrial tools"— Presentation transcript:

1 Controlling a large CPU farm using industrial tools
Alba Sambade Varela On behalf of the LHCb Online group Real Time 2009 May IHEP Beijing

2 ECS and data flow in LHCb
High Level Trigger Farm Second level of trigger input rate: 1MHz output rate: 2kHz Each server in the event filter farm runs one event-builder process, which distributes the assembled events to trigger processes. There will be as many trigger-processes as there are CPU cores. Each server also runs one instance of the data-writer, which sends accepted events to the streaming and formatting layers (below). Data are not actually moved, but put into a shared memory area. Descriptors to the events are passed between processes by means of a shared buffer library. All processes are implemented using the standard LHCb software frame-work, GAUDI, where some of the services have been specifically re-implemented for the Online environment. These include the logging service, eventreading and writing layers. These are described in detail in [7]. The selection algorithms in the event filter farm reduce the incoming data rate of 1 MHz to an output rate of 2 to 5 kHz, following the luminosity of the accelerator. RTSA2-3: A 40MHz Trigger-Free Readout Architecture for the LHCb Experiment at CERN RT2009 Alba Sambade Varela

3 Experiment Control System
ECS is in charge of the configuration, operation and supervision of all the online components in LHCb. Industrial SCADA system: PVSS FSM package Definition in terms of hierarchies of Finite State Machines. Distributed Information Management System (DIM) ECS Large amount of software processes + very different hardware devices Distributed hierarchical integration STORAGE HV DCS DAQ TFC HLT SubFarmN SubFarm2 See talk CMS1-1 An Integrated Control System for the LHCb experiment SubFarm1 RT2009 Alba Sambade Varela

4 What do we have to control?
Each server in the event filter farm runs one event-builder process, which distributes the assembled events to trigger processes. There will be as many trigger-processes as there are CPU cores. Each server also runs one instance of the data-writer, which sends accepted events to the streaming and formatting layers (below). Data are not actually moved, but put into a shared memory area. Descriptors to the events are passed between processes by means of a shared buffer library. All processes are implemented using the standard LHCb software frame-work, GAUDI, where some of the services have been specifically re-implemented for the Online environment. These include the logging service, eventreading and writing layers. These are described in detail in [7]. The selection algorithms in the event filter farm reduce the incoming data rate of 1 MHz to an output rate of 2 to 5 kHz, following the luminosity of the accelerator. RT2009 Alba Sambade Varela

5 High Level Trigger CPU Farm
Composed out of up to 2000 boxes (= nodes) grouped into 50 subfarms (50 racks) up to 40 nodes/subfarm 8 cores/node * 1 HLT algorithm running/core HLT * Nowadays RT2009 Alba Sambade Varela

6 HLT Top level control … HLT control PC Control PC per sub-farm
Running main HLT project Control PC per sub-farm Running its own control project Corresponds to a control unit in ECS Control levels HLT Farm Control Sub-farm Control Units ECS HLT Farm Nodes STORAGE DCS DAQ HV TFC HLT SubFarmN SubFarm2 SubFarm1 NodeN Node2 Node1 RT2009 Alba Sambade Varela

7 HLT low level control Logical aggregation
Functional algorithms Division of HLT node by tasks RT2009 Alba Sambade Varela

8 Task architecture on HLT node
Each node in the EFF runs one event-builder process. It distributes the assembled events to trigger processes. As many trigger-processes as there are CPU cores. Compute trigger decision and declare accepted events. Each node also runs one instance of the data-writer. Sends accepted events to Storage system. Algorithms implemented with GAUDI. GAUDI: data processing experiment independent framework. Same software used as for offline analysis. Event flow Readout network Event Building Trigger Trigger Trigger Trigger Trigger Trigger Trigger Trigger DataWriter Storage RT2009 Alba Sambade Varela

9 HLT Tasks control Algorithms are treated by the ECS as hardware devices. Integrated as Device Unit (DU) DU behavior modeled with Finite State Machines Common state diagram for all algorithms Transitions mapped to Gaudi transitions Command parameters UNKNOWN Task dies Load Unload Recover NOT_READY Configure Reset READY Start Stop RUNNING ERROR RT2009 Alba Sambade Varela

10 Communication layer Communication via Distributed Information Management System (DIM) Communication mechanism based on client-server paradigm Task behaves as a DIM server Publishes services Algorithm status Counters Receives commands ECS HLT task Controls based on HLT naming convention Sub-farm (row and index) UTGID (task name) RT2009 Alba Sambade Varela

11 Sub-Farm Control GUI RT2009 Alba Sambade Varela

12 Sub-farm control Hierarchical ECS main control + dynamic allocation Easy to include new farms into global system Single sub-farm controls as framework package + base on naming convention Easy to duplicate farm control system. ECS DAQ HLT TFC SubFarmN SubFarm2 SubFarm1 RT2009 Alba Sambade Varela

13 Partitioning Several sub-detectors (many teams)
Different possible configurations (commissioning stage). Possibility to modify dynamically the readout components included to control. Different running modes (Physics, Calibration, Cosmics, etc). Run_type parameter sent with “Configure” command Pool of sub-farms Dynamic allocation Different instances of run control running in parallel (readout partitions) RT2009 Alba Sambade Varela

14 Farm pool controls RT2009 Alba Sambade Varela
Add bubles: MUON and MUONC are allocated, there´s 1farm removed from pool, color coding… RT2009 Alba Sambade Varela

15 Conclusions HLT control completely defined and integrated into global LHCb control system. Implemented with same toolkit (PVSS & FSM) used through the ECS. Keeps Homogeneity Fully configurable at real time. Automatic control of processes for the shift operator. RT2009 Alba Sambade Varela


Download ppt "Controlling a large CPU farm using industrial tools"

Similar presentations


Ads by Google