Controlling a large CPU farm using industrial tools

Slides:



Advertisements
Similar presentations
CWG10 Control, Configuration and Monitoring Status and plans for Control, Configuration and Monitoring 16 December 2014 ALICE O 2 Asian Workshop
Advertisements

GNAM and OHP: Monitoring Tools for the ATLAS Experiment at LHC GNAM and OHP: Monitoring Tools for the ATLAS Experiment at LHC M. Della Pietra, P. Adragna,
CHEP 2012 – New York City 1.  LHC Delivers bunch crossing at 40MHz  LHCb reduces the rate with a two level trigger system: ◦ First Level (L0) – Hardware.
CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/27 A Control Software for the ALICE High Level Trigger Timm.
1 HLT – ECS, DCS and DAQ interfaces Sebastian Bablok UiB.
CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration Timm Morten Steinbeck, Computer Science.
Clara Gaspar, May 2010 The LHCb Run Control System An Integrated and Homogeneous Control System.
L. Granado Cardoso, F. Varela, N. Neufeld, C. Gaspar, C. Haen, CERN, Geneva, Switzerland D. Galli, INFN, Bologna, Italy ICALEPCS, October 2011.
1 The ATLAS Online High Level Trigger Framework: Experience reusing Offline Software Components in the ATLAS Trigger Werner Wiedenmann University of Wisconsin,
Control and monitoring of on-line trigger algorithms using a SCADA system Eric van Herwijnen Wednesday 15 th February 2006.
The LHCb Online System Design, Implementation, Performance, Plans Presentation at the 2 nd TIPP Conference Chicago, 9 June 2011 Beat Jost Cern.
Calo Piquet Training Session - Xvc1 ECS Overview Piquet Training Session Cuvée 2012 Xavier Vilasis.
Designing a HEP Experiment Control System, Lessons to be Learned From 10 Years Evolution and Operation of the DELPHI Experiment. André Augustinus 8 February.
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
JCOP Workshop September 8th 1999 H.J.Burckhart 1 ATLAS DCS Organization of Detector and Controls Architecture Connection to DAQ Front-end System Practical.
Clara Gaspar, October 2011 The LHCb Experiment Control System: On the path to full automation.
C.Combaret, L.Mirabito Lab & beamtest DAQ with XDAQ tools.
G. Maron, Agata Week, Orsay, January Agata DAQ Layout Gaetano Maron INFN – Laboratori Nazionali di Legnaro.
CERN Real Time conference, Montreal May 18 – 23, 2003 Richard Jacobsson 1 Driving the LHCb Front-End Readout TFC Team: Arek Chlopik, IPJ, Poland Zbigniew.
Control in ATLAS TDAQ Dietrich Liko on behalf of the ATLAS TDAQ Group.
ALICE, ATLAS, CMS & LHCb joint workshop on
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Clara Gaspar, March 2005 LHCb Online & the Conditions DB.
Online Software 8-July-98 Commissioning Working Group DØ Workshop S. Fuess Objective: Define for you, the customers of the Online system, the products.
Bruno Belbute, October 2006 Presentation Rehearsal for the Follow-up meeting of the Protocol between AdI and CERN.
Overview of DAQ at CERN experiments E.Radicioni, INFN MICE Daq and Controls Workshop.
CHEP March 2003 Sarah Wheeler 1 Supervision of the ATLAS High Level Triggers Sarah Wheeler on behalf of the ATLAS Trigger/DAQ High Level Trigger.
Source Controller software Ianos Schmidt The University of Iowa.
Niko Neufeld, CERN/PH. Online data filtering and processing (quasi-) realtime data reduction for high-rate detectors High bandwidth networking for data.
Management of the LHCb Online Network Based on SCADA System Guoming Liu * †, Niko Neufeld † * University of Ferrara, Italy † CERN, Geneva, Switzerland.
Online Monitoring System at KLOE Alessandra Doria INFN - Napoli for the KLOE collaboration CHEP 2000 Padova, 7-11 February 2000 NAPOLI.
1 Calorimeters LED control LHCb CALO meeting Anatoli Konoplyannikov /ITEP/ Status of the calorimeters LV power supply and ECS control Status of.
Status & development of the software for CALICE-DAQ Tao Wu On behalf of UK Collaboration.
1 Farm Issues L1&HLT Implementation Review Niko Neufeld, CERN-EP Tuesday, April 29 th.
Clara Gaspar, April 2006 LHCb Experiment Control System Scope, Status & Worries.
LHCb Configuration Database Lana Abadie, PhD student (CERN & University of Pierre et Marie Curie (Paris VI), LIP6.
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Alignment in real-time in current detector and upgrade 6th LHCb Computing Workshop 18 November 2015 Beat Jost / Cern.
R.Divià, CERN/ALICE 1 ALICE off-line week, CERN, 9 September 2002 DAQ-HLT software interface.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
1 DAQ.IHEP Beijing, CAS.CHINA mail to: The Readout In BESIII DAQ Framework The BESIII DAQ system consists of the readout subsystem, the.
Calibration & Monitoring M.N Minard Monitoring News Status of monitoring tools Histogramm and monitoring meeting 6/02/08 Calibration farm brainstorming.
Introduction to DAQ Architecture Niko Neufeld CERN / IPHE Lausanne.
M. Caprini IFIN-HH Bucharest DAQ Control and Monitoring - A Software Component Model.
20OCT2009Calo Piquet Training Session - Xvc1 ECS Overview Piquet Training Session Cuvée 2009 Xavier Vilasis.
Clara Gaspar, May 2010 SMI++ A Tool for the Automation of large distributed control systems.
Stefan Koestner IEEE NSS – San Diego 2006 IEEE - Nuclear Science Symposium San Diego, Oct. 31 st 2006 Stefan Koestner on behalf of the LHCb Online Group.
Giovanna Lehmann Miotto CERN EP/DT-DI On behalf of the DAQ team
CMS High Level Trigger Configuration Management
Online Control Program: a summary of recent discussions
Enrico Gamberini, Giovanna Lehmann Miotto, Roland Sipos
TPC Commissioning: DAQ, ECS aspects
CMS – The Detector Control System
RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne
MiniDAQ2 Workshop Control System.
ProtoDUNE SP DAQ assumptions, interfaces & constraints
The LHCb Run Control System
Philippe Vannerem CERN / EP ICALEPCS - Oct03
Control and monitoring of trigger algorithms using Gaucho
LHCb Trigger, Online and related Electronics
IEEE - Nuclear Science Symposium San Diego, Oct. 31st 2006
The LHCb High Level Trigger Software Framework
Network Processors for a 1 MHz Trigger-DAQ System
Experiment Control System
Eric van Herwijnen March 10th 2005
Pierluigi Paolucci & Giovanni Polese
Pierluigi Paolucci & Giovanni Polese
Tools for the Automation of large distributed control systems
Use Of GAUDI framework in Online Environment
Pierluigi Paolucci & Giovanni Polese
Presentation transcript:

Controlling a large CPU farm using industrial tools Alba Sambade Varela On behalf of the LHCb Online group Real Time 2009 May 10-15 IHEP Beijing

ECS and data flow in LHCb High Level Trigger Farm Second level of trigger input rate: 1MHz output rate: 2kHz Each server in the event filter farm runs one event-builder process, which distributes the assembled events to trigger processes. There will be as many trigger-processes as there are CPU cores. Each server also runs one instance of the data-writer, which sends accepted events to the streaming and formatting layers (below). Data are not actually moved, but put into a shared memory area. Descriptors to the events are passed between processes by means of a shared buffer library. All processes are implemented using the standard LHCb software frame-work, GAUDI, where some of the services have been specifically re-implemented for the Online environment. These include the logging service, eventreading and writing layers. These are described in detail in [7]. The selection algorithms in the event filter farm reduce the incoming data rate of 1 MHz to an output rate of 2 to 5 kHz, following the luminosity of the accelerator. RTSA2-3: A 40MHz Trigger-Free Readout Architecture for the LHCb Experiment at CERN RT2009 Alba Sambade Varela

Experiment Control System ECS is in charge of the configuration, operation and supervision of all the online components in LHCb. Industrial SCADA system: PVSS FSM package Definition in terms of hierarchies of Finite State Machines. Distributed Information Management System (DIM) ECS Large amount of software processes + very different hardware devices Distributed hierarchical integration STORAGE HV DCS DAQ … TFC HLT SubFarmN SubFarm2 … See talk CMS1-1 An Integrated Control System for the LHCb experiment SubFarm1 RT2009 Alba Sambade Varela

What do we have to control? Each server in the event filter farm runs one event-builder process, which distributes the assembled events to trigger processes. There will be as many trigger-processes as there are CPU cores. Each server also runs one instance of the data-writer, which sends accepted events to the streaming and formatting layers (below). Data are not actually moved, but put into a shared memory area. Descriptors to the events are passed between processes by means of a shared buffer library. All processes are implemented using the standard LHCb software frame-work, GAUDI, where some of the services have been specifically re-implemented for the Online environment. These include the logging service, eventreading and writing layers. These are described in detail in [7]. The selection algorithms in the event filter farm reduce the incoming data rate of 1 MHz to an output rate of 2 to 5 kHz, following the luminosity of the accelerator. RT2009 Alba Sambade Varela

High Level Trigger CPU Farm Composed out of up to 2000 boxes (= nodes) grouped into 50 subfarms (50 racks) up to 40 nodes/subfarm 8 cores/node * 1 HLT algorithm running/core HLT … * Nowadays RT2009 Alba Sambade Varela

HLT Top level control … HLT control PC Control PC per sub-farm Running main HLT project Control PC per sub-farm Running its own control project Corresponds to a control unit in ECS Control levels HLT Farm Control … Sub-farm Control Units ECS HLT Farm Nodes STORAGE DCS DAQ HV TFC … HLT SubFarmN SubFarm2 SubFarm1 … NodeN Node2 Node1 RT2009 Alba Sambade Varela

HLT low level control Logical aggregation Functional algorithms Division of HLT node by tasks RT2009 Alba Sambade Varela

Task architecture on HLT node Each node in the EFF runs one event-builder process. It distributes the assembled events to trigger processes. As many trigger-processes as there are CPU cores. Compute trigger decision and declare accepted events. Each node also runs one instance of the data-writer. Sends accepted events to Storage system. Algorithms implemented with GAUDI. GAUDI: data processing experiment independent framework. Same software used as for offline analysis. Event flow Readout network Event Building Trigger Trigger Trigger Trigger Trigger Trigger Trigger Trigger DataWriter Storage RT2009 Alba Sambade Varela

HLT Tasks control Algorithms are treated by the ECS as hardware devices. Integrated as Device Unit (DU) DU behavior modeled with Finite State Machines Common state diagram for all algorithms Transitions mapped to Gaudi transitions Command parameters UNKNOWN Task dies Load Unload Recover NOT_READY Configure Reset READY Start Stop RUNNING ERROR RT2009 Alba Sambade Varela

Communication layer Communication via Distributed Information Management System (DIM) Communication mechanism based on client-server paradigm Task behaves as a DIM server Publishes services Algorithm status Counters Receives commands ECS HLT task Controls based on HLT naming convention Sub-farm (row and index) UTGID (task name) RT2009 Alba Sambade Varela

Sub-Farm Control GUI RT2009 Alba Sambade Varela

Sub-farm control Hierarchical ECS main control + dynamic allocation Easy to include new farms into global system Single sub-farm controls as framework package + base on naming convention Easy to duplicate farm control system. ECS DAQ … HLT TFC SubFarmN SubFarm2 … SubFarm1 RT2009 Alba Sambade Varela

Partitioning Several sub-detectors (many teams) Different possible configurations (commissioning stage). Possibility to modify dynamically the readout components included to control. Different running modes (Physics, Calibration, Cosmics, etc). Run_type parameter sent with “Configure” command Pool of sub-farms Dynamic allocation Different instances of run control running in parallel (readout partitions) RT2009 Alba Sambade Varela

Farm pool controls RT2009 Alba Sambade Varela Add bubles: MUON and MUONC are allocated, there´s 1farm removed from pool, color coding… RT2009 Alba Sambade Varela

Conclusions HLT control completely defined and integrated into global LHCb control system. Implemented with same toolkit (PVSS & FSM) used through the ECS. Keeps Homogeneity Fully configurable at real time. Automatic control of processes for the shift operator. RT2009 Alba Sambade Varela