Clara Gaspar, April 2012 The LHCb Experiment Control System: Automation concepts & tools.

Slides:



Advertisements
Similar presentations
Zhongxing Telecom Pakistan (Pvt.) Ltd
Advertisements

Distributed Systems Architectures
Chapter 10 Architectural Design.
Chapter 7 System Models.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 4 Computing Platforms.
Processes and Operating Systems
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Chapter 5 Input/Output 5.1 Principles of I/O hardware
1 Chapter 11 I/O Management and Disk Scheduling Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and.
Programming Language Concepts
ALICE DCS, Heidelberg 8 Sept G. De Cataldo, CERN CH and INFN Bari;A. Franco INFN Bari 1 Updating on the HV control systems in ALICE The DELPHI HV,
Configuration management
Campus02.at don't stop thinking about tomorrow DI Anton Scheibelmasser Setubal ICINCO /25 Device integration into automation systems with.
The Detector Control System – FERO related issues
CS 6143 COMPUTER ARCHITECTURE II SPRING 2014 ACM Principles and Practice of Parallel Programming, PPoPP, 2006 Panel Presentations Parallel Processing is.
Operating Systems Operating Systems - Winter 2011 Dr. Melanie Rieback Design and Implementation.
Operating Systems Operating Systems - Winter 2012 Dr. Melanie Rieback Design and Implementation.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
1..
Database System Concepts and Architecture
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
Chapter 2 Entity-Relationship Data Modeling: Tools and Techniques
Chapter 10: The Traditional Approach to Design
Systems Analysis and Design in a Changing World, Fifth Edition
PSSA Preparation.
 2003 Prentice Hall, Inc. All rights reserved. 1 Chapter 13 - Exception Handling Outline 13.1 Introduction 13.2 Exception-Handling Overview 13.3 Other.
TCP/IP Protocol Suite 1 Chapter 18 Upon completion you will be able to: Remote Login: Telnet Understand how TELNET works Understand the role of NVT in.
The Control System for the ATLAS Pixel Detector
Clara Gaspar on behalf of the LHCb Collaboration, “Physics at the LHC and Beyond”, Quy Nhon, Vietnam, August 2014 Challenges and lessons learnt LHCb Operations.
Experiment Control Systems at the LHC An Overview of the System Architecture An Overview of the System Architecture JCOP Framework Overview JCOP Framework.
André Augustinus ALICE Detector Control System  ALICE DCS is responsible for safe, stable and efficient operation of the experiment  Central monitoring.
Clara Gaspar, May 2010 The LHCb Run Control System An Integrated and Homogeneous Control System.
Clara Gaspar, March 2006 LHCb’s Experiment Control System Step by Step.
First attempt of ECS training Work in progress… A lot of material was borrowed! Thanks!
Calo Piquet Training Session - Xvc1 ECS Overview Piquet Training Session Cuvée 2012 Xavier Vilasis.
Designing a HEP Experiment Control System, Lessons to be Learned From 10 Years Evolution and Operation of the DELPHI Experiment. André Augustinus 8 February.
MDT PS DCS for ATLAS Eleni Mountricha
JCOP Workshop September 8th 1999 H.J.Burckhart 1 ATLAS DCS Organization of Detector and Controls Architecture Connection to DAQ Front-end System Practical.
Clara Gaspar, October 2011 The LHCb Experiment Control System: On the path to full automation.
XXVI Workshop on Recent Developments in High Energy Physics and Cosmology Theodoros Argyropoulos NTUA DCS group Ancient Olympia 2008 ATLAS Cathode Strip.
ATLAS MDT Power Supply DCS Anastasios Iliopoulos 15/08/2006.
The Joint COntrols Project Framework Manuel Gonzalez Berges on behalf of the JCOP FW Team.
André Augustinus 10 September 2001 DCS Architecture Issues Food for thoughts and discussion.
André Augustinus 10 October 2005 ALICE Detector Control Status Report A. Augustinus, P. Chochula, G. De Cataldo, L. Jirdén, S. Popescu the DCS team, ALICE.
Control in ATLAS TDAQ Dietrich Liko on behalf of the ATLAS TDAQ Group.
Controls EN-ICE Finite States Machines An introduction Marco Boccioli FSM model(s) of detector control 26 th April 2011.
Clara Gaspar, March 2005 LHCb Online & the Conditions DB.
Bruno Belbute, October 2006 Presentation Rehearsal for the Follow-up meeting of the Protocol between AdI and CERN.
Controls EN-ICE FSM for dummies (…w/ all my respects) 15 th Jan 09.
Management of the LHCb Online Network Based on SCADA System Guoming Liu * †, Niko Neufeld † * University of Ferrara, Italy † CERN, Geneva, Switzerland.
Clara Gaspar, April 2006 LHCb Experiment Control System Scope, Status & Worries.
The DCS Databases Peter Chochula. 31/05/2005Peter Chochula 2 Outline PVSS basics (boring topic but useful if one wants to understand the DCS data flow)
Clara Gaspar, March 2003 Hierarchical Control Demo: Partitioning, Automation and Error Recovery in the (Detector) Control System of LHC Experiments.
André Augustinus 18 March 2002 ALICE Detector Controls Requirements.
M. Caprini IFIN-HH Bucharest DAQ Control and Monitoring - A Software Component Model.
20OCT2009Calo Piquet Training Session - Xvc1 ECS Overview Piquet Training Session Cuvée 2009 Xavier Vilasis.
Clara Gaspar, May 2010 SMI++ A Tool for the Automation of large distributed control systems.
B. Franek, poster presented at Computing in High Energy and Nuclear Physics, Praha – Czech Republic, 21 – 27 March 2009 This framework provides: -A method.
PVSS an industrial tool for slow control
The Finite State Machine toolkit of the JCOP Framework
CMS – The Detector Control System
Controlling a large CPU farm using industrial tools
WinCC-OA Upgrades in LHCb.
The LHCb Run Control System
Philippe Vannerem CERN / EP ICALEPCS - Oct03
Experiment Control System
Tools for the Automation of large distributed control systems
Presentation transcript:

Clara Gaspar, April 2012 The LHCb Experiment Control System: Automation concepts & tools

Clara Gaspar, April 2012 LHCb Experiment 2 LHCb is one of the four LHC experiments Specialized in detecting differences between matter and anti-matter Several sub-detectors p-p collisions every 25 ns Collected by a large Data Acquisition System Filtered by the Trigger System (~1500 PCs)

Clara Gaspar, April The Experiment Control System Detector Channels Front End Electronics Readout Network HLT Farm Storage L0 Experiment Control System DAQ DCS Devices (HV, LV, GAS, Cooling, etc.) External Systems (LHC, Technical Services, Safety, etc) TFC Monitoring Farm Is in charge of the Control and Monitoring of all areas of the experiment

Clara Gaspar, April Some Requirements Large number of devices/IO channels Need for Distributed Hierarchical Control De-composition in Systems, sub-systems, …, Devices Local decision capabilities in sub-systems Large number of independent teams and very different operation modes Need for Partitioning Capabilities (concurrent usage) High Complexity & Non-expert Operators Need for Full Automation of: Standard Procedures Error Recovery Procedures And for Intuitive User Interfaces

Clara Gaspar, April Design Steps In order to achieve an integrated System: Promoted HW Standardization (so that common components could be re-used) Ex.: Mainly two control interfaces to all LHCb electronics Credit Card sized PCs (CCPC) for non-radiation zones A serial protocol (SPECS) for electronics in radiation areas Defined an Architecture That could fit all areas and all aspects of the monitoring and control of the full experiment Provided a Framework An integrated collection of guidelines, tools and components that allowed the development of each sub-system coherently in view of its integration in the complete system

Clara Gaspar, April Generic SW Architecture LV Dev1 LV Dev2 LV DevN DCS SubDetN DCS SubDet2 DCS SubDet1 DCS SubDet1 LV SubDet1 TEMP SubDet1 GAS … … Commands DAQ SubDetN DAQ SubDet2 DAQ SubDet1 DAQ SubDet1 FEE SubDet1 RO FEE Dev1 FEE Dev2 FEE DevN Control Unit Device Unit … … Legend: INFR.TFCLHC ECS HLT Status & Alarms

Clara Gaspar, April The JCOP * Framework is based on: SCADA System - PVSSII for: Device Description (Run-time Database) Device Access (OPC, Profibus, drivers) +DIM Alarm Handling (Generation, Filtering, Masking, etc) Archiving, Logging, Scripting, Trending User Interface Builder Alarm Display, Access Control, etc. SMI++ providing: Abstract behavior modeling (Finite State Machines) Automation & Error Recovery (Rule based system) * – The Joint COntrols Project (between the 4 LHC exp. and the CERN Control Group) The Control Framework Device Units Control Units

Clara Gaspar, April Device Units Provide access to real devices: The FW provides interfaces to all necessary types of devices: LHCb devices: HV channels, Read Out boards, Trigger processes running in the HLT farm or Monitoring tasks for data quality, etc. External devices: the LHC, a gas system, etc. Each device is modeled as a Finite State Machine: Its main interface to the outside world is a State and a (small) set of Actions. Device Unit

Clara Gaspar, April Hierarchical control Each Control Unit: Is defined as one or more Finite State Machines Its interface to outside is also a state and actions Can implement rules based on its childrens states In general it is able to: Include/Exclude children (Partitioning) Excluded nodes can run is stand-alone Implement specific behaviour & Take local decisions Sequence & Automate operations Recover errors User Interfacing Present information and receive commands DCS Muon DCS Tracker DCS … Muon LV Muon GAS Control Unit

Clara Gaspar, April 2012 FW – Graphical Editor SMI++ Objects States & Actions 10 Parallelism, Synchronization Asynchronous Rules

Clara Gaspar, April FW - Run-Time Dynamically generated operation panels (Uniform look and feel) Configurable User Panels and Logos Embedded standard partitioning rules: Take Include Exclude Etc.

Clara Gaspar, April Operation Domains DCS Domain Equipment operation related to a running period (Ex: GAS, Cooling) HV Domain Equipment operation related to the LHC State (Ex: High Voltages) DAQ Domain Equipment operation related to a RUN (Ex: RO board, HLT process) READY STANDBY1 OFF ERROR Recover STANDBY2 RAMPING_STANDBY1 RAMPING_STANDBY2 RAMPING_READY NOT_READY Go_STANDBY1 Go_STANDBY2 Go_READY RUNNING READY NOT_READY StartStop ERRORUNKNOWN Configure Reset Recover CONFIGURING READY OFF ERRORNOT_READY Switch_ONSwitch_OFF RecoverSwitch_OFF FSM templates distributed to all Sub-detectors All Devices and Sub-Systems have been implemented using one of these templates

Clara Gaspar, April 2012 ECS - Automation Some Examples: HLT Control (~1500 PCs) Automatically excludes misbehaving PCs (within limits) Can (re)include PCs at run- time (they get automatically configured and started) 13 RunControl Automatically detects and recovers SubDetector desynchronizations Can Reset SDs when problems detected by monitoring AutoPilot Knows how to start and keep a run going from any state. BigBrother Based on the LHC state: Controls SD Voltages VELO Closure RunControl

Clara Gaspar, April 2012 Run Control 14 Matrix Activity Domain X Sub-Detector Used for Configuring all Sub-Systems

Clara Gaspar, April LHCb Operations Two operators on shift: Data Manager Shift Leader has 2 views of the System: Run Control Big Brother Manages LHC dependencies: SubDetector Voltages VELO Closing Run Control

Clara Gaspar, April ECS: Some numbers DCS SubDetN DCS SubDet1 DCS … DAQ SubDetN DAQ SubDet1 DAQ … HVTFCLHCHLT ECS Size of the Control Tree: Distributed over ~150 PCs ~100 Linux (50 for the HLT) ~ 50 Windows >2000 Control Units >50000 Device Units Run Control Timing Cold Start to Running: 4 minutes Configure all Sub-detectors, Start & Configure ~40000 HLT processes (always done well before PHYSICS) Stop/Start Run: 6 seconds

Clara Gaspar, April ECS Summary LHCb has designed and implemented a coherent and homogeneous control system The Experiment Control System allows to: Configure, Monitor and Operate the Full Experiment Run any combination of sub-detectors in parallel in standalone Some of its main features: Partitioning, Sequencing, Error recovery, Automation Come from the usage of SMI++ (integrated with PVSS) LHCb operations now almost completely automated Operator task is easier (basically only confirmations) DAQ Efficiency improved to ~98%

Clara Gaspar, April 2012 SMI++ A Tool for the Automation of large distributed control systems

Clara Gaspar, April SMI++ Method Classes and Objects Allow the decomposition of a complex system into smaller manageable entities Finite State Machines Allow the modeling of the behavior of each entity and of the interaction between entities in terms of STATES and ACTIONS Rule-based reasoning React to asynchronous events (allow Automation and Error Recovery)

Clara Gaspar, April SMI++ Method (Cont.) SMI++ Objects can be: Abstract (e.g. a Run or the DCS System) Concrete (e.g. a power supply or a temp. sensor) Concrete objects are implemented externally either in "C", C++, or PVSS Logically related objects can be grouped inside "SMI domains" representing a given sub-system

Clara Gaspar, April SMI++ Run-time Environment Proxy Hardware Devices Obj SMI Domain Obj SMI Domain Device Level: Proxies C, C++, PVSS ctrl scripts drive the hardware: deduceState handleCommands Abstract Levels: Domains Internal objects Implement the logical model Dedicated language User Interfaces For User Interaction

Clara Gaspar, April SMI++ - The Language SML –State Management Language Finite State Logic Objects are described as FSMs their main attribute is a STATE Parallelism Actions can be sent in parallel to several objects. Synchronization and Sequencing The user can also wait until actions finish before sending the next one. Asynchronous Rules Actions can be triggered by logical conditions on the state of other objects.

Clara Gaspar, April SML example Device: Sub System:

Clara Gaspar, April SML example (many objs) Devices: Sub System: Objects can be dynamically included/excluded in a Set

Clara Gaspar, April SML example (automation) External Device: Sub System: Objects in different domains can be addressed by: ::

Clara Gaspar, April SMI++ Run-time Tools Proxy Hardware Devices Obj SMI Domain Obj SMI Domain Device Level: Proxies C, C++, PVSS ctrl scripts Use a Run Time Library: smirtl To Communicate with their domain Abstract Levels: Domains A C++ engine: smiSM - reads the translated SML code and instantiates the objects User Interfaces Use a Run Time Library: smiuirtl To communicate with the domains All Tools available on: Windows, Unix (Linux), etc. All Communications are dynamically (re)established

Clara Gaspar, April SMI++ History A top level domain: Big-Brother automatically piloted the experiment 1997: Rewritten in C : Used by BaBar for the Run-Control and high level automation (above EPICS) 2002: Integration with PVSS for use by the 4 LHC exp. 1989: First implemented for DELPHI in ADA Thanks to M. Jonker and B. Franek in Delphi and the CERN DD/OC group (S. Vascotto, P. Vande Vyvre et al.) DELPHI used it in all domains: DAQ, DCS, Trigger, etc. Has become a very powerful, time-tested, robust, toolkit

Clara Gaspar, April Features of SMI++ Task Separation: SMI Proxies execute only basic actions – Minimal intelligence Good practice: Proxies know what to do but not when SMI Objects implement the logic behaviour Advantages: Change the HW -> change only the Proxy Change logic behaviour sequencing and dependency of actions, etc -> change only SMI rules

Clara Gaspar, April Features of SMI++ Sub-system integration SMI++ allows the integration of components at various different levels: Device level (SMI++ All the way to the bottom) Each Device is modeled by a Proxy Any other higher level (simple SMI++ interface) A full Sub-system can be modeled by a Proxy Examples: The Gas Systems (or the LHC) for the LHC experiments Slow Control Sub-systems (EPICS) in BaBar

Clara Gaspar, April Features of SMI++ Distribution and Robustness: SMI Proxies and SMI domains can run distributed over a large number of heterogeneous machines If any process dies/crashes/hangs: Its /dead_state is propagated as current state When a process restarts (even on a different machine) All connections are dynamically re-established Proxies should re-calculate their states SMI Objects will start in /initial_state and can recover their current state (if rules are correct)

Clara Gaspar, April Features of SMI++ Error Recovery Mechanism Bottom Up SMI Objects react to changes of their children In an event-driven, asynchronous, fashion Distributed Each Sub-System can recover its errors Normally each team knows how to recover local errors Hierarchical/Parallel recovery Can provide complete automation even for very large systems

Clara Gaspar, April Conclusions SMI++ is: A well tested, and very robust tool Not only a Finite State Machine toolkit But has also Expert System capabilities Advantage: Decentralized and distributed knowledge base Heavily used in BaBar and by the 4 LHC experiments (they depend on it)

Clara Gaspar, April Spare slides

Clara Gaspar, April SMI++ Declarations Classes, Objects and ObjectSets class: [/associated] … object: is_of_class objectset: [{obj1, obj2, …, objn}]

Clara Gaspar, April SMI++ Parameters SMI Objects can have parameters, ex: int n_events, string error_type Possible types: int, float, string For concrete objects Parameters are set by the proxy (they are passed to the SMI domain with the state) Parameters are a convenient way to pass extra information up in the hierarchy

Clara Gaspar, April SMI++ States state: [/ ] /initial_state For abstract objects only, the state the object takes when it first starts up /dead_state For associated objects only, the state the object takes when the proxy or the external domain is not running

Clara Gaspar, April SMI++ Whens Set of conditions that will trigger an object transition. "when"s are executed in the order they are declared (if one fires, the others will not be executed). state: when ( ) do when ( ) move_to

Clara Gaspar, April SMI++ Conditions Evaluate the states of objects or objectsets ( [not_]in_state ) ( [not_]in_state {,, …}) (all_in [not_]in_state ) (all_in [not_]in_state {,, …}) (any_in [not_]in_state ) (any_in [not_]in_state {,, …}) ( and|or )

Clara Gaspar, April SMI++ Actions action: [(parameters)] If an object receives an undeclared action (in the current state) the action is ignored. Actions can accept parameters, ex: action: START_RUN (string run_type, int run_nr) Parameter types: int, float and string If the object is a concrete object The parameters are sent to the proxy with the action Action Parameters are a convenient way to send extra information down the hierarchy

Clara Gaspar, April SMI++ Instructions insert in remove from set = set =. set =

Clara Gaspar, April SMI++ Instructions Instruction Sends a command to an object. Do is non-blocking, several consecutive "do"s will proceed in parallel. do [( )] do [( )] all_in examples: do START_RUN (run_type = "PHYSICS", run_nr = 123) X action: START (string type) do START_RUN (run_type = type) EVT_BUILDER

Clara Gaspar, April SMI++ Instructions Instruction "if"s can be blocking if the objects involved in the condition are "transiting". The condition will be evaluated when all objects reach a stable state. if then else endif

Clara Gaspar, April SMI++ Instructions Instruction "move_to" terminates an action or a when statement. It sends the object directly to the specified state. action: … move_to when ( ) move_to

Clara Gaspar, April Future Developments SML Language Parameter Arithmetics set = + 2 if ( == 5) wait(<obj_list) for instruction for (dev in DEVICES) if (dev in_state ERROR) then do RESET dev endif endfor

Clara Gaspar, April SML – The Language An SML file corresponds to an SMI Domain. This file describes: The objects contained in the domain For Abstract objects: The states & actions of each The detailed description of the logic behaviour of the object For Concrete or External (Associated) objects The declaration of states & actions