CV 75: Cern (summer student) –drift-chambers (i.e. particle detector) tests 76: UvA/Cern (‘diploma work for IT -option’) –Development of detector readout electronics. (4 ns precision hit recording system) 77-82: Charm neutrino experiment (PhD student) –DAQ system, physics analysis, thesis. 83-86: SLAC ASP experiment (Post-Doc) –Detector building, Readout Electronics, DAQ and monitoring, physics analysis. 86-93: Delphi (Fellow / Staff) –DAQ, monitoring, control. 93-99: SL/OP EIC –LEP and SPS operations + control software, timing, RSO 99- ?: SL/CO –SPS2001, …
Current Activities & Interests SPS2001 related –Architecture –Parameter Maintenance –Contracts –Device Server Contract Support Rocs/mugef – BT device servers Timing SPS interlock RT feedback
SPS2001 Project Objectives: Design and implement the operational software to run the SPS in the LHC era Most important: deal with multi cycling environment. It is not: –just a matter of cycle scheduling (aka timing). If it were, it it would have been solved already a long time ago. –rewriting the actual SPS software with the latest software technologies.
SPS2001 Project Major problems: -Management of multiple resident cycles in the equipment - … and in the application layer -Parameter maintenance that is capable to manage multiple instances of the same cycle but with different settings. -Protection system (interlock) that is cycle aware. -Provide input for an efficient scheduling of the CERN accelerator complex (CBCM). -Unify three underlying control systems (sps, orbit, tz) -Eradicate c-tree database -Base control system on subscription not on polling (eradicate sl-equip) In the implementation aim for a maximum of reusability, and homogeneity, reduce inter-component coupling Build the control system up out of generic components.
SPS2001 Architecture SPS Equipment domains SPS2001 compliant device SPS2001 compliant device SPS2001 compliant device SPS2001 compliant device SPS2001 compliant device SPS2001 compliant device SPS2001 compliant device SPS2001 compliant device SPS2001 compliant device SPS2001 compliant device Equipment to logical entity mapping layer. Logical State Devices Measurement Device Manager Logical State Devices Logical State Device Cycle Configuration Manager Parameter Loader Measurement Device Manager Cycle generation domains Beam Process Model & Setting Generator Machine Equipment Model Optimisation and surveillance domains Feed Forward and corrections Fixed Display generator Data logging External domains Central Alarm System Central Beam and Cycle Manager Legacy Software Interactive Applications Operator Work Context Parameter Editing Tools Interactive Optimisation Tools. Generic equipment diagnostics Cycle Exploitation Applications Standard Parameter Display Exploitation domains User Request Handler Interlock Handler Local Beam and Cycle Manager Exploitation Cycle and Sequence Manager Beam process, Cycle and Sequence Repository Parameter maintenance domains Trim Manager Parameter Manager and Translator Parameter Repository and Version Manager Exploitation domain: Optimises SPS cycle usage Handles user requests Responds to interlock conditions Machine Access Parameter Maintenance Domain: Trim Server Settings translation Trim propagation Beam optimization domain Correction procedures Surveillance Logging and reporting
PC Ref: IQ.f1, IQ.f2, IQ.d Beam Parameters Non correctable Operator Expert Equipment Parameters Non correctable Operator Expert HW Send to HW Example of a business application Tune matix Strength: KQ.f1, KQ.f2 Q.ph62 (Phase.h 6-2) Energy Q.h, Q.v PC Transfer Function: Vref(I) Magnet Currents IQ.f1, IQ.f2, IQ.d GQ.f1, GQ.f2, GQ.d Equipment model Beam model Strength: KQ.f, KQ.d Calibration: I(G) … a minor detail of the overall SPS parameter model:
Beam Parameters Non correctable Operator Expert Equipment Parameters Non correctable Operator Expert HW Send to HW Example of a business application Parameter maintenance application Translate high level (machine physics) settings into device settings Keeps track of trim archive Propagate trims to related cycles depending on trim context SPS2001 compliant device SPS2001 compliant device SPS2001 compliant device SPS2001 compliant device SPS2001 compliant device Data Base/
What and Why of Device Contracts ? Accelerator equipment is composed of a large variety of different types such as: –Magnets, RF, Kickers, Beam-obstacles, Beam-observation The control of this equipment deals with various aspects such as: –State Control –Setting Management –Measurements –Cycle Organisation –Expert Settings and Options –Diagnostics and Reporting –etc The Device Contracts provides the framework to provide a homogeneous view of a heterogeneous set of devices.
Device model One could be tempted to arrange accelerator control equipment into a hierarchical class model (i.e. based on single inheritance). AcceleratorDevice ObservationDeviceControlDevice Orbit Tune Etc Cycle Oriented ControlDevice NonCycling ControlDevice Vacuum Stopper Kicker RF Magnet Extraction Correctors Quads MAIN Magnets However, this organization does not reflect the usage pattern by the business applications.
Multiple inheritance It is more appropriate for Accelerator control equipment inherit from one or more “super-classes” that represent general control aspects: RFOrbitKickerStopperVacuumTuneMagnets The applications only care about these type of devices. StateDeviceSettingDeviceMeasurementDeviceCycleOrientedDevice… Every such “general control aspect” is represented by a specific device contract. An accelerator device implements the contracts that are required for its functionality.
Why contracts ? Contracts is NOT middleware, it is based on middleware: Standardisation on communication and a get-/set- “property” device model (aka middleware) is not enough. Need to define the semantics (I.e. which device properties are used, how these properties are expected to behave). This should not be left to the individual equipment groups: –The SPS equipment has a heterogynous interface and semantics. This has led to the need of black boxes (I.e. equipment specific driver routines), which provided some level of homogenisation of the equipment to the application layer. Contracts lay down the semantics of the device property model used by the SPS2001 applications. They avoid the need for black boxes (which makes the applications much simpler).
What do the SPS2001 contracts provide ? In general: Definition of group of logical functionality with a protocol for information exchange. Support data subscription. Provide methods to obtain device and data description from the severs. Advantages No need to know the equipment type, equipment self-description is part of the contract. Examples: –BT, PC, RF & BI equipment, can all use the same interface for loading cycle settings –Virtual devices can be build from physical devices easily (simple recipes because all equipment is access identically) Note: Contracts are used by logical equipment as well.
Example: Device Contracts Usage Physical device PC Physical device Kicker Physical device PC Physical device Stopper Physical device You name it MCSM- Control SPS2001 application MCSM Logical device SPS_ring Logical device TT10_proton SPS_ring: State Operational Stopper out, QF1 on, etc State Beam Safe QF1 on, etc State Shutdown etc Note that these logical devices act in two directions: Commands propagate down Status propagate up
Example: Device Contracts Usage Physical device PC Physical device Kicker Physical device PC Physical device Stopper Physical device You name it SPS2001 application Setting Management Logical device Settings Repository In sending a setting to the hardware we do not have to understand the contents of the settings. We have to care about coherence of a setting update: data acceptance check HW commit
Device Contracts implementation Device contracts are developed and implemented by SPS2001 according to the needs of the SPS2001 applications. –They are implemented as a layer in top of the middleware (both on the client and on the server). This layer has its own API and syntax. –The device contracts are independent of the underlying middleware, i.e. they hide the middleware. It can be changed transparently to the contract users. There is a rearrangement in responsibility: –To help equipment groups with a correct implementation of the contracts, the SPS2001 project offers a library with common code. This common code provides common solutions for common problems. Example: Resident cycle management, commit handling The common code comes with its own syntax (API). –Equipment specific processing are moved into the device servers. Example: Switch on procedures.
In Java the contracts are formalized through the definitions of interfaces. How contracts are implemented: Abstract Contract Class (common code) Factory Specific Contract Class (middleware dependant) Application Specific Contract Class e.g. VMC.SD Emulating Contract Class Network connection Contract Interface: (I.e. what the user of the class sees) An abstract class is provided that implements all the common behavior of the contracts. This class is extended by the device import classes. A class representing an SPS2001 device will also extend from these abstract classes.
How contracts are implemented: Abstract Contract Export Class (common code) Factory Specific Contract Export Class (middleware dependant) Identification Interface: (I.e. what a device server has to provide for exporting a device) Network connection Abstract Contract Class (common code) However, many device servers implementations will extend from the Abstract Contracts… Exporting in Java: An abstract class is provided that contains all the common code to export the SPS2001 contracts. A specific MW dependent Export factory will extend these abstract classes.
How contracts are implemented: Contract Export code Middleware Specific library Network Contract Support (common code) Device Server code C-framework library In the C framework there is no formal definition of a Contract. However, there is a lot of common code to provide a contract implementation in C. The C-framework provides the equivalent of the export code and the abstract java class. The user code does not extend this abstract class, but registers callback with the framework. (Note: all callbacks have a strong typed signature). The callback ensure the communication with the real device using a device specific communication mechanism (e.g. vme, pci, shared memory, message queues, …) The black box is in the device server. Its signature is assured by the code of the contract support library. The contract support library also avoids duplication of implementation effort. VME Message queue
Contracts overview Identification (mandatory for all devices) –Identifies a particular device –Identifies the implemented contracts of the device and other device constants. –Identifies associated devices (used in tree browsers) StateManagement –Request state change (reply through transaction and/or state change) –Subscribe state changes –Get state –Get detailed state (subcomponents state) –Get State Model (both main state and subcomponents state) –The state model defines which states are possible and which states can be requested. They also define the mapping between a binary format and a user readable format.
Contracts overview Data Catalogue contracts A data catalogue provides a list of properties published by a device Can get/set/subscribe to these properties Can obtain a data description of these device properties DeviceData (intended usage: cycle independent expert parameters) –Read/write –Restricted access (through the security contract) SettingData (intended usage: operational settings) –Read/write/subscribe –Cycle aware (i.e. data is specific for a given cycle) –Transaction handling including commit (through a timing message). MeasurementData –Read/write-filter/subscribe –Cycle aware –Data may be filtered on a per user defined criteria
Contracts overview Others CycleManagement (cycle space management of cycle oriented equipment) Expert Security Transaction Management Diagnostic Logging and Alarms Configuration
DataCatalogues in more detail A data catalogue provides a list of DataEntries with their format descriptions. Example: GefFunction: a table of undefined depth with a time and a value column. DataCatalogue Entries are described in XML format: Example:
DataCatalogues in more detail This XML description is used to: –generate C structures and wrapper code (to enforce strong typing) that can be used by the device server. –java meta data for creating jtable model + in the future: –generate data validation functions (for the device servers) –generate device specific java wrapper classes based on generic DataContainer classes. –generate database mappings (to archive any device data on database tables) –Assist in generic filtering
XML based data description Reminder: XML is used to describe the data entry points (properties) format, not to encode the data itself. Advantages of XML data description: Format is independent of a database. However, mapping between an XML-based data description and a database resident data description is possible. (I.e. one does not exclude the other) There is never a mismatch between the device server and its data description. Simple device explorers can be used to discover the data (no need for database connections). XML is source code and can be subjected to version management.
XML Parsing how it works XML provides an excellent syntax for defining configuration information. Using SAX parsers (Simple Api for Xml) the contents of an xml text can be decoded and stored in an object: XML: a table class Table String name; String comment; Column theColumns; class Column String name; Sax parsing:
XML Parsing: the schema In the example of previous slide, the definition of the class have to match the syntax of the xml. To enforce that an xml text obeys the syntax we can define a schema. (Today a proper schema language itself is based on XML) XML: a table class Table String name; String comment; Column theColumns; class Column String name; Sax parsing (schema): etc…
XML Parsing: the schema compiler The schema can be used to enforce that the xml code matches the target class, but it can do more… It can be used to generate the target class itself, including a constructor that will invoke the sax parser and creates the objects. XML: a table class Table String name; String comment; Column theColumns; Table(String XML) class Column String name; Constructor() etc… object Schema Compiler
transaction handling An important aspect of setting management in the devices is transaction handling. (Transaction Handling is a standard feature of the Setting DataCatalogue entries.) Transaction handing provides coherent setting updates : All settings take effect on the same cycle. Transaction handing aims at optimizing bulk setting updates (100 - 2000 settings). The client does not need to wait for the individual replies. –It is based on asynchronous communication (sends). –The return status is obtained by subscription to the transaction property (a property shared with all devices in a device server). –The transaction property on the server also keeps track of the pending update requests and accepts commands to abort or commit the pending requests.
transaction handling From the client perspective: –Get a transaction object from the device access package. (Get a hardware commit key and give it to the transaction object.) –Get the cycle identification These three items form part of the user context which is passed when sending data to the equipment. –Send data to one or more device, (Note, data is send asynchronously, the completion status will be communicated to the transaction object.) –Check (and wait) for the completion status of the transaction object. The transaction object can tell which of the equipment failed or rejected the request. –Either commit or abort the transaction object: Commit: the transaction uses the HW commit key Abort: the transaction knows which equipment needs to be informed.
Contracts definition and implementation status Contract:definedimplemented (server C- support) Identification full Expert full StateManagement full DataCatalogues (3) full Cycle Management rudimentary Security no Transaction Management full Diagnostic-- Logging and Alarms-- Configuration full
Future developments Enhancement of the data description syntax of the data catalogue contracts. (Discussed in the data container presentation). Development of diagnostics and alarm/logging contract. Discussion: These device contracts could also be used by some of the business applications as their primary interface. Example one could create a virtual device SPS that has a data catalogue with entries of all trimmable parameters.
Device Server implementations To assist equipment groups with a correct implementation of the contracts, the SPS2001 project offers a library with common code. Example: Resident cycle management, commit handling The implementation of a device server calls functions of the support library to register properties and to declare set- get- property callbacks. (c version) (or overwrites virtual methods (future c++ version) a callbacks can: a)do the work themselves directly (i.e. talk directly to the VME devices) b)Use different threads to schedule the real work. c)Communicate with other processes through shared memory and other mechanisms. Option a) and b) is used by BT, option b) and c) is used by PO.
Device Server implementations. Implementation by SPS equipment groups: BT: –MKP device server in production. –Various other device servers (for beam obstacles) ready for deployment. PO: –Implementation support provided by me. (manpower shortage in the equipment group). –Contracts are implemented based on a local SPS2001 server. –Improve some of the the underlying processes of the power converters to make a local integration of the SPS2001 device server more efficient and effective. (See next slide) PowerConveter state status/control Setting Management
The rocs system What makes up a rocs/mugef system VME crate with: SAC PPC (currently on lynxos 2.5.1) themis battery backed up memory board TG8 1-8 ramp cards (up to 64 channels) 1-4 adc cards (up to 64*2 channels (ref, dcct) statophone controler (to control hw status)
SPS2001 architecture SPS2001 device server SPS2001 device m3sba3 SPS2001 device server SPS2001 device SPS2001 device server SPS2001 device mkp rf SPS2001 device server SPS2001 device SPS2001 device server SPS2001 device SPS2001 device server SPS2001 device m.sba. SPS2001 state devices SPS RING TT60 TT20 TT10 InterlockAlarmConsole Display measurement devices SPS RING TT60 TT20 TT10 Function Loader SPS RING TT60 TT20 TT10
ROCS software Motivation: The involvement with the rocs/mugef system was motivated by the implementation work of the SPS2001 device server for the PowerConverters. Some work was started in september 2001 by John Brazier and me. I took it up again in the beginning of May.
The rocs clients The initial SPS2001 server was developed as a regular EQUIP client. However, as such it was not alone. Other clients are: SSIS (poling individual status channels) Nodal alarm survey TZ survey for pvss/ST Steering CMW client This has made some of the bottlenecks apparent.
Rocs architecture or where to hang the SPS2001 server mh SL_Equip client mugef server rocsfe sequence thread timing acquire floader stato Startup SPS2001 server action SPS2001 clients statoSurvey Rocs system
The stato process The stato process reads the channel status on command: Reading one channel nominally takes 150 ms but this has been shortened to 40 ms + 110 ms dead time. –Reading many channels in one request takes 50 ms per channel. When reading in individual requests, this takes 150 msec. Reading the whole crate ‘en-bloque’ takes 3.2 seconds! A command takes 400 ms + 150 msec overhead. The SSIS process was the most sensitive to the bottlenecks. (a bit to detailed to explain how/why)
In trying to understand the delay the timeouts of the stato process (which reads the pstat) I discovered the simplicity of this process (750 lines) how ‘trivial’ it would be to upgrade the Stato process into a statoSurvey process in a transparent way.
statoSurvey The statoSurvey process continuously reads all the channels and caches the values in a local store to serve the clients. It has two threads: The client thread, which accept client requests: commands to execute. Inform the surveyance thread and (optionally) wait for this thread to complete the command. Status request, if the local cache contains a valid entry, this value is returned, otherwise a stato busy status is returned. The surveyance thread: If there is a command to execute, execute the command, declare the changed pstat values invalid. Wait for hw busy to be cleared. Execution time 400ms/channel + 150 msec. Else read the next active channel and updates the cache. Execution time 40 msec.
statoSurvey implementation statoSurveyance Survey-Thread Client- Thread Statophon HW PSTAT cache Shared memory (read only for others) Sequence taskSPS2001 server Anonymous data signal: Signals report number updates to “whom it may concern”
Implementation details Note: J.Brazier has often suggestion this approach as an improvement for the pstat reading. Time involved: Code writing1.5 days Diagnostics tools1.5 days Debugging1.5 days Deployment0.5 day The code is developed as a new version for the two files involved in the stato process, plus additional code for the diagnostics tools. NO modifications to other processes were needed. The code is to be handed over and to be added to the rocs software repository.
Performance The new version was started up on four machines m1sb80 m3sba2 m4sba6 m2sba6 Note: The restart was done 'manual', which implies that the old version will come back if the machines are rebooted. The stato fix has improved the situation: Access time by the SSIS client reduced from 5 seconds to 1.5 second. (dixit A.Bland).
Floader Implementation (in progres) Based on the same principle components as the statoSurvey process: Client thread to accept and execute commands (reply is optional) A high priority thread to respond to timing interrupts Structured shared memory segments (read only access to outside) with status information Anonymous signal mechanism to inform anyone that something has changed. More complications: Manage memory on HW boards Manage memory of persistent memory boards Reply to external events Accept rt-corrections (one more thread)
Floader Implementation (in progres) Implementation in C++ Class to manage shareable memory managed data stores position independent memory management tool Class to manage setting and setting transaction: A shareable transaction store A shareable persistent setting store Multiple shareable hw resident setting stores (Allows binding of multiple events to the same setting, special rocs/mugef feature for coast/economy) These are generic classes which are then specialized into Rocs specific classes.
Floader Implementation floader StartFunc-Thread Client-Thread Ramp Cards Setting and transaction cache Shared memory (read only for others) Sequence taskSPS2001 server Anonymous data signal: Signals report number updates to “whom it may concern” Timing task (existing)
A failsafe solution The mugef warning events are sent 101 ms before the start of the function. The event will wake up a rocs process to update the start address on all the ramp cards to point of the correct function address. The same event will generate in the TG8 the hardware start signal for the ramp cards with a delay of 100 ms (1 msec before the start of the new function). If the rocs process to update the start address did not do its work, the ramp cards will re-execute the previous function. This all with possibly unpleasant consequences for the beam pipe.
A failsafe solution To avoid the previous failure make a hardware protection that demands a positive action from the rocs process before a function start is accepted: Beam Abort TG8 Rocs Ramp Cards 101 ms one-shot 5: start 2: Warning -101 ms 4: arm 3: Update function start addresses 1: event