Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stefan Schlenker, CERN ATLAS Post-LS1 Operations Workshop, 24th June 2013, CERN DCS Tools during ATLAS Run 1 Questionnaire Results Stefan Schlenker, CERN.

Similar presentations


Presentation on theme: "Stefan Schlenker, CERN ATLAS Post-LS1 Operations Workshop, 24th June 2013, CERN DCS Tools during ATLAS Run 1 Questionnaire Results Stefan Schlenker, CERN."— Presentation transcript:

1 Stefan Schlenker, CERN ATLAS Post-LS1 Operations Workshop, 24th June 2013, CERN DCS Tools during ATLAS Run 1 Questionnaire Results Stefan Schlenker, CERN

2 ATLAS Post-LS1 Operations Workshop, 24th June 2013, CERN Detector Control System – Operation Tools Architecture ►Designed to be controlled by single operator at GCS layer Shift operation ►Alarm system – low level ►Finite State machine – high level and navigation Expert operation ►Remote access via WTS/gateway – full access ►Web online monitoring – at a glance ►Web history – glance back in time ►DCS Data Viewer (DDV) – interface to DCS archive on Oracle servers ►Notification framework – manage SMS/email

3 Stefan Schlenker, CERN ATLAS Post-LS1 Operations Workshop, 24th June 2013, CERN System Assignment / Tools DCS Tree  System assignment @end of run 1 ►ID desk: PIX, SCT, TRT, IDE (incl. BCM) ►CAL desk : LAR, TIL, FWD (=LUCID+ZDC+ALFA) ►Muon desk : MDT, CSC, RPC, TGC, MUON (=MUON) ►Data quality desk : LHC (=DQ, LUMI) ►Trigger/DAQ desk : TDQ (TRIGGER, DAQ, L1CALO) ►Slimos & ShiftLeader: CIC, EXT, SAFETY, DCS BE (OPM, DCS) *System covered in questionnaire results (total 16, no ADC) Separation of tools ►Desk Alarms: Assign respective DCS control stations to filter ►Desk FSM: Attach respective tree nodes to artificial desk top node, flexible in reassignment  IDG, CAL, MUO, TI Q: System shifter has DCS tasks? ►14 Yes, all but DCS, DAQ Q: Tools used? ►Desk Alarms: 13 (all but TRIGGER) ►Desk FSM: 14 ►Constant monitoring of specific panel? 8 No, 2 Occasionally (for specific problems), 1 Permanently (RPC rates+tower status) ►Non standard tools used for DCS? No

4 Stefan Schlenker, CERN ATLAS Post-LS1 Operations Workshop, 24th June 2013, CERN Finite State Machine (FSM) State machine hierarchy ►Detector hardware and hierarchical structure represented by FSM entities State, Status, and Command propagation ►State model for devices ( ON,OFF,… ) and logical objects ( READY, NOT_READY,… ), error handling with STATUS ►Propagation upwards (using programmed logic for parent depending on child states), Shifter monitors everything = READY/OK ►Commands propagated downwards Current procedure in SL training: ►If problem cannot be fixed with instructions, contact expert ►Report to ELOG ►If problem cannot be solved on short timescale (<~15min), Disable or Exclude corresponding FSM object(s) in agreement with expert ►In particular Disable/Exclude when in state other than READY ►Interventions should be done by handing over ownership (Exclude) to experts on DCS desk, Include sub-tree when intervention is over

5 Stefan Schlenker, CERN ATLAS Post-LS1 Operations Workshop, 24th June 2013, CERN Finite State Machine: Questions Q: System shifter instructed to use FSM actions? ►Yes: MUON ►Only by expert instruction: L1CALO, LAR (almost never), LUCID, SCT, TIL, TRT ►Never: ALFA, BCM, DQ, LUMI, PIX, TRIGGER Q: Actions on VME crates by whom? ►Experts: all except LUCID, MDT/CSC (also shifters) Q: Lost data because only expert could take action? ►13 No ►1 Yes (VME crate power cycle: expert who was not in reach of computer) Q: Shifters were instructed to disable/exclude parts of tree? ►No: ALFA, BCM, DAQ, L1CALO, LAR, LUCID, LUMI, PIX, TIL, TRT, TRIGGER ►Yes, when instructed by expert: SCT ►Yes, for specific cases from instructions/training: MUON Q: Shifters were instructed to leave FSM in a state other than READY for >1h ►9 No (however, 2 of them inconsistent answer) ►3 Yes (TIL, TRT, LUMI)

6 Stefan Schlenker, CERN ATLAS Post-LS1 Operations Workshop, 24th June 2013, CERN Alarm Screen Concept and tool options ►Alarm = parameter out of good range, ranges defined by experts ►Acknowledgement: ►Only for dedicated alarms ►Makes sure alarm does not disappear before operator action: acknowledge ►Options for each alarm (right-click menu): ►Mask alarm on UI level until the alarm condition goes ►Insert alarm to ELOG with comment ►Display trend plot of value ►Alarm help on Twiki ►Expert actions: change description, alert ranges ►Summary alert: ►Hides several individual alerts in a single alarm entry ►Access to individual alerts via “Details” ►Filters: ►Different sets of filters exist, e.g. sub-detectors ►“Default” filter can be pre-defined per desk Current procedure in SL training: ►Understand alarm, check docu for procedures ►If problem cannot be fixed, contact expert ►If alarm condition cannot be removed on short timescale (<~15min), mask alarm in agreement with expert ►If alarm condition cannot be removed on long timescale (>1week), expert is responsible to deactivate alarm or change alarm limits until problem is solved

7 Stefan Schlenker, CERN ATLAS Post-LS1 Operations Workshop, 24th June 2013, CERN Alarm Screen: Questions Q: Alarms, shifters were told to ignore? ►6 No ►2 Yes, but only during test runs ►3 Yes, but only for a very short time (few minutes) ►2 Yes Q: Acknowledgement: shifter or expert action? ►8 experts only ►1 Only on expert instruction ►2 Only “non mission-critical” alarms ►1 Only for white-listed WENT alarms Q: Mask: shifter or expert action? ►6 experts only ►3 Only on expert instruction ►3 Only for white-listed alarms Q: How often did shifters ignore a critical (FATAL or ERROR) alarm? ►6 Never ►1 Single occurrence (hesitated to call during night) ►2 Few times ►1 25% (ALFA) Q: Alarm authorization scheme should change (who is allowed acknowledge, mask) ►8 No ►2 No, but enforce expert notification (elog) on mask/acknowledge ►1 Shift Leaders should not be able to acknowledge ►1 Only experts should be able to mask ►1 Maybe ►2 No opinion ?

8 Stefan Schlenker, CERN ATLAS Post-LS1 Operations Workshop, 24th June 2013, CERN Web Alarm Help Q: Usage? ►6 No ►6 Yes ►1 YES! Q: Filled? ►2 100% ►4 > 50% ►1 most important ones Concept ►For each alarm type a Twiki page is assigned by naming convention ►DCS help web portal: daily scan of DCS alarms in online system and Twiki server, fills portal page with ►Existing help pages grouped by description ►Not documented alarms suggesting Twiki page for creation ►Alarms with wrong descriptions ►Help portal for individual alarm reachable via direct alarm screen right-click  Twiki can be created from alarm screen or by systematically browsing the portal Q: Experience? ►Quality varies, some shifters don’t know Q: Not used because? ►2 Not necessary ►1 Not aware ►1 Manpower/had no strong need ►1 Small detector

9 Stefan Schlenker, CERN ATLAS Post-LS1 Operations Workshop, 24th June 2013, CERN Automatic Actions Q: Did you rely on automatic DCS actions? ►8 Yes, used regularly ►2 Yes, but rarely or never triggered ►2 No Q: Were automatic actions notified to shifter? ►7 Yes, in FSM or alarm screen ►1 MRS only ►1 No, experts only: log files, SMS, email… Q: Shifter confusion due to automatism? ►2 Sometimes ►2 Rarely ►4 Never Proposal for introducing new DCS tool: ►Central log viewer for DCS messages ►Currently, logs are only written to file but central message passing tool exists including archiving to Oracle ►ATLAS DCS log message types: ►ATLAS DCS log message types: Debug, Information, Action, Warning, Error, Fatal ►Could use for shifter feedback messages and define additional message classes, e.g. “UI Action”, “Procedure Start”, “Procedure End”, … ►Different LogViewer default configuration for ACR or Expert mode, Sub-det filters etc. ►DDV plugin exists  Web-accessible DCS logs ►Discussed already at DCS workshop last year

10 Stefan Schlenker, CERN ATLAS Post-LS1 Operations Workshop, 24th June 2013, CERN Beam Actions, LHC Handshake Q: STANDBY/READY transition automatic? ►automatic ALFA, MUON, SCT ►manual, expert action PIX Q: STABLE BEAMS transition time? ►MUON: 3~5 minutes ►PIX: About 3 min ►SCT: 60 seconds Q: Automate LHC Handshake or keep manual SL action? ►1 Keep SL approval ►Rest: No strong opinion, but keep possibility of disabling automatic mode or have fallback in case of problems Train Injection Energy Ramp Collisions Stable Beams Pixel HV Drift Chamber HV Beam Energy Luminosity Beam Intensity Stable Flag

11 Stefan Schlenker, CERN ATLAS Post-LS1 Operations Workshop, 24th June 2013, CERN DCS Notifications: Questions Q: Did you rely on DCS SMS notifications? ►10 Yes (FSM states, infrastructure alarms etc. all also available to shifters) ►2 No Q: Did you rely on DCS email/Elog notifications? ►8 Yes (equal or less as SMS sending) ►4 No Q: Why automatic notifications? ►3 Redundancy to shift operation (e.g. compensate missing shifter reaction) ►2 Timely reaction by experts on critical issues ►1 Make sure that correct expert is informed ►1 Wake up experts before shifter call Q: Data loss due to notification failure? ►9 No ►1 Not sure Q: Data loss due to no expert reaction to notification? ►9 No ►1 Not sure

12 Stefan Schlenker, CERN ATLAS Post-LS1 Operations Workshop, 24th June 2013, CERN DCS Notification Framework Manage Notifications ►Generic tool to create/modify/manage SMS/email notifications via FSM screen (DCS experts) ►Define ►Addressing (who gets it) using phonebook search ►Flood protection (e.g. only 1 SMS per hour) ►Trigger conditions (e.g. alarms) ►Boundary conditions (stable beams = true) ►Generic handling of activation/deactivation ►Read-only browsing by non- experts (who sends me SMS???) ►Special mode: reports (e.g. send me all FSM nodes of type X which are disabled but OK on every weekday) ►Should facilitate debugging of notification problems by more than one expert ►Ready to be activated for your sub-det

13 Stefan Schlenker, CERN ATLAS Post-LS1 Operations Workshop, 24th June 2013, CERN Remote Tools: Questions Q: Need for additional remote DCS monitoring tools? ►9 No ►1 Applications logs ►1 Smart phone apps?! ►1 Alarm acknowledgement log mySCT On my way! Redirect on-call How to avoid panic Call PL

14 Stefan Schlenker, CERN ATLAS Post-LS1 Operations Workshop, 24th June 2013, CERN DDV: Alarm History Web Alarm History ►Plugin of DcsDataViewer ►Query any time interval for DCS alarms ►Browsing by sub- det/system ►Easily handles ten thousands of alarm events ►Powerful filtering functions for result ►Store search criteria (e.g. last 24h LAR HV trips) and call anytime using single URL https://atlas-ddv.cern.ch

15 Stefan Schlenker, CERN ATLAS Post-LS1 Operations Workshop, 24th June 2013, CERN Discussion Are we ready for a common DCS shifter?


Download ppt "Stefan Schlenker, CERN ATLAS Post-LS1 Operations Workshop, 24th June 2013, CERN DCS Tools during ATLAS Run 1 Questionnaire Results Stefan Schlenker, CERN."

Similar presentations


Ads by Google