Presentation is loading. Please wait.

Presentation is loading. Please wait.

Control Room and Shift Operations: CMS Greg Rakness (CMS Deputy Run Coordinator) University of California, Los Angeles ATLAS Post-LS1 Operations Workshop.

Similar presentations


Presentation on theme: "Control Room and Shift Operations: CMS Greg Rakness (CMS Deputy Run Coordinator) University of California, Los Angeles ATLAS Post-LS1 Operations Workshop."— Presentation transcript:

1 Control Room and Shift Operations: CMS Greg Rakness (CMS Deputy Run Coordinator) University of California, Los Angeles ATLAS Post-LS1 Operations Workshop CERN 24 June 2013 https://indico.cern.ch/conferenceDisplay.py?confId=256916

2 The Compact Muon Solenoid (CMS) is as “big” as ATLAS 24 June 2013G. Rakness (UCLA)2 Not as big in the linear dimension, but certainly as big in data volume (and author list)… So, some aspects of our situation may feel familiar…

3 Personnel 24 June 2013G. Rakness (UCLA)3

4 CMS shift hours 3 shifts per day (in sync with the LHC shifts) – 07:00-15:00 “day” – 15:00-23:00 “evening” – 23:00-07:00 “night” 24 June 2013G. Rakness (UCLA)4

5 CMS shift crew in the p5 control room Beginning of 2010: 13 shifters during normal operations – 8 subsystem shifters – 5 central shifters (see below) – 13 shifters x 3 shifts/day = 39 collaborators per day... By the end of 2010: reduce from 13 to 5 “central” shifters – Shift Leader – Data Quality Monitor – Trigger – Data Acquisition – Detector Control System (DCS) When not running (e.g., overnight during Technical Stops, or now), minimum shift crew required when equipment is powered on in the experiment cavern… – DCS (to monitor detector conditions) – Shift Leader (at least 2 people needed for personnel safety) 24 June 2013G. Rakness (UCLA)5

6 CMS crew “beyond” the p5 control room Run Field Managers – Set the daily and weekly run plan and facilitate its execution – Provide continuity from one shift to the next, advise shift leaders – Lead the daily run meeting – Communicate with Run Coordinators when issues or questions arise – Another way to think of this role is “super shift-leader” – Two Run Field Managers on-duty at all times – Term = 3 weeks Detector On-Call (DOC) – If shift crew has a problem with a subsystem, they call the DOC… – 15 DOCs on-duty at all times, one per subsystem – Term ~ one week 24 June 2013G. Rakness (UCLA)6

7 Candidates and training We specify a “preferred profile” for each central shifter (note: this is not strictly enforced) – Run Field Manager: invited personally by run coordinator – Shift Leader: certain level of seniority and experience – DAQ: motivated to gain insight into a modern DAQ system – Trigger: interest in trigger logic and web-based services – DCS: experience with detector development, integration, and/or slow control – DQM: experience with data analysis and/or detector performance assessment Training done separately for each shift role – Classroom training, some include practical test – One block of training shifts 24 June 2013G. Rakness (UCLA)7

8 Filling Shifts In order to fill central shifts, we had to separate the service work performed by CMS institutes into two categories – Shifts – Other service work In order to achieve a satisfactory level of shifter experience, we require N shifts before a person’s credits apply as service work – Normally N  20 – Conversion from shifts to credits depends on the type of shifts taken E.g., weekend = 1.25 credits, night = 1.5 credits, weekend night = 2.0 credits… Shift sign-up blocks constrained by CERN rules – No more than 5 night shifts in a row – At least one day rest in any 7 days period – Any two shifts must be separated by at least 16 hours – These rules are a (minor) source of complaints, mainly by those who travel to CERN specifically to perform shifts 24 June 2013G. Rakness (UCLA)8

9 Shift statistics in 2012-2013 9 Minimum quota 32 credits~21 shifts Minimum quota 21 shifts

10 Subsystem personnel Operations Manager – If Run Coordinators have a request or question about a subsystem, they call the Operations Manager… – Typical term >~ 1 year Detector On-Call (DOC) – If shift crew has a problem with a subsystem, they call the DOC… – 15 DOCs on-duty at all times representing all critical systems – Rotate ~once per week On-call experts – If the DOC or Operations Manager has a problem, she calls the subsystem expert… – Experts are “free” to act on the system remotely in case of problem (no strict access control to the CMS network) 24 June 2013G. Rakness (UCLA)10 These roles are filled within each subsystem

11 Transportation of crew CMS is on the other side of LHC – Rely on the CERN shuttle to transport shifters between Meyrin and p5 (45 minute ride) – http://cern.ch/ShuttleService http://cern.ch/ShuttleService 24 June 2013G. Rakness (UCLA)11 ATLAS ALICE LHCb CMS

12 CMS control room Subsystem area – Since p5 is so far away, experts tend to stay in the control room longer when they are there Central area – Focus of activity during standard operations 24 June 2013G. Rakness (UCLA)12 PIXTRK CSC DT RPC HCAL ECAL Alignment DCS BRM DAQ SL TRG DQM Magnet

13 24 June 2013G. Rakness (UCLA)13 http://acr.web.cern.ch/acr/ACR_Layout.htm Apparently the “Compact” in CMS describes both the detector and the control room… Let’s compare the ATLAS and CMS control rooms… CMS ATLAS

14 CMS Centre 24 June 2013G. Rakness (UCLA)14 Computers, meeting rooms, tables, coffee nearby… Location of offline Data Quality Monitoring shifts Also used for some “analysis marathons” before major conferences… Located at Bldg. 354 Meyrin

15 Meetings Daily 9:30 meeting at point 5 – Focus on previous 24 hours and following 24 hours – LHC report from Run Coordinators, CMS overall report by Run Field Manager, round table report from each subsystem DOC – Meet 7 days per week during LHC running, even during Technical Stops (canceled on weekends/holidays if not needed) – If the 8:30 LHC meeting runs long, we have to rush to point 5 in order to make it to the 9:30 CMS meeting… Weekly Run Meeting at Meyrin – Summary of the week, topical discussions, longer term planning – Normally attended by Operations Managers, but expect that any CMS collaborator might attend this meeting… We use the same Vidyo booking for both meetings 24 June 2013G. Rakness (UCLA)15

16 Operation 24 June 2013G. Rakness (UCLA)16

17 CMS lifecycle defined by the LHC fill The users of the LHC modes will include… – Experiments… – The modes are also used by the Detector Control System (DCS)… 24 June 2013G. Rakness (UCLA)17 From https://edms.cern.ch/document/1070479...https://edms.cern.ch/document/1070479 “The mode will be made available by a number of channels. These will include… DIP”

18 Things not automated Shift leader checklist (twiki) – A number of items must be done by the shift crew depending on the state of the machine White board (20 th century technology) – We have found this is still the best way to… communicate short-term instructions from shift-to-shift remember the CCC phone number 24 June 2013G. Rakness (UCLA)18

19 Monitoring and alarms Over the years, system monitoring and alarms were often implemented in an ad-hoc way to expeditiously satisfy specific needs… – E.g., DCS alarms are different from DAQ alarms – Found that audio alarms are an effective way to alert the shift crew of a crucial problem (set threshold correctly) Presently working to overhaul system… – … rationalize information into the database – … factorize source from display –... more easily establish cause-effect – Timescale: 2015 24 June 2013G. Rakness (UCLA)19

20 Evolution to automation New in 2012: detector HV-state fully based on Machine/Accelerator mode 2015 plan: Run Settings (clock, trigger, thresholds, …) to be fully based on Machine/Accelerator mode 24 June 2013G. Rakness (UCLA)20 HV turn on automated It’s true: it is inefficient when humans touch the system… Time between “Stable Beams” and silicon tracker ON (min)

21 Automated soft error recovery Radiation from proton collisions causes single event effects in detector electronics – Well-known phenomena accounted for in design of CMS – Impact of effects range from not noticeable to stopping the run – Started to become an issue with increasing luminosity in 2011… 2012: full commissioning of automatic soft error recovery – Depending on the error and the system, this is done via hardware or software means This will remain an issue for the rest of the lifetime of CMS – Systems continue to automate recovery from known problems 24 June 2013G. Rakness (UCLA)21

22 What does a typical fill look like? Look at the last two fills before the LHCC in Dec. Fill 3363  163.8/pb recorded 97.0% data recording efficiency 2 stops of data taking (manual) 3 software recoveries (automatic) 578 hardware recoveries (automatic) Fill 3370  74.8/pb recorded 97.6% data recording efficiency 0 stops of data taking (manual) 1 software recovery (automatic) 281 hardware recoveries (automatic) In 2010, each error would have required manual intervention… 24 June 2013G. Rakness (UCLA)22

23 Running efficiency per year CMS recorded 92.2% of 44/pb in 2010… 24 June 2013G. Rakness (UCLA)23

24 Running efficiency per year … then 90.5% of 6/fb in 2011… CMS recorded 92.2% of 44/pb in 2010… 24 June 2013G. Rakness (UCLA)24

25 Running efficiency per year … then 90.5% of 6/fb in 2011… CMS recorded 92.2% of 44/pb in 2010… … then 93.5% of 23/fb in 2012… 24 June 2013G. Rakness (UCLA)25

26 Running efficiency per year … then 90.5% of 6/fb in 2011… CMS recorded 92.2% of 44/pb in 2010… … then 93.5% of 23/fb in 2012… This high number was the result of a lot of hard work by a lot of smart people! 24 June 2013G. Rakness (UCLA)26


Download ppt "Control Room and Shift Operations: CMS Greg Rakness (CMS Deputy Run Coordinator) University of California, Los Angeles ATLAS Post-LS1 Operations Workshop."

Similar presentations


Ads by Google