Control Room and Shift Operations: CMS Greg Rakness (CMS Deputy Run Coordinator) University of California, Los Angeles ATLAS Post-LS1 Operations Workshop.

Slides:



Advertisements
Similar presentations
Anatoli Romaniouk, TRT IB meeting October 30 th 2013 TRT Tasks/Responsibilities/Manpow er issues. 1.
Advertisements

Clara Gaspar on behalf of the LHCb Collaboration, “Physics at the LHC and Beyond”, Quy Nhon, Vietnam, August 2014 Challenges and lessons learnt LHCb Operations.
EMCal DCS and related Issues with Shutdown/Installation B. S. Nilsen M. Cherney J. Fujita.
Peter Chochula, January 31, 2006  Motivation for this meeting: Get together experts from different fields See what do we know See what is missing See.
CMS ECAL Prompt Feedback Group Toyoko J. Orimoto, California Institute of Technology A Brief History: From GRUMM to First BeamThe Future of the ECAL PFG.
Period Run Coordinator Shift Guide Period Run Coordinator Shift Guide Updated on PRC Guide.
Clara Gaspar, May 2010 The LHCb Run Control System An Integrated and Homogeneous Control System.
From Olivier to commissioning team plans for the start-up of regular operations of LHCb 30/06 to 4/07 : Global commissioning week, all detectors, full.
ALICE Electronic Logbook MEST-CT Vasco Barroso PH/AID.
Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition.
Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition.
Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition.
Muon Piquet Training. Intro - Disclaimer  This presentation is not intended to be a complete description of the system but a guide, possibly simple,
06/03/06Calice TB preparation1 HCAL test beam monitoring - online plots & fast analysis - - what do we want to monitor - how do we want to store & communicate.
Requirements Review – July 21, Requirements for CMS Patricia McBride July 21, 2005.
11 CTP Training A.Jusko, M. Krivda and R.Lietava..
Web Based Monitoring DT online shifter tutorial Jesús Puerta-Pelayo CIEMAT Muon_Barrel_Workshop_07/July/10.
Clara Gaspar, October 2011 The LHCb Experiment Control System: On the path to full automation.
DCS Workshop - L.Jirdén1 ALICE DCS PROJECT ORGANIZATION - a proposal - u Project Goals u Organizational Layout u Technical Layout u Deliverables.
André Augustinus 10 September 2001 DCS Architecture Issues Food for thoughts and discussion.
André Augustinus 17 June 2002 Technology Overview What is out there to fulfil our requirements? (with thanks to Tarek)
Offline shifter training tutorial L. Betev February 19, 2009.
ATLAS in LHCC report from ATLAS –ATLAS Distributed Computing has been working at large scale Thanks to great efforts from shifters.
Experimental equipment interacting with beam operation D. Macina TS/LEA Many thanks to my colleagues both from the experiments and the machine for their.
André Augustinus 21 June 2004 DCS Workshop Detector DCS overview Status and Progress.
Cover for special hardcover edition of Phys. Lett. B featuring CMS+ATLAS Higgs results 31 July 2012G. Rakness (UCLA)1 The articles were submitted today.
OFFLINE TRIGGER MONITORING TDAQ Training 5 th November 2010 Ricardo Gonçalo On behalf of the Trigger Offline Monitoring Experts team.
A.Golunov, “Remote operational center for CMS in JINR ”, XXIII International Symposium on Nuclear Electronics and Computing, BULGARIA, VARNA, September,
How could the CMS control room be improved? Greg Rakness University of California, Los Angeles.
Status Report from LHC Safety System Task Force Emmanuel Tsesmelis / TS-LEA on behalf of the Task Force LEAF 11 May 2007.
Click to add Text L. CONEY1 MICE Shifter Training June 2015.
Dmitri Denisov D0 Weekly Summary: August 18 to August 24  Delivered Luminosity and operating efficiency u Delivered 6.9pb -1 u Recorded 6.0pb -1 (87%)
Collisions… … with “non-stable beams” maybe on Thursday –Got experts to converge on goal and how to do it … to check timing, using L1 = zero-bias seeding.
Takao Sakaguchi, BNL Run-11 PHENIX Run Coordinator PHENIX Run-11 Report Sakaguchi, RHIC retreat 1 RHIC retreat version.
Draft of talk to be given in Madrid: CSC Operations Summary Greg Rakness University of California, Los Angeles CMS Run Coordination Workshop CIEMAT, Madrid.
Muon Shift Organization Outline Class 1 Atlas Control Room Shifts Booking Present Status + Experience+ Feedback Class 2 reminder A. Polini & Z. Yan May.
UCLA group meeting1/11 CSC update – a 2-week summary Status of CMS at LHC: L=2*10 32 reached 25-Oct-2010 (=the original goal for 2011) and 42 pb -1 collected.
IDE DCS development overview Ewa Stanecka, ID Week, CERN
ALICE Pixel Operational Experience R. Santoro On behalf of the ITS collaboration in the ALICE experiment at LHC.
1 LHCC RRB SG 16 Sep P. Vande Vyvre CERN-PH On-line Computing M&O LHCC RRB SG 16 Sep 2004 P. Vande Vyvre CERN/PH for 4 LHC DAQ project leaders.
MG, YWB and CS 1 Alice Shift model for 2009 Shift model for the global subsytems (HLT,DCS,ECS,CTP,Offline) Shift model for the detector subsytems : MTR+MCH.
Transversity Readiness and Timeline Xiaodong Jiang, March 17th, This document specifies the overall readiness for Hall A experiments E06-010/E
Pixel DQM Status R.Casagrande, P.Merkel, J.Zablocki (Purdue University) D.Duggan, D.Hidas, K.Rose (Rutgers University) L.Wehrli (ETH Zuerich) A.York (University.
Running plan Step I.1 Basic radiation levels. I.1.1 ISIS running with beam elements on, target off, beam stop shut. Establish permanent MICE monitoring.
All Experimenters MeetingDmitri Denisov Week of September 9 to September 16 D0 Summary  Delivered luminosity and operating efficiency u Delivered: 4.0pb.
WLCG Service Report ~~~ WLCG Management Board, 16 th September 2008 Minutes from daily meetings.
11 th February 2008Brian Martlew EPICS for MICE Status of the MICE slow control system Brian Martlew STFC, Daresbury Laboratory.
CMS livetime during last week’s runnin 11 Sep 2012G. Rakness (UCLA)1 Fills  1.02/fb delivered, 0.98/fb recorded 96.1% live ~2.4% not running.
14 November 08ELACCO meeting1 Alice Detector Control System EST Fellow : Lionel Wallet, CERN Supervisor : Andre Augustinus, CERN Marie Curie Early Stage.
CSC Shifter Training Course – Global Running Fred Borcherding Reach from CSCOperations Twiki page or directly:
First discussion on MSS for Katrin March 26, 2013 M.Capeans CERN PH-DT.
Automating the CMS DAQ, CHEP, Oct 17, 2013, AmsterdamH. Sakulin / CERN PH2 Overview Automation Features added to CMS DAQ over Run 1 of the LHC added.
TCR Remote Monitoring for the LHC Technical Infrastructure 6th ST Workshop, Thoiry 2003U. Epting, M.C. Morodo Testa, S. Poulsen1 TCR Remote Monitoring.
M. Caprini IFIN-HH Bucharest DAQ Control and Monitoring - A Software Component Model.
Preparation for Running Outline Present Schedule Detector Preparation Requests for dedicated runs –Cosmics, Splashes, Alignment Muon Shifts Experts A.
CHEP 2010 – TAIPEI Robert Gomez-Reino on behalf of CMS DAQ group.
Committee – June 30, News from CERN Erik Gottschalk June 30, 2005.
Status at CERN 8 fills  ~500/pb since last Tuesday – ~5.3/fb delivered, ~4.8/fb recorded, … SEU workshop on Friday – Different detectors have different.
ECAL Shift Duty: A Beginners Guide By Pourus Mehta.
Sundry LHC Machine Development starts 19 June –Original plan to have 90m comm. next week was torpedoed by private discussions between spokesperson and.
M4 Operations ● Operational model for M4 ● Shifts and Experts ● Documentation and Checklists ● Control Room(s) ● AOB Murrough Landon 24 July 2007.
Fabio Follin Delphine Jacquet For the LHC operation team
Draft of to institute heads
F. Bellini for the DQM core DQM meeting, 04th October 2012
Central Online DQM Shift Tutorial March 2017, CMS DQM group
Shift instructions August 16, 2017 Antoni Aduszkiewicz
Preparation for CERN test beam
Hannes Sakulin, CERN/EP on behalf of the CMS DAQ group
Offline shifter training tutorial
Central DQM Shift Tutorial Online/Offline
Presentation transcript:

Control Room and Shift Operations: CMS Greg Rakness (CMS Deputy Run Coordinator) University of California, Los Angeles ATLAS Post-LS1 Operations Workshop CERN 24 June

The Compact Muon Solenoid (CMS) is as “big” as ATLAS 24 June 2013G. Rakness (UCLA)2 Not as big in the linear dimension, but certainly as big in data volume (and author list)… So, some aspects of our situation may feel familiar…

Personnel 24 June 2013G. Rakness (UCLA)3

CMS shift hours 3 shifts per day (in sync with the LHC shifts) – 07:00-15:00 “day” – 15:00-23:00 “evening” – 23:00-07:00 “night” 24 June 2013G. Rakness (UCLA)4

CMS shift crew in the p5 control room Beginning of 2010: 13 shifters during normal operations – 8 subsystem shifters – 5 central shifters (see below) – 13 shifters x 3 shifts/day = 39 collaborators per day... By the end of 2010: reduce from 13 to 5 “central” shifters – Shift Leader – Data Quality Monitor – Trigger – Data Acquisition – Detector Control System (DCS) When not running (e.g., overnight during Technical Stops, or now), minimum shift crew required when equipment is powered on in the experiment cavern… – DCS (to monitor detector conditions) – Shift Leader (at least 2 people needed for personnel safety) 24 June 2013G. Rakness (UCLA)5

CMS crew “beyond” the p5 control room Run Field Managers – Set the daily and weekly run plan and facilitate its execution – Provide continuity from one shift to the next, advise shift leaders – Lead the daily run meeting – Communicate with Run Coordinators when issues or questions arise – Another way to think of this role is “super shift-leader” – Two Run Field Managers on-duty at all times – Term = 3 weeks Detector On-Call (DOC) – If shift crew has a problem with a subsystem, they call the DOC… – 15 DOCs on-duty at all times, one per subsystem – Term ~ one week 24 June 2013G. Rakness (UCLA)6

Candidates and training We specify a “preferred profile” for each central shifter (note: this is not strictly enforced) – Run Field Manager: invited personally by run coordinator – Shift Leader: certain level of seniority and experience – DAQ: motivated to gain insight into a modern DAQ system – Trigger: interest in trigger logic and web-based services – DCS: experience with detector development, integration, and/or slow control – DQM: experience with data analysis and/or detector performance assessment Training done separately for each shift role – Classroom training, some include practical test – One block of training shifts 24 June 2013G. Rakness (UCLA)7

Filling Shifts In order to fill central shifts, we had to separate the service work performed by CMS institutes into two categories – Shifts – Other service work In order to achieve a satisfactory level of shifter experience, we require N shifts before a person’s credits apply as service work – Normally N  20 – Conversion from shifts to credits depends on the type of shifts taken E.g., weekend = 1.25 credits, night = 1.5 credits, weekend night = 2.0 credits… Shift sign-up blocks constrained by CERN rules – No more than 5 night shifts in a row – At least one day rest in any 7 days period – Any two shifts must be separated by at least 16 hours – These rules are a (minor) source of complaints, mainly by those who travel to CERN specifically to perform shifts 24 June 2013G. Rakness (UCLA)8

Shift statistics in Minimum quota 32 credits~21 shifts Minimum quota 21 shifts

Subsystem personnel Operations Manager – If Run Coordinators have a request or question about a subsystem, they call the Operations Manager… – Typical term >~ 1 year Detector On-Call (DOC) – If shift crew has a problem with a subsystem, they call the DOC… – 15 DOCs on-duty at all times representing all critical systems – Rotate ~once per week On-call experts – If the DOC or Operations Manager has a problem, she calls the subsystem expert… – Experts are “free” to act on the system remotely in case of problem (no strict access control to the CMS network) 24 June 2013G. Rakness (UCLA)10 These roles are filled within each subsystem

Transportation of crew CMS is on the other side of LHC – Rely on the CERN shuttle to transport shifters between Meyrin and p5 (45 minute ride) – June 2013G. Rakness (UCLA)11 ATLAS ALICE LHCb CMS

CMS control room Subsystem area – Since p5 is so far away, experts tend to stay in the control room longer when they are there Central area – Focus of activity during standard operations 24 June 2013G. Rakness (UCLA)12 PIXTRK CSC DT RPC HCAL ECAL Alignment DCS BRM DAQ SL TRG DQM Magnet

24 June 2013G. Rakness (UCLA)13 Apparently the “Compact” in CMS describes both the detector and the control room… Let’s compare the ATLAS and CMS control rooms… CMS ATLAS

CMS Centre 24 June 2013G. Rakness (UCLA)14 Computers, meeting rooms, tables, coffee nearby… Location of offline Data Quality Monitoring shifts Also used for some “analysis marathons” before major conferences… Located at Bldg. 354 Meyrin

Meetings Daily 9:30 meeting at point 5 – Focus on previous 24 hours and following 24 hours – LHC report from Run Coordinators, CMS overall report by Run Field Manager, round table report from each subsystem DOC – Meet 7 days per week during LHC running, even during Technical Stops (canceled on weekends/holidays if not needed) – If the 8:30 LHC meeting runs long, we have to rush to point 5 in order to make it to the 9:30 CMS meeting… Weekly Run Meeting at Meyrin – Summary of the week, topical discussions, longer term planning – Normally attended by Operations Managers, but expect that any CMS collaborator might attend this meeting… We use the same Vidyo booking for both meetings 24 June 2013G. Rakness (UCLA)15

Operation 24 June 2013G. Rakness (UCLA)16

CMS lifecycle defined by the LHC fill The users of the LHC modes will include… – Experiments… – The modes are also used by the Detector Control System (DCS)… 24 June 2013G. Rakness (UCLA)17 From “The mode will be made available by a number of channels. These will include… DIP”

Things not automated Shift leader checklist (twiki) – A number of items must be done by the shift crew depending on the state of the machine White board (20 th century technology) – We have found this is still the best way to… communicate short-term instructions from shift-to-shift remember the CCC phone number 24 June 2013G. Rakness (UCLA)18

Monitoring and alarms Over the years, system monitoring and alarms were often implemented in an ad-hoc way to expeditiously satisfy specific needs… – E.g., DCS alarms are different from DAQ alarms – Found that audio alarms are an effective way to alert the shift crew of a crucial problem (set threshold correctly) Presently working to overhaul system… – … rationalize information into the database – … factorize source from display –... more easily establish cause-effect – Timescale: June 2013G. Rakness (UCLA)19

Evolution to automation New in 2012: detector HV-state fully based on Machine/Accelerator mode 2015 plan: Run Settings (clock, trigger, thresholds, …) to be fully based on Machine/Accelerator mode 24 June 2013G. Rakness (UCLA)20 HV turn on automated It’s true: it is inefficient when humans touch the system… Time between “Stable Beams” and silicon tracker ON (min)

Automated soft error recovery Radiation from proton collisions causes single event effects in detector electronics – Well-known phenomena accounted for in design of CMS – Impact of effects range from not noticeable to stopping the run – Started to become an issue with increasing luminosity in 2011… 2012: full commissioning of automatic soft error recovery – Depending on the error and the system, this is done via hardware or software means This will remain an issue for the rest of the lifetime of CMS – Systems continue to automate recovery from known problems 24 June 2013G. Rakness (UCLA)21

What does a typical fill look like? Look at the last two fills before the LHCC in Dec. Fill 3363  163.8/pb recorded 97.0% data recording efficiency 2 stops of data taking (manual) 3 software recoveries (automatic) 578 hardware recoveries (automatic) Fill 3370  74.8/pb recorded 97.6% data recording efficiency 0 stops of data taking (manual) 1 software recovery (automatic) 281 hardware recoveries (automatic) In 2010, each error would have required manual intervention… 24 June 2013G. Rakness (UCLA)22

Running efficiency per year CMS recorded 92.2% of 44/pb in 2010… 24 June 2013G. Rakness (UCLA)23

Running efficiency per year … then 90.5% of 6/fb in 2011… CMS recorded 92.2% of 44/pb in 2010… 24 June 2013G. Rakness (UCLA)24

Running efficiency per year … then 90.5% of 6/fb in 2011… CMS recorded 92.2% of 44/pb in 2010… … then 93.5% of 23/fb in 2012… 24 June 2013G. Rakness (UCLA)25

Running efficiency per year … then 90.5% of 6/fb in 2011… CMS recorded 92.2% of 44/pb in 2010… … then 93.5% of 23/fb in 2012… This high number was the result of a lot of hard work by a lot of smart people! 24 June 2013G. Rakness (UCLA)26