Draft of talk to be given in Madrid: CSC Operations Summary Greg Rakness University of California, Los Angeles CMS Run Coordination Workshop CIEMAT, Madrid.

Draft of talk to be given in Madrid: CSC Operations Summary Greg Rakness University of California, Los Angeles CMS Run Coordination Workshop CIEMAT, Madrid 3 Nov 2011 https://indico.cern.ch/conferenceDisplay.py?confId=156187

CSC downtime at the beginning and end of 2011 running… CSC downtime reduced after fix of CSC response to Resync – Since 21 July, CSC no longer has data corruption problems Since 21 July, biggest cause of downtime by CSC’s is from SEU’s – Currently, this makes CSC go into TTS=Out-of-sync… – … fixed by the DAQ shifter pushing TTCHardReset CSC to do: make the correct request for SEU recovery – Should go to TTS=Error 3 Nov 2011 G. Rakness (UCLA) 2 11 March – 21 July Int. lumi = 1.3/fb… CSC downtime = 13/nb 21 July – 30 Oct Int. lumi = 3.9/fb… CSC downtime = 7/nb

Preparation for high data rates CSC data event size – With ALCT zero-suppression, CSC event size would be reduced by ~15-20% Requires firmware update – Under discussion to implement during Year-End Tech. Stop https://hypernews.cern.ch/HyperNews/CMS/get/csc- ops/2057.html https://hypernews.cern.ch/HyperNews/CMS/get/csc- ops/2057.html – If CSC needs still more bandwidth, it is possible to increase the number of CSC SLinks to 16 or 36 HV trip threshold to be increased in inner rings – See next slide for more on HV… 3 Nov 2011G. Rakness (UCLA)3

Automation plan: CSC Expert System Goal: reduce load on CSC DOC by using the CSC Expert System First successful implementation: recovery from HV channel trips – 2010: HV trips had to be reset by DCS shifter – 2011: HV trips handled automatically by the Expert System Next goal: automated firmware maintenance – Recall daily task: CSC DOC reloads the program on CFEB’s and ALCT’s whose EEPROM’s have flipped a bit Problem discovered in 2008 Recovery procedure by the DOC has been reduced to a few clicks… – Example numbers: between Aug 2010 – Aug 2011… ~33% of ALCT’s needed to be touched (~150 ALCT’s) ~13% of CFEB’s needed to be touched (~300 CFEB’s) See https://indico.cern.ch/conferenceDisplay.py?confId=160517https://indico.cern.ch/conferenceDisplay.py?confId=160517 – Timescale: during 2012 run 3 Nov 2011 G. Rakness (UCLA) 4

Fraction of active CSC channels HV distribution board died D-link board failure caused loss of the configurations of a few chambers 3 Nov 20115G. Rakness (UCLA) Diligence will be needed to preserve the high fraction of active CSC channels throughout 2012

A few items with impact beyond CSC 3 Nov 2011G. Rakness (UCLA)6

Question: if a subdetector has data corruption (e.g., sync-lost draining), is it possible to get a dump of the offending event? 3 Nov 2011G. Rakness (UCLA)7

Aging online computers All the VME control computers in CMS are more than 4 years old – Expect them to die at any moment… To facilitate the reinstallation of online software, CSC installs its RPM’s with quattor 3 Nov 2011G. Rakness (UCLA)8

“You’re calling the wrong DOC” The following scenario has happened twice: a calorimeter is misconfigured, causing…  L1A rate = 105kHz (pre-deadtime rate = 700kHz)  CSCTF in 80% Warning (nobody can run at 700kHz)  Shift crew calls CSC… 3 Nov 2011G. Rakness (UCLA)9 L1 Trigger rate = 105kHz Pre-deadtime rate = 700kHz Deadtime = 80% DAQ backpressure = False Might be worth to make one screen that has this information… (green if OK, red if bad)

Monitor timing more regularly 3 Nov 201110 CSC = average wire time per chamber, averaged over all chambers ECAL = average EB time per crystal, averaged over all crystals = ECAL – 0.325ns = CSC Average Time (nsec) CSC and ECAL see similar changes in timing  CSC needs to automate the monitoring our timing… G. Rakness (UCLA)

Summary of CSC Operations Highlights from 2011 running include… – High fraction of live channels – Small amount of deadtime – Automatic HV trip recovery Preparations for 2012 running include… – Correct request for SEU recovery – Reduction of event size – Automation of EEPROM reload Some “CSC worries” which are “CMS worries,” too… – Diagnosing (rare) data corruption – Worry about age of VME control computers – Calling the right DOC at the right time – Monitoring of timing 3 Nov 2011G. Rakness (UCLA)11

Backup 3 Nov 2011G. Rakness (UCLA)12

Goals for workshop Review the experience from the 2011 operations for the proton running and to plan for operations in 2012 We are asking speakers to comment on the following topics (as relevant): – Experience with beam operation in 2011 – Summary of major data loss sources – Readiness for high pile up, ~25ns, and 6e33 luminosity – Readiness for 25 ns – Requests for CMS re-commissioning before LHC start up, e.g. cosmics – Plans for 2012 operation – Availability of experts – Are you affected by SEUs? If so, how and what is required to recover – Experience with shifts, on-call load – Are you using standard tools (quattor) for software maintenance? 3 Nov 2011G. Rakness (UCLA)13

Draft of talk to be given in Madrid: CSC Operations Summary Greg Rakness University of California, Los Angeles CMS Run Coordination Workshop CIEMAT, Madrid.

Similar presentations

Presentation on theme: "Draft of talk to be given in Madrid: CSC Operations Summary Greg Rakness University of California, Los Angeles CMS Run Coordination Workshop CIEMAT, Madrid."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Draft of talk to be given in Madrid: CSC Operations Summary Greg Rakness University of California, Los Angeles CMS Run Coordination Workshop CIEMAT, Madrid.

Similar presentations

Presentation on theme: "Draft of talk to be given in Madrid: CSC Operations Summary Greg Rakness University of California, Los Angeles CMS Run Coordination Workshop CIEMAT, Madrid."— Presentation transcript:

Similar presentations

About project

Feedback