CMS Emu Meeting, Dec. 6, 2008 1 Electronics Long Term Operations What we learned from Electronics Commissioning G. Rakness U.C.L.A. Dec 6, 2008.

Slides:



Advertisements
Similar presentations
A3016 and MAO test at CAEN Pigi, A. Boaino & CAEN.
Advertisements

CPT Week, Nov 2003, B. Paul Padley, Rice University1 CSC Trigger Status, MPC and Sorter B. Paul Padley Rice University November 2003.
CMS Week Sept 2002 HCAL Data Concentrator Status Report for RUWG and Calibration WG Eric Hazen, Jim Rohlf, Shouxiang Wu Boston University.
CSC Muon Trigger September 16, 2003 CMS Annual Review 1 Current Status of CSC Trigger Elements – Quick Summary Jay Hauser, with many slides from Darin.
Endcap Muon meeting: UC Davis, Feb , 2005 J. Hauser UCLA 1 TMB and RAT Status Report Outline: Current status of TMB and RAT boards Noise measurements.
Commissioning of CSCs and Peripheral Crates Task L M. Ignatenko UCLA October
Endcap Muon meeting: CMU, Oct 19, 2003 J. Hauser UCLA 1 CSC Trigger Primitives Test Beam Studies Main Test Beam 2003 Goals: Verify the peripheral crate.
S. Durkin, USCMS-EMU Meeting, Oct. 21, 2005 Critical Data Errors S. Durkin The Ohio State University USCMS EMU Meeting, FNAL, Oct. 20, 2005.
Emulator System for OTMB Firmware Development for Post-LS1 and Beyond Aysen Tatarinov Texas A&M University US CMS Endcap Muon Collaboration Meeting October.
Copyright © 2007 Heathkit Company, Inc. All Rights Reserved PC Fundamentals Presentation 43 – The Network Interface Card (NIC)
The MINER A Operations Report All Experimenters Meeting Howard Budd, University of Rochester April 1, 2013.
Printed by Topical Workshop on Electronics for Particle Physics TWEPP-08, Naxos, Greece / September 2008 MEZZANINE CARDS FOR.
What is a BIOS? * basic input/output system (BIOS), also known as the System BIOS * The BIOS software is built into the PC on a non-volatile ROM and is.
Status of NA62 straw electronics and services Peter LICHARD, Johan Morant, Vito PALLADINO.
Status of the CSC Track-Finder Darin Acosta University of Florida.
Horz V1 H1 H2 V2 Pbars in Recycler Ring Stripline Kickers Split Plate Pickups A-B Vertical Digital Damper Vert Recycler Transverse Damper System Similar.
28 Jan 2009G. Rakness (UCLA)1/15 Overview of What Was Done Last Week… Worked with Alex M. to measure ALCT turn-on curves –See next slides… Obtained values.
Upgrade of the CSC Endcap Muon Port Card and Optical Links to CSCTF Mikhail Matveev Rice University 17 August 2012.
CSC Endcap Muon Port Card and Muon Sorter Status Mikhail Matveev Rice University.
CSC ME1/1 Upgrade Status of Electronics Design Mikhail Matveev Rice University March 28, 2012.
DAQMB Production Status S. Durkin The Ohio State University Florida EMU Meeting 2004.
FED RAL: Greg Iles5 March The 96 Channel FED Tester What needs to be tested ? Requirements for 96 channel tester ? Baseline design Functionality.
1.2 EMU Electronics L.S. Durkin CMS Review CERN, September 2003.
Draft of talk to be given in Madrid: CSC Operations Summary Greg Rakness University of California, Los Angeles CMS Run Coordination Workshop CIEMAT, Madrid.
Status of NA62 straw electronics Webs Covers Services Readout.
John Coughlan Tracker Week October FED Status Production Status Acceptance Testing.
T.Y. Ling EMU Meeting CERN, September 20, 2005 Status Summary Off-Chamber Electronics.
Сессия Ученого Совета ОФВЭ 27 декабря 2006 Проект CMS в 2006 Ю.М.Иванов В.В.Сулимов.
1 ME1/1 OTMB Production Readiness Review: Schedule and Budget Darien Wood Northeastern University For the ME1/1 Electronics Project.
1 ME1/1 ODMB Production Readiness Review: Schedule and Budget Darien Wood Northeastern University For the ME1/1 Electronics Project.
21-Jan-2009CSC - ESSC Jan'091 Report on CSC for ESSC Prepared for CSC by Fred Borcherding FNAL.
1 OperationsOperations CSC is stable running since HI run has started. The low CSC occupancy in HI running allows us to operate all HV channels at nominal.
1B. Bylsma, CERN CSC Meeting July 2008 G. Williams, CSC Commissioning, Aug. 19, 2008 G. Williams B. Bylsma S. Durkin The Ohio State University Exorcizing.
CSC ME1/1 Patch Panel Interconnect Board (PPIB) Mikhail Matveev Rice University February 27, 2013.
PS VFE & PS/SPD FE Electronics Status and Plans 16 January 2008 LPC Clermont.
DAQMB Status – Onward to Production! S. Durkin, J. Gu, B. Bylsma, J. Gilmore,T.Y. Ling DAQ Motherboard (DMB) Initiates FE digitization and readout Receives.
S. Durkin, Software Review, March 16, 2006 FED Library S. Durkin The Ohio State University CSC Online Software Review, March 16,2005.
S. Durkin, CMS EMU Meeting U.C. Davis Feb. 25, DMB Production 8-layer PC Board, 2 Ball-Grid Array FPGA’s, 718 Components/Board 550 Production Boards.
7-Nov-08ESSC Nov 2008 > CSC Early Upgrades by Fred B. 1 CSC Early Upgrades Current plans for "early upgrades" and maintenance during 2009 and 2010 Description:
SPD VFE installation and commissioning (status and plans) I.Summary of VFE installation II.Stand alone VFE test: noise & offset III.Stand alone test: LED.
1 Status of the CSC system M. Ignatenko (UCLA) CSC Operation and DPG meeting CERN January 18, 2012.
EMu Slice Test -- Status Frank Geurts FNAL/Rice
6-Dec-05PC Commissioning1 Peripheral Crate (PC) Commissioning Status Fred Borcherding, CSC/EMU Group.
1 DCS Meeting, CERN (vydio), Jun 25th 2013, A. Cotta Ramusino for INFN and Dip. Fisica FE Preliminary DCS technical specifications (v1.0) for the Gigatracker.
1 Top Level of CSC DCS UI 2nd PRIORITY ERRORS 3rd PRIORITY ERRORS LV Primary - MaratonsHV Primary 1 st PRIORITY ERRORS CSC_COOLING CSC_GAS CSC – Any Single.
PC-based L0TP Status Report “on behalf of the Ferrara L0TP Group” Ilaria Neri University of Ferrara and INFN - Italy Ferrara, September 02, 2014.
CMS Week, 3-7 November CSC Trigger Test Beam Report Cast of many.
6 April 2007G. Rakness (UCLA) 1 CSC runs at minus side slice test 27 Mar – 5 Apr Color scheme: Successes Problems/questions Greg Rakness University.
DCFEB Production for LS2
LKr status R. Fantechi.
CSC EMU Muon Port Card (MPC)
University of California Los Angeles
University of California, Los Angeles Endcap Muon Purdue
“Golden” Local Run: Trigger rate = 28Hz
EMU Slice Test Run Plan a working document.
The System Boards (Motherboards)
8-layer PC Board, 2 Ball-Grid Array FPGA’s, 718 Components/Board
Current Status of CSC Trigger Elements – Quick Summary
8-layer PC Board, 2 Ball-Grid Array FPGA’s, 718 Components/Board
ALCT, TMB Status, Peripheral Crate Layout, CSC Event Display
The Ohio State University
CSC Trigger Primitives Test Beam Studies
University of California Los Angeles
Fred’s Input to Rice Workshop
CSC Turn ON Status Report by Fred B.
Sector Processor Status Report
FED Design and EMU-to-DAQ Test
CSC Status Report Status reports;
CSC Hot Spares LV Modules (24 units needed) HV Modules Custom Modules
CSC Electronics Problem Report CSCE I&C
Presentation transcript:

CMS Emu Meeting, Dec. 6, Electronics Long Term Operations What we learned from Electronics Commissioning G. Rakness U.C.L.A. Dec 6, 2008

CMS Emu Meeting, Dec. 6, CMS CSC System is Huge There is a tendency to forget the size of this system. ~400,000 channels >17,000 electronics boards 60 remote VME crates ~5,500 skew clear cables, over a million shielded conductors 1,400 gigabit optical fibers This system has been cabled and commission in less than 11 months!

CMS Emu Meeting, Dec. 6, Turning on the Electronics PCrate Sequential LV power up - Major improvement, late October (Sytnik) This assures Proms properly load FPGAs 1) Power up DMB/TMB 2) Power up VMECC 3) Power up CCB/MPC It is essential that DCS monitoring is turned off during sequence. THERE IS NO AUTOMATIC WAY TO DO THIS IN DCS! This works well but there are rare problems.

CMS Emu Meeting, Dec. 6, Peripheral Crate Power-up Problems 1) Problem: VMECC fails to program Solution : a) renegotiate gigabit link (shutdown switch port via software PCSwitches) b) recycle power on slot (This presently takes ~five minutes using DCS GUI, THIS HAS TO BE AUTOMATED !) 2) Problem: Netgear Gigabit Switch CPU Locks out VMECC Solution: a) Recycle switch power supply with new remote AC power switch (ssh) 3) Problem: TMB or DMB fail to program Solution : a) TTC hard reset (1/2 detector) b) CCB hard reset (whole crate) c) worse case (rare): Power cycle DMB/TMB slot (2 slots) There is a run around problem here. One would like to reset only the problem DMB or TMB 4) Almost zero Prom programming loss observed

CMS Emu Meeting, Dec. 6, Front End Board Power-up Problems FEBoard Power LV Powerup - Switch on LV individually through DMB using LVMB Power on problems rare. Almost all due to infamous Erased Prom problem. CFEBs and ALCTs occasionally lose Prom Data on power up. rare on power-up, typically less than 1 in 458 ALCTs and 2300 CFEBs Prom Read back shows ~equal proms with one bit flip (1->0) and no bit flips from loaded data. (A typical Prom read back has millions of bits). 1->0 flip suggest charge loss on gate. Solution: Automatically detect problem proms and reload firmware. This was successfully implemented in late November. CCB Initialization - resets TTC signal communications e.g. hard resets This has been a bit problematic. Debugging possibly needed?

CMS Emu Meeting, Dec. 6, Problems during Global/Local Data Taking Global/Local Data Taking Electronics seems just to work on good boards. We have tested hard reset response (FPGA reload, reset, and Flash memory constant loads) and have never seen a problem. Rarely VMECC loses gigabit communications. Solution: a) renegotiate gigabit link (shutdown switch port via software PCSwitches) b) recycle power on slot (This presently takes ~five minutes using DCS GUI, THIS HAS TO BE AUTOMATED !) Rarely a DMB or TMB looses VME communications - data/trigger operation unaffected - long period with no DCS access - this is under study, we have no explanation - only fixed on hard reset for a new run

CMS Emu Meeting, Dec. 6, Problems during Global/Local Data Taking Failures that Require Board Replacement VMECC, DMB, TMB, CCB, and MPC failures are rare. They are easily accessible and are fixed within hours. FED DDU and DCC failures are even rarer. They are swapped out within minutes if needed. F.E. Board failures require access. Boards we discovered with problems last February have still not been replaced. LVDB Fuses Rarely ALCT and CFEB LVDB fuses blow. These are extremely difficult to replace. It was earlier this year one can blow an LVDB fuse programming the ALCT with bad firmware. This had been fixed in software and is believed to be impossible now. There is a random unexplained source of blown fuses over the last six months

CMS Emu Meeting, Dec. 6, Problems during Global/Local Data Taking ~4 ALCT fuses need replacing ~2 LVDB-CFEB fuses need replacing Two of the ALCT fuses blew on separate chambers on the same night! We presently have no idea the source of these failures. Sudden LV Power Loss on Peripheral Crate There are electronics problems that can only be explained by sudden short term power loss to peripheral crates - DDU has registered 9 FMM Errors instantaneously in one crate - MPC has been observed to go into power up mode These seem to have decreased in frequency since mid-summer There is no DCS voltage history available. This would help greatly in debugging/understanding this problem. Solution: restart run

CMS Emu Meeting, Dec. 6, Failed Boards needing Replacement Other Long-term Board Failures ME1/1 A third of the long-term board problems have occurred on ME1/1 CSCs. The ME1/1 group has shown data suggesting that many of these are skew clear cable related. ME1/1 Skew Clear cables have patch panel. Damaged connectors suspected. ME1/1 Skew Clear cables are at length limit of technology.  4/72 ALCT problems, 9/360 CFEB problems Other Chamber Board Failures (non-ME1/1) ~11/396 ME1/2,3 ME2, ME3, ME4/1 ALCT boards need replacement ~19/1908 ME1/2,3 ME2, ME3, ME4/1 CFEBs boards need replacement, although some of these are skew clear cable related Systematic repairs of boards replaced have shown no repeat problems. We have had few boards to autopsy with long term failures. Biggest problems still on chamber.

CMS Emu Meeting, Dec. 6, FED Crate Problems Monster Event problem showed filtering problems on DCC and on global DAQ group’s slink mezzanine boards. Through collaboration problem eviscerated on both sides. No single board DDU or DCC problems seen. Software thread loading problem solved in September DDUs report problems from other boards. The problems are on the other boards. "Don't kill the messenger." Online Computer Problems The online software runs on 16 computers. Known problems: 1) Problem: On power-up randomly some number of machines don't boot Solution: Hand recycling power on machines. Although not optimal, ACPI cards are expensive and are reportably flakey

CMS Emu Meeting, Dec. 6, Computer Problems Encountered 2) Problem: Farm machines overheating alarms Solution: fans with 3x air volume installed 3) Problem: Farm machine eth_hook drivers have problems after weeks of running Solution: patches to gigabit driver seems to have removed problem 4) Problem: DCS machines drivers don't work after several days Solution: XMAS monitoring seems to have solved problems 5) Problem: We do not manage the computers A recent motherboard was swapped on a farm machine 9 days later and 10s of NFS mounting problem machine still unusable Solution: Eric Cano et al are overworked. This is their problem since we don't have root privileges on USA owned machines ???!? 2 Spare machines live, configured and connected $$$$ space for 1 2u machine in usc ???