Presentation is loading. Please wait.

Presentation is loading. Please wait.

TE-MPE-CP, RD, 06-Oct-2011 1 Radiation Induced Faults in QPS Systems during LHC run 2011 R. Denz TE-MPE Technical Meeting October 6 th.

Similar presentations


Presentation on theme: "TE-MPE-CP, RD, 06-Oct-2011 1 Radiation Induced Faults in QPS Systems during LHC run 2011 R. Denz TE-MPE Technical Meeting October 6 th."— Presentation transcript:

1 TE-MPE-CP, RD, 06-Oct-2011 1 Radiation Induced Faults in QPS Systems during LHC run 2011 R. Denz TE-MPE Technical Meeting October 6 th

2 TE-MPE-CP, RD, 06-Oct-2011 2 Outline  Introduction  Radiation induced fault statistics 2011  Fault analysis, mitigation and consolidation measures –Measures taken during LHC run 2011 –Proposals for Xmas break 2011/2012 –Proposals for LS1  Summary This presentation contains 23 slides and some jokes.

3 TE-MPE-CP, RD, 06-Oct-2011 3  Due to functional requirements a significant amount of QPS and EE equipment is exposed to radiation during LHC operation –Radiation load depends on location and LHC exploitation  QPS and EE equipment locations –LHC tunnel Main magnet protection, nQPS, some 13kA EE systems (e.g. point 3) Effects seen during LHC run 2010 and 2011 –Partly shielded areas (RR13,17,53,57,73,77, UJ14, 16, 56) IPQ, IPD, IT, 600 A protection, EE 600 A, EE 13 kA Effects seen during LHC run 2011 Additional shielding for UJ14, UJ16 during Xmas break ;-)) –Protected areas (UA23, 27, 43, 47, 63, 67, 83, 87, UJ33) IPQ, IPD, IT, 600 A protection, EE 600 A, EE 13 kA No confirmed radiation induced fault observed so far Relocation during LS1 Introduction

4 TE-MPE-CP, RD, 06-Oct-2011 4  Fault analysis has to be done very carefully as not all problems are related to radiation –Equipment faults, EMC, “friendly fire”, bad connections, virtual equipment, circuit breakers, real triggers (very rare but not excluded) –In addition there remain some doubtful cases where the exact cause of the trip cannot be determined –Enhanced diagnostic capabilities would be helpful E.g. diagnostics for power abort loops linking PC, PIC and QPS  Confirmed radiation induced faults are transmitted regularly to the R2E project to be included in their statistics –Radiation to electronics related problems are discussed as well in the RADWG –Technical notes are compiled for selected events Fault statistics

5 TE-MPE-CP, RD, 06-Oct-2011 5 Radiation induced fault statistics 2011

6 TE-MPE-CP, RD, 06-Oct-2011 6 Radiation induced fault statistics 2011 – spurious triggers SystemLocations DQQDI (IPQ, IPD, IT)UJ14, UJ16 (2x), RR53 DQQDG (600 A)UJ14 (2x), UJ16 (2x), RR17, RR73, RR77 (3x) nQPS (splice protection)B8L1, B11L5, B9L8 Flat top?

7 TE-MPE-CP, RD, 06-Oct-2011 7 Radiation induced fault statistics 2011 – spurious triggers Detection system typeExposed systems Radiation induced spurious triggers DQQDL (MB & MQ protection, analog, radiation tolerant) 40320 DQQDS (MB & MQ protection, digital, radiation tolerant) 16320 DQQDG (600 A, digital, partly hardened) 250 out of 8369 (3.6 %) DQQDI,T (IPQ, IPD, IT, digital, partly hardened) 138 out of 4084 (2.9 %) DQQBS (nQPS splice protection, partly hardened) 20683 (0.15 %) DQQDC (HTS lead protection, partly hardened) 508 out of 11980 –DQQDG and DQQDI,T are hardware equivalent and differ only in firmware –DQQBS and DQQDC are hardware equivalent and differ only in firmware –DQQBS and DQQDC have on board redundancy A/B (two interlock channels)

8 TE-MPE-CP, RD, 06-Oct-2011 8 Radiation induced fault statistics 2011 – fault types aa Equipment Faults total Faults remotely cleared or transparent Faults requiring access Faults causing a beam dump Electronic component and failure mode(s) DQAMC (DAQ system, local bus)9172190 ISO150™ digital isolator, upset in capacitive transmission path DQAMC (DAQ system, fieldbus)4040 uFIP™ fieldbus coupler DQQDG (quench detection system 600 A) 9009 SDRAM or DSP, program execution stalled and triggering watchdog, digital filter corruption (bit flips) DQQDI,T (quench detection system IPQ, IPD, IT) 4004 DQQBS (nQPS splice protection)6203ADuC834™, internal RAM, external SRAM, ADC register stuck, digital filter corruption

9 TE-MPE-CP, RD, 06-Oct-2011 9 Radiation induced faults – technology versus hardness aa EquipmentProcessorsRemark DQQDL_STPECC83, EL84 Under study, rack powering and cooling to be revised DQQDL, nDQQDL ADuC812, ADuC831 for DAQ only Microcontrollers used only for DAQ, detection part analog and classic digital logic DQQDS, nDQQDI, nDQQDG, nDQAMG ProASIC 3E A3PE1500 FPGA configuration stored in FLASH, triplicate logic DQAMC, DQAMG, DQAMS (fieldbus couplers) ADuC831, ADuC841, uFIP Program executed from FLASH, standard logic DQQBS, DQQDCADuC834Program executed from FLASH, standard logic DQQDG, I, TTMS320C6211Program stored in FLASH but executed from SDRAM, standard logic

10 TE-MPE-CP, RD, 06-Oct-2011 10 Radiation induced fault statistics 2011 – conclusions  While most of the radiation induced faults are transparent to LHC operation, the number of beam dumps caused by spurious triggers is close to reach the maximum admissible limit. –Consolidation measures to be applied already during Xmas break 2011/2012  Enhanced shielding for UJ14 and UJ16 will be beneficial but not cure all problems  The main problem are the DSP based quench detectors originally developed for radiation free areas –Consolidation work has been launched already in 2008 –The symmetric quench detection board is the first result of these efforts Fully satisfying performance during LHC operation One man year of work required for R&D –While the technological challenges have been mastered the lack of resources remains a major problem Rather hypothetical scenario - to be avoided nevertheless …

11 TE-MPE-CP, RD, 06-Oct-2011 11 Mitigation and consolidation measures – DAQ systems  Firmware upgrade for DQAMCMB and DQAMCMQ as first mitigation measure –Deployment to be completed during next TS –88.5% (1437 units) done so far including all MB –Upgrade includes 3 out of 4 condition for MB quench heater power supply availability (no injection inhibit in case of loss of 1 power supply)  Full consolidation requires hardware upgrade (new board) –Incriminated chip is located on quench detection board type DQQDL Replacement already successfully exploited with DQQDS board –Design completed (Joaquim), prototype expected for 10/2011 –Production covering DS areas 02/2012, procurement of components started  Replacement of the fieldbus coupler chip (MicroFip™) by NanoFip CERN –New chip is neither hardware nor software compatible –Significant development and integration work to be done –First fieldbus segments to be upgraded during LS1

12 TE-MPE-CP, RD, 06-Oct-2011 12  Firmware upgrade –Triplication of digital filters and other modifications –Expected to cure a significant amount but not all faults –Development to be completed 10/2011, partial deployment during Xmas break (half cells 8 to 11 around IP1, 2, 5, and 8) –One test slot in CNRAD still available for type testing  Additional Shielding –Proposed by R2E, to be considered for half cells 8 to 11 around IP1 & 5 Option currently being evaluated – installation not yet confirmed –16 locations ~ 14 tons of steel Mitigation and consolidation measures – nQPS splice protection

13 TE-MPE-CP, RD, 06-Oct-2011 13  Hardware upgrade –Technology evaluated – two possible options FPGA based version using high resolution ADC –Additional radiation test campaign for ADC wishful Standard technology with optimised firmware and modified evaluation logic –Using three instead of two redundant processors and majority voting (introduction of “famous” board C) –This option could be implemented on a relatively short timescale but requires a more detailed study –Design in 2012  installation in hot zones during LS1 or even in 2012 Mitigation and consolidation measures – nQPS splice protection

14 TE-MPE-CP, RD, 06-Oct-2011 14 QPS Crate Location IP Direction (cells 7, 6,…) ARC Direction (cells 9, 10, 11…) Shielding Upstream of QPS crate Detailed location of affected racks to be verified!!! Slide courtesy M. Brugger

15 TE-MPE-CP, RD, 06-Oct-2011 15 Mitigation and consolidation measures – IPQ, IPD and IT protection  New digital quench detection systems type nDQQDI –Similar to symmetric quench detection board developed for nQPS Core is flash based FPGA ProAsic TM A3PE1500 –Board design and firmware development by Jens (= QPS FPGA guru) –New board is (of course) not fully compatible with previous version Some specialist work required to integrate it into QPS supervision –200 boards including spares required for consolidation UJ14,16,56, RR13,17,53,57 By the way: The detection threshold for IPQs especially Q9 and Q10 should be revised as well – is 200 mV, 10 ms acceptable?

16 TE-MPE-CP, RD, 06-Oct-2011 16 Mitigation and consolidation measures – IPQ, IPD and IT protection TaskStatus Board designDone Technical reviewDone Prototypes I (5 units)Done Firmware developmentDone, only minor modifications expected Radiation test in CNRADTest successfully passed Prototypes II (10 units)In preparation (Vincent) System integration – adaption of QPS low level supervision (DQAMG firmware update) Started and advancing well, tedious but no showstoppers so far Adaption gateway application for new commands Done Type testsStarted, ok so far Procurement of componentsStarted Production of 200 boards and follow upPending Installation and QPS-ISTPending, during Xmas break 2011/2012

17 TE-MPE-CP, RD, 06-Oct-2011 17 Mitigation and consolidation measures – 600 A protection  New digital quench detection systems type nDQQDG –Similar to nDQQDI board developed for nQPS Core is flash based FPGA ProAsic TM A3PE1500 or A3PE3000 –High dynamic range of the current reading requires a high resolution ADC or a complex digital to analog feedback circuit Fast high resolution 24 bit ∑Δ ADC TI ADS1271 Modulator part successfully radiation tested by TE-EPC –Firmware is by far more complex than for nDQQDI Complex digital filter system including non-linear filters Numerical derivative of current, look-up tables for circuit inductance Algorithms well known but transfer to FPGA not trivial –Board design and especially the firmware development to be done by Jens –300 boards including spares required for consolidation UJ14,16,56, RR13,17,53,57,73,77

18 TE-MPE-CP, RD, 06-Oct-2011 18 Mitigation and consolidation measures – 600 A protection TaskStatus Board designStarted Technical reviewTo be decided Prototypes I (5 units)Pending Firmware developmentStarted Radiation test in CNRADNot required Prototypes II (10 units)Pending System integration – adaption of QPS low level supervision Pending Adaption gateway application for new commandsDone Type testsPending Procurement of componentsPending Production of 300 boards and follow upPending Installation and QPS-ISTPending, earliest at the end of the Xmas break 2011/2012

19 TE-MPE-CP, RD, 06-Oct-2011 19 Mitigation and consolidation – current and new baseline Current baselineNew proposal DeviceR&DDeploymentR&DDeployment nDQQDI2011Partial 20122011Full Xmas break 2011/2012 nDQQDG2012LS12011/12Early 2012 nDQQBS2012LS12011/12Mid 2012 nDQQDL2011Partial Xmas break 2011/2012 2011Partial Xmas break 2011/2012 nDQAMC (NanoFip CERN ) 2012LS12012LS1

20 TE-MPE-CP, RD, 06-Oct-2011 20 Mitigation and consolidation – resources  Financial resources –nDQQDI boards: ~ 300 CHF per board  60 kCHF –nDQQDG boards: ~ 350 CHF per board  105 kCHF –nDQQDL board: ~ 200 CHF per board  40 kCHF Production of 200 boards, which serve as well as spares –nDQQBS board: ~ 300 – 400 CHF per board (redundant circuit board) –NanoFip board: ~ 300 – 400 CHF per board  Production –Lead time for many components critical (e.g. A3PE3000 > 20 weeks) (Pre-emptive) ordering already started –To be planned in detail and firms to be selected very carefully Recommended to use known good suppliers only Production follow-up could eventually be outsourced

21 TE-MPE-CP, RD, 06-Oct-2011 21 Mitigation and consolidation – resources (the controversial slide)  Manpower: –Most of the work is reserved to QPS specialists; there is no gain in outsourcing as the necessary transfer of information would require as well a substantial specialist contribution Production can be outsourced if knowledgeable workforce is available –With the present baseline, i.e. no installation of nDQQDG and nDQQDB boards in 2012 the current assignment of activities does not to be changed Schedule for the nDQQDI board is very tight but still feasible –In case an upgrade of the 600 A protection systems in 2012 is regarded as mandatory, resources need to be re-assigned The FPGA specialist must be relieved from other tasks as much as possible (number of FPGA specialists to be increased as well...) –The good news: Installation and QPS-IST estimated to 1-2 days (2 specialists working) per concerned area (installation during TS feasible)

22 TE-MPE-CP, RD, 06-Oct-2011 22 Mitigation and consolidation – outlook LS1  Relocation of all QPS equipment installed in UJ14, 16 and UJ56  Installation of nQPS for IPQ, IPD and IT  Hardware upgrades for 600 A protection –Upgrades to be started in 2012 to be completed  Hardware upgrade nQPS splice protection –Scope to be defined  Consolidation of DAQ systems –NanoFip CERN –ISO150™ replacement  Change of detector evaluation logic –E.g. 2 out of 3 instead of 1 out of 1 –Significant change of QPS systems, to be studied in more detail

23 TE-MPE-CP, RD, 06-Oct-2011 23 Summary  During LHC run 2011 so far 116 confirmed radiation induced faults have been observed –16 beam dumps due to radiation induced spurious triggers  Fault analysis to be done very carefully before coming to conclusions  So far only soft errors, i.e. no destructive faults have been observed  None of the observed events caused a total loss of magnet and/or circuit protection –Redundancy of the protection systems is essential  Solutions for mitigation and consolidation have been elaborated and deployment has started in some cases –Priority is given to events requiring access to LHC or causing beam dumps  In order to keep the radiation induced spurious QPS triggers in 2012 at a reasonable level some consolidation measures have to be implemented already during the coming Xmas break –Adequate resources have to be assigned


Download ppt "TE-MPE-CP, RD, 06-Oct-2011 1 Radiation Induced Faults in QPS Systems during LHC run 2011 R. Denz TE-MPE Technical Meeting October 6 th."

Similar presentations


Ads by Google