Presentation is loading. Please wait.

Presentation is loading. Please wait.

Radiation Tolerance of an Used in a Large Tracking Detector

Similar presentations


Presentation on theme: "Radiation Tolerance of an Used in a Large Tracking Detector"— Presentation transcript:

1 Radiation Tolerance of an Used in a Large Tracking Detector
SRAM based FPGA Used in a Large Tracking Detector Ketil Røed1,2,3 Johan Alme2, Dominik Fehlker2, H. Helstrup1, Matthias Richter2, Kjetil Ullaland2, Dieter Röhrich2 1. Bergen University College 2. University of Bergen 3. CERN dfgdg

2 Outline Main focus: reconfiguration solution applied to reduced the probability of functional failures due to SEUs. Introduction & background System description Testing & Results

3 ALICE: A Large Ion Collider Experiment
TPC RCU

4 Challenge Physics: Nuclear Interaction Effect: Single Event Upset
Make use of commercial SRAM based FPGAs for data readout in the TPC radiation environment. Physics: Nuclear Interaction Effect: Single Event Upset Consequence: Functional Failure SRAM cell value 1  0 or 0  1

5 Failure prediction Various SEU cross section results (29*,63**,180 MeV p***, mixed E n****): 2 - 4 x cm2 / bit FPGAs exposed to a hadron flux of particles /cm2s* (n,π,p E > 10 MeV) Failure prediction for all 216 FPGAs and a 4 hour run SEUs Conservatively only 1 out of every 10 config. bits are used***** Functional failures 2 – 8 Main points: Realistic scenario to have functional failures in during a RUN. However does not say anything about what type of failure that is expected (could be serious or no effect at all) System developed to reduce the probability of experiencing failure and test procedure study effect of mitigation (fault injection) Can also be used to study failure signatures * K. Røed, Bergen University College, Phd thesis to be published ** H. Quinn, Radiation-induced multi-bit upsets in sram-based fpgas. Nuclear Science, IEEE Transactions on, 52(6):2455{2461, Dec. 005. *** G. Tröger, KIP Uni. Heidelberg, PhD thesis to be published **** Lesea et. al. The Rosetta Experiment, IEEE TRANS. ON DEVICE AND MATERIALS RELIABILITY, V 5, N3, 2005 ***** Using an SEUPI: Single Event Upset Probability Impact = 10***

6 Repeated Outline A systen solution is developed to both reduced the probability of failure and to offer additional testing functionality Introduction & background System description Testing & Results

7 Readout Control Unit (RCU)
RCU main FPGA controls readout of detector data Keep data path intact by correcting SEUs (Task of Support FPGA) Reconfiguration solution based on Active Partial Reconfiguration

8 Active Partial Reconfiguration (APR)
Rewriting a subset (frame) of the configuration memory of an FPGA while the user design is operating. Source: UG012 - Virtex-II Pro and Virtex-II Pro X FPGA User Guide

9 Support FPGA Configuration Controller
Memory Mapped Interface to Detector Control System Configuration Interface

10 Frame by frame Readback, Verification and Correction
Memory Mapped Interface to Detector Control System Frame Readback Original frame data Reconfigure frame

11 Repeated Outline Introduction & background System description
Testing & Results

12 Testing Irradiation testing (physical)
Errors (SEUs) are injected into the configuration memory using a proton beam Fault injection (software) Errors (”SEUs”) are injected into the configuration memory through manipulation of the configuration bitstream Alternative to irradiation testing? Main Objectives Validate implementation of Support FPGA configuration controller and fault injection solution Investigate effect of mitigation approach

13 FPGA test design 1 1 1 Basic shift register extended with a configurable TMR solution (on/off) Can reconfiguration and TMR reduced the failure probability?

14 Test procedure start Tstart Mitigation procedure None Continuous
checking of shift register output Irradiation or Fault injection T1 T2 FRVC TMR Tend end FRVC: Frame by frame Readback, Verification and Correction

15 Irradiation test results (1)
Reconfiguration (FRVC) corrects and prevents accumualtion of SEUs reduced life time of functional failures Additional mitigation (TMR) Masks out functional failures due to individual SEUs Corrects and prevents accumulation as long as reconfiguration frequency is higher than SEU rate (which it is in this case) Reduces life time of the functional failure (only limited by the time it takes to carry out one reconfiguration 350us ms) No mitigation FRVC enabled FRVC + TMR enabled

16 Irradiation test results (2)
Only a fraction of the SEUs leads to functional failure (as expected) Reconfiguration alone does not reduce the failure probability Must be combined with mitigation at user design level to be effective Fault injection reproduces irradiation test results

17 Fault injection results
Only a fraction of the SEUs leads to functional failure (as expected) Reconfiguration alone does not reduce the failure probability Must be combined with mitigation at user design level to be effective Fault injection reproduces irradiation test results

18 Distribution of sensitive bits
1 2 3 No mitigation FRVC + TMR I/O and clock resources (no mitigation implemented) Voter + shift register Only shift register

19 Summary Successful implementation of reconfiguration network
Allows us to use COTS SRAM FPGAs in radiation environments. Prevents accumulation of SEUs by continuous reconfiguration, but mitigation at the level of user design is needed. Combination will significantly reduce the probability of functional failures during operation. System allows to monitor SEUs during operation Fault injection implemented as alternative test method Locate sensitive bits  optimize mitigation approach To do: Predict the failure probability of the final design

20 Acknowledgements Gerd Tröger, University of Heidelberg
Luciano Musa, Blahoslav Pastircák, CERN Austin Lesea, Xilinx Alexander Prokofiev, TSL University of Uppsala Jon Wikne, Eivind Olsen, OCL University of Oslo

21 Backup

22 Irradiation test results
Test flux: p/cm2s TPC flux: h/cm2s ~ factor 104 lower flux 1 1+2 No action FRVC enabled TMR enabled

23 RCU support FPGA SelectMAP mode FLASH mode Normal mode

24 Some numbers

25 General Fault Injection Flow
150 ms (1 frame) ms 15 ms (96 frames) Store result Inject bit error Check design 1 cycle FRVC If requested Inject errors From software Readback and correct Main task of reconfiguration network Reconfigure Xilinx with correct data Why not reconfigure with incorrect data? Fault injection How? Inject errors in the Xilinx configuration memory by bitstream manipulation Solution implemented in DCS software FRVC: Frame by frame Readback Verification and Correction


Download ppt "Radiation Tolerance of an Used in a Large Tracking Detector"

Similar presentations


Ads by Google