Presentation is loading. Please wait.

Presentation is loading. Please wait.

Radiation Effects on FPGA and Mitigation Strategies Bin Gui Experimental High Energy Physics Group 1Journal Club4/26/2015.

Similar presentations


Presentation on theme: "Radiation Effects on FPGA and Mitigation Strategies Bin Gui Experimental High Energy Physics Group 1Journal Club4/26/2015."— Presentation transcript:

1 Radiation Effects on FPGA and Mitigation Strategies Bin Gui Experimental High Energy Physics Group 1Journal Club4/26/2015

2 Outline  General introduction of FPGA  Radiation Effects Single Event Upset Single Event Functional Interrupt  Mitigation  Measurements of SEU and SEFI Experiment Set-up Heavy Ion Test Results Proton Test Results  Summary 2Journal Club4/26/2015

3 Field Programmable Gate Array  The field programmable gate array (FPGA) is a semiconductor device that can be programmed after manufacturing. Instead of being restricted to any predetermined hardware function, an FPGA allows you to program product features and functions, adapt to new standards, and reconfigure hardware for specific applications even after the product has been installed in the field—hence the name "field-programmable".  These devices have always offered significant advantages in flexibility, and recent advances in fabrication have greatly increased logic capacity, substantially increasing the number of applications for this technology.  FPGAs have been an attractive choice in small volume instrumentation and control system electronics. 4/26/2015Journal Club3

4 The Uses of FPGA 4/26/2015Journal Club4 Sorry, no electricity

5 FPGA Architecture  The core of the Xilinx vertex series FPGA consists of: An array of configurable logic blocks (CLBs), each of which consists of two slices. Each slice contains two 4 input look up tables for logic generation, two flip flops, and arithmetic carry and clocking functions. Flanking the CLB matrix are two columns of dual port RAM, divided into 4Kbit blocks. The edges of the device are populated by input/output blocks, which support several I/O standards.  This FPGA is based on SRAM technology can be reconfigured at will, allowing unmatched flexibility in the face of changing requirements. 4/26/2015Journal Club5  Unfortunately, the increased density (and corresponding shrinkage of process geometry), has made these devices more susceptible to failure due to external radiation.

6 Single Event Effects  Single event effects (SEE) - Single Event Effects refer to the fact that it is not a cumulative effect but an effect related to single individual interactions in the silicon. Highly ionizing particles can directly deposit enough charge locally in the silicon to disturb the function of electronic circuits. Single event upset (SEU): The deposited charge is sufficient to flip the value of a digital signal. Single Event Upsets normally refer to bit flips in memory circuits (RAM, Latch, and flip-flop) but may also in some rare cases directly affect digital signals in logic circuits. This is usually reversible. Single event latchup (SEL): Latched change of state of a circuit due to radiation. May need to power cycle to reset. Single event burnout (SEB): Single event burnout refers to destructive failures of power MOSFET transistors in high power applications. Single event functional interrupt (SEFI): Typically, SEFIs are low in occurrence and are almost never seen while in orbit. However, in test environments where event rates are hugely accelerated in order to obtain statistical significance and accurate measurements of events even with negligible cross-sections, SEFIs may be observed. The criterion for a SEFI is that it requires either a complete reconfiguration or power-cycle of the device before returning to normal operation. 4/26/2015Journal Club6

7 Mitigation  Mitigation involves both repairing altered configuration and logic design that is resistant to failure. Scrubbing refers to the periodic readback of the FPGA’s configuration memory, comparing it to a known good copy, and writing back any corrections required. By periodically scrubbing a device, maximum limits may be placed on the period of time that a configuration error can be present in a device. Triple Module Redundancy, (the most widely used technique) is an effective technique creating fault tolerant logic. 4/26/2015Journal Club7

8 Triple Module Redundancy  In TMR, the logic of the design can simply be triplicated, with redundant voters on the output. In order to recover smoothly from logic upsets, the internal state of the design must be restored to the repaired logic.  In the feedback counter, the state of the counters is obtained from the output of the voters. This feature has the effect of always presenting the correct state to the counter logic, resulting in the logic being self restoring in the event of an upset and subsequent repair.  TMR does not come without a price. Obviously, designs are at least 3 times as large as a non TMR design, and suffer from speed degradation as well. In particular, feedback TMR degrades the speed of operation by introducing a longer feedback path including the voter. Power consumption is also tripled along with the logic.  The underlying assumption of TMR is that only one upset will occur within a given logic block. This is not always a good assumption to make. Recent testing resulted in approximately.3-.5% of upsets causing multiple bit upsets within the device. 4/26/2015Journal Club8 TMR counter Feedback counter with TMR in the feedback path

9  Measurements of SEU and SEFI Experiment Set-up Heavy Ion Test Results Proton Test Results 4/26/2015Journal Club9

10 FPGAs Used in Measurement 4/26/2015Journal Club10

11 Test Setup in Vacuum 4/26/2015Journal Club11 The setup for in air testing was essentially the same as in vacuum, the main exception being that the adapted connections for getting through the bulkheads were discarded. Also, USB programming cables were used via high speed hubs for the in-air irradiations.

12 Latchup Testing – DUT FPGA  DUT – device under test  For the purpose of this experiment, the accepted definition of a latchup was any sudden high current modes resulting from the test run that required a power cycle of the DUT in order to recover.  Because the bottom of the silicon is solder “bumped” to a fully populated ball-grid package, it is difficult to heat the device enough for latchup testing with an external heating element. In order to obtain the target temperature (near 125°C junction temperature) in vacuum, the devices were configured with a “heater” (a long shift-register chain of CLB flip-flops) design meant to increase dynamic current consumption sufficient to heat the transistor junctions to a desired temperature. 4/26/2015Journal Club12

13 Latchup Testing – Results 4/26/2015Journal Club13

14 Heavy-Ion Test  The devices were tested at different incidences for an LET (linear energy transfer) range of 1.2–108.7 MeVc m² /mg. A combination of degraders and angles were used to achieve higher LET using the same ion. (How?) 4/26/2015Journal Club14

15 SEU Results  The data graphs shown in this report all have two sigma statistical error bars plotted.  The static heavy ion SEU response data set has been fit with a Weibull curve function to facilitate Orbital Rate Calculations. The equation below shows this function:  The absolute LET threshold extrapolates to about 1 MeV- cm²/mg (or lower) for both the configuration memory and the block memory. 4/26/2015Journal Club15

16 Single Event Functional Interrupt  Power-On-Reset (POR) SEFI results in a global reset of all internal storage cells and the loss of all program and state data.  SelectMAP (SMAP) SEFI is the loss of either read or write capabilities through the SelectMAP port.  Frame Address Register (FAR) SEFI results in the frame address register continuously incrementing uncontrollably.  Global Signal SEFI is separated from other design-disrupting SEFIs for the first time in these tests. These signals include GSR (Global Set/Reset), GWE_B (Global Write Enable), GHIGH_B (Global Drive High), and others. They can all be observed through the status (STAT) register or the control (CTL) register.  Readback SEFI occurs when a portion of the readback data has been upset and cannot be corrected.  Scrub SEFI seems to be the result of an upset causing corruption of the data stream being scrubbed into the DUT. 4/26/2015Journal Club16

17 SEFI Results 4/26/2015Journal Club17

18 Proton Test Results - SEU 4/26/2015Journal Club18

19 Proton Test Results - SEFI 4/26/2015Journal Club19

20 CREME96 Calculated Orbital Upset Rates 4/26/2015Journal Club20

21 CREME96 Calculated Orbital Upset Rates 4/26/2015Journal Club21

22 CREME96 Calculated Orbital Upset Rates 4/26/2015Journal Club22

23 CREME96 Calculated Orbital Upset Rates 4/26/2015Journal Club23

24 Summary  The SEFI cross sections are low enough to be almost academic.  The space upset rates given in Table 9 are sufficiently low.  Further study on orbital rate calculation.  Considering our experiment (Actel). 4/26/2015Journal Club24

25 Reference  Radiation effects and mitigation strategies for modern FPGAs  /26/2015Journal Club25


Download ppt "Radiation Effects on FPGA and Mitigation Strategies Bin Gui Experimental High Energy Physics Group 1Journal Club4/26/2015."

Similar presentations


Ads by Google