Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prof. Fernanda Lima Kastensmidt, Ph.D. Instituto de Informatica

Similar presentations


Presentation on theme: "Prof. Fernanda Lima Kastensmidt, Ph.D. Instituto de Informatica"— Presentation transcript:

1 SEE Mitigation Strategies for Digital Circuit Design Applicable to ASIC and FPGAs
Prof. Fernanda Lima Kastensmidt, Ph.D. Instituto de Informatica Universidade Federal do Rio Grande do Sul Porto Alegre – RS – Brazil

2 Motivation A large set of electronics devices used in avionic, space and ground-level applications can be upset by ionized particles. General System FPGA memory processors ASIC Analog electronics high reliability X low reliability Hardened components COTS components $$$$$$$$$$$$$$ $$$

3 Motivation Solution I:
If it is too expensive, so the solution may be design your own hardened device! Which fault tolerance techniques should be used? How much fault tolerance is enough? It is necessary to qualify your hardened design. high reliability Hardened components $$$$$$$$$$$$$$

4 Motivation Solution II:
It is necessary to qualify the device to analyze its robustness to the application! Is it possible to apply some fault tolerance technique? Software level Component replication level low reliability COTS components $$$

5 Types of SEE Single event phenomena can be classified into three
effects (in order of permanency): Single event upset and Single event transient (soft error) Single event latchup (soft or hard error) Single event burnout (hard failure) Hard errors or Single Event Latchup (SEL) are due to shorts between ground and power, and cause permanent functional damages.

6 Collected Charge Depending on the circuit, transistor size, charge energy, different current amplitude, duration and shapes will appear.

7 Charge Collection Mechanism
IC Ic IP Ip Ion IC(t) = ICRITICAL(t) = IP(t) – ION(t) Soft Error occurs when Qcollected > Qcritical

8 Fault Tolerance Fault Masking: any technique that prevents faults from introducing errors to the output (failure) + - FAILURE - + + - ionization

9 Fault Tolerance - + + + - - FAILURE ERROR
Fault latency Error latency shielding Transient current (injected or extracted from the junction) ERROR clk BIT-FLIP FAULT EFFECT Transient voltage pulse (capacitor node) FAULT + - FAILURE - + + - ionization Fault Masking (hardening by design): Hardware and time redundancy Hardened memory cells Error-correction codes Self-checking mechanisms with recovery Sensors (detection) 9

10 Fault Tolerance - + + + - - FAILURE ERROR Redundant Spare components
Fault latency Error latency Transient current (injected or extracted from the junction) ERROR clk BIT-FLIP FAULT EFFECT Transient voltage pulse (capacitor node) FAULT + - FAILURE - + + - Redundant Spare components ionization Fault Masking (hardening by design): Hardware and time redundancy Hardened memory cells Error-correction codes Self-checking mechanisms with recovery Sensors (detection) Number of faults overcome the mitigation technique

11 Outline Radiation Effects on Digital ICs
Radiation Hardening by Design: Strategies for ASICs Radiation Effects on FPGAs Radiation Hardening by Design: Strategies for FPGAs Final Remarks

12 Outline Radiation Effects on Digital ICs
Radiation Hardening by Design: Strategies for ASICs Radiation Effects on FPGAs Radiation Hardening by Design: Strategies for FPGAs Final Remarks

13 Single Event Effects (SEEs)
Transient Effect Single Event Upset (SEU): bit-flip in a sequential logic element Digital Single Event Transient (DSET): transient voltage pulse in the combinational logic 1 1 1 1 1 1 Combinational logic sequential logic sequential logic 13

14 SEU in Sequential Logic
WL WL OFF OFF OFF 1 1 OFF gnd BIT-FLIP N N ionization P P 14

15 Hardened Memories Approach 1: use decoupling resistors to slow the cell regenerative feedback response avoiding the bit-flip [Rocket, R., IEEE TNS, 1992]

16 Hardened Memory Approach 2: add transistors to create an appropriate feedback devoted to restore the data corrupted. IBM Memory Cell [Rockett cell, 88] HIT Memory Cell (Velazco, 92]

17 Hardened Memories The principle is to store the data in two different locations within the cell in such way that the corrupted part can be restored. Whitaker/Liu Memory Cell [Liu, 92] DICE Memory Cell [Calin, 96]

18 Dual Interlocked storage Cell (DICE)
clk clk OFF 1 1 OFF Qa Qb

19 Dual Interlocked storage Cell (DICE)
clk clk OFF OFF 1 1 OFF OFF Qa Qb

20 Dual Interlocked storage Cell (DICE)
The original value is restored clk clk OFF OFF 1 1 OFF OFF OFF Qa Qb

21 Challenges in Sequential Logic
MULTIPLE BIT UPSETS + - + - + - Particle incidence angle Transistor Dimensions Voltage Supply Memory Array Density + - Multiple memory cells Single memory cell

22 Charge Sharing (NMOS transistor)
T=50ps T=100ps T=250ps T=800ps T=2ns [Reed, et al., New Electronic Technologies Insertion into Flight Programs Workshop, 2007]

23 Limitations of Hardened Memory
Multiple nodes collecting charge are able to upset hardened memory cells. Solutions: Shallow Trench Isolation (STI) structures Suitable transistors placement and routing Hardened memory cells combined with hardware redundancy. + - - + - + + - + - - + + - ionization

24 Triple Modular Redundancy
inputs MAJ X Sequential logic 000 0 001 0 010 0 011 1 100 0 101 1 110 1 111 1 MAJ Combinational logic OK clk Each master-slave flip-flip can be composed of: standard latches: robust to multiple node collected charge in the same latch hardened latches: robust to multiple node collected charge in crossing domain latches too

25 Triple Modular Redundancy
inputs MAJ X 0 X 1 Sequential logic 000 0 001 0 010 0 011 1 100 0 101 1 110 1 111 1 Combinational MAJ logic X clk Voter’s output can show a transient wrong value that may be captured by the next memory cell.

26 Triple Modular Redundancy
MAJ Sequential logic Combinational logic MAJ OK OK Current strength MAJ Increases current drive helping keeping the node in the original value. OK clk Triple MAJ voter

27 Triple Modular Redundancy
inputs MAJ Sequential logic 000 0 001 0 010 0 011 1 100 0 101 1 110 1 111 1 X X Combinational logic Triple MAJ voter clk Catastrophic effect: the system votes three wrong values out of three and the result is assumed to be correct.

28 SET in Combinational Logic
Each node has an associated: Capacitance Resistance SET pulse Amplitude x Width Critical Charge QCRIT current QDrift Qdiffusion Charge Qi time

29 SET in Combinational Logic
Not all SETs are captured by a memory cell. They can be: Logical masked Electrical masked Latch window masked Logical masked 1 e0 e1 1 Q 1 1 e2 a3

30 SET in Combinational Logic
Not all SETs are captured by a memory cell. They can be: Logical masked Electrical masked Latch window masked Electrical masked 1 e0 e1 Negligible pulse 1 Q 1 e2 a3 1

31 SET in Combinational Logic
Not all SETs are captured by a memory cell. They can be: Logical masked Electrical masked Latch window masked Latch window masked 1 e0 e1 1 Q 1 e2 a3 1 clk edge

32 Electrical Masking Heavy Ion Radiation Results: 180nm CMOS
Pulse too narrow!!! [Bruguier, G., et al., IEEE TNS, 1996]

33 SET vs. Frequency Radiation Results: DSET for 180nm vs. Freq Freq. clk
[Benedetto et al, IEEE TNS, 2004]

34 Challenges in Combinational Logic
SET Transient Width (TW) may vary from few hundred of pico seconds to few nano seconds according to LET. TW TW 500 Mhz clk 1Ghz 2.5 GHz Critical Transient Width (ps) 5Ghz TW clk 100 Ghz 100 Process technology (nm) [Dodd, P., IEEE TNS 2004] clk TW

35 SET vs. SEU Error Rate

36 Challenges in Combinational Logic
According to the logic topology fan-out, a single SET may originate multiple SETs. a0 y0 Q0 a1 a2 a3 X a4 a5 y1 Q1 X

37 Identifying the most sensitive nodes
Fault injection performed by electrical (SPICE) and logic simulations can identify the most sensitive nodes: Lower critical charge (QCRIT) Lower SET logical mask probability most sensitive nodes A B C D E F Z

38 Transistor Resizing QCRITICAL [Zhou et al., IRPS 2004]
[Cazeaux et al., IOLTS 2005] [Dhillon et al., IEEE Transaction on ISVLSI 2006] most sensitive nodes A B C D E F Z QCRITICAL

39 Gate Replication Current strength [Lisboa, C., et al., SBCCI 2005]
[Nieuwland et al., IOLTS 2006] most sensitive nodes A B C D E F Z Current strength Increases current drive helping keeping the node in the original value.

40 Temporal Filtering Votes the SET out by time redundancy.
The time redundancy is implemented by delays at the clock lines or at the latch/flip-flops inputs. Sequential logic Sequential logic Combinational Combinational clk logic Triple or Single MAJ voter logic Triple or Single MAJ voter X T X clk+ T OK OK 2.T clk+ 2.T clk [Nicolaidis, VTS 1999], [Anghel et al., DATE 2000]

41 Full time redundancy [Nicolaidis, VTS 1999] [Anghel et al., DATE 2000]
clk [Nicolaidis, VTS 1999] [Anghel et al., DATE 2000] clk+T TW clk+2.T Sequential logic SET Combinational comb clk logic Triple or Single MAJ voter OK X clk+T ffp0 ffp1 clk+ 2.T ffp2 The .T is directly proportional to the SET Transient Width (TW) MAJ MAJ + comb delays T

42 Full time redundancy TW clk period (T) X clk clk+2.T clk+4.T SET
Sequential logic comb Combinational clk Triple or Single MAJ voter logic OK X clk+2.T ffp0 ffp1 clk+4.T ffp2 TW clk period (T) MAJ MAJ + comb delays T

43 Temporal Latching to Trigger SETs
Error cross-section decreases with the increase of T .T [Benedetto et al., IEEE TNS 2004]

44 Triple Sample Memory Robust to Multiple Bit Upsets and SET
[MAVIS, IRPS 2002] combinational logic Shifted clocks

45 Triple Sample Memory Robust to Multiple Bit Upsets and SET
[MAVIS, IRPS 2002] combinational logic X OK Shifted clocks

46 Triple Sample Memory Robust to Multiple Bit Upsets and SET
[MAVIS, IRPS 2002] Multiple nodes collected charge X combinational logic OK OK Shifted clocks OK

47 Triple Sample Memory Robust to Multiple Bit Upsets and SET
[MAVIS, IRPS 2002] Multiple nodes collected charge OK combinational logic OK OK Shifted clocks OK

48 Full Triple Modular Redundancy (TMR) with self-recovery
TRV0 OK combinational logic D0 clk0 TR1 voter TR2 E0 TR1 TRV1 D1 X combinational logic clk1 TR0 voter TR2 E1 TR2 TRV2 D2 combinational logic clk2 TR0 voter TR1 E2

49 Full Triple Modular Redundancy (TMR) with self-recovery
TRV0 D0 OK combinational logic clk0 TR1 voter TR2 E0 TR1 TRV1 D1 X combinational logic clk1 TR0 voter TR2 E1 TR2 TRV2 D2 combinational logic clk2 TR0 voter TR1 E2

50 Full Triple Modular Redundancy (TMR) with self-recovery
voter TR0 TR1 TR2 TRV0 TRV1 TRV2 E0 E1 E2 D0 D1 D2 clk0 clk1 clk2 combinational logic output pads output pad wired voter

51 How much mitigation is enough?
The circuits are becoming more and more complex Hardware and Time redundancy techniques can provide a certain level of protection on: Single Event Upsets (SEU) Single Event Transient (SET) Multiple Bits or Nodes Upsets Problem: in some cases multiple faults can overcome the mitigation techniques, provoking a system failure.

52 Multiple Faults in the Full TMR
TR0 TRV0 D0 X combinational logic clk0 TR1 voter TR2 E0 WRONG VALUE TR1 TRV1 D1 X combinational logic clk1 TR0 voter TR2 E1 TR2 TRV2 D2 combinational logic clk2 TR0 voter TR1 E2

53 How much mitigation is enough?
How is it possible to know that the mitigation technique is working properly for a certain Soft Error Rate (SER)? It is necessary to have a mechanism to inform the system when the number of multiple faults have passed a certain level. Built-in Self Test (BIST) Mechanism: sensors working as watch dogs each time an ionization occurs, the system is informed

54 How about sensors working as watch dogs?
Full TMR with sensors voter TR0 TR1 TR2 TRV0 TRV1 TRV2 D0 D1 D2 clk0 clk1 clk2 combinational logic sensors

55 How about sensors working as watch dogs?
Full TMR with sensors voter TR0 TR1 TR2 TRV0 TRV1 TRV2 D0 D1 D2 clk0 clk1 clk2 combinational logic sensors If sensors detect: One upset per time Technique is working!

56 How about sensors working as watch dogs?
Full TMR with sensors voter TR0 TR1 TR2 TRV0 TRV1 TRV2 D0 D1 D2 clk0 clk1 clk2 combinational logic sensors If sensors detect: Two or more upsets in distinct redundant modules per time X Technique is not working!

57 Bulk Built-in Current Sensors
During normal operation, the current in the bulk is approximately zero. When an energetic particle generates an ionization, it creates a current that flows through the stroke node and Vdd or gnd. The bulk-BICS senses the current generated by ionization at the bulk terminal. [Henes Neto et al. IEEE MICRO, 2006] Bulk-BICS

58 Bulk Built-in Current Sensors
Vdd Vdd Flips the BICS latch p6 p1 p2 1 p5 Vdd’ 1 p4 p3 RST P P N ionization BICS-P Vdd Circuit Design n1 n2 nRST Gnd’ n5 n6 n4 n3 BICS-N

59 Trade-offs There is always some penalty to be paid when protecting circuits against upsets. Each technique may present a combination of: area overhead, performance penalty, power dissipation increase. The challenge is to select the most cost-effective techniques for the target circuit application.

60 CASE-STUDY: Adder Detection SET SEU = = X X
Recomputing with Shifted Operands << >> = S = A + B 2.S = 2.A + 2.B ADDER = Duplication with Comparison (DWC) ADDER Bulk-BICS

61 CASE-STUDY: Adder SEU correction X X Error-Correction Code (Hamming)
enc dec ADDER Hardened Flip-flops

62 CASE-STUDY: Adder SEU and SET correction TMR with single voter
TMR with triple voter ADDER ADDER voter voter voter ADDER ADDER voter ADDER ADDER

63 CASE-STUDY: Adder SEU and SET correction
Time redundancy with TMR in the registers voter voter voter T ADDER voter voter 2.T

64 AREA vs. PERFORMANCE SEU and SET correction SEU correction
More than 200% SEU and SET correction SEU correction Less than 50% Less than 50% SEU and SET detection

65 How about Qualifying for SEE?
Testing by fault injection: Model the SEU and SET effect at: Spice level Logic level or RTL level Testing in a Laser Facility Testing at ground-level facilities (in front of a beam of Protons, heavy ions, neutrons) Testing in space (actual environment) accuracy cost

66 When testing in a Ground Level facility for SEE:
Static Testing: no application is running during the test. The register files are read during or after the test to check for SEU or/and SET and compared to a gold file. Test in memories, microprocessors, ASICs in general Dynamic Testing: Applications are running during test. Outputs are been analyzed and compared to a gold design. SEU and SET can be checked during test Test in memories, microprocessors, ASICs in general, analog circuits, etc…

67 General System FPGA memory processors ASIC Analog logic

68 Outline Radiation Effects on Digital ICs
Radiation Hardening by Design: Strategies for ASICs Radiation Effects on FPGAs Radiation Hardening by Design: Strategies for FPGAs Final Remarks

69 Field-Programmable Gate Arrays
An array of logic blocks and interconnections customizable by programmable switches. High logic density Customizable by the end user to realize different designs Configurable logic blocks (CLBs) interconnections Switches for customization

70 Programmable Technologies
Programmable switches can be based on: Antifuse: (Antifuses based FPGAs) when an electrically programmable switch forms a low resistance path between two metal layers. One-time configurable SRAM: (SRAM based FPGAs) the state of a static latch controls pass transistors or multiplexers connected to pre-defined metal layers Re-configurable Flash: (Flash based FPGAs) Floating gate controls the switches

71 Antifuse-based FPGAs Non-volatile: hold the customizable content even when not connected to the power supply. They can be programmed just once. FPGAs products for Space ACTEL AEROFLEX (based on Quicklogic)

72 ACTEL: RTAX-S device C R Super Cluster
RAM CT SC RAMC RD HD C R RX TX B Super Cluster [Actel, RTAX-S RadTolerant FPGAs 2007]

73 ACTEL: RTAX-S device C C R C-CELL R-CELL Susceptible to SET C-CELL
CFN FCI D1 D3 B0 B1 CFN Robust to SEU 1 1 X 1 1 1 1 1 1 1 1 1 ERROR D0 D2 DB A0 A1 Y D0 D2 Y DB A0 A1 FCO [Actel, RTAX-S RadTolerant FPGAs 2007]

74 Effects of Frequency Response
Circuit: Shift Register with 8 levels of C-cell between R-cells Error cross-section increases when frequency increases. # ERROR clk edge [Berg, M. et al., IEEE TNS 2006]

75 RadHard Eclipse FPGA from Aeroflex
ERROR X hardened flip-flops Robust to SEU ViaLink connections

76 Antifuse FPGAs: summary
Customized routing is not sensitive to SEU Flip-flops are not sensitive to SEU Actel and Aeroflex provides one solution where all flip-flops are hardened. Logic are susceptible to DSETs The user may protect the logic by using high level mitigation techniques in the VHDL/VERILOG description of the design (TMR, duplication and others)

77 SRAM-based FPGAs Volatile: loose their contents information when the memories are not connected to the power supply. They can be reprogrammed as many times as necessary at the work site They are programmed by loading a bitstream FPGAs products for Space XILINX ATMEL HONEYWELL

78 SRAM-based FPGAs Basic board must be composed of: FPGA
The original design bitstream must be stored in a memory outside the FPGA. Memory size needed: Bitstream may range from Kbytes to several Mbytes. Power Supply Core & IO Osc. EEPROM FPGA LOADER & MEMORY FPGA Programming Interface IO Interface

79 Reconfigurability Can offer benefits for space and remote applications by: saving space in the system: the same circuitry can be used with different configurations at different stages of a mission, reducing weight and power requirements. allowing in-orbit design changes reducing the mission cost by correcting errors If part of an FPGA fails, then circuitry can be reprogrammed to make use of remaining functional portions of the chips.

80 Synthesis optimizations
FPGA Design Flow Hardware Description Language Synthesis optimizations Logic mapping Placement Routing configuration bitstream … …

81 Technology Scaling in Xilinx FPGAs
Nanometer technologies Embedded Hard microprocessor Embedded memories (BRAM)

82 SRAM-based FPGA Architecture
Xilinx FPGA Configurable logic block (CLB) GRM slices A B C D Lookup Table (LUT) ‘0’ 1 Boolean Function F(A,B,C,D) BRAM The configuration memory is spread throughout the device in a large array. Each logic block contains its own configuration cells locally. A data frame is a one bit vertical slice through the array. The configuration logic identifies each frame with a unique address “minor” as well as a unique address for the column that it lies in. Configuration data is loaded serially or in byte-parallel into the interface where it is assembled into a frame in a shift register. The entire frame is loaded into memory all at once.

83 SEU in SRAM-based FPGAs: CLB slice
1 I1 I2 I3 I4 LUT CLB slice Transient Effect (corrected at next ffp load) routing When we take a look in the FPGA architecture, there are two basic elements to implement the logic in a FPGA, the lookup table that is responsible to implement the logic functions as truth tables and the flip-flops. They are all located in the CLB slices. These basic elements are connected through a programmable routing that are multiplexers or pass transistors controlled by static memory cells. When a SEU occurs in one of these memory cells: the configuration memory bits (LUTs and routing) or in the CLB flip-flop, the stored value is flipped, this creates a fault with a permanent effect in the case of LUT or routing, or a fault with a transient effect in the case of the CLB flip-flop. LUT Persistent effect (corrected by scrubbing) Configuration memory bits

84 SET in SRAM-based FPGAs : CLB slice
LUT 1 SET may be captured by the ffp. 1 1 1 X 1 1 1 1 The transistors that implement the CLB Look-up Table (LUT) logic and routing are also susceptible tom transient faults known as SETs. This type of fault is transient and it only generates an error in the design if the SET pulse is captured by the CLB flip-flop. LUT routing Configuration memory bits

85 General Routing Matrix (GRM)
Direct lines Long lines CLB CLB CLB CLB CLB CLB CLB CLB Hex lines CLB Hex connections CLB CLB CLB CLB CLB CLB CLB CLB CLB Direct connections Fast connect CLB Double lines CLB CLB

86 SEU in SRAM-based FPGAs: Routing
Direct connections: Hex connections: 1 open open short short 1 1 1 1 short open

87 Other sensitive structures
Power-on Reset (POR) Low probability of occurrence Signature: done pin transitions low, I/O becomes tri-stated, no user functionality available Solution: reconfigure device Single-Event-Functional Interrupts (SEFI) SelectMAP and JTAG controllers Low probability of occurrence Signature: loss of communication, read access to configuration memory returns constant value. Solution: reconfigure device Digital Clock Manager (DCM) Power-PC Hard IP Input and Output Blocks (IOB) Multi-Gigabit Transceivers (MGT)

88 SEE Characterization – Heavy Ion: Static Testing in Virtex4
So, it is common to evaluate the SEU sensitivity of the configuration memory bits, BRAM bits, CLB flip-flops and also the Power on Reset (POR) by static testing. But, what can we do it we want to draw the cross-section of SET in the FPGA logic circuitry that I presented? BRAMs present higher error cross-section compared to CLBs Error cross-section of POR in Virtex4 has improved compared to Virtex-II. [George, et al. IEEE Radiation Effects Data Workshop, 2006]

89 Scrubbing ISE tool Synthesis optimizations Logic mapping Placement
Hardware Description Language TMR by hand ISE tool Synthesis optimizations Logic mapping Placement Routing ISE tool Placement Routing configuration bitstream … … Scrubbing (full or partial reconfiguration) Fault Injection (fault tolerance verification) output

90 Scrubbing: continuous configuration
It does not correct upsets in: Embedded Memory (BRAM) CLB flip-flops No application interruption XQR18V04 SRAM-based FPGA DATA[7:0] DATA[7:0] BOOT OE/RESET INIT Configuration bits CE DONE GND CLK CS WR I/O PROM I/O Original bitstream SCRUB Controller XQR18V04 DATA[7:0] SCRUB OE/RESET I/O CE I/O GND CLK I/O OSC CCLK

91 Configuration Scrubbing Example: to correct persistent effect faults
Column x Configuration Upset

92 Configuration Scrubbing Example: to correct persistent effect faults
Upset Repaired Scrubbing rate is important to reduce the probability of multiple upsets. Scrubbing can be performed: from outside the FPGA by another FPGA controller from inside the FPGA: Hardware Internal Configuration Access Port (HWICAP) Scrub Column

93 Mitigation Techniques
Hardware Description Language TMR by hand ISE tool Synthesis optimizations Logic mapping Placement Routing ISE tool Placement Routing configuration bitstream … … Scrubbing (full or partial reconfiguration) Fault Injection (fault tolerance verification) output

94 X-TMR Full TMR in: Combinational logic Sequential Logic
Inputs/Output pads Why do we need full TMR? To guarantee the correct output in the presence of the persistent effect errors that are corrected only by loading the correct bitstream. FPGA REDUNDANT LOGIC (tr0) REDUNDANT LOGIC (tr0) REDUNDANT LOGIC (tr0) INPUT package PIN REDUNDANT LOGIC (tr1) REDUNDANT LOGIC (tr1) TMR flip-flop REDUNDANT LOGIC (tr1) TMR flip-flop TMR Output Voter OUTPUT REDUNDANT LOGIC (tr2) REDUNDANT LOGIC (tr2) REDUNDANT LOGIC (tr2) package PIN

95 FPGA REDUNDANT LOGIC (tr0) REDUNDANT LOGIC (tr0) REDUNDANT LOGIC (tr0) INPUT package PIN REDUNDANT LOGIC (tr1) REDUNDANT LOGIC (tr1) TMR flip-flop REDUNDANT LOGIC (tr1) TMR flip-flop TMR Output Voter OUTPUT REDUNDANT LOGIC (tr2) REDUNDANT LOGIC (tr2) REDUNDANT LOGIC (tr2) The recovery path is mandatory to correct the state of the flip-flops, specially in FSM. TMR flip-flop tr0 MAJ R0 R1 R2 MAJ 1 clk0 tr1 MAJ clk1 tr2 MAJ clk2 LUT: _

96 0: it allows the data to pass to the output pad.
FPGA REDUNDANT LOGIC (tr0) REDUNDANT LOGIC (tr0) REDUNDANT LOGIC (tr0) INPUT package PIN REDUNDANT LOGIC (tr1) REDUNDANT LOGIC (tr1) TMR flip-flop REDUNDANT LOGIC (tr1) TMR flip-flop TMR Output Voter OUTPUT REDUNDANT LOGIC (tr2) REDUNDANT LOGIC (tr2) REDUNDANT LOGIC (tr2) package PIN REF R0 R1 R2 MAJ 1 0: it allows the data to pass to the output pad. 1: it blocks the data R0 3-state_0 O_voter R0 R1 O_voter 3-state_1 R1 LUT: _ R2 3-state_2 O_voter R2

97 Evaluating TMR I/O pads
[Swift et al, IEEE TNS 2004] Inputs at 66 MHz

98 Evaluating TMR I/O pads
[Swift et al., IEEE TNS 2004] Heavy Ion

99 Evaluating Multiple Bit Upsets
Heavy ion radiation static test: Virtex Family Virtex II Family 220nm CMOS 130nm CMOS [Quinn, et al., IEEE TNS, 2005]

100 Domain Crossing Events
Bit-flips in the routing can generate short cut connections among different blocks of the TMR (tr0, tr1 and tr2). FPGA REDUNDANT LOGIC (tr0) a X OK INPUT package PIN tr0 with voters and refresh TMR register Majority Voter TMR Output REDUNDANT LOGIC (tr1) OUTPUT tr1 REDUNDANT LOGIC (tr2) tr2 package PIN Bit-flip a: affects only the redundant logic tr0, consequently, the majority voter choose the correct result (two out of three outputs).

101 Domain Crossing Events
Bit-flips in the routing can generate short cut connections among different blocks of the TMR (tr0, tr1 and tr2). FPGA REDUNDANT LOGIC (tr0) OK X INPUT package PIN tr0 REDUNDANT LOGIC (tr1) with voters and refresh TMR register Majority Voter TMR Output b OUTPUT tr1 REDUNDANT LOGIC (tr2) tr2 package PIN Bit-flip b: affect two redundant logic parts, consequently, the majority voter will not choose the correct result (two out of three outputs).

102 Solution to Reduce Domain Crossing Events
Voters Insertion: Barrier of voters can reduce the probability of a bit-flip in the routing causing a short cut connection among two or more redundant blocks. [Kastensmidt, et al., DATE 2005] FPGA REDUNDANT LOGIC (tr0) OK X INPUT package PIN tr0 tr0 tr0 REDUNDANT LOGIC (tr1) with voters and refresh TMR register b Majority Voter TMR Output OUTPUT TMR Majority Voter tr1 TMR Majority Voter tr1 tr1 REDUNDANT LOGIC (tr2) tr2 tr2 tr2 package PIN logic partition

103 TMR BRAM (Embedded memory)
Upsets in BRAMs are not corrected by scrubbing. TMR with refreshing must be used to mitigate upsets. Need to use Dual Port BRAMs. Mechanism to refresh the memory contents Counter Voters X OK OK

104 Verifying the Mitigated Design
Hardware Description Language TMR by hand ISE tool Synthesis optimizations Logic mapping Placement Routing ISE tool Placement Routing configuration bitstream … … Scrubbing (full or partial reconfiguration) Fault Injection (fault tolerance verification) output checking

105 Flash-based: Actel ProASIC3

106 Flash-based FPGA: CLB tile

107 Summary Antifuse FPGAs:
- Fault tolerance techniques applied in VHDL/Verilog - protect SET (SEU is protected by the vendor) SRAM FPGA - Scrubbing to clean persistent faults - protect SET and SEU - New FPGA protected by Vendor is coming out! Flash FPGA - protect SEU and SET - Flash transistor sensitivity for SEE is low, still under Investigation

108 Outline Radiation Effects on Digital ICs
Radiation Hardening by Design: Strategies for ASICs Radiation Effects on FPGAs Radiation Hardening by Design: Strategies for FPGAs Final Remarks

109 Final Remarks Mitigation techniques for ASICs and FPGAs must take into account SEUs and SETs considering single and multiple effects. ASICs: Integrated systems fabricated at nanometer technologies should have mitigation techniques at different levels to ensure robustness: charge dissipation (transistor resizing, capacitors, resistors) Sensors (bulk-BICS) hardware and time redundancy Error-correction codes (ECCs) Self-checking and recomputation

110 Final Remarks FPGAs: new FPGA generations bring more flexibility and design capabilities but also more reliable design challenges. The design can always be protected by high level techniques (VHDL, VERILOG) such as TMR. In order to reduce the cost of TMR, solutions at the FPGA architectural level must be done in: CLB logic: Combinational blocks Sequential blocks Programmable switches Routing programmable switches … to mitigate against SEU and SET!

111 Conferences NSREC – IEEE Nuclear and Space Radiation Effects Conference RADECS European Conference on Radiation Effects on Components and Systems 2011- RADECS in Sevilla, SPAIN

112 Schools SERESSA 2011 - Brazil First: 2006 - Manaus - Brazill
Second: Sevilla - Spain Third: Buenos Aires - Argentina Fourth: Florida, USA France Brazil

113

114 SEE Mitigation Strategies for Digital Circuit Design Applicable to ASIC and FPGAs
Fernanda Lima Kastensmidt, Ph.D.


Download ppt "Prof. Fernanda Lima Kastensmidt, Ph.D. Instituto de Informatica"

Similar presentations


Ads by Google