Presentation is loading. Please wait.

Presentation is loading. Please wait.

SEE Mitigation Strategies for Digital Circuit Design Applicable to ASIC and FPGAs Prof. Fernanda Lima Kastensmidt, Ph.D. Instituto de Informatica Universidade.

Similar presentations


Presentation on theme: "SEE Mitigation Strategies for Digital Circuit Design Applicable to ASIC and FPGAs Prof. Fernanda Lima Kastensmidt, Ph.D. Instituto de Informatica Universidade."— Presentation transcript:

1 SEE Mitigation Strategies for Digital Circuit Design Applicable to ASIC and FPGAs Prof. Fernanda Lima Kastensmidt, Ph.D. Instituto de Informatica Universidade Federal do Rio Grande do Sul Porto Alegre – RS – Brazil

2 Prof. Fernanda Lima Kastensmidt Motivation A large set of electronics devices used in avionic, space and ground-level applications can be upset by ionized particles. memory processors Analog electronics FPGA ASIC Hardened components COTS components X $$$$$$$$$$$$$$ $$$ high reliability low reliability General System

3 Prof. Fernanda Lima Kastensmidt Motivation Solution I: If it is too expensive, so the solution may be design your own hardened device! – Which fault tolerance techniques should be used? – How much fault tolerance is enough? It is necessary to qualify your hardened design. Hardened components $$$$$$$$$$$$$$ high reliability

4 Prof. Fernanda Lima Kastensmidt Motivation Solution II: It is necessary to qualify the device to analyze its robustness to the application! – Is it possible to apply some fault tolerance technique? Software level Component replication level COTS components $$$ low reliability

5 Prof. Fernanda Lima Kastensmidt Types of SEE Single event phenomena can be classified into three effects (in order of permanency): Single event upset and Single event transient (soft error) Single event latchup (soft or hard error) Single event burnout (hard failure) Hard errors or Single Event Latchup (SEL) are due to shorts between ground and power, and cause permanent functional damages.

6 Prof. Fernanda Lima Kastensmidt Depending on the circuit, transistor size, charge energy, different current amplitude, duration and shapes will appear. Collected Charge

7 Prof. Fernanda Lima Kastensmidt I C (t) = I CRITICAL (t) = I P (t) – I ON (t) IPIP I ON ICIC Ion Ip Ic Soft Error occurs when Q collected > Q critical Charge Collection Mechanism

8 Prof. Fernanda Lima Kastensmidt Fault Tolerance +-+- -+-+ +-+- ionization FAILURE Fault Masking: any technique that prevents faults from introducing errors to the output (failure)

9 Prof. Fernanda Lima Kastensmidt Fault Tolerance +-+- -+-+ +-+- ionization Transient current (injected or extracted from the junction) Transient voltage pulse (capacitor node) FAULT ERROR clk BIT-FLIP FAULT EFFECT FAILURE Sensors (detection) Error latencyFault latency Fault Masking (hardening by design): Hardware and time redundancy Hardened memory cells Error-correction codes Self-checking mechanisms with recovery shielding

10 Prof. Fernanda Lima Kastensmidt Fault Tolerance +-+- -+-+ +-+- ionization Transient current (injected or extracted from the junction) Transient voltage pulse (capacitor node) FAULT ERROR clk BIT-FLIP FAULT EFFECT FAILURE Sensors (detection) Error latencyFault latency Redundant Spare components Fault Masking (hardening by design): Hardware and time redundancy Hardened memory cells Error-correction codes Self-checking mechanisms with recovery Number of faults overcome the mitigation technique

11 Prof. Fernanda Lima Kastensmidt Outline Radiation Effects on Digital ICs Radiation Hardening by Design: Strategies for ASICs Radiation Effects on FPGAs Radiation Hardening by Design: Strategies for FPGAs Final Remarks

12 Prof. Fernanda Lima Kastensmidt Outline Radiation Effects on Digital ICs Radiation Hardening by Design: Strategies for ASICs Radiation Effects on FPGAs Radiation Hardening by Design: Strategies for FPGAs Final Remarks

13 Prof. Fernanda Lima Kastensmidt Single Event Effects (SEEs) Single Event Upset (SEU): bit-flip in a sequential logic element Digital Single Event Transient (DSET): transient voltage pulse in the combinational logic Combinational logic sequential logic 00100010 1 1 01 1 1 Transient Effect

14 Prof. Fernanda Lima Kastensmidt SEU in Sequential Logic 1 0 OFF P NN gnd OFF 0 1 BIT-FLIP ionization P WL

15 Prof. Fernanda Lima Kastensmidt Hardened Memories Approach 1: use decoupling resistors to slow the cell regenerative feedback response avoiding the bit-flip [Rocket, R., IEEE TNS, 1992]

16 Prof. Fernanda Lima Kastensmidt Hardened Memory Approach 2: add transistors to create an appropriate feedback devoted to restore the data corrupted. IBM Memory Cell [Rockett cell, 88] HIT Memory Cell (Velazco, 92]

17 Prof. Fernanda Lima Kastensmidt Hardened Memories two different locations The principle is to store the data in two different locations within the cell in such way that the corrupted part can be restored. Whitaker/Liu Memory Cell [Liu, 92]DICE Memory Cell [Calin, 96]

18 Prof. Fernanda Lima Kastensmidt Dual Interlocked storage Cell (DICE) clk 0 0 0 1 1 OFF QaQb

19 Prof. Fernanda Lima Kastensmidt clk 0 0 0 1 1 OFF 0 QaQb Dual Interlocked storage Cell (DICE)

20 Prof. Fernanda Lima Kastensmidt OFF 0 clk 0 0 1 1 OFF The original value is restored OFF QaQb 0 Dual Interlocked storage Cell (DICE)

21 Prof. Fernanda Lima Kastensmidt Challenges in Sequential Logic Particle incidence angle Transistor Dimensions Voltage Supply Memory Array Density +-+-+-+- +-+-+-+- +-+-+-+- +-+-+-+- MULTIPLE BIT UPSETS Single memory cell Multiple memory cells

22 Prof. Fernanda Lima Kastensmidt Charge Sharing (NMOS transistor) T=0 T=100ps T=250ps T=50ps T=800ps T=2ns [Reed, et al., New Electronic Technologies Insertion into Flight Programs Workshop, 2007]

23 Prof. Fernanda Lima Kastensmidt Limitations of Hardened Memory Multiple nodes collecting charge are able to upset hardened memory cells. Solutions: Shallow Trench Isolation (STI) structures Suitable transistors placement and routing Hardened memory cells combined with hardware redundancy. +-+- -+-+ +-+- ionization -+-+ +-+- +-+- -+-+

24 Prof. Fernanda Lima Kastensmidt Triple Modular Redundancy OK MAJ 000 0 001 0 010 0 011 1 100 0 101 1 110 1 111 1 inputs MAJ clk Sequential logic Combinational logic X Each master-slave flip-flip can be composed of: standard latches: robust to multiple node collected charge in the same latch hardened latches: robust to multiple node collected charge in crossing domain latches too

25 Prof. Fernanda Lima Kastensmidt Triple Modular Redundancy MAJ 000 0 001 0 010 0 011 1 100 0 101 1 110 1 111 1 inputs clk Sequential logic Combinational logic X MAJ Voters output can show a transient wrong value that may be captured by the next memory cell. X 0 X 1

26 Prof. Fernanda Lima Kastensmidt Triple Modular Redundancy clk Sequential logic Combinational logic MAJ OK Current strength Triple MAJ voter OK Increases current drive helping keeping the node in the original value.

27 Prof. Fernanda Lima Kastensmidt Triple Modular Redundancy MAJ 000 0 001 0 010 0 011 1 100 0 101 1 110 1 111 1 inputs clk Sequential logic Combinational logic X X X X Catastrophic effect: the system votes three wrong values out of three and the result is assumed to be correct. Triple MAJ voter

28 Prof. Fernanda Lima Kastensmidt SET in Combinational Logic Each node has an associated: Capacitance Resistance current time Charge Q i Q Drift Q diffusion … Critical Charge Q CRIT SET pulse Amplitude x Width

29 Prof. Fernanda Lima Kastensmidt SET in Combinational Logic e0 e1 e2 a3 Q 1 0 0 1 Not all SETs are captured by a memory cell. They can be: Logical masked Electrical masked Latch window masked Logical masked 0 1 1

30 Prof. Fernanda Lima Kastensmidt SET in Combinational Logic e0 e1 e2 a3 Q 0 1 1 Electrical masked 0 1 1 0 0 Not all SETs are captured by a memory cell. They can be: Logical masked Electrical masked Latch window masked Negligible pulse

31 Prof. Fernanda Lima Kastensmidt SET in Combinational Logic e0 e1 e2 a3 Q 0 1 1 0 1 1 0 clk edge 0 Latch window masked Not all SETs are captured by a memory cell. They can be: Logical masked Electrical masked Latch window masked

32 Prof. Fernanda Lima Kastensmidt Electrical Masking [Bruguier, G., et al., IEEE TNS, 1996] Heavy Ion Radiation Results: 180nm CMOS Pulse too narrow!!!

33 Prof. Fernanda Lima Kastensmidt SET vs. Frequency Radiation Results: DSET for 180nm vs. Freq Freq. clk [Benedetto et al, IEEE TNS, 2004]

34 Prof. Fernanda Lima Kastensmidt TW Challenges in Combinational Logic SET Transient Width (TW) may vary from few hundred of pico seconds to few nano seconds according to LET. clk [Dodd, P., IEEE TNS 2004] TW 100 Critical Transient Width (ps) 100 Ghz 5Ghz 2.5 GHz 1Ghz 500 Mhz Process technology (nm)

35 Prof. Fernanda Lima Kastensmidt SET vs. SEU Error Rate

36 Prof. Fernanda Lima Kastensmidt Challenges in Combinational Logic According to the logic topology fan-out, a single SET may originate multiple SETs. a0 a1 a2 a3 a4 a5 y0 y1 Q0 Q1 X X

37 Prof. Fernanda Lima Kastensmidt Identifying the most sensitive nodes Fault injection performed by electrical (SPICE) and logic simulations can identify the most sensitive nodes: Lower critical charge (Q CRIT ) Lower SET logical mask probability ABCDEFABCDEF Z most sensitive nodes

38 Prof. Fernanda Lima Kastensmidt Transistor Resizing ABCDEFABCDEF Z most sensitive nodes [Zhou et al., IRPS 2004] [Cazeaux et al., IOLTS 2005] [Dhillon et al., IEEE Transaction on ISVLSI 2006] Q CRITICAL

39 Prof. Fernanda Lima Kastensmidt Gate Replication ABCDEFABCDEF Z most sensitive nodes [Lisboa, C., et al., SBCCI 2005] Increases current drive helping keeping the node in the original value. [Nieuwland et al., IOLTS 2006] Current strength

40 Prof. Fernanda Lima Kastensmidt Temporal Filtering Votes the SET out by time redundancy. The time redundancy is implemented by delays at the clock lines or at the latch/flip-flops inputs. clk Sequential logic Combinational logic clk+ T clk+ 2. T X OK Sequential logic Combinational logic clk X OK 2. T T Triple or Single MAJ voter Triple or Single MAJ voter [Nicolaidis, VTS 1999], [Anghel et al., DATE 2000]

41 Prof. Fernanda Lima Kastensmidt Full time redundancy clk Sequential logic Combinational logic clk+ T clk+ 2. T X clk clk+ T T comb clk+2. T T SET ffp0 ffp1 ffp2 MAJ MAJ + comb delays T OK [Nicolaidis, VTS 1999] [Anghel et al., DATE 2000] The.T is directly proportional to the SET Transient Width (TW) Triple or Single MAJ voter TW

42 Prof. Fernanda Lima Kastensmidt Full time redundancy clk Sequential logic Combinational logic clk+2. T clk+4. T X OK clk clk+2. T T comb clk+4. T T SET ffp0 ffp1 ffp2 MAJ MAJ + comb delays T TW clk period (T) Triple or Single MAJ voter 2. TW

43 Prof. Fernanda Lima Kastensmidt Temporal Latching to Trigger SETs [Benedetto et al., IEEE TNS 2004] Error cross-section decreases with the increase of T.T

44 Prof. Fernanda Lima Kastensmidt Triple Sample Memory Robust to Multiple Bit Upsets and SET [MAVIS, IRPS 2002] combinational logic Shifted clocks

45 Prof. Fernanda Lima Kastensmidt Triple Sample Memory Robust to Multiple Bit Upsets and SET [MAVIS, IRPS 2002] combinational logic Shifted clocks X OK

46 Prof. Fernanda Lima Kastensmidt Triple Sample Memory Robust to Multiple Bit Upsets and SET [MAVIS, IRPS 2002] combinational logic Shifted clocks Multiple nodes collected charge OK X

47 Prof. Fernanda Lima Kastensmidt Triple Sample Memory Robust to Multiple Bit Upsets and SET [MAVIS, IRPS 2002] combinational logic Shifted clocks OK Multiple nodes collected charge

48 Prof. Fernanda Lima Kastensmidt Full Triple Modular Redundancy (TMR) with self-recovery voter TR0 TR1 TR2 TR0 TR2 TR0 TR1 TRV0 TRV1 TRV2 E0 E1 E2 D0 D1 D2 clk0 clk1 clk2 X OK combinational logic

49 Prof. Fernanda Lima Kastensmidt Full Triple Modular Redundancy (TMR) with self-recovery voter TR0 TR1 TR2 TR0 TR2 TR0 TR1 TRV0 TRV1 TRV2 E0 E1 E2 D0 D1 D2 clk0 clk1 clk2 combinational logic X OK

50 Prof. Fernanda Lima Kastensmidt Full Triple Modular Redundancy (TMR) with self-recovery voter TR0 TR1 TR2 TR0 TR2 TR0 TR1 TRV0 TRV1 TRV2 E0 E1 E2 D0 D1 D2 clk0 clk1 clk2 combinational logic output pad wired voter output pads

51 Prof. Fernanda Lima Kastensmidt How much mitigation is enough? The circuits are becoming more and more complex Hardware and Time redundancy techniques can provide a certain level of protection on: –Single Event Upsets (SEU) –Single Event Transient (SET) –Multiple Bits or Nodes Upsets Problem: in some cases multiple faults can overcome the mitigation techniques, provoking a system failure.

52 Prof. Fernanda Lima Kastensmidt Multiple Faults in the Full TMR voter TR0 TR1 TR2 TR0 TR2 TR0 TR1 TRV0 TRV1 TRV2 E0 E1 E2 D0 D1 D2 clk0 clk1 clk2 combinational logic X X WRONG VALUE

53 Prof. Fernanda Lima Kastensmidt How much mitigation is enough? How is it possible to know that the mitigation technique is working properly for a certain Soft Error Rate (SER)? It is necessary to have a mechanism to inform the system when the number of multiple faults have passed a certain level. Built-in Self Test (BIST) Mechanism: – sensors working as watch dogs – each time an ionization occurs, the system is informed

54 Prof. Fernanda Lima Kastensmidt How about sensors working as watch dogs? voter TR0 TR1 TR2 TR0 TR2 TR0 TR1 TRV0 TRV1 TRV2 D0 D1 D2 clk0 clk1 clk2 combinational logic sensors Full TMR with sensors

55 Prof. Fernanda Lima Kastensmidt How about sensors working as watch dogs? voter TR0 TR1 TR2 TR0 TR2 TR0 TR1 TRV0 TRV1 TRV2 D0 D1 D2 clk0 clk1 clk2 combinational logic sensors If sensors detect: One upset per time Technique is working! Full TMR with sensors

56 Prof. Fernanda Lima Kastensmidt How about sensors working as watch dogs? voter TR0 TR1 TR2 TR0 TR2 TR0 TR1 TRV0 TRV1 TRV2 D0 D1 D2 clk0 clk1 clk2 combinational logic sensors If sensors detect: Two or more upsets in distinct redundant modules per time Technique is not working! Full TMR with sensors X

57 Prof. Fernanda Lima Kastensmidt Bulk Built-in Current Sensors During normal operation, the current in the bulk is approximately zero. When an energetic particle generates an ionization, it creates a current that flows through the stroke node and V dd or gnd. The bulk-BICS senses the current generated by ionization at the bulk terminal. + - + - + - [Henes Neto et al. IEEE MICRO, 2006] Bulk-BICS

58 Prof. Fernanda Lima Kastensmidt Bulk Built-in Current Sensors Circuit Design Vdd Gnd Vdd BICS-N BICS-P n1n2 n4n3 n5 p4 n6 p6 p5 p1p2 p3 nRST RST Vdd 0 1 N PP ionization 0 1 Flips the BICS latch

59 Prof. Fernanda Lima Kastensmidt Trade-offs There is always some penalty to be paid when protecting circuits against upsets. Each technique may present a combination of: – area overhead, – performance penalty, – power dissipation increase. The challenge is to select the most cost-effective techniques for the target circuit application.

60 Prof. Fernanda Lima Kastensmidt CASE-STUDY: Adder ADDER X X Detection SET SEU ADDER = Duplication with Comparison (DWC) ADDER Bulk-BICS ADDER Recomputing with Shifted Operands << >> = S = A + B 2.S = 2.A + 2.B

61 Prof. Fernanda Lima Kastensmidt CASE-STUDY: Adder ADDER X X SEU correction ADDER Hardened Flip-flops ADDER Error-Correction Code (Hamming) encdec encdec encdec

62 Prof. Fernanda Lima Kastensmidt CASE-STUDY: Adder SEU and SET correction ADDER voter ADDER voter TMR with single voter TMR with triple voter

63 Prof. Fernanda Lima Kastensmidt CASE-STUDY: Adder SEU and SET correction ADDER voter 2. T T Time redundancy with TMR in the registers

64 Prof. Fernanda Lima Kastensmidt AREA vs. PERFORMANCE SEU and SET detection SEU correction SEU and SET correction Less than 50% More than 200% Less than 50%

65 Prof. Fernanda Lima Kastensmidt How about Qualifying for SEE? Testing by fault injection: – Model the SEU and SET effect at: Spice level Logic level or RTL level Testing in a Laser Facility Testing at ground-level facilities – (in front of a beam of Protons, heavy ions, neutrons) Testing in space (actual environment) accuracy cost

66 Prof. Fernanda Lima Kastensmidt When testing in a Ground Level facility for SEE: Static Testing: – no application is running during the test. – The register files are read during or after the test to check for SEU or/and SET and compared to a gold file. – Test in memories, microprocessors, ASICs in general Dynamic Testing: – Applications are running during test. – Outputs are been analyzed and compared to a gold design. – SEU and SET can be checked during test – Test in memories, microprocessors, ASICs in general, analog circuits, etc…

67 Prof. Fernanda Lima Kastensmidt General System memory processors Analog logic FPGA ASIC

68 Prof. Fernanda Lima Kastensmidt Outline Radiation Effects on Digital ICs Radiation Hardening by Design: Strategies for ASICs Radiation Effects on FPGAs Radiation Hardening by Design: Strategies for FPGAs Final Remarks

69 Prof. Fernanda Lima Kastensmidt Field-Programmable Gate Arrays An array of logic blocks and interconnections customizable by programmable switches. High logic density Customizable by the end user to realize different designs Configurable logic blocks (CLBs) interconnections Switches for customization

70 Prof. Fernanda Lima Kastensmidt Programmable Technologies Programmable switches can be based on: Antifuse: (Antifuses based FPGAs) – when an electrically programmable switch forms a low resistance path between two metal layers. – One-time configurable SRAM: (SRAM based FPGAs) – the state of a static latch controls pass transistors or multiplexers connected to pre-defined metal layers – Re-configurable Flash: (Flash based FPGAs) – Floating gate controls the switches – Re-configurable

71 Prof. Fernanda Lima Kastensmidt Antifuse-based FPGAs Non-volatile: hold the customizable content even when not connected to the power supply. They can be programmed just once. FPGAs products for Space – ACTEL – AEROFLEX (based on Quicklogic)

72 Prof. Fernanda Lima Kastensmidt ACTEL: RTAX-S device RAM CT RAM SC RAMC RD HD [Actel, RTAX-S RadTolerant FPGAs 2007] CR RX TX RX TX RX TX RX TX B CCCR Super Cluster

73 Prof. Fernanda Lima Kastensmidt ACTEL: RTAX-S device CRC D2D0 DB A0A1 Y D3 D1 B1 B0 FCI CFN 1 0 1 0 1 0 1 0 1 0 D2D0 DB A0A1 FCO Y D3D1 B1 B0 CFN 1 0 1 0 1 0 1 0 1 0 1 0 C-CELL R-CELL Robust to SEU Susceptible to SET [Actel, RTAX-S RadTolerant FPGAs 2007] XXXXXX ERROR C-CELL

74 Prof. Fernanda Lima Kastensmidt Effects of Frequency Response Circuit: Shift Register with 8 levels of C-cell between R-cells Error cross-section increases when frequency increases. # ERROR clk edge [Berg, M. et al., IEEE TNS 2006]

75 Prof. Fernanda Lima Kastensmidt hardened flip-flops ViaLink connections RadHard Eclipse FPGA from Aeroflex Robust to SEU X ERROR

76 Prof. Fernanda Lima Kastensmidt Antifuse FPGAs: summary Customized routing is not sensitive to SEU Flip-flops are not sensitive to SEU –Actel and Aeroflex provides one solution where all flip-flops are hardened. Logic are susceptible to DSETs –The user may protect the logic by using high level mitigation techniques in the VHDL/VERILOG description of the design (TMR, duplication and others)

77 Prof. Fernanda Lima Kastensmidt SRAM-based FPGAs Volatile: loose their contents information when the memories are not connected to the power supply. They can be reprogrammed as many times as necessary at the work site They are programmed by loading a bitstream FPGAs products for Space – XILINX – ATMEL – HONEYWELL

78 Prof. Fernanda Lima Kastensmidt SRAM-based FPGAs Basic board must be composed of: FPGA Osc. IO Interface Power Supply Core & IO EEPROM FPGA LOADER & MEMORY Programming Interface The original design bitstream must be stored in a memory outside the FPGA. Memory size needed: Bitstream may range from Kbytes to several Mbytes. 110101011

79 Prof. Fernanda Lima Kastensmidt Reconfigurability Can offer benefits for space and remote applications by: saving space in the system: the same circuitry can be used with different configurations at different stages of a mission, reducing weight and power requirements. allowing in-orbit design changes reducing the mission cost by correcting errors If part of an FPGA fails, then circuitry can be reprogrammed to make use of remaining functional portions of the chips.

80 Prof. Fernanda Lima Kastensmidt FPGA Design Flow Hardware Description Language Synthesis optimizations Logic mapping Placement Routing configuration bitstream … 101001110100000111…

81 Prof. Fernanda Lima Kastensmidt Technology Scaling in Xilinx FPGAs Nanometer technologies Embedded Hard microprocessor Embedded memories (BRAM)

82 Prof. Fernanda Lima Kastensmidt SRAM-based FPGA Architecture Configurable logic block (CLB) GRM slices BRAM Boolean Function F(A,B,C,D) Xilinx FPGA

83 Prof. Fernanda Lima Kastensmidt Configuration memory bits SEU in SRAM-based FPGAs: CLB slice CLB slice 0 0 0 1 0 1 1 1 0 0 0 1 0 1 1 1 I1I1 I2I2 I3I3 I4I4 LUT routing LUT Persistent effect (corrected by scrubbing) Transient Effect (corrected at next ffp load)

84 Prof. Fernanda Lima Kastensmidt Configuration memory bits SET in SRAM-based FPGAs : CLB slice CLB slice 0 0 0 1 0 1 1 1 0 0 0 1 0 1 1 1 I1I1 I2I2 I3I3 I4I4 LUT routing X LUT SET may be captured by the ffp.

85 Prof. Fernanda Lima Kastensmidt Direct connections Hex connections General Routing Matrix (GRM) Direct lines Double lines CLB Long lines Hex lines CLB Fast connect CLB

86 Prof. Fernanda Lima Kastensmidt 0 1 short 10 open Direct connections: Hex connections: open short 0 1 1 1 SEU in SRAM-based FPGAs: Routing short open

87 Prof. Fernanda Lima Kastensmidt Other sensitive structures Digital Clock Manager (DCM) Power-on Reset (POR) Input and Output Blocks (IOB) Low probability of occurrence Signature: done pin transitions low, I/O becomes tri- stated, no user functionality available Solution: reconfigure device Single-Event-Functional Interrupts (SEFI) SelectMAP and JTAG controllers Low probability of occurrence Signature: loss of communication, read access to configuration memory returns constant value. Solution: reconfigure device Power-PC Hard IP Multi-Gigabit Transceivers (MGT)

88 Prof. Fernanda Lima Kastensmidt SEE Characterization – Heavy Ion: Static Testing in Virtex4 BRAMs present higher error cross-section compared to CLBs Error cross-section of POR in Virtex4 has improved compared to Virtex-II. [George, et al. IEEE Radiation Effects Data Workshop, 2006]

89 Prof. Fernanda Lima Kastensmidt Scrubbing (full or partial reconfiguration) Scrubbing Hardware Description Language configuration bitstream … 101001110100000111… TMR by hand ISE tool Synthesis optimizations Logic mapping Placement Routing ISE tool Placement Routing Fault Injection (fault tolerance verification) 10101011.. output

90 Prof. Fernanda Lima Kastensmidt Scrubbing: continuous configuration SRAM-based FPGA OSC INIT DONE CCLK OE/RESET CLK XQR18V04 DATA[7:0] CE WR GND OE/RESET CLK XQR18V04 DATA[7:0] CE I/O GND CS BOOT SCRUB No application interruption PROM It does not correct upsets in: - Embedded Memory (BRAM) - CLB flip-flops 00000001010 10101010100 10101010010 10101010101 01010100101 11111111101 11100000000 11101010101 10101010101 00101000010 10001101010 00000001010 10101010100 10101010010 10101000101 01010100101 11111111101 11100000000 11101010101 10101010101 00101000010 I/O SCRUB Controller I/O Configuration bits Original bitstream

91 Prof. Fernanda Lima Kastensmidt Configuration Scrubbing Example: to correct persistent effect faults Scrub Column x Configuration Upset

92 Prof. Fernanda Lima Kastensmidt Scrub Column Configuration Upset Repaired Scrubbing rate is important to reduce the probability of multiple upsets. Scrubbing can be performed: – from outside the FPGA by another FPGA controller – from inside the FPGA: Hardware Internal Configuration Access Port (HWICAP) Configuration Scrubbing Example: to correct persistent effect faults

93 Prof. Fernanda Lima Kastensmidt Scrubbing (full or partial reconfiguration) Mitigation Techniques Hardware Description Language configuration bitstream … 101001110100000111… TMR by hand ISE tool Synthesis optimizations Logic mapping Placement Routing ISE tool Placement Routing Fault Injection (fault tolerance verification) 10101011.. output

94 Prof. Fernanda Lima Kastensmidt X-TMR Full TMR in: Combinational logic Sequential Logic Inputs/Output pads INPUT package PIN REDUNDANT LOGIC (tr0) REDUNDANT LOGIC (tr1) REDUNDANT LOGIC (tr2) OUTPUT package PIN TMR flip-flop TMR Output Voter FPGA REDUNDANT LOGIC (tr0) REDUNDANT LOGIC (tr1) REDUNDANT LOGIC (tr2) TMR flip-flop REDUNDANT LOGIC (tr0) REDUNDANT LOGIC (tr1) REDUNDANT LOGIC (tr2) Why do we need full TMR? To guarantee the correct output in the presence of the persistent effect errors that are corrected only by loading the correct bitstream.

95 Prof. Fernanda Lima Kastensmidt MAJ clk0 clk1 clk2 TMR flip-flop INPUT package PIN REDUNDANT LOGIC (tr0) REDUNDANT LOGIC (tr1) REDUNDANT LOGIC (tr2) OUTPUT TMR flip-flop TMR Output Voter FPGA REDUNDANT LOGIC (tr0) REDUNDANT LOGIC (tr1) REDUNDANT LOGIC (tr2) TMR flip-flop REDUNDANT LOGIC (tr0) REDUNDANT LOGIC (tr1) REDUNDANT LOGIC (tr2) LUT: 00010111_00010111 R0 R1 R2 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 MAJ 0 1 0 1 tr0 tr1 tr2 The recovery path is mandatory to correct the state of the flip-flops, specially in FSM.

96 Prof. Fernanda Lima Kastensmidt INPUT package PIN REDUNDANT LOGIC (tr0) REDUNDANT LOGIC (tr1) REDUNDANT LOGIC (tr2) OUTPUT package PIN TMR flip-flop TMR Output Voter FPGA REDUNDANT LOGIC (tr0) REDUNDANT LOGIC (tr1) REDUNDANT LOGIC (tr2) TMR flip-flop REDUNDANT LOGIC (tr0) REDUNDANT LOGIC (tr1) REDUNDANT LOGIC (tr2) R0 R1 R2 O_voter R2 R1 R0 R0 R1 R2 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 MAJ 0 1 0 REF LUT: 00011000_00011000 3-state_0 3-state_1 3-state_2 0: it allows the data to pass to the output pad. 1: it blocks the data

97 Prof. Fernanda Lima Kastensmidt Evaluating TMR I/O pads Inputs at 66 MHz [Swift et al, IEEE TNS 2004]

98 Prof. Fernanda Lima Kastensmidt Heavy Ion [Swift et al., IEEE TNS 2004] Evaluating TMR I/O pads

99 Prof. Fernanda Lima Kastensmidt Evaluating Multiple Bit Upsets 220nm CMOS 130nm CMOS Heavy ion radiation static test: [Quinn, et al., IEEE TNS, 2005] Virtex Family Virtex II Family

100 Prof. Fernanda Lima Kastensmidt Bit-flips in the routing can generate short cut connections among different blocks of the TMR (tr0, tr1 and tr2). INPUT package PIN REDUNDANT LOGIC (tr0) REDUNDANT LOGIC (tr1) REDUNDANT LOGIC (tr2) OUTPUT package PIN TMR register with voters and refresh tr0 tr1 tr2 TMR Output Majority Voter FPGA a Bit-flip a: affects only the redundant logic tr0, consequently, the majority voter choose the correct result (two out of three outputs). Domain Crossing Events X OK

101 Prof. Fernanda Lima Kastensmidt Bit-flips in the routing can generate short cut connections among different blocks of the TMR (tr0, tr1 and tr2). INPUT package PIN REDUNDANT LOGIC (tr0) REDUNDANT LOGIC (tr1) REDUNDANT LOGIC (tr2) OUTPUT package PIN TMR register with voters and refresh tr0 tr1 tr2 TMR Output Majority Voter FPGA b Bit-flip b: affect two redundant logic parts, consequently, the majority voter will not choose the correct result (two out of three outputs). Domain Crossing Events OK X X X

102 Prof. Fernanda Lima Kastensmidt Solution to Reduce Domain Crossing Events Voters Insertion: Barrier of voters can reduce the probability of a bit-flip in the routing causing a short cut connection among two or more redundant blocks. INPUT package PIN REDUNDANT LOGIC (tr0) REDUNDANT LOGIC (tr1) REDUNDANT LOGIC (tr2) tr0 tr1 tr2 OUTPUT package PIN tr0 tr1 tr2 TMR register with voters and refresh TMR Majority Voter tr0 tr1 tr2 TMR Majority Voter TMR Output Majority Voter FPGA logic partition [Kastensmidt, et al., DATE 2005] b OK X X OKOK

103 Prof. Fernanda Lima Kastensmidt Upsets in BRAMs are not corrected by scrubbing. TMR with refreshing must be used to mitigate upsets. Need to use Dual Port BRAMs. Mechanism to refresh the memory contents –Counter –Voters TMR BRAM (Embedded memory) X OK

104 Prof. Fernanda Lima Kastensmidt Scrubbing (full or partial reconfiguration) Verifying the Mitigated Design Hardware Description Language configuration bitstream … 101001110100000111… TMR by hand ISE tool Synthesis optimizations Logic mapping Placement Routing ISE tool Placement Routing Fault Injection (fault tolerance verification) 10101011.. output checking

105 Prof. Fernanda Lima Kastensmidt Flash-based: Actel ProASIC3

106 Prof. Fernanda Lima Kastensmidt Flash-based FPGA: CLB tile

107 Prof. Fernanda Lima Kastensmidt Summary Antifuse FPGAs: - Fault tolerance techniques applied in VHDL/Verilog - protect SET (SEU is protected by the vendor) SRAM FPGA - Fault tolerance techniques applied in VHDL/Verilog - Scrubbing to clean persistent faults - protect SET and SEU - New FPGA protected by Vendor is coming out! Flash FPGA - Fault tolerance techniques applied in VHDL/Verilog - protect SEU and SET - Flash transistor sensitivity for SEE is low, still under Investigation

108 Prof. Fernanda Lima Kastensmidt Outline Radiation Effects on Digital ICs Radiation Hardening by Design: Strategies for ASICs Radiation Effects on FPGAs Radiation Hardening by Design: Strategies for FPGAs Final Remarks

109 Prof. Fernanda Lima Kastensmidt Final Remarks Mitigation techniques for ASICs and FPGAs must take into account SEUs and SETs considering single and multiple effects. ASICs: Integrated systems fabricated at nanometer technologies should have mitigation techniques at different levels to ensure robustness: – charge dissipation (transistor resizing, capacitors, resistors) – Sensors (bulk-BICS) – hardware and time redundancy – Error-correction codes (ECCs) – Self-checking and recomputation

110 Prof. Fernanda Lima Kastensmidt Final Remarks FPGAs: new FPGA generations bring more flexibility and design capabilities but also more reliable design challenges. The design can always be protected by high level techniques (VHDL, VERILOG) such as TMR. In order to reduce the cost of TMR, solutions at the FPGA architectural level must be done in: – CLB logic: Combinational blocks Sequential blocks Programmable switches – Routing programmable switches … to mitigate against SEU and SET!

111 Prof. Fernanda Lima Kastensmidt Conferences NSREC – IEEE Nuclear and Space Radiation Effects Conference www.nsrec.com RADECS European Conference on Radiation Effects on Components and Systems www.radecs.org 2011- RADECS in Sevilla, SPAIN

112 Prof. Fernanda Lima Kastensmidt Schools SERESSA First: 2006 - Manaus - Brazill Second: 2007 - Sevilla - Spain Third: 2008 - Buenos Aires - Argentina Fourth: 2009 - Florida, USA 2010 - France 2011 - Brazil

113 Prof. Fernanda Lima Kastensmidt

114 SEE Mitigation Strategies for Digital Circuit Design Applicable to ASIC and FPGAs Fernanda Lima Kastensmidt, Ph.D. fglima@inf.ufrgs.br


Download ppt "SEE Mitigation Strategies for Digital Circuit Design Applicable to ASIC and FPGAs Prof. Fernanda Lima Kastensmidt, Ph.D. Instituto de Informatica Universidade."

Similar presentations


Ads by Google