Prof. Fernanda Lima Kastensmidt, Ph.D. Instituto de Informatica

Slides:



Advertisements
Similar presentations
PhD Student: Carlos Arthur Lang Lisbôa Advisor: Luigi Carro VLSI-SoC PhD Forum Low overhead system level approaches to deal with multiple and long.
Advertisements

IHP Im Technologiepark Frankfurt (Oder) Germany IHP Im Technologiepark Frankfurt (Oder) Germany ©
1January 18, 2006irk Rich Katz, Grunt Engineer NASA Office of Logic Design Some SEE Testing Considerations for the RTAX-S Series Devices.
Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs
Melanie Berg MEI Technologies/NASA GSFC
FPGA (Field Programmable Gate Array)
Introduction to Programmable Logic John Coughlan RAL Technology Department Electronics Division.
Baloch 1MAPLD 2005/1024-L Design of a ‘Single Event Effect’ Mitigation Technique for Reconfigurable Architectures SAJID BALOCH Prof. Dr. T. Arslan 1,2.
Sana Rezgui 1, Jeffrey George 2, Gary Swift 3, Kevin Somervill 4, Carl Carmichael 1 and Gregory Allen 3, SEU Mitigation of a Soft Embedded Processor in.
+ CS 325: CS Hardware and Software Organization and Architecture Internal Memory.
Scrubbing Approaches for Kintex-7 FPGAs
Radiation Effects on FPGA and Mitigation Strategies Bin Gui Experimental High Energy Physics Group 1Journal Club4/26/2015.
HPEC 2012 Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing Quinn Martin Alan George.
1 Fault Tolerant FPGA Co-processing Toolkit Oral defense in partial fulfillment of the requirements for the degree of Master of Science 2006 Oral defense.
Single Event Upsets (SEUs) – Soft Errors By: Rajesh Garg Sunil P. Khatri Department of Electrical and Computer Engineering, Texas A&M University, College.
April 30, Cost efficient soft-error protection for ASICs Tuvia Liran; Ramon Chips Ltd.
ICAP CONTROLLER FOR HIGH-RELIABLE INTERNAL SCRUBBING Quinn Martin Steven Fingulin.
2007 MURI Review The Effect of Voltage Fluctuations on the Single Event Transient Response of Deep Submicron Digital Circuits Matthew J. Gadlage 1,2, Ronald.
DC/DC Switching Power Converter with Radiation Hardened Digital Control Based on SRAM FPGAs F. Baronti 1, P.C. Adell 2, W.T. Holman 2, R.D. Schrimpf 2,
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
The Spartan 3e FPGA. CS/EE 3710 The Spartan 3e FPGA  What’s inside the chip? How does it implement random logic? What other features can you use?  What.
Programmable logic and FPGA
Embedded Systems Laboratory Informatics Institute Federal University of Rio Grande do Sul Porto Alegre – RS – Brazil SRC TechCon 2005 Portland, Oregon,
February 4, 2002 John Wawrzynek
EE 261 – Introduction to Logic Circuits Module #8 Page 1 EE 261 – Introduction to Logic Circuits Module #8 – Programmable Logic & Memory Topics A.Programmable.
Radiation Effects and Mitigation Strategies for modern FPGAs 10 th annual workshop for LHC and Future experiments Los Alamos National Laboratory, USA.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
Power Reduction for FPGA using Multiple Vdd/Vth
A comprehensive method for the evaluation of the sensitivity to SEUs of FPGA-based applications A comprehensive method for the evaluation of the sensitivity.
FPGA IRRADIATION and TESTING PLANS (Update) Ray Mountain, Marina Artuso, Bin Gui Syracuse University OUTLINE: 1.Core 2.Peripheral 3.Testing Procedures.
EGRE 427 Advanced Digital Design Figures from Application-Specific Integrated Circuits, Michael John Sebastian Smith, Addison Wesley, 1997 Chapter 4 Programmable.
Open Discussion of Design Flow Today’s task: Design an ASIC that will drive a TV cell phone Exercise objective: Importance of codesign.
1 5. Application Examples 5.1. Programmable compensation for analog circuits (Optimal tuning) 5.2. Programmable delays in high-speed digital circuits (Clock.
SiLab presentation on Reliable Computing Combinational Logic Soft Error Analysis and Protection Ali Ahmadi May 2008.
1 Moore’s Law in Microprocessors Pentium® proc P Year Transistors.
J. Christiansen, CERN - EP/MIC
Programmable Logic Devices
ATMEL ATF280E Rad Hard SRAM Based FPGA SEE test results Application oriented SEU Sensitiveness Bernard BANCELIN ATMEL Nantes SAS, Aerospace Business Unit.
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
MAPLD 2005/202 Pratt1 Improving FPGA Design Robustness with Partial TMR Brian Pratt 1,2 Michael Caffrey, Paul Graham 2 Eric Johnson, Keith Morgan, Michael.
An Unobtrusive Debugging Methodology for Actel AX and RTAX-S FPGAs Jonathan Alexander Applications Consulting Manager Actel Corporation MAPLD 2004.
SET Fault Tolerant Combinational Circuits Based on Majority Logic
LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering.
1/14 Merging BIST and Configurable Computing Technology to Improve Availability in Space Applications Eduardo Bezerra 1, Fabian Vargas 2, Michael Paul.
Using Memory to Cope with Simultaneous Transient Faults Authors: Universidade Federal do Rio Grande do Sul Programa de Pós-Graduação em Engenharia Elétrica.
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Basic FPGA architecture.
Eduardo L. Rhod, Álisson Michels, Carlos A. L. Lisbôa, Luigi Carro ETS 2006 Fault Tolerance Against Multiple SEUs using Memory-Based Circuits to Improve.
Introduction to Field Programmable Gate Arrays Lecture 1/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May – 9 June 2007 Javier.
Using reconfigurable FPGAs in radioactive environments: challenges and possible solutions Massimo Violante Politecnico di Torino Dip. Automatica e Informatica.
In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.
Paper by F.L. Kastensmidt, G. Neuberger, L. Carro, R. Reis Talk by Nick Boyd 1.
Introduction to Field Programmable Gate Arrays (FPGAs) EDL Spring 2016 Johns Hopkins University Electrical and Computer Engineering March 2, 2016.
A Novel, Highly SEU Tolerant Digital Circuit Design Approach By: Rajesh Garg Sunil P. Khatri Department of Electrical and Computer Engineering, Texas A&M.
Xilinx V4 Single Event Effects (SEE) High-Speed Testing Melanie D. Berg/MEI – Principal Investigator Hak Kim, Mark Friendlich/MEI.
MAPLD 2005/213Kakarla & Katkoori Partial Evaluation Based Redundancy for SEU Mitigation in Combinational Circuits MAPLD 2005 Sujana Kakarla Srinivas Katkoori.
Sequential Programmable Devices
Memories.
An Unobtrusive Debugging Methodology for Actel AX and RTAX-S FPGAs
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
CFTP ( Configurable Fault Tolerant Processor )
SEU Mitigation Techniques for Virtex FPGAs in Space Applications
Electronics for Physicists
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Week 5, Verilog & Full Adder
Sequential circuits and Digital System Reliability
Design of a ‘Single Event Effect’ Mitigation Technique for Reconfigurable Architectures SAJID BALOCH Prof. Dr. T. Arslan1,2 Dr.Adrian Stoica3.
Electronics for Physicists
Hardware Assisted Fault Tolerance Using Reconfigurable Logic
Xilinx Kintex7 SRAM-based FPGA
Presentation transcript:

SEE Mitigation Strategies for Digital Circuit Design Applicable to ASIC and FPGAs Prof. Fernanda Lima Kastensmidt, Ph.D. Instituto de Informatica Universidade Federal do Rio Grande do Sul Porto Alegre – RS – Brazil

Motivation A large set of electronics devices used in avionic, space and ground-level applications can be upset by ionized particles. General System FPGA  memory   processors      ASIC  Analog electronics   high reliability X low reliability Hardened components COTS components $$$$$$$$$$$$$$ $$$

Motivation Solution I: If it is too expensive, so the solution may be design your own hardened device! Which fault tolerance techniques should be used? How much fault tolerance is enough? It is necessary to qualify your hardened design. high reliability Hardened components $$$$$$$$$$$$$$

Motivation Solution II: It is necessary to qualify the device to analyze its robustness to the application! Is it possible to apply some fault tolerance technique? Software level Component replication level low reliability COTS components $$$

Types of SEE Single event phenomena can be classified into three effects (in order of permanency): Single event upset and Single event transient (soft error) Single event latchup (soft or hard error) Single event burnout (hard failure) Hard errors or Single Event Latchup (SEL) are due to shorts between ground and power, and cause permanent functional damages.

Collected Charge Depending on the circuit, transistor size, charge energy, different current amplitude, duration and shapes will appear.

Charge Collection Mechanism IC Ic IP Ip Ion IC(t) = ICRITICAL(t) = IP(t) – ION(t) Soft Error occurs when Qcollected > Qcritical

Fault Tolerance Fault Masking: any technique that prevents faults from introducing errors to the output (failure) + - FAILURE - + + - ionization

Fault Tolerance - + + + - - FAILURE ERROR Fault latency Error latency shielding Transient current (injected or extracted from the junction) ERROR clk BIT-FLIP FAULT EFFECT Transient voltage pulse (capacitor node) FAULT + - FAILURE - + + - ionization Fault Masking (hardening by design): Hardware and time redundancy Hardened memory cells Error-correction codes Self-checking mechanisms with recovery Sensors (detection) 9

Fault Tolerance - + + + - - FAILURE ERROR Redundant Spare components Fault latency Error latency Transient current (injected or extracted from the junction) ERROR clk BIT-FLIP FAULT EFFECT Transient voltage pulse (capacitor node) FAULT + - FAILURE - + + - Redundant Spare components ionization Fault Masking (hardening by design): Hardware and time redundancy Hardened memory cells Error-correction codes Self-checking mechanisms with recovery Sensors (detection) Number of faults overcome the mitigation technique

Outline Radiation Effects on Digital ICs Radiation Hardening by Design: Strategies for ASICs Radiation Effects on FPGAs Radiation Hardening by Design: Strategies for FPGAs Final Remarks

Outline Radiation Effects on Digital ICs Radiation Hardening by Design: Strategies for ASICs Radiation Effects on FPGAs Radiation Hardening by Design: Strategies for FPGAs Final Remarks

Single Event Effects (SEEs) Transient Effect Single Event Upset (SEU): bit-flip in a sequential logic element Digital Single Event Transient (DSET): transient voltage pulse in the combinational logic 1 1 1 1 1 1 Combinational logic sequential logic sequential logic 13

SEU in Sequential Logic WL WL OFF OFF OFF 1 1 OFF gnd BIT-FLIP N N ionization P P 14

Hardened Memories Approach 1: use decoupling resistors to slow the cell regenerative feedback response avoiding the bit-flip [Rocket, R., IEEE TNS, 1992]

Hardened Memory Approach 2: add transistors to create an appropriate feedback devoted to restore the data corrupted. IBM Memory Cell [Rockett cell, 88] HIT Memory Cell (Velazco, 92]

Hardened Memories The principle is to store the data in two different locations within the cell in such way that the corrupted part can be restored. Whitaker/Liu Memory Cell [Liu, 92] DICE Memory Cell [Calin, 96]

Dual Interlocked storage Cell (DICE) clk clk OFF 1 1 OFF Qa Qb

Dual Interlocked storage Cell (DICE) clk clk OFF OFF 1 1 OFF OFF Qa Qb

Dual Interlocked storage Cell (DICE) The original value is restored clk clk OFF OFF 1 1 OFF OFF OFF Qa Qb

Challenges in Sequential Logic MULTIPLE BIT UPSETS + -  + - + - Particle incidence angle Transistor Dimensions Voltage Supply Memory Array Density + - Multiple memory cells Single memory cell

Charge Sharing (NMOS transistor) T=50ps T=100ps T=250ps T=800ps T=2ns [Reed, et al., New Electronic Technologies Insertion into Flight Programs Workshop, 2007]

Limitations of Hardened Memory Multiple nodes collecting charge are able to upset hardened memory cells. Solutions: Shallow Trench Isolation (STI) structures Suitable transistors placement and routing Hardened memory cells combined with hardware redundancy. + - - + - + + - + - - + + - ionization

Triple Modular Redundancy inputs MAJ X Sequential logic 000 0 001 0 010 0 011 1 100 0 101 1 110 1 111 1 MAJ Combinational logic OK clk Each master-slave flip-flip can be composed of: standard latches: robust to multiple node collected charge in the same latch hardened latches: robust to multiple node collected charge in crossing domain latches too

Triple Modular Redundancy inputs MAJ X 0 X 1 Sequential logic 000 0 001 0 010 0 011 1 100 0 101 1 110 1 111 1 Combinational MAJ logic X clk Voter’s output can show a transient wrong value that may be captured by the next memory cell.

Triple Modular Redundancy MAJ Sequential logic Combinational logic MAJ OK OK Current strength MAJ Increases current drive helping keeping the node in the original value. OK clk Triple MAJ voter

Triple Modular Redundancy inputs MAJ Sequential logic 000 0 001 0 010 0 011 1 100 0 101 1 110 1 111 1 X X Combinational logic Triple MAJ voter clk Catastrophic effect: the system votes three wrong values out of three and the result is assumed to be correct.

SET in Combinational Logic Each node has an associated: Capacitance Resistance SET pulse Amplitude x Width Critical Charge QCRIT current QDrift Qdiffusion Charge Qi … time

SET in Combinational Logic Not all SETs are captured by a memory cell. They can be: Logical masked Electrical masked Latch window masked Logical masked 1 e0 e1 1 Q 1 1 e2 a3

SET in Combinational Logic Not all SETs are captured by a memory cell. They can be: Logical masked Electrical masked Latch window masked Electrical masked 1 e0 e1 Negligible pulse 1 Q 1 e2 a3 1

SET in Combinational Logic Not all SETs are captured by a memory cell. They can be: Logical masked Electrical masked Latch window masked Latch window masked 1 e0 e1 1 Q 1 e2 a3 1 clk edge

Electrical Masking Heavy Ion Radiation Results: 180nm CMOS Pulse too narrow!!! [Bruguier, G., et al., IEEE TNS, 1996]

SET vs. Frequency Radiation Results: DSET for 180nm vs. Freq Freq. clk [Benedetto et al, IEEE TNS, 2004]

Challenges in Combinational Logic SET Transient Width (TW) may vary from few hundred of pico seconds to few nano seconds according to LET. TW TW 500 Mhz clk 1Ghz 2.5 GHz Critical Transient Width (ps) 5Ghz TW clk 100 Ghz 100 Process technology (nm) [Dodd, P., IEEE TNS 2004] clk TW

SET vs. SEU Error Rate

Challenges in Combinational Logic According to the logic topology fan-out, a single SET may originate multiple SETs. a0 y0 Q0 a1 a2 a3 X a4 a5 y1 Q1 X

Identifying the most sensitive nodes Fault injection performed by electrical (SPICE) and logic simulations can identify the most sensitive nodes: Lower critical charge (QCRIT) Lower SET logical mask probability most sensitive nodes A B C D E F Z

Transistor Resizing QCRITICAL [Zhou et al., IRPS 2004] [Cazeaux et al., IOLTS 2005] [Dhillon et al., IEEE Transaction on ISVLSI 2006] most sensitive nodes A B C D E F Z QCRITICAL

Gate Replication Current strength [Lisboa, C., et al., SBCCI 2005] [Nieuwland et al., IOLTS 2006] most sensitive nodes A B C D E F Z Current strength Increases current drive helping keeping the node in the original value.

Temporal Filtering Votes the SET out by time redundancy. The time redundancy is implemented by delays at the clock lines or at the latch/flip-flops inputs. Sequential logic Sequential logic Combinational Combinational clk logic Triple or Single MAJ voter logic Triple or Single MAJ voter X T X clk+ T OK OK 2.T clk+ 2.T clk [Nicolaidis, VTS 1999], [Anghel et al., DATE 2000]

Full time redundancy [Nicolaidis, VTS 1999] [Anghel et al., DATE 2000] clk [Nicolaidis, VTS 1999] [Anghel et al., DATE 2000] clk+T TW clk+2.T Sequential logic SET Combinational comb clk logic Triple or Single MAJ voter OK X clk+T ffp0 ffp1 clk+ 2.T ffp2 The .T is directly proportional to the SET Transient Width (TW) MAJ MAJ + comb delays T

Full time redundancy TW clk period (T) X clk clk+2.T clk+4.T SET Sequential logic comb Combinational clk Triple or Single MAJ voter logic OK X clk+2.T ffp0 ffp1 clk+4.T ffp2 TW clk period (T) MAJ MAJ + comb delays T

Temporal Latching to Trigger SETs Error cross-section decreases with the increase of T .T [Benedetto et al., IEEE TNS 2004]

Triple Sample Memory Robust to Multiple Bit Upsets and SET [MAVIS, IRPS 2002] combinational logic Shifted clocks

Triple Sample Memory Robust to Multiple Bit Upsets and SET [MAVIS, IRPS 2002] combinational logic X OK Shifted clocks

Triple Sample Memory Robust to Multiple Bit Upsets and SET [MAVIS, IRPS 2002] Multiple nodes collected charge X combinational logic OK OK Shifted clocks OK

Triple Sample Memory Robust to Multiple Bit Upsets and SET [MAVIS, IRPS 2002] Multiple nodes collected charge OK combinational logic OK OK Shifted clocks OK

Full Triple Modular Redundancy (TMR) with self-recovery TRV0 OK combinational logic D0 clk0 TR1 voter TR2 E0 TR1 TRV1 D1 X combinational logic clk1 TR0 voter TR2 E1 TR2 TRV2 D2 combinational logic clk2 TR0 voter TR1 E2

Full Triple Modular Redundancy (TMR) with self-recovery TRV0 D0 OK combinational logic clk0 TR1 voter TR2 E0 TR1 TRV1 D1 X combinational logic clk1 TR0 voter TR2 E1 TR2 TRV2 D2 combinational logic clk2 TR0 voter TR1 E2

Full Triple Modular Redundancy (TMR) with self-recovery voter TR0 TR1 TR2 TRV0 TRV1 TRV2 E0 E1 E2 D0 D1 D2 clk0 clk1 clk2 combinational logic output pads output pad wired voter

How much mitigation is enough? The circuits are becoming more and more complex Hardware and Time redundancy techniques can provide a certain level of protection on: Single Event Upsets (SEU) Single Event Transient (SET) Multiple Bits or Nodes Upsets Problem: in some cases multiple faults can overcome the mitigation techniques, provoking a system failure.

Multiple Faults in the Full TMR TR0 TRV0 D0 X combinational logic clk0 TR1 voter TR2 E0 WRONG VALUE TR1 TRV1 D1 X combinational logic clk1 TR0 voter TR2 E1 TR2 TRV2 D2 combinational logic clk2 TR0 voter TR1 E2

How much mitigation is enough? How is it possible to know that the mitigation technique is working properly for a certain Soft Error Rate (SER)? It is necessary to have a mechanism to inform the system when the number of multiple faults have passed a certain level. Built-in Self Test (BIST) Mechanism: sensors working as watch dogs each time an ionization occurs, the system is informed

How about sensors working as watch dogs? Full TMR with sensors voter TR0 TR1 TR2 TRV0 TRV1 TRV2 D0 D1 D2 clk0 clk1 clk2 combinational logic sensors

How about sensors working as watch dogs? Full TMR with sensors voter TR0 TR1 TR2 TRV0 TRV1 TRV2 D0 D1 D2 clk0 clk1 clk2 combinational logic sensors If sensors detect: One upset per time Technique is working!

How about sensors working as watch dogs? Full TMR with sensors voter TR0 TR1 TR2 TRV0 TRV1 TRV2 D0 D1 D2 clk0 clk1 clk2 combinational logic sensors If sensors detect: Two or more upsets in distinct redundant modules per time X Technique is not working!

Bulk Built-in Current Sensors During normal operation, the current in the bulk is approximately zero. When an energetic particle generates an ionization, it creates a current that flows through the stroke node and Vdd or gnd. The bulk-BICS senses the current generated by ionization at the bulk terminal. [Henes Neto et al. IEEE MICRO, 2006] Bulk-BICS + - + - + -

Bulk Built-in Current Sensors Vdd Vdd Flips the BICS latch p6 p1 p2 1 p5 Vdd’ 1 p4 p3 RST P P N ionization BICS-P Vdd Circuit Design n1 n2 nRST Gnd’ n5 n6 n4 n3 BICS-N

Trade-offs There is always some penalty to be paid when protecting circuits against upsets. Each technique may present a combination of: area overhead, performance penalty, power dissipation increase. The challenge is to select the most cost-effective techniques for the target circuit application.

CASE-STUDY: Adder Detection SET SEU = = X X Recomputing with Shifted Operands << >> = S = A + B 2.S = 2.A + 2.B ADDER = Duplication with Comparison (DWC) ADDER Bulk-BICS

CASE-STUDY: Adder SEU correction X X Error-Correction Code (Hamming) enc dec ADDER Hardened Flip-flops

CASE-STUDY: Adder SEU and SET correction TMR with single voter TMR with triple voter ADDER ADDER voter voter voter ADDER ADDER voter ADDER ADDER

CASE-STUDY: Adder SEU and SET correction Time redundancy with TMR in the registers voter voter voter T ADDER voter voter 2.T

AREA vs. PERFORMANCE SEU and SET correction SEU correction More than 200% SEU and SET correction SEU correction Less than 50% Less than 50% SEU and SET detection

How about Qualifying for SEE? Testing by fault injection: Model the SEU and SET effect at: Spice level Logic level or RTL level Testing in a Laser Facility Testing at ground-level facilities (in front of a beam of Protons, heavy ions, neutrons) Testing in space (actual environment) accuracy cost

When testing in a Ground Level facility for SEE: Static Testing: no application is running during the test. The register files are read during or after the test to check for SEU or/and SET and compared to a gold file. Test in memories, microprocessors, ASICs in general Dynamic Testing: Applications are running during test. Outputs are been analyzed and compared to a gold design. SEU and SET can be checked during test Test in memories, microprocessors, ASICs in general, analog circuits, etc…

General System FPGA memory processors ASIC Analog logic

Outline Radiation Effects on Digital ICs Radiation Hardening by Design: Strategies for ASICs Radiation Effects on FPGAs Radiation Hardening by Design: Strategies for FPGAs Final Remarks

Field-Programmable Gate Arrays An array of logic blocks and interconnections customizable by programmable switches. High logic density Customizable by the end user to realize different designs Configurable logic blocks (CLBs) interconnections Switches for customization

Programmable Technologies Programmable switches can be based on: Antifuse: (Antifuses based FPGAs) when an electrically programmable switch forms a low resistance path between two metal layers. One-time configurable SRAM: (SRAM based FPGAs) the state of a static latch controls pass transistors or multiplexers connected to pre-defined metal layers Re-configurable Flash: (Flash based FPGAs) Floating gate controls the switches

Antifuse-based FPGAs Non-volatile: hold the customizable content even when not connected to the power supply. They can be programmed just once. FPGAs products for Space ACTEL AEROFLEX (based on Quicklogic)

ACTEL: RTAX-S device C R Super Cluster RAM CT SC RAMC RD HD C R RX TX B Super Cluster [Actel, RTAX-S RadTolerant FPGAs 2007]

ACTEL: RTAX-S device C C R C-CELL R-CELL Susceptible to SET C-CELL CFN FCI D1 D3 B0 B1 CFN Robust to SEU 1 1 X 1 1 1 1 1 1 1 1 1 ERROR D0 D2 DB A0 A1 Y D0 D2 Y DB A0 A1 FCO [Actel, RTAX-S RadTolerant FPGAs 2007]

Effects of Frequency Response Circuit: Shift Register with 8 levels of C-cell between R-cells Error cross-section increases when frequency increases. # ERROR clk edge [Berg, M. et al., IEEE TNS 2006]

RadHard Eclipse FPGA from Aeroflex ERROR X hardened flip-flops Robust to SEU ViaLink connections

Antifuse FPGAs: summary Customized routing is not sensitive to SEU Flip-flops are not sensitive to SEU Actel and Aeroflex provides one solution where all flip-flops are hardened. Logic are susceptible to DSETs The user may protect the logic by using high level mitigation techniques in the VHDL/VERILOG description of the design (TMR, duplication and others)

SRAM-based FPGAs Volatile: loose their contents information when the memories are not connected to the power supply. They can be reprogrammed as many times as necessary at the work site They are programmed by loading a bitstream FPGAs products for Space XILINX ATMEL HONEYWELL

SRAM-based FPGAs Basic board must be composed of: FPGA The original design bitstream must be stored in a memory outside the FPGA. Memory size needed: Bitstream may range from Kbytes to several Mbytes. Power Supply Core & IO Osc. EEPROM FPGA LOADER & MEMORY FPGA 110101011 Programming Interface IO Interface

Reconfigurability Can offer benefits for space and remote applications by: saving space in the system: the same circuitry can be used with different configurations at different stages of a mission, reducing weight and power requirements. allowing in-orbit design changes reducing the mission cost by correcting errors If part of an FPGA fails, then circuitry can be reprogrammed to make use of remaining functional portions of the chips.

Synthesis optimizations FPGA Design Flow Hardware Description Language Synthesis optimizations Logic mapping Placement Routing configuration bitstream … 101001110100000111…

Technology Scaling in Xilinx FPGAs Nanometer technologies Embedded Hard microprocessor Embedded memories (BRAM)

SRAM-based FPGA Architecture Xilinx FPGA Configurable logic block (CLB) GRM slices A B C D Lookup Table (LUT) ‘0’ 1 Boolean Function F(A,B,C,D) BRAM The configuration memory is spread throughout the device in a large array. Each logic block contains its own configuration cells locally. A data frame is a one bit vertical slice through the array. The configuration logic identifies each frame with a unique address “minor” as well as a unique address for the column that it lies in. Configuration data is loaded serially or in byte-parallel into the interface where it is assembled into a frame in a shift register. The entire frame is loaded into memory all at once.

SEU in SRAM-based FPGAs: CLB slice 1 I1 I2 I3 I4 LUT CLB slice Transient Effect (corrected at next ffp load) routing When we take a look in the FPGA architecture, there are two basic elements to implement the logic in a FPGA, the lookup table that is responsible to implement the logic functions as truth tables and the flip-flops. They are all located in the CLB slices. These basic elements are connected through a programmable routing that are multiplexers or pass transistors controlled by static memory cells. When a SEU occurs in one of these memory cells: the configuration memory bits (LUTs and routing) or in the CLB flip-flop, the stored value is flipped, this creates a fault with a permanent effect in the case of LUT or routing, or a fault with a transient effect in the case of the CLB flip-flop. LUT Persistent effect (corrected by scrubbing) Configuration memory bits

SET in SRAM-based FPGAs : CLB slice LUT 1 SET may be captured by the ffp. 1 1 1 X 1 1 1 1 The transistors that implement the CLB Look-up Table (LUT) logic and routing are also susceptible tom transient faults known as SETs. This type of fault is transient and it only generates an error in the design if the SET pulse is captured by the CLB flip-flop. LUT routing Configuration memory bits

General Routing Matrix (GRM) Direct lines Long lines CLB CLB CLB CLB CLB CLB CLB CLB Hex lines CLB Hex connections CLB CLB CLB CLB CLB CLB CLB CLB CLB Direct connections Fast connect CLB Double lines CLB CLB

SEU in SRAM-based FPGAs: Routing Direct connections: Hex connections: 1 open open short short 1 1 1 1 short open

Other sensitive structures Power-on Reset (POR) Low probability of occurrence Signature: done pin transitions low, I/O becomes tri-stated, no user functionality available Solution: reconfigure device Single-Event-Functional Interrupts (SEFI) SelectMAP and JTAG controllers Low probability of occurrence Signature: loss of communication, read access to configuration memory returns constant value. Solution: reconfigure device Digital Clock Manager (DCM) Power-PC Hard IP Input and Output Blocks (IOB) Multi-Gigabit Transceivers (MGT)

SEE Characterization – Heavy Ion: Static Testing in Virtex4 So, it is common to evaluate the SEU sensitivity of the configuration memory bits, BRAM bits, CLB flip-flops and also the Power on Reset (POR) by static testing. But, what can we do it we want to draw the cross-section of SET in the FPGA logic circuitry that I presented? BRAMs present higher error cross-section compared to CLBs Error cross-section of POR in Virtex4 has improved compared to Virtex-II. [George, et al. IEEE Radiation Effects Data Workshop, 2006]

Scrubbing ISE tool Synthesis optimizations Logic mapping Placement Hardware Description Language TMR by hand ISE tool Synthesis optimizations Logic mapping Placement Routing ISE tool Placement Routing configuration bitstream … 101001110100000111… Scrubbing (full or partial reconfiguration) 10101011.. Fault Injection (fault tolerance verification) output

Scrubbing: continuous configuration It does not correct upsets in: Embedded Memory (BRAM) CLB flip-flops No application interruption XQR18V04 SRAM-based FPGA 10001101010 DATA[7:0] DATA[7:0] BOOT OE/RESET INIT Configuration bits CE DONE 00000001010 10101010100 10101010010 10101010101 01010100101 11111111101 11100000000 11101010101 00101000010 00000001010 10101010100 10101010010 10101000101 01010100101 11111111101 11100000000 11101010101 10101010101 00101000010 GND CLK CS WR I/O PROM I/O Original bitstream SCRUB Controller XQR18V04 DATA[7:0] SCRUB OE/RESET I/O CE I/O GND CLK I/O OSC CCLK

Configuration Scrubbing Example: to correct persistent effect faults Column x Configuration Upset

Configuration Scrubbing Example: to correct persistent effect faults Upset Repaired Scrubbing rate is important to reduce the probability of multiple upsets. Scrubbing can be performed: from outside the FPGA by another FPGA controller from inside the FPGA: Hardware Internal Configuration Access Port (HWICAP) Scrub Column

Mitigation Techniques Hardware Description Language TMR by hand ISE tool Synthesis optimizations Logic mapping Placement Routing ISE tool Placement Routing configuration bitstream … 101001110100000111… Scrubbing (full or partial reconfiguration) 10101011.. Fault Injection (fault tolerance verification) output

X-TMR Full TMR in: Combinational logic Sequential Logic Inputs/Output pads Why do we need full TMR? To guarantee the correct output in the presence of the persistent effect errors that are corrected only by loading the correct bitstream. FPGA REDUNDANT LOGIC (tr0) REDUNDANT LOGIC (tr0) REDUNDANT LOGIC (tr0) INPUT package PIN REDUNDANT LOGIC (tr1) REDUNDANT LOGIC (tr1) TMR flip-flop REDUNDANT LOGIC (tr1) TMR flip-flop TMR Output Voter OUTPUT REDUNDANT LOGIC (tr2) REDUNDANT LOGIC (tr2) REDUNDANT LOGIC (tr2) package PIN

FPGA REDUNDANT LOGIC (tr0) REDUNDANT LOGIC (tr0) REDUNDANT LOGIC (tr0) INPUT package PIN REDUNDANT LOGIC (tr1) REDUNDANT LOGIC (tr1) TMR flip-flop REDUNDANT LOGIC (tr1) TMR flip-flop TMR Output Voter OUTPUT REDUNDANT LOGIC (tr2) REDUNDANT LOGIC (tr2) REDUNDANT LOGIC (tr2) The recovery path is mandatory to correct the state of the flip-flops, specially in FSM. TMR flip-flop tr0 MAJ R0 R1 R2 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 MAJ 1 clk0 tr1 MAJ clk1 tr2 MAJ clk2 LUT: 00010111_00010111

0: it allows the data to pass to the output pad. FPGA REDUNDANT LOGIC (tr0) REDUNDANT LOGIC (tr0) REDUNDANT LOGIC (tr0) INPUT package PIN REDUNDANT LOGIC (tr1) REDUNDANT LOGIC (tr1) TMR flip-flop REDUNDANT LOGIC (tr1) TMR flip-flop TMR Output Voter OUTPUT REDUNDANT LOGIC (tr2) REDUNDANT LOGIC (tr2) REDUNDANT LOGIC (tr2) package PIN REF R0 R1 R2 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 MAJ 1 0: it allows the data to pass to the output pad. 1: it blocks the data R0 3-state_0 O_voter R0 R1 O_voter 3-state_1 R1 LUT: 00011000_00011000 R2 3-state_2 O_voter R2

Evaluating TMR I/O pads [Swift et al, IEEE TNS 2004] Inputs at 66 MHz

Evaluating TMR I/O pads [Swift et al., IEEE TNS 2004] Heavy Ion

Evaluating Multiple Bit Upsets Heavy ion radiation static test: Virtex Family Virtex II Family 220nm CMOS 130nm CMOS [Quinn, et al., IEEE TNS, 2005]

Domain Crossing Events Bit-flips in the routing can generate short cut connections among different blocks of the TMR (tr0, tr1 and tr2). FPGA REDUNDANT LOGIC (tr0) a X OK INPUT package PIN tr0 with voters and refresh TMR register Majority Voter TMR Output REDUNDANT LOGIC (tr1) OUTPUT tr1 REDUNDANT LOGIC (tr2) tr2 package PIN Bit-flip a: affects only the redundant logic tr0, consequently, the majority voter choose the correct result (two out of three outputs).

Domain Crossing Events Bit-flips in the routing can generate short cut connections among different blocks of the TMR (tr0, tr1 and tr2). FPGA REDUNDANT LOGIC (tr0) OK X INPUT package PIN tr0 REDUNDANT LOGIC (tr1) with voters and refresh TMR register Majority Voter TMR Output b OUTPUT tr1 REDUNDANT LOGIC (tr2) tr2 package PIN Bit-flip b: affect two redundant logic parts, consequently, the majority voter will not choose the correct result (two out of three outputs).

Solution to Reduce Domain Crossing Events Voters Insertion: Barrier of voters can reduce the probability of a bit-flip in the routing causing a short cut connection among two or more redundant blocks. [Kastensmidt, et al., DATE 2005] FPGA REDUNDANT LOGIC (tr0) OK X INPUT package PIN tr0 tr0 tr0 REDUNDANT LOGIC (tr1) with voters and refresh TMR register b Majority Voter TMR Output OUTPUT TMR Majority Voter tr1 TMR Majority Voter tr1 tr1 REDUNDANT LOGIC (tr2) tr2 tr2 tr2 package PIN logic partition

TMR BRAM (Embedded memory) Upsets in BRAMs are not corrected by scrubbing. TMR with refreshing must be used to mitigate upsets. Need to use Dual Port BRAMs. Mechanism to refresh the memory contents Counter Voters X OK OK

Verifying the Mitigated Design Hardware Description Language TMR by hand ISE tool Synthesis optimizations Logic mapping Placement Routing ISE tool Placement Routing configuration bitstream … 101001110100000111… Scrubbing (full or partial reconfiguration) 10101011.. Fault Injection (fault tolerance verification) output checking

Flash-based: Actel ProASIC3

Flash-based FPGA: CLB tile

Summary Antifuse FPGAs: - Fault tolerance techniques applied in VHDL/Verilog - protect SET (SEU is protected by the vendor) SRAM FPGA - Scrubbing to clean persistent faults - protect SET and SEU - New FPGA protected by Vendor is coming out! Flash FPGA - protect SEU and SET - Flash transistor sensitivity for SEE is low, still under Investigation

Outline Radiation Effects on Digital ICs Radiation Hardening by Design: Strategies for ASICs Radiation Effects on FPGAs Radiation Hardening by Design: Strategies for FPGAs Final Remarks

Final Remarks Mitigation techniques for ASICs and FPGAs must take into account SEUs and SETs considering single and multiple effects. ASICs: Integrated systems fabricated at nanometer technologies should have mitigation techniques at different levels to ensure robustness: charge dissipation (transistor resizing, capacitors, resistors) Sensors (bulk-BICS) hardware and time redundancy Error-correction codes (ECCs) Self-checking and recomputation

Final Remarks FPGAs: new FPGA generations bring more flexibility and design capabilities but also more reliable design challenges. The design can always be protected by high level techniques (VHDL, VERILOG) such as TMR. In order to reduce the cost of TMR, solutions at the FPGA architectural level must be done in: CLB logic: Combinational blocks Sequential blocks Programmable switches Routing programmable switches … to mitigate against SEU and SET!

Conferences NSREC – IEEE Nuclear and Space Radiation Effects Conference www.nsrec.com RADECS European Conference on Radiation Effects on Components and Systems www.radecs.org 2011- RADECS in Sevilla, SPAIN

Schools SERESSA 2011 - Brazil First: 2006 - Manaus - Brazill Second: 2007 - Sevilla - Spain Third: 2008 - Buenos Aires - Argentina Fourth: 2009 - Florida, USA 2010 - France 2011 - Brazil

SEE Mitigation Strategies for Digital Circuit Design Applicable to ASIC and FPGAs Fernanda Lima Kastensmidt, Ph.D. fglima@inf.ufrgs.br