Presentation is loading. Please wait.

Presentation is loading. Please wait.

Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs

Similar presentations


Presentation on theme: "Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs"— Presentation transcript:

1 Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs
Dan Fisher, Addison Floyd

2 Outline Introduction Fault Detection - Motivation, Methods, etc.
Fault Diagnosis - Motivation, Methods, etc. Fault Tolerance Single FPGA Multiple FPGAs Single Faults Multiple Faults Conclusion

3 Introduction FPGA Background Importance Applications
Motivation for Fault Tolerance

4 Fault Detection - Motivation
Main Causes of Faults Degradation Manufacturing Defects Single Event Upsets(SEUs)

5 Fault Detection - Judgement Criteria
Detection Methods are judged on: Speed of Detection Coverage Resource Overhead Performance Overhead Detection Granularity

6 Fault Detection - Criteria In-Depth
Detection Granularity - how specific one is when detecting an error. FPGA made up of Tiles containing: Logic Blocks Connection Blocks - connect tiles Switch Blocks - connect tiles, allow for direction change

7 Fault Detection - Comparison

8 Fault Detection - SEDC Method
The Method Explained Partition data and Encode with SEDC codes Calculate and Store check bits Generate check bits as circuit operates Compare calculated and generated values Better than Berger and TMR

9 Fault Detection - Nazar Method
CED method providing single error detection Takes advantage of properties of LUTs Major Drawback - LUT insertion Area Improvement over DWC

10 Nazar Method - LUT Properties Explained*
1st Advantage: A LUT can be viewed as combinational circuit independent from others. Area overhead is avoided since you don’t need to replicate sub-expressions that form circuit outputs 2nd Advantage: A K-input LUT can compute any function with up to K inputs. So as long as our selected group is no more than K different inputs than the parity can be calculated using just one LUT. If the selected group also has no more than K-1 different outputs, then the checker can be made of just one LUT(with the last input the parity bit). This picture shows upside-down triangles as LUTs, with a one parity LUT for each K-1 outputs. Also show is the checker which would be composed of just one LUT. Separate LUTs in the same checker group can’t overlap (otherwise they wouldn’t be independent) but in order to provide coverage different checker group LUTs can overlap. *Note:This slide wasn’t in the original presentation but was added to try to better explain the method since some mentioned wanting to know more

11 Fault Detection - Roving Stars
New method for online detection Detected faults do not affect working logic STARs and BISTERs Better than other methods *Picture added after presentation to attempt to help clear up any confusion.

12 Fault Detection - Injection Topic 1
Which modules most sensitive to SEU 1.4% sensitive(83% routing/16% logic) Density matrix

13 Fault Detection - Injection Topic 2
HW module to test efficiency of SEU mitigation schemes How to emulate SEUs - 2 step process Example Results Scrubbing Rate

14 Fault Diagnosis - Roving Stars
Diagnose both interconnect & plb faults Partial Reuse Future - Do we allow for retest of fault?

15 Fault Diagnosis - More Abramovici
BIST-based method in 2000 2004 paper further extending Roving Stars

16 Fault Diagnosis - Niamat - MATS++
Diagnose multiple stuck at faults Use of MATS++ algorithm Goal of speeding up diagnosis

17 Fault Diagnosis - Tahoori’s Method
Diagnose a single fault in interconnect or logic Application Dependent Basic Idea

18 Fault Tolerance Single FPGA platform Multi FPGA platform Single Fault
Multiple Faults

19 Fault Tolerance - Single FPGA
Dynamic Fault Tolerance via Partial Reconfiguration online - handles faulty PLBs without system stopping uses spare logic cells Stroud et al

20 Fault Tolerance - Single FPGA
Online Fault Tolerance for FPGA Logic Blocks reuse defective blocks to increase the number of spares and extend mission life uses commercial CAD tools to implement Stroud et al

21 Fault Tolerance - Single FPGA
Using Relocatable Bitstreams for Fault Tolerance combines passive and active techniques standardized relocatable modules, which are copied and stored Montminy et al

22 Fault Tolerance - Multi FPGA
A Reliable Reconfiguration Controller for Fault-Tolerant Embedded Systems on Multi-FPGA platforms multiple FPGAs in a mesh topology hardening achieved by TMR distributed solution Bolchini et al

23 Fault Tolerance - Single Fault
Designing Fault Tolerant Systems into SRAM-based FPGAs for use in space Duplication with Comparison and Concurrent Error Detection Lima et al

24 Fault Tolerance - Single Fault
TMR and Partial Dynamic Reconfiguration to Mitigate SEU Faults in FPGAs passive Triple Modular Redundancy Bolchini et al

25 Fault Tolerance - Single Fault
IPR: In-Place Reconfiguration for FPGA Fault Tolerance preserves function and topology of LUT-based logic network algorithm applied post- layout Zhe et al

26 Fault Tolerance - Single Fault
A Novel SRAM-Based FPGA Architecture for Efficient TMR Fault Tolerance Support Architectural level augments LUTs with TMR minimize number of reconfigurations Kyriakoulakos et al

27 Fault Tolerance - Multiple Faults
Placement of Repair Circuits for In-Field FPGA Repair utilize unused FPGA resources repair circuits identified before faults occur alternate repair circuits cached locally or remotely Wirthlin et al

28 Fault Tolerance - Multiple Faults
Reconfigurable Fault Tolerance: A Comprehensive Framework for Reliable and Adaptive FPGA- Based Space Computing dynamic self- adaptation high reliability vs. high performance Jacobs et al

29 Fault Tolerance - Multiple Faults
Exploiting Partially Defective LUTs: Why You Don’t Need Perfect Fabrication because of shrinking feature size, transistor variability and failure rates are going up identifies partially defective LUTs for reuse DeHon et al

30 Conclusion Importance of FPGAs FPGA applications
Future of FPGA fault tolerance

31 Questions?


Download ppt "Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs"

Similar presentations


Ads by Google