Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Engineering Group Brandenburg University of Technology at Cottbus 1 Ressource Reduced Triple Modular Redundancy for Built-In Self-Repair in VLIW-Processors.

Similar presentations


Presentation on theme: "Computer Engineering Group Brandenburg University of Technology at Cottbus 1 Ressource Reduced Triple Modular Redundancy for Built-In Self-Repair in VLIW-Processors."— Presentation transcript:

1 Computer Engineering Group Brandenburg University of Technology at Cottbus 1 Ressource Reduced Triple Modular Redundancy for Built-In Self-Repair in VLIW-Processors Mario Schölzel

2 Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 2 Outline Why Built-In Self-Repair? Base Architecture Resource Reduced TMR Program Modifications Architecture Modifications Conclusions and Limitations

3 Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 3 Why Built-In Self-Repair ? Hardware becomes unreliable (permanent faults due to small feature size) ITRS Roadmap 2005 for Design predicts requirement for reliable systems due to: –Infeasibility of full functional test at manufacturing exit –Relaxing 100% correctness requirement (reduces functional test complexity and cost) Consequence: Redundancy in the system is required for robustness!

4 Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 4 Simple TMR-Approach Processor 1 Processor 2 Processor 3 Voter InputOutput We consider the following application domain: High-performance signal processing applications (i.e. image- and audio-processing) Real-Time demands

5 Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 5 Basic Processor Architecture Data Path Register File BranchFU1 Extern FU n Control Path Control Logic Program MemoryData Memory Instruction Pointer...

6 Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 6 Idea of Resource Reduced TMR Redundant operators are naturally available in a VLIW data path In TMR: Three results are only necessary in case of a mismatch of two results Idea of RR-TMR: Perform every operation only by two operators and use in non-fault case third operator for executing regular operations

7 Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 7 Modified VLIW Data Path Limitation: Every operator must be available at least three times.

8 Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 8 + + + + * **** * Program Transformation Duplicated Operations + + **** * Pair of Reference Operations

9 Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 9 Modified Part of Instruction Word RefFU: number of FU that executes reference operation Mod=0: RefReg is target register in TRF Mod=1: RefReg delivers reference value from TRF These fields must be set correctly for every operation and its duplicate after scheduling all operations (We allow scheduling of original and duplicate operations at different times) opcodesrc1 2dstRefREGRefFUmod

10 Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 10 Example: Instruction Word + + Time step 8 Time step 10 FU 2FU 3 Time step 9 Result of Scheduling Corresponding Instruction Words …… +R3R6R00R63 OpCSrc1Src2DstmodRRegRFU +R3R6R01R62 Instr. 8 Instr. 9 Instr. 10 OpCSrc1Src2DstmodRRegRFU Instruction Word Part of FU 2Instruction Word Part of FU 3

11 Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 11 FD&C Logic Details Every bit represents fault status of corresponding operator Opcode of currently executed operation in corresponding FU Compares current result and reference value from register RefReg in TRF Decides whether an error occurs first time or not and gives a signal to Voting Logic Detects, if current result is faulty

12 Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 12 Example: Correct Execution + R3 R6 R00 R63 OpCSrc1Src2DstmodRRegRFU +R3R6R01R62 Instr. 8 Instr. 9 Instr. 10 OpCSrc1Src2DstmodRRegRFU Instruction Word Part of FU 2Instruction Word Part of FU 3 0 00 0

13 Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 13 Example: FU 2 is Faulty + R3 R6 R00 R63 OpCSrc1Src2DstmodRRegRFU +R3R6R01R62 Instr. 8 Instr. 9 Instr. 10 OpCSrc1Src2DstmodRRegRFU Instruction Word Part of FU 2Instruction Word Part of FU 3 1 11 0

14 Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 14 Example: FU 3 is Faulty + R3 R6 R00 R63 OpCSrc1Src2DstmodRRegRFU +R3R6R01R62 Instr. 8 Instr. 9 Instr. 10 OpCSrc1Src2DstmodRRegRFU Instruction Word Part of FU 2Instruction Word Part of FU 3 0 00 1

15 Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 15 Example: Fault Detection (1) + R3 R6 R00 R63 OpCSrc1Src2DstmodRRegRFU +R3R6R01R62 Instr. 8 Instr. 9 Instr. 10 OpCSrc1Src2DstmodRRegRFU Instruction Word Part of FU 2Instruction Word Part of FU 3 0 00 0

16 Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 16 Example: Fault Detection (2) +R3R6R01R62 OpCSrc1Src2DstmodRRegRFU Executing mismatch causing operation of FU 3 again in another FU. One of the following two cases applies: 0 +R3R6R01R62 OpCSrc1Src2DstmodRRegRFU 1 No mismatch is discovered. FU 2 and FU 4 computed correct result. Suppress Write-Back of FU 3 A mismatch is discovered again. It is assumed that FU 3 computed correct result. This is written to register file.

17 Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 17 Details FD&C-Logic Select a certain control word (normal: cs 1 ) Current operation mode (normal, voting, resume) Select control signals of fault causing operation Redirect selected signals to a working FU Remember faulty operators Control of (De-)Multiplexers

18 Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 18 Example: FD&C-Logic Example ScheduleSituation of FD&C-Logic * + Instruktion1 (EX) * & - & * - * Instruktion2 (Fetch) Instruktion3 Fault is reported normal

19 Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 19 Example: FD&C-Logic Example ScheduleSituation of FD&C-Logic * + Instruktion1 (WB) * & - & * - * Instruktion2 (EX, stopped) Instruktion3 (Fetched, stopped) Voting

20 Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 20 Example FD&C-Logic Resume starts here Resume

21 Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 21 Limitations in Error Detection + + Fu1Fu2Fu3 … Assumption: Operator + in FU 1 is faulty. Problem: Correctness of Operator + in FU 2 can no longer be checked! + + Solution: Check correctness of FU 2 with a reference operation in FU 3.

22 Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 22 Preliminary Results

23 Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 23 Preliminary Results

24 Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 24 Conclusion Method can detect and repair permanent and transient faults Known faults do not cause a delay, new faults cause a delay of at most 2  maxLat+1 Multiple known faults can be repaired (as long as at least on operation of every pair is executed by a non-faulty FU) Overhead of operators and register file ports of approximately 100% Overhead of Control-Logic is unknown so far (VHDL model is missing)

25 Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 25 Open Problems Handling of multiple faults that first occur at the same time is possible but difficult Faults in wires, registers, control path and FD & C logic Hardware implementation for better area and performance estimation

26 Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 26 Thank You!


Download ppt "Computer Engineering Group Brandenburg University of Technology at Cottbus 1 Ressource Reduced Triple Modular Redundancy for Built-In Self-Repair in VLIW-Processors."

Similar presentations


Ads by Google