Computer Engineering Group Brandenburg University of Technology at Cottbus 1 Ressource Reduced Triple Modular Redundancy for Built-In Self-Repair in VLIW-Processors.

Slides:



Advertisements
Similar presentations
1/1/ / faculty of Electrical Engineering eindhoven university of technology Speeding it up Part 3: Out-Of-Order and SuperScalar execution dr.ir. A.C. Verschueren.
Advertisements

NC STATE UNIVERSITY 1 Assertion-Based Microarchitecture Design for Improved Fault Tolerance Vimal K. Reddy Ahmed S. Al-Zawawi, Eric Rotenberg Center for.
1 ECE369 ECE369 Pipelining. 2 ECE369 addm (rs), rt # Memory[R[rs]] = R[rt] + Memory[R[rs]]; Assume that we can read and write the memory in the same cycle.
Anshul Kumar, CSE IITD CSL718 : VLIW - Software Driven ILP Hardware Support for Exposing ILP at Compile Time 3rd Apr, 2006.
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
Data Dependencies Describes the normal situation that the data that instructions use depend upon the data created by other instructions, or data is stored.
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
University of Michigan Electrical Engineering and Computer Science 1 A Distributed Control Path Architecture for VLIW Processors Hongtao Zhong, Kevin Fan,
Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Computer Organization and Architecture
Fault-Tolerant Systems Design Part 1.
Computer Organization and Architecture
Fault Detection in a HW/SW CoDesign Environment Prepared by A. Gaye Soykök.
Room: E-3-31 Phone: Dr Masri Ayob TK 2123 COMPUTER ORGANISATION & ARCHITECTURE Lecture 5: CPU and Memory.
Chapter 12 Pipelining Strategies Performance Hazards.
EECS 470 Pipeline Hazards Lecture 4 Coverage: Appendix A.
1 Chapter Fault Tolerant Design of Digital Systems.
September 28 th 2004University of Utah1 A preliminary look Karthik Ramani Power and Temperature-Aware Microarchitecture.
King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.5 Forward Recovery Systems Upon the detection of a failure, the system discards the current.
Developing Dependable Systems CIS 376 Bruce R. Maxim UM-Dearborn.
Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.
From Essentials of Computer Architecture by Douglas E. Comer. ISBN © 2005 Pearson Education, Inc. All rights reserved. 7.2 A Central Processor.
1 Multi-Level Error Detection Scheme based on Conditional DIVA-Style Verification Kevin Lacker and Huifang Qin CS252 Project Presentation 12/10/2003.
The Processor Data Path & Control Chapter 5 Part 1 - Introduction and Single Clock Cycle Design N. Guydosh 2/29/04.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
1 Fault-Tolerant Computing Systems #2 Hardware Fault Tolerance Pattara Leelaprute Computer Engineering Department Kasetsart University
Instituto de Informática and Dipartimento di Automatica e Informatica Universidade Federal do Rio Grande do Sul and Politecnico di Torino Porto Alegre,
IBM S/390 Parallel Enterprise Server G5 fault tolerance: A historical perspective by L. Spainhower & T.A. Gregg Presented by Mahmut Yilmaz.
1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.
Fault-Tolerant Systems Design Part 1.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
CS 211: Computer Architecture Lecture 6 Module 2 Exploiting Instruction Level Parallelism with Software Approaches Instructor: Morris Lancaster.
Maeda, Sill Torres: CLEVER CLEVER: Cross-Layer Error Verification Evaluation and Reporting Rafael Kioji Vivas Maeda, Frank Sill Torres Federal University.
CS5222 Advanced Computer Architecture Part 3: VLIW Architecture
Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn.
CprE 458/558: Real-Time Systems
5 May CmpE 516 Fault Tolerant Scheduling in Multiprocessor Systems Betül Demiröz.
Electrical and Computer Engineering University of Cyprus LAB 1: VHDL.
Fault-Tolerant Systems Design Part 1.
Evaluating Logic Resources Utilization in an FPGA-Based TMR CPU
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Shrikant G.
A Survey of Fault Tolerance in Distributed Systems By Szeying Tan Fall 2002 CS 633.
Introduction to Computer Organization Pipelining.
LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,
CS717 1 Hardware Fault Tolerance Through Simultaneous Multithreading (part 2) Jonathan Winter.
Pipelining Intro Computer Organization 1 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Basic Instruction Timings Making some assumptions.
1 Lecture: Pipelining Extensions Topics: control hazards, multi-cycle instructions, pipelining equations.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
Stalling delays the entire pipeline
ARM Organization and Implementation
CS161 – Design and Architecture of Computer Systems
nZDC: A compiler technique for near-Zero silent Data Corruption
CS203 – Advanced Computer Architecture
CDA 3101 Spring 2016 Introduction to Computer Organization
Design of the Control Unit for Single-Cycle Instruction Execution
Pipelining: Advanced ILP
Superscalar Processors & VLIW Processors
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Sequential circuits and Digital System Reliability
A Multiple Clock Cycle Instruction Implementation
Control unit extension for data hazards
Instruction Execution Cycle
Computer Architecture
Hardware Assisted Fault Tolerance Using Reconfigurable Logic
Control unit extension for data hazards
Seminar on Enterprise Software
Presentation transcript:

Computer Engineering Group Brandenburg University of Technology at Cottbus 1 Ressource Reduced Triple Modular Redundancy for Built-In Self-Repair in VLIW-Processors Mario Schölzel

Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 2 Outline Why Built-In Self-Repair? Base Architecture Resource Reduced TMR Program Modifications Architecture Modifications Conclusions and Limitations

Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 3 Why Built-In Self-Repair ? Hardware becomes unreliable (permanent faults due to small feature size) ITRS Roadmap 2005 for Design predicts requirement for reliable systems due to: –Infeasibility of full functional test at manufacturing exit –Relaxing 100% correctness requirement (reduces functional test complexity and cost) Consequence: Redundancy in the system is required for robustness!

Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 4 Simple TMR-Approach Processor 1 Processor 2 Processor 3 Voter InputOutput We consider the following application domain: High-performance signal processing applications (i.e. image- and audio-processing) Real-Time demands

Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 5 Basic Processor Architecture Data Path Register File BranchFU1 Extern FU n Control Path Control Logic Program MemoryData Memory Instruction Pointer...

Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 6 Idea of Resource Reduced TMR Redundant operators are naturally available in a VLIW data path In TMR: Three results are only necessary in case of a mismatch of two results Idea of RR-TMR: Perform every operation only by two operators and use in non-fault case third operator for executing regular operations

Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 7 Modified VLIW Data Path Limitation: Every operator must be available at least three times.

Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion * **** * Program Transformation Duplicated Operations + + **** * Pair of Reference Operations

Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 9 Modified Part of Instruction Word RefFU: number of FU that executes reference operation Mod=0: RefReg is target register in TRF Mod=1: RefReg delivers reference value from TRF These fields must be set correctly for every operation and its duplicate after scheduling all operations (We allow scheduling of original and duplicate operations at different times) opcodesrc1 2dstRefREGRefFUmod

Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 10 Example: Instruction Word + + Time step 8 Time step 10 FU 2FU 3 Time step 9 Result of Scheduling Corresponding Instruction Words …… +R3R6R00R63 OpCSrc1Src2DstmodRRegRFU +R3R6R01R62 Instr. 8 Instr. 9 Instr. 10 OpCSrc1Src2DstmodRRegRFU Instruction Word Part of FU 2Instruction Word Part of FU 3

Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 11 FD&C Logic Details Every bit represents fault status of corresponding operator Opcode of currently executed operation in corresponding FU Compares current result and reference value from register RefReg in TRF Decides whether an error occurs first time or not and gives a signal to Voting Logic Detects, if current result is faulty

Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 12 Example: Correct Execution + R3 R6 R00 R63 OpCSrc1Src2DstmodRRegRFU +R3R6R01R62 Instr. 8 Instr. 9 Instr. 10 OpCSrc1Src2DstmodRRegRFU Instruction Word Part of FU 2Instruction Word Part of FU

Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 13 Example: FU 2 is Faulty + R3 R6 R00 R63 OpCSrc1Src2DstmodRRegRFU +R3R6R01R62 Instr. 8 Instr. 9 Instr. 10 OpCSrc1Src2DstmodRRegRFU Instruction Word Part of FU 2Instruction Word Part of FU

Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 14 Example: FU 3 is Faulty + R3 R6 R00 R63 OpCSrc1Src2DstmodRRegRFU +R3R6R01R62 Instr. 8 Instr. 9 Instr. 10 OpCSrc1Src2DstmodRRegRFU Instruction Word Part of FU 2Instruction Word Part of FU

Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 15 Example: Fault Detection (1) + R3 R6 R00 R63 OpCSrc1Src2DstmodRRegRFU +R3R6R01R62 Instr. 8 Instr. 9 Instr. 10 OpCSrc1Src2DstmodRRegRFU Instruction Word Part of FU 2Instruction Word Part of FU

Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 16 Example: Fault Detection (2) +R3R6R01R62 OpCSrc1Src2DstmodRRegRFU Executing mismatch causing operation of FU 3 again in another FU. One of the following two cases applies: 0 +R3R6R01R62 OpCSrc1Src2DstmodRRegRFU 1 No mismatch is discovered. FU 2 and FU 4 computed correct result. Suppress Write-Back of FU 3 A mismatch is discovered again. It is assumed that FU 3 computed correct result. This is written to register file.

Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 17 Details FD&C-Logic Select a certain control word (normal: cs 1 ) Current operation mode (normal, voting, resume) Select control signals of fault causing operation Redirect selected signals to a working FU Remember faulty operators Control of (De-)Multiplexers

Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 18 Example: FD&C-Logic Example ScheduleSituation of FD&C-Logic * + Instruktion1 (EX) * & - & * - * Instruktion2 (Fetch) Instruktion3 Fault is reported normal

Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 19 Example: FD&C-Logic Example ScheduleSituation of FD&C-Logic * + Instruktion1 (WB) * & - & * - * Instruktion2 (EX, stopped) Instruktion3 (Fetched, stopped) Voting

Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 20 Example FD&C-Logic Resume starts here Resume

Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 21 Limitations in Error Detection + + Fu1Fu2Fu3 … Assumption: Operator + in FU 1 is faulty. Problem: Correctness of Operator + in FU 2 can no longer be checked! + + Solution: Check correctness of FU 2 with a reference operation in FU 3.

Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 22 Preliminary Results

Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 23 Preliminary Results

Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 24 Conclusion Method can detect and repair permanent and transient faults Known faults do not cause a delay, new faults cause a delay of at most 2  maxLat+1 Multiple known faults can be repaired (as long as at least on operation of every pair is executed by a non-faulty FU) Overhead of operators and register file ports of approximately 100% Overhead of Control-Logic is unknown so far (VHDL model is missing)

Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 25 Open Problems Handling of multiple faults that first occur at the same time is possible but difficult Faults in wires, registers, control path and FD & C logic Hardware implementation for better area and performance estimation

Computer Engineering Group Brandenburg University of Technology at Cottbus Mario Schölzel SPA 2007 Motivation VLIW Architecture RR-TRM Idea SW Modifications HW Modifications Conclusion 26 Thank You!