NC STATE UNIVERSITY 1 Assertion-Based Microarchitecture Design for Improved Fault Tolerance Vimal K. Reddy Ahmed S. Al-Zawawi, Eric Rotenberg Center for.

Slides:



Advertisements
Similar presentations
NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer.
Advertisements

1 Lecture 11: Modern Superscalar Processor Models Generic Superscalar Models, Issue Queue-based Pipeline, Multiple-Issue Design.
CS 7810 Lecture 4 Overview of Steering Algorithms, based on Dynamic Code Partitioning for Clustered Architectures R. Canal, J-M. Parcerisa, A. Gonzalez.
Federation: Repurposing Scalar Cores for Out- of-Order Instruction Issue David Tarjan*, Michael Boyer, and Kevin Skadron* University of Virginia Department.
UPC Microarchitectural Techniques to Exploit Repetitive Computations and Values Carlos Molina Clemente LECTURA DE TESIS, (Barcelona,14 de Diciembre de.
THE MIPS R10000 SUPERSCALAR MICROPROCESSOR Kenneth C. Yeager IEEE Micro in April 1996 Presented by Nitin Gupta.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
School of Computing Exploiting Eager Register Release in a Redundantly Multi-threaded Processor Niti Madan Rajeev Balasubramonian University of Utah.
Microprocessor Microarchitecture Dependency and OOO Execution Lynn Choi Dept. Of Computer and Electronics Engineering.
DATAFLOW ARHITEKTURE. Dataflow Processors - Motivation In basic processor pipelining hazards limit performance –Structural hazards –Data hazards due to.
NC STATE UNIVERSITY ASPLOS-XII Understanding Prediction-Based Partial Redundant Threading for Low-Overhead, High-Coverage Fault Tolerance Vimal Reddy Sailashri.
National & Kapodistrian University of Athens Dep.of Informatics & Telecommunications MSc. In Computer Systems Technology Advanced Computer Architecture.
NC STATE UNIVERSITY DSN 2007 © Vimal Reddy Inherent Time Redundancy (ITR): Using Program Repetition for Low-Overhead Fault Tolerance Vimal Reddy.
CS 7810 Lecture 25 DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design T. Austin Proceedings of MICRO-32 November 1999.
Microarchitectural Approaches to Exceeding the Complexity Barrier © Eric Rotenberg 1 Microarchitectural Approaches to Exceeding the Complexity Barrier.
Transient Fault Tolerance via Dynamic Process-Level Redundancy Alex Shye, Vijay Janapa Reddi, Tipp Moseley and Daniel A. Connors University of Colorado.
1 Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
UPC Reducing Misspeculation Penalty in Trace-Level Speculative Multithreaded Architectures Carlos Molina ψ, ф Jordi Tubella ф Antonio González λ,ф ISHPC-VI,
ISLPED’03 1 Reducing Reorder Buffer Complexity Through Selective Operand Caching *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk,
Cost-Efficient Soft Error Protection for Embedded Microprocessors
Slipstream Processors by Pujan Joshi1 Pujan Joshi May 6 th, 2008 Slipstream Processors Improving both Performance and Fault Tolerance.
1 Multi-Level Error Detection Scheme based on Conditional DIVA-Style Verification Kevin Lacker and Huifang Qin CS252 Project Presentation 12/10/2003.
University of Michigan Electrical Engineering and Computer Science 1 A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded.
Transient Fault Detection via Simultaneous Multithreading Shubhendu S. Mukherjee VSSAD, Alpha Technology Compaq Computer Corporation.
1 Sixth Lecture: Chapter 3: CISC Processors (Tomasulo Scheduling and IBM System 360/91) Please recall:  Multicycle instructions lead to the requirement.
Computer Engineering Group Brandenburg University of Technology at Cottbus 1 Ressource Reduced Triple Modular Redundancy for Built-In Self-Repair in VLIW-Processors.
1 Presented By Şahin DELİPINAR Simon Moore,Peter Robinson,Steve Wilcox Computer Labaratory,University Of Cambridge December 15, 1995 Rotary Pipeline Processors.
Complexity-Effective Superscalar Processors S. Palacharla, N. P. Jouppi, and J. E. Smith Presented by: Jason Zebchuk.
Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn.
NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /3/2013 Lecture 9: Memory Unit Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE CENTRAL.
1/25 June 28 th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control BranchTap Improving Performance With.
Redundant Multithreading Techniques for Transient Fault Detection Shubu Mukherjee Michael Kontz Steve Reinhardt Intel HP (current) Intel Consultant, U.
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Shrikant G.
The life of an instruction in EV6 pipeline Constantinos Kourouyiannis.
Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin.
Low-cost Program-level Detectors for Reducing Silent Data Corruptions Siva Hari †, Sarita Adve †, and Helia Naeimi ‡ † University of Illinois at Urbana-Champaign,
Exploiting Value Locality in Physical Register Files Saisanthosh Balakrishnan Guri Sohi University of Wisconsin-Madison 36 th Annual International Symposium.
OOO Pipelines - III Smruti R. Sarangi Computer Science and Engineering, IIT Delhi.
Computer Structure 2015 – Intel ® Core TM μArch 1 Computer Structure Multi-Threading Lihu Rappoport and Adi Yoaz.
CS717 1 Hardware Fault Tolerance Through Simultaneous Multithreading (part 2) Jonathan Winter.
15-740/ Computer Architecture Lecture 12: Issues in OoO Execution Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/7/2011.
1/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl 1 and Andreas Moshovos AENAO Research Group Department of Electrical.
1 Lecture 10: Memory Dependence Detection and Speculation Memory correctness, dynamic memory disambiguation, speculative disambiguation, Alpha Example.
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
Dynamic Scheduling Why go out of style?
nZDC: A compiler technique for near-Zero silent Data Corruption
CS203 – Advanced Computer Architecture
Smruti R. Sarangi Computer Science and Engineering, IIT Delhi
Microprocessor Microarchitecture Dynamic Pipeline
UnSync: A Soft Error Resilient Redundant Multicore Architecture
Lecture 16: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
Hwisoo So. , Moslem Didehban#, Yohan Ko
Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Lecture 11: Memory Data Flow Techniques
Smruti R. Sarangi Computer Science and Engineering, IIT Delhi
15-740/ Computer Architecture Lecture 5: Precise Exceptions
Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/30/2011
Instruction-Level Parallelism (ILP)
15-740/ Computer Architecture Lecture 10: Out-of-Order Execution
Overview Prof. Eric Rotenberg
Patrick Akl and Andreas Moshovos AENAO Research Group
Conceptual execution on a processor which exploits ILP
Spring 2019 Prof. Eric Rotenberg
ECE 721, Spring 2019 Prof. Eric Rotenberg.
Transparent Control Independence (TCI)
ECE 721 Alternatives to ROB-based Retirement
Sizing Structures Fixed relations Empirical (simulation-based)
ECE 721 Modern Superscalar Microarchitecture
Presentation transcript:

NC STATE UNIVERSITY 1 Assertion-Based Microarchitecture Design for Improved Fault Tolerance Vimal K. Reddy Ahmed S. Al-Zawawi, Eric Rotenberg Center for Embedded Systems Research (CESR) Department of Electrical & Computer Engineering North Carolina State University

NC STATE UNIVERSITY 2 Motivation Technology scaling –Smaller, faster transistors –Transistors more susceptible to transient faults How to build reliable processors using unreliable transistors?

NC STATE UNIVERSITY 3 Motivation Need efficient fault tolerance solutions Technology Scaling Performance No fault tolerance, erroneous operation Susceptible to transient faults Inefficient fault tolerance, correct operation No fault tolerance, correct operation Due to high overheads of inefficient fault tolerance

NC STATE UNIVERSITY 4 Redundant Multithreading (RMT) Duplicate a program and compare outcomes to detect transient faults Positives –Simple and straightforward –General solution –Complete fault coverage Negatives –High overhead

NC STATE UNIVERSITY 5 RMT using an extra processor Processor == Program Power Cost

NC STATE UNIVERSITY 6 RMT using simultaneous multithreading PROGRAM Processor == Performance Power Pipeline stages

NC STATE UNIVERSITY 7 Alternate solution: Targeted fault checks Add regimen of fault checks Specific to logic block Arbitrary latches: Robust latch design Arbitrary gates: Self-checking logic FSM: Self-checking FSM designs ALU: Self-checking ALUs, RESO Storage and buses: Parity, ECC Positives –No overhead of duplicating the program Negatives –Not general, i.e., many types of checks needed

NC STATE UNIVERSITY 8 Our contribution: Microarchitectural Assertions Novel class of fault checks Key Idea: Confirm µarch. “truths” within processor pipeline Catch-all checks for microarchitecture mechanisms Positives –Broad coverage (catch-all checks) –Very low-overhead solution (no redundant execution)

NC STATE UNIVERSITY 9 Spectrum of fault checks (RMT) Architecture Level == Processor == Processor

NC STATE UNIVERSITY 10 Spectrum of fault checks (RMT) Architecture Level Processor == Processor parity/ECC self-check FSM == robust flip-flop == self-check ALU == Logic Circuit Level

NC STATE UNIVERSITY 11 Spectrum of fault checks (RMT) Architecture Level == Processor Logic Circuit Level Processor == Processor

NC STATE UNIVERSITY 12 Spectrum of fault checks (RMT) Architecture Level == Processor Logic Circuit Level Processor == Microarchitecture Level == Processor μarch assert == μarch assert ==

NC STATE UNIVERSITY 13 Examples of Microarchitectural Assertions Register Name Authentication (RNA) –Aims to detect faults in renaming unit –Asserts consistencies in renaming unit Exploits inherent redundancy in renaming structures Asserts expected physical register states Timestamp-based Assertion Checking (TAC) –Aims to detect faults in issue unit –Asserts sequential order among data dependent instructions –Uses timestamps

NC STATE UNIVERSITY 14 p1 Rename Logic p2p3p4 Reorder Buffer r1 r2 r3 Architecture Map Table I1 (r1  ) Freelist I1 (r1:p1  ) old_r1:p5 Program I2 (r1  ) I3 (r1  ) I1 (r1  ) p1 I1 (r1:p1  ) I1 (r1:p1  ) old_r1:p5 p1 p5 p6 p7 p5 p6 p7 r1 r2 r3 Rename Map Table p1 I1 retires p2p3p4

NC STATE UNIVERSITY 15 Rename Logic p2p3p4 Reorder Buffer r1 r2 r3 Architecture Map Table Freelist Program I2 (r1  ) I3 (r1  ) I2 (r1  ) I2 (r1:p2  ) I2 (r1:p2  ) old_r1:p1 p1 p5 p6 p7 p6 p7 p2 I2 (r1:p2  ) old_r1:p1 r1 r2 r3 Rename Map Table p5 p1

NC STATE UNIVERSITY 16 Rename Logic Reorder Buffer r1 r2 r3 Architecture Map Table Freelist Program I3 (r1  ) p1 p5 p6 p7 p1 p6 p7 p2 I2 (r1:p2  ) old_r1:p1 Observation r1’s old == r1’s arch mapping Rename Map Table r1 r2 r3 p3p4p5

NC STATE UNIVERSITY 17 Rename Logic Reorder Buffer r1 r2 r3 Architecture Map Table Freelist r1 r2 r3 Rename Map Table Program I3 (r1  ) p1 p5 p6 p7 p6 p7 p2 I2 (r1:p2  ) old_r1:p1 RNA prev mapping check old_r1 == arch_r1? p1 == p1? p6 p1 p3p4p5

NC STATE UNIVERSITY 18 Rename Logic Reorder Buffer r1 r2 r3 Architecture Map Table Freelist Program I3 (r1  ) p1 p6 p7 I2 (r1:p2  ) old_r1:p1 p3 r1 r2 r3 Rename Map Table p1 p5 p6 p7 p6 I3 (r1:p3  ) I3 (r1:p3  ) old_r1:p6 p3 I3 (r1:p3  ) old_r1:p6 p4p5

NC STATE UNIVERSITY 19 Rename Logic Reorder Buffer r1 r2 r3 Architecture Map Table Freelist Program p1 p6 p7 I2 (r1:p2  ) old_r1:p1 RNA prev mapping check old_r1 == arch_r1? p1 == p1? r1 r2 r3 Rename Map Table p1 p5 p6 p7 p6p3 I3 (r1:p3  ) old_r1:p6 p4p5 p2 p4p5 I2 retires

NC STATE UNIVERSITY 20 Rename Logic Reorder Buffer r1 r2 r3 Architecture Map Table Freelist Program p6 p7 RNA prev mapping check old_r1 == arch_r1? p6 == p2? r1 r2 r3 Rename Map Table p1 p5 p6 p7 p6p3 I3 (r1:p3  ) old_r1:p6 p5p1 p2 FAULT DETECTED p4 At I3 retirement

NC STATE UNIVERSITY 21 p1p2p3 Rename Logic p4 Reorder Buffer r1 r2 r3 Architecture Map Table Freelist r1 r2 r3 Rename Map Table Program I1 (r1  ) I2 (r1  ) I1 (r1  ) p1 I1 (r1:p6  ) I1 (r1:p6  old_r1:p5) p6 p7 p5 p6 p7 p5 p6 I1 (r1:p6  ) old_r1:p5

NC STATE UNIVERSITY 22 p2p3 Rename Logic p4 Reorder Buffer r1 r2 r3 Architecture Map Table Freelist r1 r2 r3 Rename Map Table Program I2 (r1  ) p2 I2 (r1:p2  ) I2 (r1:p2  old_r1:p6) p6 p7 p5 p6 p7 p5 p6 I1 (r1:p6  ) old_r1:p5 p2 I2 (r1:p2  ) old_r1:p6

NC STATE UNIVERSITY 23 Rename Logic p3p4 Reorder Buffer r1 r2 r3 Architecture Map Table Freelist r1 r2 r3 Rename Map Table Program p6 p7 p5 p6 p7 p5 p6 I1 (r1:p6  ) old_r1:p5 p2 I2 (r1:p2  ) old_r1:p6 p3p4 I1 retires p6 RNA prev mapping check old_r1 == arch_r1? p5 == p5?

NC STATE UNIVERSITY 24 Rename Logic p3p4 Reorder Buffer r1 r2 r3 Architecture Map Table Freelist r1 r2 r3 Rename Map Table Program p6 p7 p6 p7 p5 p6 p2 I2 (r1:p2  ) old_r1:p6 p3p4 At I2 retirement p6 RNA prev mapping check old_r1 == arch_r1? p6 == p6? FAULT NOT DETECTED

NC STATE UNIVERSITY 25 p1 Rename Logic p2p3p4 Rdy/Free Bit Array Register File Freelist I2 (r2  ) r1 r2 r3 Rename Map Table p1 I1 (r1  ) FU p1 p2 p3 p4 RdyFree p1 p2 p3 p4 Issue Queue issue writeback Free bit unset I3 (r1  ) Program I1 (r1  ) p1 I1 (r1:p1  ) I1 (r1:p1  ) I1 (r1:p1  )

NC STATE UNIVERSITY 26 p2 Rename Logic p3p4 Rdy/Free Bit Array Register File r1 r2 r3 Rename Map Table p2 I2 (r2  ) FU p1 p2 p3 p4 RdyFree p1 p2 p3 p4 Issue Queue issue writeback I3 (r1  ) Program I2 (r2  ) p2 I2 (r2:p2  ) I2 (r2:p2  ) I1 (r1:p1  ) I2 (r2:p2  ) p1 0 Freelist

NC STATE UNIVERSITY 27 Rename Logic p3p4 Rdy/Free Bit Array Register File Freelist r1 r2 r3 Rename Map Table p1 FU p1 p2 p3 p4 Rdy Free p1 p2 p3 p4 Issue Queue issue writeback XXX Ready bit set Program I1 (r1:p1  ) I2 (r2:p2  ) Observation Ready bit == 0 (not executed) Free bit == 0 (not in freelist) 1 p2 XXX I3 (r1  )

NC STATE UNIVERSITY 28 Rename Logic p3p4 Rdy/Free Bit Array Register File Freelist r1 r2 r3 Rename Map Table p1 FU p1 p2 p3 p4 Rdy Free p1 p2 p3 p4 Issue Queue issue writeback XXX I3 (r1  ) Program p2 I3 (r1  ) p3 I3 (r1:p2  ) I3 (r1:p2  ) p2 I3 (r1:p2  ) 0 Ready of p2 == 0? FAULT DETECTED Free of p2 == 0? RNA writeback check XXX

NC STATE UNIVERSITY 29 RNA summary RNA previous mapping check –Detects faults in renaming structures Rename map table, Architecture map table Branch checkpoint tables Active list (renaming state) RNA writeback state check –Detects faults in renaming logic and freelist

NC STATE UNIVERSITY 30 Source renaming Pure source renaming faults undetected –E.g., fault in source renaming logic –Researching solutions similar to RNA However, faults causing deadlock are detectable –Faulty source name causing cyclic dependency –Faulty source name points to unpopped freelist entry –Other faults that cause phantom producers Use watchdog timer to detect deadlocks

NC STATE UNIVERSITY 31 Timestamp-based Assertion Checking (TAC) Confirm data dependent instructions issued sequentially –Assign timestamps to instructions at issue –At retirement, confirm instruction timestamp greater than producers’ timestamps TAC check Instruction timestamp >= Producer’s timestamp + latency Faults on checking logic can only cause false alarms

NC STATE UNIVERSITY 32 Scheduler FU H T I1 (r1  ) I3 (r1 ,  r1) I4 (r1 ,  r1) I2 (r3  ) I5 (r3 ,  r3) I6 (r3 ,  r3) I1I2I3I4I5I6 issue writeback Dataflow r1 r2 r3 Reorder Buffer Issue Queue retire I1I2I3I4I5I6 Timestamp Counter Lat. TS Lat. TAC check Instr TS >= Prod TS + Prod Lat Architectural State

NC STATE UNIVERSITY 33 Scheduler FU H T I1 (r1  ) I3 (r1 ,  r1) I4 (r1 ,  r1) I2 (r3  ) I5 (r3 ,  r3) I6 (r3 ,  r3) I1I2I3I4I5I6 issue writeback Dataflow Reorder Buffer Issue Queue HHHHHH retire Lat. TS r1 r2 r3 TSLat TAC check I1: 1 >= I2: 0 >= I3: 4 >= I4: 3 >= FAULT DETECTED Architectural State

NC STATE UNIVERSITY 34 Experiments Randomly injected faults in timing simulator –1000 faults per benchmark –Faults target issue and rename state –Simulation ends 1 million cycles following fault injection Observations –Fault detected by an assert (Assert) or not (Undet) –Fault corrupts architectural state (SDC) or not (Masked) Possible outcomes –Assert+SDC –Undet+SDC –Assert+Masked –Undet+Masked

NC STATE UNIVERSITY 35 TAC - Fault Injection Experiments Type of faults injected –Ready bits prematurely set –Speculatively issued cache-missing load dependents not reissued

NC STATE UNIVERSITY 36 TAC - Fault Injection Experiments 80% of faults cause SDC 20% of faults are masked All SDC faults are detected Zero undetected SDC faults

NC STATE UNIVERSITY 37 RNA - Fault Injection Experiments Faults injected –Flip bits of an entry in architecture map table –Flip bits of an entry in rename map table –Flip bits of an entry in freelist –Flip destination register bits at dispatch Watchdog timer included to detect deadlocks Additional possible outcomes –Assert+Wdog : RNA detected a future deadlock –Undet+Wdog : Deadlock undetected by RNA (possibly would have been caught by RNA in future)

NC STATE UNIVERSITY 38 RNA - Fault Injection Experiments 68% SDC faults 24% deadlocks 8% masked faults 88% of SDC faults detected by RNA 12% of SDC faults undetected 75% of deadlocks not detected by RNA (but detected by watchdog) 25% of deadlocks detected by RNA beforehand

NC STATE UNIVERSITY 39 RNA - Fault outcome distribution “dest” faults cause deadlocks because of phantom producers Deadlock blocks retirement Thus, RNA check can’t complete “arch_map” faults cause most undetected SDC Long live register range System trap before RNA check can complete

NC STATE UNIVERSITY 40 Conclusions (RMT) Architecture Level == Processor Logic Circuit Level Processor == Microarchitecture Level == Processor μarch assert == μarch assert ==