Download presentation
Presentation is loading. Please wait.
Published byShayla Goss Modified over 9 years ago
1
NC STATE UNIVERSITY 1 Assertion-Based Microarchitecture Design for Improved Fault Tolerance Vimal K. Reddy Ahmed S. Al-Zawawi, Eric Rotenberg Center for Embedded Systems Research (CESR) Department of Electrical & Computer Engineering North Carolina State University
2
NC STATE UNIVERSITY 2 Motivation Technology scaling –Smaller, faster transistors –Transistors more susceptible to transient faults How to build reliable processors using unreliable transistors?
3
NC STATE UNIVERSITY 3 Motivation Need efficient fault tolerance solutions Technology Scaling Performance No fault tolerance, erroneous operation Susceptible to transient faults Inefficient fault tolerance, correct operation No fault tolerance, correct operation Due to high overheads of inefficient fault tolerance
4
NC STATE UNIVERSITY 4 Redundant Multithreading (RMT) Duplicate a program and compare outcomes to detect transient faults Positives –Simple and straightforward –General solution –Complete fault coverage Negatives –High overhead
5
NC STATE UNIVERSITY 5 RMT using an extra processor Processor == Program Power Cost
6
NC STATE UNIVERSITY 6 RMT using simultaneous multithreading PROGRAM Processor == Performance Power Pipeline stages
7
NC STATE UNIVERSITY 7 Alternate solution: Targeted fault checks Add regimen of fault checks Specific to logic block Arbitrary latches: Robust latch design Arbitrary gates: Self-checking logic FSM: Self-checking FSM designs ALU: Self-checking ALUs, RESO Storage and buses: Parity, ECC Positives –No overhead of duplicating the program Negatives –Not general, i.e., many types of checks needed
8
NC STATE UNIVERSITY 8 Our contribution: Microarchitectural Assertions Novel class of fault checks Key Idea: Confirm µarch. “truths” within processor pipeline Catch-all checks for microarchitecture mechanisms Positives –Broad coverage (catch-all checks) –Very low-overhead solution (no redundant execution)
9
NC STATE UNIVERSITY 9 Spectrum of fault checks (RMT) Architecture Level == Processor == Processor
10
NC STATE UNIVERSITY 10 Spectrum of fault checks (RMT) Architecture Level Processor == Processor parity/ECC self-check FSM == robust flip-flop == self-check ALU == Logic Circuit Level
11
NC STATE UNIVERSITY 11 Spectrum of fault checks (RMT) Architecture Level == Processor Logic Circuit Level Processor == Processor
12
NC STATE UNIVERSITY 12 Spectrum of fault checks (RMT) Architecture Level == Processor Logic Circuit Level Processor == Microarchitecture Level == Processor μarch assert == μarch assert ==
13
NC STATE UNIVERSITY 13 Examples of Microarchitectural Assertions Register Name Authentication (RNA) –Aims to detect faults in renaming unit –Asserts consistencies in renaming unit Exploits inherent redundancy in renaming structures Asserts expected physical register states Timestamp-based Assertion Checking (TAC) –Aims to detect faults in issue unit –Asserts sequential order among data dependent instructions –Uses timestamps
14
NC STATE UNIVERSITY 14 p1 Rename Logic p2p3p4 Reorder Buffer r1 r2 r3 Architecture Map Table I1 (r1 ) Freelist I1 (r1:p1 ) old_r1:p5 Program I2 (r1 ) I3 (r1 ) I1 (r1 ) p1 I1 (r1:p1 ) I1 (r1:p1 ) old_r1:p5 p1 p5 p6 p7 p5 p6 p7 r1 r2 r3 Rename Map Table p1 I1 retires p2p3p4
15
NC STATE UNIVERSITY 15 Rename Logic p2p3p4 Reorder Buffer r1 r2 r3 Architecture Map Table Freelist Program I2 (r1 ) I3 (r1 ) I2 (r1 ) I2 (r1:p2 ) I2 (r1:p2 ) old_r1:p1 p1 p5 p6 p7 p6 p7 p2 I2 (r1:p2 ) old_r1:p1 r1 r2 r3 Rename Map Table p5 p1
16
NC STATE UNIVERSITY 16 Rename Logic Reorder Buffer r1 r2 r3 Architecture Map Table Freelist Program I3 (r1 ) p1 p5 p6 p7 p1 p6 p7 p2 I2 (r1:p2 ) old_r1:p1 Observation r1’s old == r1’s arch mapping Rename Map Table r1 r2 r3 p3p4p5
17
NC STATE UNIVERSITY 17 Rename Logic Reorder Buffer r1 r2 r3 Architecture Map Table Freelist r1 r2 r3 Rename Map Table Program I3 (r1 ) p1 p5 p6 p7 p6 p7 p2 I2 (r1:p2 ) old_r1:p1 RNA prev mapping check old_r1 == arch_r1? p1 == p1? p6 p1 p3p4p5
18
NC STATE UNIVERSITY 18 Rename Logic Reorder Buffer r1 r2 r3 Architecture Map Table Freelist Program I3 (r1 ) p1 p6 p7 I2 (r1:p2 ) old_r1:p1 p3 r1 r2 r3 Rename Map Table p1 p5 p6 p7 p6 I3 (r1:p3 ) I3 (r1:p3 ) old_r1:p6 p3 I3 (r1:p3 ) old_r1:p6 p4p5
19
NC STATE UNIVERSITY 19 Rename Logic Reorder Buffer r1 r2 r3 Architecture Map Table Freelist Program p1 p6 p7 I2 (r1:p2 ) old_r1:p1 RNA prev mapping check old_r1 == arch_r1? p1 == p1? r1 r2 r3 Rename Map Table p1 p5 p6 p7 p6p3 I3 (r1:p3 ) old_r1:p6 p4p5 p2 p4p5 I2 retires
20
NC STATE UNIVERSITY 20 Rename Logic Reorder Buffer r1 r2 r3 Architecture Map Table Freelist Program p6 p7 RNA prev mapping check old_r1 == arch_r1? p6 == p2? r1 r2 r3 Rename Map Table p1 p5 p6 p7 p6p3 I3 (r1:p3 ) old_r1:p6 p5p1 p2 FAULT DETECTED p4 At I3 retirement
21
NC STATE UNIVERSITY 21 p1p2p3 Rename Logic p4 Reorder Buffer r1 r2 r3 Architecture Map Table Freelist r1 r2 r3 Rename Map Table Program I1 (r1 ) I2 (r1 ) I1 (r1 ) p1 I1 (r1:p6 ) I1 (r1:p6 old_r1:p5) p6 p7 p5 p6 p7 p5 p6 I1 (r1:p6 ) old_r1:p5
22
NC STATE UNIVERSITY 22 p2p3 Rename Logic p4 Reorder Buffer r1 r2 r3 Architecture Map Table Freelist r1 r2 r3 Rename Map Table Program I2 (r1 ) p2 I2 (r1:p2 ) I2 (r1:p2 old_r1:p6) p6 p7 p5 p6 p7 p5 p6 I1 (r1:p6 ) old_r1:p5 p2 I2 (r1:p2 ) old_r1:p6
23
NC STATE UNIVERSITY 23 Rename Logic p3p4 Reorder Buffer r1 r2 r3 Architecture Map Table Freelist r1 r2 r3 Rename Map Table Program p6 p7 p5 p6 p7 p5 p6 I1 (r1:p6 ) old_r1:p5 p2 I2 (r1:p2 ) old_r1:p6 p3p4 I1 retires p6 RNA prev mapping check old_r1 == arch_r1? p5 == p5?
24
NC STATE UNIVERSITY 24 Rename Logic p3p4 Reorder Buffer r1 r2 r3 Architecture Map Table Freelist r1 r2 r3 Rename Map Table Program p6 p7 p6 p7 p5 p6 p2 I2 (r1:p2 ) old_r1:p6 p3p4 At I2 retirement p6 RNA prev mapping check old_r1 == arch_r1? p6 == p6? FAULT NOT DETECTED
25
NC STATE UNIVERSITY 25 p1 Rename Logic p2p3p4 Rdy/Free Bit Array Register File Freelist I2 (r2 ) r1 r2 r3 Rename Map Table p1 I1 (r1 ) FU p1 p2 p3 p4 RdyFree p1 p2 p3 p4 Issue Queue issue writeback 1 1 1 10 0 0 00 Free bit unset I3 (r1 ) Program I1 (r1 ) p1 I1 (r1:p1 ) I1 (r1:p1 ) I1 (r1:p1 )
26
NC STATE UNIVERSITY 26 p2 Rename Logic p3p4 Rdy/Free Bit Array Register File r1 r2 r3 Rename Map Table p2 I2 (r2 ) FU p1 p2 p3 p4 RdyFree p1 p2 p3 p4 Issue Queue issue writeback 1 1 10 0 0 0 0 I3 (r1 ) Program I2 (r2 ) p2 I2 (r2:p2 ) I2 (r2:p2 ) I1 (r1:p1 ) I2 (r2:p2 ) p1 0 Freelist
27
NC STATE UNIVERSITY 27 Rename Logic p3p4 Rdy/Free Bit Array Register File Freelist r1 r2 r3 Rename Map Table p1 FU p1 p2 p3 p4 Rdy Free p1 p2 p3 p4 Issue Queue issue writeback 0 0 1 10 0 0 01 XXX Ready bit set Program I1 (r1:p1 ) I2 (r2:p2 ) Observation Ready bit == 0 (not executed) Free bit == 0 (not in freelist) 1 p2 XXX I3 (r1 )
28
NC STATE UNIVERSITY 28 Rename Logic p3p4 Rdy/Free Bit Array Register File Freelist r1 r2 r3 Rename Map Table p1 FU p1 p2 p3 p4 Rdy Free p1 p2 p3 p4 Issue Queue issue writeback 0 0 1 10 0 1 1 XXX I3 (r1 ) Program p2 I3 (r1 ) p3 I3 (r1:p2 ) I3 (r1:p2 ) p2 I3 (r1:p2 ) 0 Ready of p2 == 0? FAULT DETECTED Free of p2 == 0? RNA writeback check XXX
29
NC STATE UNIVERSITY 29 RNA summary RNA previous mapping check –Detects faults in renaming structures Rename map table, Architecture map table Branch checkpoint tables Active list (renaming state) RNA writeback state check –Detects faults in renaming logic and freelist
30
NC STATE UNIVERSITY 30 Source renaming Pure source renaming faults undetected –E.g., fault in source renaming logic –Researching solutions similar to RNA However, faults causing deadlock are detectable –Faulty source name causing cyclic dependency –Faulty source name points to unpopped freelist entry –Other faults that cause phantom producers Use watchdog timer to detect deadlocks
31
NC STATE UNIVERSITY 31 Timestamp-based Assertion Checking (TAC) Confirm data dependent instructions issued sequentially –Assign timestamps to instructions at issue –At retirement, confirm instruction timestamp greater than producers’ timestamps TAC check Instruction timestamp >= Producer’s timestamp + latency Faults on checking logic can only cause false alarms
32
NC STATE UNIVERSITY 32 Scheduler FU H T I1 (r1 ) I3 (r1 , r1) I4 (r1 , r1) I2 (r3 ) I5 (r3 , r3) I6 (r3 , r3) I1I2I3I4I5I6 issue writeback Dataflow r1 r2 r3 Reorder Buffer Issue Queue retire I1I2I3I4I5I6 Timestamp Counter 1 014 3 25 111 1 11Lat. TS 234560 Lat. TAC check Instr TS >= Prod TS + Prod Lat Architectural State 00 00 00
33
NC STATE UNIVERSITY 33 Scheduler FU H T I1 (r1 ) I3 (r1 , r1) I4 (r1 , r1) I2 (r3 ) I5 (r3 , r3) I6 (r3 , r3) I1I2I3I4I5I6 issue writeback Dataflow Reorder Buffer Issue Queue HHHHHH retire 250 1 1 1 4 1 3 1 11Lat. TS r1 r2 r3 TSLat. 11 01 41 TAC check I1: 1 >= 0 + 0 I2: 0 >= 0 + 0 I3: 4 >= 1 + 1 I4: 3 >= 4 + 1 FAULT DETECTED Architectural State 00 00 00
34
NC STATE UNIVERSITY 34 Experiments Randomly injected faults in timing simulator –1000 faults per benchmark –Faults target issue and rename state –Simulation ends 1 million cycles following fault injection Observations –Fault detected by an assert (Assert) or not (Undet) –Fault corrupts architectural state (SDC) or not (Masked) Possible outcomes –Assert+SDC –Undet+SDC –Assert+Masked –Undet+Masked
35
NC STATE UNIVERSITY 35 TAC - Fault Injection Experiments Type of faults injected –Ready bits prematurely set –Speculatively issued cache-missing load dependents not reissued
36
NC STATE UNIVERSITY 36 TAC - Fault Injection Experiments 80% of faults cause SDC 20% of faults are masked All SDC faults are detected Zero undetected SDC faults
37
NC STATE UNIVERSITY 37 RNA - Fault Injection Experiments Faults injected –Flip bits of an entry in architecture map table –Flip bits of an entry in rename map table –Flip bits of an entry in freelist –Flip destination register bits at dispatch Watchdog timer included to detect deadlocks Additional possible outcomes –Assert+Wdog : RNA detected a future deadlock –Undet+Wdog : Deadlock undetected by RNA (possibly would have been caught by RNA in future)
38
NC STATE UNIVERSITY 38 RNA - Fault Injection Experiments 68% SDC faults 24% deadlocks 8% masked faults 88% of SDC faults detected by RNA 12% of SDC faults undetected 75% of deadlocks not detected by RNA (but detected by watchdog) 25% of deadlocks detected by RNA beforehand
39
NC STATE UNIVERSITY 39 RNA - Fault outcome distribution “dest” faults cause deadlocks because of phantom producers Deadlock blocks retirement Thus, RNA check can’t complete “arch_map” faults cause most undetected SDC Long live register range System trap before RNA check can complete
40
NC STATE UNIVERSITY 40 Conclusions (RMT) Architecture Level == Processor Logic Circuit Level Processor == Microarchitecture Level == Processor μarch assert == μarch assert ==
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.