Presentation is loading. Please wait.

Presentation is loading. Please wait.

NC STATE UNIVERSITY 1 Assertion-Based Microarchitecture Design for Improved Fault Tolerance Vimal K. Reddy Ahmed S. Al-Zawawi, Eric Rotenberg Center for.

Similar presentations


Presentation on theme: "NC STATE UNIVERSITY 1 Assertion-Based Microarchitecture Design for Improved Fault Tolerance Vimal K. Reddy Ahmed S. Al-Zawawi, Eric Rotenberg Center for."— Presentation transcript:

1 NC STATE UNIVERSITY 1 Assertion-Based Microarchitecture Design for Improved Fault Tolerance Vimal K. Reddy Ahmed S. Al-Zawawi, Eric Rotenberg Center for Embedded Systems Research (CESR) Department of Electrical & Computer Engineering North Carolina State University

2 NC STATE UNIVERSITY 2 Motivation Technology scaling –Smaller, faster transistors –Transistors more susceptible to transient faults How to build reliable processors using unreliable transistors?

3 NC STATE UNIVERSITY 3 Motivation Need efficient fault tolerance solutions Technology Scaling Performance No fault tolerance, erroneous operation Susceptible to transient faults Inefficient fault tolerance, correct operation No fault tolerance, correct operation Due to high overheads of inefficient fault tolerance

4 NC STATE UNIVERSITY 4 Redundant Multithreading (RMT) Duplicate a program and compare outcomes to detect transient faults Positives –Simple and straightforward –General solution –Complete fault coverage Negatives –High overhead

5 NC STATE UNIVERSITY 5 RMT using an extra processor Processor == Program Power Cost

6 NC STATE UNIVERSITY 6 RMT using simultaneous multithreading PROGRAM Processor == Performance Power Pipeline stages

7 NC STATE UNIVERSITY 7 Alternate solution: Targeted fault checks Add regimen of fault checks Specific to logic block Arbitrary latches: Robust latch design Arbitrary gates: Self-checking logic FSM: Self-checking FSM designs ALU: Self-checking ALUs, RESO Storage and buses: Parity, ECC Positives –No overhead of duplicating the program Negatives –Not general, i.e., many types of checks needed

8 NC STATE UNIVERSITY 8 Our contribution: Microarchitectural Assertions Novel class of fault checks Key Idea: Confirm µarch. “truths” within processor pipeline Catch-all checks for microarchitecture mechanisms Positives –Broad coverage (catch-all checks) –Very low-overhead solution (no redundant execution)

9 NC STATE UNIVERSITY 9 Spectrum of fault checks (RMT) Architecture Level == Processor == Processor

10 NC STATE UNIVERSITY 10 Spectrum of fault checks (RMT) Architecture Level Processor == Processor parity/ECC self-check FSM == robust flip-flop == self-check ALU == Logic Circuit Level

11 NC STATE UNIVERSITY 11 Spectrum of fault checks (RMT) Architecture Level == Processor Logic Circuit Level Processor == Processor

12 NC STATE UNIVERSITY 12 Spectrum of fault checks (RMT) Architecture Level == Processor Logic Circuit Level Processor == Microarchitecture Level == Processor μarch assert == μarch assert ==

13 NC STATE UNIVERSITY 13 Examples of Microarchitectural Assertions Register Name Authentication (RNA) –Aims to detect faults in renaming unit –Asserts consistencies in renaming unit Exploits inherent redundancy in renaming structures Asserts expected physical register states Timestamp-based Assertion Checking (TAC) –Aims to detect faults in issue unit –Asserts sequential order among data dependent instructions –Uses timestamps

14 NC STATE UNIVERSITY 14 p1 Rename Logic p2p3p4 Reorder Buffer r1 r2 r3 Architecture Map Table I1 (r1  ) Freelist I1 (r1:p1  ) old_r1:p5 Program I2 (r1  ) I3 (r1  ) I1 (r1  ) p1 I1 (r1:p1  ) I1 (r1:p1  ) old_r1:p5 p1 p5 p6 p7 p5 p6 p7 r1 r2 r3 Rename Map Table p1 I1 retires p2p3p4

15 NC STATE UNIVERSITY 15 Rename Logic p2p3p4 Reorder Buffer r1 r2 r3 Architecture Map Table Freelist Program I2 (r1  ) I3 (r1  ) I2 (r1  ) I2 (r1:p2  ) I2 (r1:p2  ) old_r1:p1 p1 p5 p6 p7 p6 p7 p2 I2 (r1:p2  ) old_r1:p1 r1 r2 r3 Rename Map Table p5 p1

16 NC STATE UNIVERSITY 16 Rename Logic Reorder Buffer r1 r2 r3 Architecture Map Table Freelist Program I3 (r1  ) p1 p5 p6 p7 p1 p6 p7 p2 I2 (r1:p2  ) old_r1:p1 Observation r1’s old == r1’s arch mapping Rename Map Table r1 r2 r3 p3p4p5

17 NC STATE UNIVERSITY 17 Rename Logic Reorder Buffer r1 r2 r3 Architecture Map Table Freelist r1 r2 r3 Rename Map Table Program I3 (r1  ) p1 p5 p6 p7 p6 p7 p2 I2 (r1:p2  ) old_r1:p1 RNA prev mapping check old_r1 == arch_r1? p1 == p1? p6 p1 p3p4p5

18 NC STATE UNIVERSITY 18 Rename Logic Reorder Buffer r1 r2 r3 Architecture Map Table Freelist Program I3 (r1  ) p1 p6 p7 I2 (r1:p2  ) old_r1:p1 p3 r1 r2 r3 Rename Map Table p1 p5 p6 p7 p6 I3 (r1:p3  ) I3 (r1:p3  ) old_r1:p6 p3 I3 (r1:p3  ) old_r1:p6 p4p5

19 NC STATE UNIVERSITY 19 Rename Logic Reorder Buffer r1 r2 r3 Architecture Map Table Freelist Program p1 p6 p7 I2 (r1:p2  ) old_r1:p1 RNA prev mapping check old_r1 == arch_r1? p1 == p1? r1 r2 r3 Rename Map Table p1 p5 p6 p7 p6p3 I3 (r1:p3  ) old_r1:p6 p4p5 p2 p4p5 I2 retires

20 NC STATE UNIVERSITY 20 Rename Logic Reorder Buffer r1 r2 r3 Architecture Map Table Freelist Program p6 p7 RNA prev mapping check old_r1 == arch_r1? p6 == p2? r1 r2 r3 Rename Map Table p1 p5 p6 p7 p6p3 I3 (r1:p3  ) old_r1:p6 p5p1 p2 FAULT DETECTED p4 At I3 retirement

21 NC STATE UNIVERSITY 21 p1p2p3 Rename Logic p4 Reorder Buffer r1 r2 r3 Architecture Map Table Freelist r1 r2 r3 Rename Map Table Program I1 (r1  ) I2 (r1  ) I1 (r1  ) p1 I1 (r1:p6  ) I1 (r1:p6  old_r1:p5) p6 p7 p5 p6 p7 p5 p6 I1 (r1:p6  ) old_r1:p5

22 NC STATE UNIVERSITY 22 p2p3 Rename Logic p4 Reorder Buffer r1 r2 r3 Architecture Map Table Freelist r1 r2 r3 Rename Map Table Program I2 (r1  ) p2 I2 (r1:p2  ) I2 (r1:p2  old_r1:p6) p6 p7 p5 p6 p7 p5 p6 I1 (r1:p6  ) old_r1:p5 p2 I2 (r1:p2  ) old_r1:p6

23 NC STATE UNIVERSITY 23 Rename Logic p3p4 Reorder Buffer r1 r2 r3 Architecture Map Table Freelist r1 r2 r3 Rename Map Table Program p6 p7 p5 p6 p7 p5 p6 I1 (r1:p6  ) old_r1:p5 p2 I2 (r1:p2  ) old_r1:p6 p3p4 I1 retires p6 RNA prev mapping check old_r1 == arch_r1? p5 == p5?

24 NC STATE UNIVERSITY 24 Rename Logic p3p4 Reorder Buffer r1 r2 r3 Architecture Map Table Freelist r1 r2 r3 Rename Map Table Program p6 p7 p6 p7 p5 p6 p2 I2 (r1:p2  ) old_r1:p6 p3p4 At I2 retirement p6 RNA prev mapping check old_r1 == arch_r1? p6 == p6? FAULT NOT DETECTED

25 NC STATE UNIVERSITY 25 p1 Rename Logic p2p3p4 Rdy/Free Bit Array Register File Freelist I2 (r2  ) r1 r2 r3 Rename Map Table p1 I1 (r1  ) FU p1 p2 p3 p4 RdyFree p1 p2 p3 p4 Issue Queue issue writeback Free bit unset I3 (r1  ) Program I1 (r1  ) p1 I1 (r1:p1  ) I1 (r1:p1  ) I1 (r1:p1  )

26 NC STATE UNIVERSITY 26 p2 Rename Logic p3p4 Rdy/Free Bit Array Register File r1 r2 r3 Rename Map Table p2 I2 (r2  ) FU p1 p2 p3 p4 RdyFree p1 p2 p3 p4 Issue Queue issue writeback I3 (r1  ) Program I2 (r2  ) p2 I2 (r2:p2  ) I2 (r2:p2  ) I1 (r1:p1  ) I2 (r2:p2  ) p1 0 Freelist

27 NC STATE UNIVERSITY 27 Rename Logic p3p4 Rdy/Free Bit Array Register File Freelist r1 r2 r3 Rename Map Table p1 FU p1 p2 p3 p4 Rdy Free p1 p2 p3 p4 Issue Queue issue writeback XXX Ready bit set Program I1 (r1:p1  ) I2 (r2:p2  ) Observation Ready bit == 0 (not executed) Free bit == 0 (not in freelist) 1 p2 XXX I3 (r1  )

28 NC STATE UNIVERSITY 28 Rename Logic p3p4 Rdy/Free Bit Array Register File Freelist r1 r2 r3 Rename Map Table p1 FU p1 p2 p3 p4 Rdy Free p1 p2 p3 p4 Issue Queue issue writeback XXX I3 (r1  ) Program p2 I3 (r1  ) p3 I3 (r1:p2  ) I3 (r1:p2  ) p2 I3 (r1:p2  ) 0 Ready of p2 == 0? FAULT DETECTED Free of p2 == 0? RNA writeback check XXX

29 NC STATE UNIVERSITY 29 RNA summary RNA previous mapping check –Detects faults in renaming structures Rename map table, Architecture map table Branch checkpoint tables Active list (renaming state) RNA writeback state check –Detects faults in renaming logic and freelist

30 NC STATE UNIVERSITY 30 Source renaming Pure source renaming faults undetected –E.g., fault in source renaming logic –Researching solutions similar to RNA However, faults causing deadlock are detectable –Faulty source name causing cyclic dependency –Faulty source name points to unpopped freelist entry –Other faults that cause phantom producers Use watchdog timer to detect deadlocks

31 NC STATE UNIVERSITY 31 Timestamp-based Assertion Checking (TAC) Confirm data dependent instructions issued sequentially –Assign timestamps to instructions at issue –At retirement, confirm instruction timestamp greater than producers’ timestamps TAC check Instruction timestamp >= Producer’s timestamp + latency Faults on checking logic can only cause false alarms

32 NC STATE UNIVERSITY 32 Scheduler FU H T I1 (r1  ) I3 (r1 ,  r1) I4 (r1 ,  r1) I2 (r3  ) I5 (r3 ,  r3) I6 (r3 ,  r3) I1I2I3I4I5I6 issue writeback Dataflow r1 r2 r3 Reorder Buffer Issue Queue retire I1I2I3I4I5I6 Timestamp Counter Lat. TS Lat. TAC check Instr TS >= Prod TS + Prod Lat Architectural State

33 NC STATE UNIVERSITY 33 Scheduler FU H T I1 (r1  ) I3 (r1 ,  r1) I4 (r1 ,  r1) I2 (r3  ) I5 (r3 ,  r3) I6 (r3 ,  r3) I1I2I3I4I5I6 issue writeback Dataflow Reorder Buffer Issue Queue HHHHHH retire Lat. TS r1 r2 r3 TSLat TAC check I1: 1 >= I2: 0 >= I3: 4 >= I4: 3 >= FAULT DETECTED Architectural State

34 NC STATE UNIVERSITY 34 Experiments Randomly injected faults in timing simulator –1000 faults per benchmark –Faults target issue and rename state –Simulation ends 1 million cycles following fault injection Observations –Fault detected by an assert (Assert) or not (Undet) –Fault corrupts architectural state (SDC) or not (Masked) Possible outcomes –Assert+SDC –Undet+SDC –Assert+Masked –Undet+Masked

35 NC STATE UNIVERSITY 35 TAC - Fault Injection Experiments Type of faults injected –Ready bits prematurely set –Speculatively issued cache-missing load dependents not reissued

36 NC STATE UNIVERSITY 36 TAC - Fault Injection Experiments 80% of faults cause SDC 20% of faults are masked All SDC faults are detected Zero undetected SDC faults

37 NC STATE UNIVERSITY 37 RNA - Fault Injection Experiments Faults injected –Flip bits of an entry in architecture map table –Flip bits of an entry in rename map table –Flip bits of an entry in freelist –Flip destination register bits at dispatch Watchdog timer included to detect deadlocks Additional possible outcomes –Assert+Wdog : RNA detected a future deadlock –Undet+Wdog : Deadlock undetected by RNA (possibly would have been caught by RNA in future)

38 NC STATE UNIVERSITY 38 RNA - Fault Injection Experiments 68% SDC faults 24% deadlocks 8% masked faults 88% of SDC faults detected by RNA 12% of SDC faults undetected 75% of deadlocks not detected by RNA (but detected by watchdog) 25% of deadlocks detected by RNA beforehand

39 NC STATE UNIVERSITY 39 RNA - Fault outcome distribution “dest” faults cause deadlocks because of phantom producers Deadlock blocks retirement Thus, RNA check can’t complete “arch_map” faults cause most undetected SDC Long live register range System trap before RNA check can complete

40 NC STATE UNIVERSITY 40 Conclusions (RMT) Architecture Level == Processor Logic Circuit Level Processor == Microarchitecture Level == Processor μarch assert == μarch assert ==


Download ppt "NC STATE UNIVERSITY 1 Assertion-Based Microarchitecture Design for Improved Fault Tolerance Vimal K. Reddy Ahmed S. Al-Zawawi, Eric Rotenberg Center for."

Similar presentations


Ads by Google