Presentation is loading. Please wait.

Presentation is loading. Please wait.

Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan D EPARTMENT OF E LECTRICAL AND C OMPUTER E NGINEERING T HE U NIVERSITY OF B RITISH C OLUMBIA.

Similar presentations


Presentation on theme: "Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan D EPARTMENT OF E LECTRICAL AND C OMPUTER E NGINEERING T HE U NIVERSITY OF B RITISH C OLUMBIA."— Presentation transcript:

1 Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan D EPARTMENT OF E LECTRICAL AND C OMPUTER E NGINEERING T HE U NIVERSITY OF B RITISH C OLUMBIA M ODELING T HE P ROPAGATION OF I NTERMITTENT H ARDWARE F AULTS IN P ROGRAMS

2 Motivation: Why Intermittent Faults? Intermittent faults are likely to be a significant concern in future processors [Cons08]. Transient FaultIntermittent Faults

3 Motivation

4 mov R1, #5 mov R2, #6 mov R3, #7 ld R4, R1, Array_Addr ld R5, R2, Array_Addr ld R6, R3, Array_Addr mult R7, R5, R4 Intermittent Fault

5 mov R1, #5 mov R2, #6 mov R3, #7 ld R4, R1, Array_Addr ld R5, R2, Array_Addr ld R6, R3, Array_Addr mult R7, R5, R4 Motivation Intermittent Fault Failure Program Execution time Intermittent Fault

6 Primary Research Questions Do all intermittent faults lead to program crash? How many instructions are executed before the program crashes? How many dynamic values are corrupted by the fault before the program crashes?

7 Contributions Understand intermittent fault propagation through fault-injection experiments in programs. Model the propagation of intermittent faults in programs at the instruction level. Validate the model using fault injections.

8 Motivation: Why Model Error Propagation? Fault injection experiments are prohibitively expensive. Intermittent faults vary in location and duration. More than two orders of magnitude slower than modeling. Modeling error propagation provides more insights that may help in tolerating faults.

9 Approach Fault Model SimpleScalar simulator Actual Results Fault Injections

10 Approach Fault Model SimpleScalar simulator Actual Results Fault Injections

11 Fault Model Assumption: An intermittent fault affects two or more consecutive instructions in the program. This fault model captures many faults including those in: ALU Load/Store Unit We specify an intermittent fault by: Fault start Fault length

12 Experimental Setup (1) Computation of intermittent fault propagation Intermittent fault injections in instruction-level simulator (SimpleScalar). Evaluate the Siemens benchmark suite.

13 Major Findings Failure categories: Of the intermittent faults that are non-benign, 95% result in a program crash. 91 to 95% of the faults cause program to crash within 300 instructions of the fault’s start. SimpleScalar simulator took 6 to 70 hours per program.

14 Approach Crash Model Dynamic Dependency Graph Fault Model Expected Results Fault Injection Actual Results

15 Approach Memory address Branch/jump address Function call address Crash Model Dynamic Dependency Graph Fault Model Expected Results Actual Results Fault Injection

16 Approach A directed acyclic graph that models the dynamic dependencies between instructions [Agrawal90]....... # ### Crash Model Dynamic Dependency Graph Fault Model Expected Results Actual Results Fault Injection

17 DDG Metrics Crash Distance (CD): number of instructions that execute from the time an intermittent fault occurs until the program crashes (due to fault). Intermittent Propagation Set (IPS): set of program values to which an intermittent fault propagates.

18 Example Code FragmentNode mov R1, #51 mov R2, #62 mov R3, #73 ld R4, R1, Array_Addr4 ld R5, R2, Array_Addr5 ld R6, R3, Array_Addr6 mult R7, R5, R47 Intermittent Error 1 2 5 7 Array_Addr #5#5 #6#6 3 6 #7#7...... Intermittent Propagation Set (1,2) = {?} Crash Distance (1, 2) = ? 4 Crash Node

19 Example Code FragmentNode mov R1, #51 mov R2, #62 mov R3, #73 ld R4, R1, Array_Addr4 ld R5, R2, Array_Addr5 ld R6, R3, Array_Addr6 mult R7, R5, R47 1 2 5 7 Array_Addr #5#5 #6#6 3 6 #7#7...... Intermittent Propagation Set (1,2) = {1, 2, 4} 4 Intermittent Error Crash Node 4

20 Example Code FragmentNode mov R1, #51 mov R2, #62 mov R3, #73 ld R4, R1, Array_Addr4 ld R5, R2, Array_Addr5 ld R6, R3, Array_Addr6 mult R7, R5, R47 Crash Distance (1, 2) = 4 4 Intermittent Error Crash Node 1 2 5 7 Array_Addr #5#5 #6#6 3 6 #7#7...... 4

21 Experimental Setup (2) Evaluating the model’s accuracy Construct the DDG of each program. Measure the difference between the predicted and the actual CD and IPS for crashes.

22 Performance Comparison Crash Model Dynamic Dependency Graph Fault Model Expected IPS and CD SimpleScalar simulator Actual IPS and CD 4 to 89 sec 6 to 70 hrs

23 75 to 82% of the predicted crash distances are within 10 nodes, 89 to 93% are within 100 nodes from actual crash distances. DDG Model Vs. SimpleScalar (CD)

24 DDG Model Vs. SimpleScalar (IPS) 88 to 94% of the difference sets’ cardinalities are within 10 nodes and that 93 to 97% are within 100 nodes.

25 Contributions Understand intermittent fault propagation through fault injection experiments in programs. Model the propagation of intermittent faults in programs at the instruction-level. Validate the model using fault injections.

26 Conclusions We enhanced Dynamic Dependency Graph to accurately model intermittent fault propagation in programs. Of the intermittent faults that are non-benign, 95% result in a program crash. 91 to 95% of the faults cause program to crash within 300 instructions of the fault’s start.

27 Discussion Detecting intermittent faults that affect programs using software-based techniques can be efficient. Diagnosing intermittent faults using software- based techniques can be accurate. Recovery using check-pointing techniques on the order of thousands of instructions will be effective.

28 T HE E ND

29 B ACKUP S LIDES

30 Effect of Intermittent Fault Length The average actual CD is 7 to 8 nodes. Hence, most crashes occur within 8 nodes from the start of the intermittent fault. 30


Download ppt "Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan D EPARTMENT OF E LECTRICAL AND C OMPUTER E NGINEERING T HE U NIVERSITY OF B RITISH C OLUMBIA."

Similar presentations


Ads by Google