1 Multi-Level Error Detection Scheme based on Conditional DIVA-Style Verification Kevin Lacker and Huifang Qin CS252 Project Presentation 12/10/2003.

1 Multi-Level Error Detection Scheme based on Conditional DIVA-Style Verification Kevin Lacker and Huifang Qin CS252 Project Presentation 12/10/2003

2 Outline Motivation Verification system architecture Error statistics Performance results Conclusion and future ideas

3 DIVA and Others — Existing Processor Dynamic Verification Schemes ApproachesAlgorithm UsedWeakness Watchdog μP Control flow AnalysisSignature comparison, Frame construction Complexity in algorithm development; Sometimes require pre-compiling; Effective for special purpose programs; Data reasonableness checkGaussian elimination Memory access validationCapability based addressing SW multithreaded execution Dynamically scheduled execution of program copies Complexity in designing dynamic scheduling; Can’t detect permanent faults. Dynamic Implementation Verification Architecture (DIVA) T. M. Austin, “DIVA: a reliable substrate for deep submicron microarchitecture design,” ACM/IEEE international symposium on microarchitecture, 1999 Simple scheme High error coverage Exploits the abundant computation power modern technology provides.

4 State mismatch Time out Sanjay J. Patel, “Assertion/recovery: a microarchitecture for error-tolerant computing systems,” C2S2 workshop, 2003 What’s the Price DIVA is Paying? High degree of redundancy in re-executing every single instruction –Performance overhead with limited ROB size and Checker speed –High Power Consumption Especially inefficient for small error rate in most situations –How often does error hits a running processor? For example, cosmic ray causes 4000FIT (failures in 10 9 hours) for modern processor with on-chip caches, I.e. 1 soft error every 28.5 year… –When error happens, how does it affect the execution correctness? Can we reduce DIVA activity efficiently to save the costs? Error not manifested Speculation, ineffectual computation, uninvolved logic, stall cycles and dead values all help mask errors.

5 Conditional DIVA-Style Verification: Go Faster with Less Power Core μP Indicating a possible error? DIVA Checker Instruction Commit N Y Enhanced DIVA Checker Level 1 Error DetectionLevel 2 Error Detection Error recovery control Questions: What are the effective error indicators? What’s the optimum point of design tradeoff? Idea: DIVA only checks when possible error indicated Core processor runs faster with less interference from DIVA checker. DIVA Checker burns less power with reduced work load. The advantages of DIVA scheme is inherited, such as simplicity and high error coverage. Penalty: Error coverage will not be 100% due to the error indicator miss.

6 Conditional DIVA Scheme — System Implementation Core μPDIVA Checker Instruction Commit Error recovery control ROB Possible Error Marker 1 1 0 Rules: DIVA checker only checks instructions marked as possible victims of error. In case of ROB congestion, oldest finished instructions are directly retired — no performance hits, otherwise marked instructions will be checked. Error recovery model: Flush core processor when error is found by DIVA checker. Fatal error recovery scheme by re-executing 10 instructions before the crash point.

7 Conditional DIVA Scheme — Design for Optimum Tradeoffs Design tradeoffs Error coverage Hardware/power costs Effectiveness of error indicators (Level 1) Performance overhead ROB overflow handling scheme DIVA checker latency (Level 2) Goal of this design: Find the most effective error indicators to maximize error coverage and minimize costs. We assume performance-favorable ROB overflow handling scheme. The DIVA checker never interferes core processor execution. Therefore the performance overhead is minimized. DIVA checker latency will be chosen to balance the error coverage and hardware/power costs.

8 Simulation Setup & Error Model SimpleScalar/PISA 3.0 tool set, instruction bandwidth 4 SPEC2000 benchmarks –(gzip, vpr, gcc, mcf, parser, vortex, bzip2, twolf, mesa, art, equake) All hardware and transient errors covered by DIVA: Sources of Real ErrorsTested?Error Injection Model Crosstalk: electrical disturbances in logic values held in circuits and wires YesRandom bit flips in register files storing the ALU calculation results. Radiation inference: gamma rays and alpha particlesYesRandom bit flips in register files Transmission errors during communication between two levels of memory or between cache and processor YesRandom bit flips in register files storing the memory access results. Circuit flaws caused by process defects and variations in deep sub-micron technologies No N/A Computer architecture bugs based by increased design complexity No N/A

9 Error Statistics — Effective Indicators of Possible Error Data prediction is effective –High correct rate in predicting a correct instruction. Very low miss rate. –Multiple data predictors can be combined. If any of them is correct then mark the instruction as non-error. Good data predictors –Constant stride (s) 2,4,6,8… –Repeat patten (r n ) 6,4,6,4… –Incremental (p n ) 1,2,3,4… Combination of s and r1- r4 is selected.

10 To Improve Error Coverage — Queueing Theory Assume checker queue before modifications has utilization u>1 If our only tool is dropping instructions indiscriminately –Dropping each instruction with probability d leads to u’ = (1-d)u –To prevent overflow, u’ 1 – 1/u –We miss errors at a rate of at least 1 – 1/u If we have an indicator which marks non-errors at a rate of a, and marks errors at a rate of e –Drop all marked instructions and drop the unmarked ones at a rate of d –u’ = (1-a)(1-d)u –To prevent overflow, u’ 1 – 1/((1-a)u) –We miss errors at a rate of at least 1 – (1/u)(1-e)/(1-a) Conclusion: If a > e, conditional verification is a net gain

11 Error Coverage and Check Activity Reduction — Performance with Bandwidth-1 Checker

12 Check Bandwidth higher than 2 only provides limited gain in error coverage. At bandwidth of 2 up to 25% of DIVA activity can be saved, with only 0.8% max. miss error coverage. Error Coverage and Check Activity Reduction — Performance with Bandwidth-2 Checker

13 Conclusions The proposed scheme achieves –zero performance overhead in core processor execution –For bandwidth 2, average checker workload diminished 10% –For bandwidth 1, average checker workload diminished 45% The penalty –For bandwidth 2, average error coverage is 99.2% –For bandwidth 1, average error coverage is 83.9% Not perfect, but factor of 6 improvement in mean time between uncaught error Future ideas –Program-specific indicators –Correlate error properties with program correctness

1 Multi-Level Error Detection Scheme based on Conditional DIVA-Style Verification Kevin Lacker and Huifang Qin CS252 Project Presentation 12/10/2003.

Similar presentations

Presentation on theme: "1 Multi-Level Error Detection Scheme based on Conditional DIVA-Style Verification Kevin Lacker and Huifang Qin CS252 Project Presentation 12/10/2003."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Multi-Level Error Detection Scheme based on Conditional DIVA-Style Verification Kevin Lacker and Huifang Qin CS252 Project Presentation 12/10/2003.

Similar presentations

Presentation on theme: "1 Multi-Level Error Detection Scheme based on Conditional DIVA-Style Verification Kevin Lacker and Huifang Qin CS252 Project Presentation 12/10/2003."— Presentation transcript:

Similar presentations

About project

Feedback