Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of Michigan Electrical Engineering and Computer Science 1 A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded.

Similar presentations


Presentation on theme: "University of Michigan Electrical Engineering and Computer Science 1 A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded."— Presentation transcript:

1 University of Michigan Electrical Engineering and Computer Science 1 A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded Microprocessor Jason Blome, Scott Mahlke, Daryl Bradley*, Krisztián Flautner* Advanced Computer Architecture Lab, University of Michigan *ARM Ltd.

2 University of Michigan Electrical Engineering and Computer Science 2 Embedded Everywhere Patterson and Hennessy 2005 Not just cellphones Safety critical applications: ► Automotive ► Healthcare

3 University of Michigan Electrical Engineering and Computer Science 3 Embedded Domain Constraints Power efficient performance ► Longer clock cycle times ► Increased logic depth between stages ► Higher area ratio of combinational logic to state elements Less speculative state ► Potentially less masking Limited real estate All of these high level constraints affect the behavior of faults and the potential of fault tolerance techniques

4 University of Michigan Electrical Engineering and Computer Science 4 Objectives Understand the effects of transient faults on a typical embedded design ► Architectural contributions to soft error effects ► Production-grade core Reference synthesis flow Design for test methodologies Simulate faults in both combinational and sequential logic

5 University of Michigan Electrical Engineering and Computer Science 5 Soft Error Rate Contributions Shivakumar 2002 Soft Error Rate Contributions Mitra 2005 Increasing contribution of faults in combinational logic to the overall soft error rate

6 University of Michigan Electrical Engineering and Computer Science 6 Processor Model Register Bank Register Bank Data Interface Instruction Address Logic Instruction Address Logic Data Address Logic Data Address Logic Multiply ALU Shift Instruction Decode ARM926EJ-S Instruction Fetch Data cache Data cache MMU Instruction cache Instruction cache MMU Bus Interface Write Buffer/ Bus Interface Mux Array Mux Array ARM926EJ-S Cell library characterized for 130 nm 5 ns clock cycle time

7 University of Michigan Electrical Engineering and Computer Science 7 Analysis Infrastructure testbench reference design test design report generation benchmark fault injection/error analysis framework error checking and logging fault injection scheduler

8 University of Michigan Electrical Engineering and Computer Science 8 Fault Masking Logical: faulted value does not affect logical operation of the circuit 0 0 Latching-Window: the fault pulse does not reach a state element within the latching window Electrical: the fault pulse is electrically attenuated by subsequent gates in the circuit Architectural/Software: incorrect state is written before it is read CLK t setup t hold

9 University of Michigan Electrical Engineering and Computer Science 9 Observed Error Rates Error SiteError RateMasking Rate Microarchitectural State 94%6% Architectural State7%93% Top-level Ports4%96% Error SiteError RateMasking Rate Microarchitectural State 16%84% Architectural State4%96% Top-level Ports3%97% Faults Occurring in Registers Faults Occurring in Combinational Logic At the software interface, error rates within 3% 94% 16% 7% 4%

10 University of Michigan Electrical Engineering and Computer Science 10 Observed Error Rates CycleAverage Bit Errors 11.26 23.19 33.06 45.52 Faults Occurring in Registers Faults Occurring in Combinational Logic CycleAverage Bit Errors 141.49 245.33 347.76 449.54 Faults in combinational logic have a much more dramatic effect on system state

11 University of Michigan Electrical Engineering and Computer Science 11 Architectural Errors per Cycle Faults Occurring in Registers Faults Occurring in Combinational Logic

12 University of Michigan Electrical Engineering and Computer Science 12 Architectural Corruption Characteristics Bits per Architectural Register Corrupted Number of Architectural Registers Corrupted

13 University of Michigan Electrical Engineering and Computer Science 13 Results Summary Faults occurring in logic: ► Will likely be much more frequent in embedded design ► Tend to have a more dramatic effect on system state ► Multi-bit/multi-register architectural errors common Design for test methodologies can greatly impact soft error characteristics Error rates at the software interface consistent with those observed in high-performance microprocessors

14 University of Michigan Electrical Engineering and Computer Science 14 Traditional Error Detection/Protection Reliable Encoding ► ECC/Parity Limited use for faults in logic Unclear where/how much to protect Redundant Computation ► In space Area/energy overhead ► In time Energy overhead Requires performance slack

15 University of Michigan Electrical Engineering and Computer Science 15 Case Study I Register Bank Register Bank Data Interface Instruction Address Logic Instruction Address Logic Data Address Logic Data Address Logic Multiply ALU Shift Instruction Decode Instruction Fetch Data cache Data cache MMU Instruction cache Instruction cache MMU Bus Interface Write Buffer/ Bus Interface Mux Array Mux Array IRoute Cycle 1: 51 Errors instr_reg_ID[0, 16, 22, 31] ID_decode_info[0, 16, 31] stored_instr[29, 30] Cycle 2: 51 Errors instr_reg_EX[0, 16, 22, 31] EX_decode_info[0, 16, 31] Cycle 3: 17 Errors ALU_out[0, 1, 2, 3, 4, 5, 6] Cycle 4: 18 Errors ALU_result_wb[0,1,2,3,4,5,6] Cycle 5: 29 Errors Reg0_reg[0, 1, 2, 3, 4, 5, 6]

16 University of Michigan Electrical Engineering and Computer Science 16 Case Study II Register Bank Register Bank Data Interface Instruction Address Logic Instruction Address Logic Data Address Logic Data Address Logic Multiply ALU Shift Instruction Decode Instruction Fetch Data cache Data cache MMU Instruction cache Instruction cache MMU Bus Interface Write Buffer/ Bus Interface Mux Array Mux Array IPipe Cycle 1: 9 Errors instr_reg_ID[3,12,17, 18,24,26,29,30,31] Cycle 4: 183 Errors writeback and forwarding state register bank Cycle 2: 62 Errors instr_reg_EX shifter_data_opEx_reg Shifter_data_reg alu_cc_reg Cycle 3: 49 Errors Shifter_data_EX alu_out_reg

17 University of Michigan Electrical Engineering and Computer Science 17 Fault Characteristics Case Study I: uCORE.uIRoute.U600 ► First cycle error sites: 51 errors uIRoute.INSTRHeld_reg[0] uIRoute.INSTRHeld_reg[16] uIRoute.INSTRHeld_reg[22] uIRoute.INSTRHeld_reg[31] u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[0] u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[16] u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[31] u9EJ.uARM9.uCORECTL.uIPIPE.StoredInstrInt_reg[29] u9EJ.uARM9.uCORECTL.uIPIPE.StoredInstrInt_reg[30] Case Study II: uCORE.u9EJ.uARM9.uCORECTL.uIPIPE.U3626 ► First cycle error sites: 9 errors u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[3] u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[12] u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[17] u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[18] u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[24] u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[26] u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[29] u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[30] u9EJ.uARM9.uCORECTL.uIPIPE.IDarmDeint_reg[31]

18 University of Michigan Electrical Engineering and Computer Science 18 Embedded Design Space Potential Leverage significant signal fanout Determine that a fault has occurred during the cycle that it occurs ► Transition detection circuits Selectively deploy fault detection units ► Intersection of high fanout fault targets ► No roll-back necessary – simply flush the pipeline ► Low cost/area overhead critical for embedded designs

19 University of Michigan Electrical Engineering and Computer Science 19 Conclusion Design domain critical: ► Affects fault behavior ► Limits applicable tolerance techiques Key observations: ► Faults in combinational logic much more likely in embedded designs ► Faults in combinational logic behave dramatically different than those in state elements ► Fault fanout offers potential for low overhead detection

20 University of Michigan Electrical Engineering and Computer Science 20 Soft Error Terminology transient faultsoft error transistor

21 University of Michigan Electrical Engineering and Computer Science 21 Dependence on Fault Duration

22 University of Michigan Electrical Engineering and Computer Science 22 Pulse Detection D CLK Q ~Q error flip-flop shadow latch

23 University of Michigan Electrical Engineering and Computer Science 23 Microarchitectural Errors per Cycle Faults Occurring in Registers Faults Occurring in Combinational Logic Multi-bit errors common for Faults in combinational logic


Download ppt "University of Michigan Electrical Engineering and Computer Science 1 A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded."

Similar presentations


Ads by Google