Presentation is loading. Please wait.

Presentation is loading. Please wait.

Departments of Electrical Eng. & Computer Sc.

Similar presentations


Presentation on theme: "Departments of Electrical Eng. & Computer Sc."— Presentation transcript:

1 Departments of Electrical Eng. & Computer Sc.
IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization Sung-Boem Park Subhasish Mitra Robust Systems Group Departments of Electrical Eng. & Computer Sc. Stanford University 1

2 Key Message Post-silicon bug localization – Major bottleneck
Pinpoint from system failure Bug location, exposing stimulus Existing schemes – Expensive & not scalable IFRA – New technique for processors Eliminates limitations of existing techniques 96% accuracy 1% area, ~0% performance impact 2

3 Outline Motivation IFRA Overview Simulation Results Conclusion

4 Microprocessor Development Flow
Post-Silicon Validation Costs: 35% of Development Time 25% of Design Resources Design Pre-Silicon Pre-Silicon Verification POST-SILICON VALIDATION Post-Silicon Manufacturing Test “Post-silicon cost & complexity is rising faster than design cost” S. Yerramilli, VP, Intel, ITC06 Invited Address

5 Post-Silicon Validation Steps
Detect – Run test content in system e.g., OS, games, functional tests Localize – Pinpoint from system failure (e.g., crash) Bug location – e.g., ALU, decoder, scheduler Exposing stimulus – e.g., instruction sequence Dominates cost [Josephson DAC06] Root cause & Fix Optical probing, patch / circuit edit / respin

6 Post-Silicon Bug Types [Josephson DAC06]
Functional bugs – Incorrect logic implementation e.g., design errors Short localization time – e.g., hours to days Electrical bugs / circuit marginalities e.g., speed-path, noise, races, hold time Some voltage / temp / frequency corners LONG localization time – e.g., days to weeks Our focus 6

7 Existing Post-Silicon Bug Localization Flows
Reproduce failure on tester 2 days Localize on tester 3 days Not always Possible Tester-based Detect in system Detect in system System-based Localize failure in system 1 to 4 weeks Major Problems Failure Reproduction System-level simulation

8 IFRA vs. Existing Techniques
Trace buffers Clock manipulation Checkpoint + replay Scan techniques IFRA Intrusive? ?  Yes  No Failure reproduction? System-level simulation? Area impact?  Yes  1% 8

9 Instruction Footprint Recording & Analysis
Design Phase Insert recorders inside chip design Non-intrusive No failure reproduction Single test run sufficient Record special info. in recorders / Run tests No Failure detected? Post-Si Validation Yes Scan out recorder contents No system simulation Self-consistency against test program binary Post-analyze offline Localized Bug: (location, stimulus)

10 Outline Motivation IFRA Overview Hardware Support
Automated Post-Analysis Techniques Simulation Results Conclusion

11 IFRA Hardware in Superscalar Processor
Branch Predictor I-Cache I-TLB Fetch Queue Pipeline Registers Decoders Reg Rename Phys Regfile Instruction Window 2xBr 2xALU MUL 2xLSU D-Cache D-TLB FPU Reorder Buffer Reg Map Reg Free FETCH Part of scan chain Post-Trigger Generator Recorders ID assignment Slow wire No at-speed routing Scan chain Alpha 21264 DECODE DISPATCH ISSUE EXECUTE COMMIT

12 Recording Operation Example
Special ID assignment rule Branch Predictor I-TLB I-Cache FETCH Fetch Queue ID Assignment INST2 Auxiliary Info: PC2 ID2 INST1 ID1 Auxiliary Info: PC1 Instruction Footprints Recorder 1 Pipeline Reg INST2 ID2 ID1 INST1 Auxiliary Info: PC2 ID2 ID1 Auxiliary Info: PC1 Decoder DECODE INST2 ID2 Auxiliary Info: Decoded bits2 ID1 INST1 Auxiliary Info: Decoded bits1 Recorder 2 Pipeline Reg ID1 INST1 ID2 INST2 ID2 Auxiliary Info: Decoded bits2 ID1 Auxiliary Info: Decoded bits1

13 Special Rule for Instruction ID Assignment
Simplistic ID assignment inadequate Speculation + flushes, out-of-order execution PC does not work for loops Special ID assignment rule – formal proof in paper ID width: log24n bits n = max. instructions in flight e.g., 8 bits for Alpha-like processor (n=64) No timestamp or global synchronization required 13

14 Instruction Footprint Recorder Design
Instruction ID + Auxiliary info. Dominated by memory Simple control logic Idle cycle compaction Circular buffer control Serialization Stop / Start recording No high-speed global routing Contents scanned out after failure detection Post-trigger signal Circular Buffer Control Logic To slow scan chain 14

15 What to Record? Total required storage for all recorders: 60 KBytes
Pipeline stage Auxiliary information Bits per recorder Number of recorders Fetch PC 32 4 Decode Decoding results Dispatch 2-bit residue of reg. name 6 Issue 3-bit residue of operands Execution (ALU, MUL) 3-bit residue of result 3 (Branch) None 2 (Load/Store unit) 32-bit memory address 35 Commit Exceptions ~0 Total required storage for all recorders: 60 KBytes

16 Post-Trigger Generation
Failure after 2 billion cycles (e.g., crash) Error after a billion cycles (e.g., speedpath) Too much storage overhead to store 1 billion cycles Code Execution time t=0

17 Post-Trigger Generation
Failure after 2 billion cycles (e.g., crash) Error after a billion cycles (e.g., speedpath) Need to capture in recorder storage Early failure detection necessary Code Execution time t=0 Early failure detection techniques (post-triggers) Classical error detection – residue, parity Deadlock & segfault detection Special early warnings to pause recording Details in paper

18 IFRA Area Impact 1% chip-level area impact
Synopsys Design Compiler synthesis Alpha like processor: 2MB L2 cache TSMC 130nm technology No global at-speed routing Area dominated by circular buffers in recorders Total recorder storage: 60 KBytes

19 Outline Motivation IFRA Overview Hardware Support
Post-Analysis Techniques Simulation Results Conclusion

20 Post-Analysis Overview
Test program binary Footprints from recorders Link footprints (Not covered today – Details in paper) Control-flow analysis Data-dependency analysis Decoding analysis Load/Store analysis Run high-level analysis Run low-level analysis Residue consistency check List of bug location-stimulus pairs

21 Linking Footprints from Recorder Contents
Test program binary Fetch-stage recorder Commit-stage recorder Execution-stage recorder PC6 PC5 PC4 PC3 PC2 PC0 INST6 INST5 INST4 INST3 INST2 INST0 ID: 7 ID: 6 ID: 5 ID: 4 AUX7 AUX6 AUX5 AUX4 AUX3 AUX2 AUX1 PC4 PC3 PC2 PC1 AUX17 AUX16 AUX15 AUX14 AUX12 AUX11 ID: PC5 PC7 INST7 ID: AUX8 ID: AUX18 time ID: PC4 ID: AUX13 PC1 INST1 ID: PC0 ID: AUX0 ID: AUX10 Special ID assignment rule ensures: Uncommitted instructions uniquely identified Relative orders of identical IDs maintained Even under flushes & out-of-order execution

22 Bug locations + exposing stimulus
Debug Example Link footprints ? ? High-level analysis ? ? Low-level analysis ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Bug locations + exposing stimulus

23 Debug Example – Decision 1
Test Program Binary Fetch-stage recorder R0  R1 + R2 R0  R3 + R6 R5  R0 + R6 Serial execution trace

24 Debug Example – Question 1
Residue of values mismatch? R0=3 Issue-stage recorder R0=5 Execute-stage R0  R1 + R2 R0  R3 + R6 Producer of R0 RAW hazard R5  R0 + R6 Consumer of R0 Serial execution trace

25 Debug Example – Question 2
Residue of phys. reg. names mismatch? R0=P5 Dispatch-stage recorder R0=P2 R0  R1 + R2 R0  R3 + R6 Producer of R0 RAW hazard R5  R0 + R6 Consumer of R0 Serial execution trace

26 Debug Example – Question 3
Residue of phys. reg. name match with previous producer? R0=P5 Dispatch-stage recorder R0  R1 + R2 Previous producer R0=P5 R0  R3 + R6 Producer of R0 RAW hazard R5  R0 + R6 Consumer of R0 Serial execution trace

27 Rest of modules in dispatch stage
Debug Example – Result Pipeline Register Bug Location R0  R1 + R2 R0  R3 + R6 R5  R0 + R6 Decoder Stimulates Bug Arch. Dest. Reg Rest of pipeline reg. Read Circuit Write Circuit Rest of modules in dispatch stage Propagates to failure Reg. Mapping

28 Outline Motivation IFRA Overview Simulation Results Conclusion

29 Experimental Setup Simplescalar architectural simulator
Alpha configuration Augmented with ~1K error injection points Error model – single bit-flips Hard-to-repeat electrical bugs Both flip-flops & combinational logic Stimulus SpecInt 2000 benchmarks

30 Localization with candidates
Experimental Flow Short error latency? Yes Warm up for a million cycles Inject error Masked/ silent error No 100K simulation runs 800 post-analysis runs Any failure detected? Yes No Post-analyze Complete miss Localization with candidates Exact localization

31 Localization with avg. 6 candidates
IFRA Bug Localization Results Exact localization (78%) Correct localization (96%) Complete miss (4%) Localization with avg. 6 candidates (22%) Localization resolution Bug exposing stimulus One of 200 erroneous design blocks Avg. block size: 10K 2-input NAND gates

32 Outline Motivation IFRA Overview Simulation Results Conclusion

33 Conclusion IFRA Inexpensive 1% area, no expensive logic analyzers
No failure reproduction or system simulation Effective 96% accuracy Practical Alpha processor demonstration 33

34 Acknowledgement Bob Gottlieb, Intel Nagib Hakim, Intel
Ted Hong, Stanford University Doug Josephson, Intel Onur Mutlu, Microsoft Research Priyadarshan Patra, Intel Eric Rentschler, AMD Jason Stinson, Intel

35 Debug Example – Decision 4
Did they coexist in reorder buffer? R0  R1 + R2 R0  R3 + R6 Producer of R0 More than n instructions in between RAW hazard R5  R0 + R6 Consumer of R0 Serial execution trace

36 Debug Example – Low Level Analysis
R0  R1 + R2 R0  R3 + R6 R5  R0 + R6 R0 Pipeline Register Stimulus Decoder R0 Arch. Src. Reg R0 Arch. Dest. Reg R0 Rest R0 Bug Location P5 P2 Code Execution P5 Reg. Free List R0, R1, R2 R0, R3, R6 R5, R0, R6 5 Read Circuit Write Circuit Stimulus 2 R4 P2 5 R0 P5 Dispatch Stage Recorder (stores residue of phys.reg.) Reg. Mapping


Download ppt "Departments of Electrical Eng. & Computer Sc."

Similar presentations


Ads by Google