Presentation is loading. Please wait.

Presentation is loading. Please wait.

IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization Sung-Boem Park Subhasish Mitra Robust Systems Group Departments of Electrical.

Similar presentations


Presentation on theme: "IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization Sung-Boem Park Subhasish Mitra Robust Systems Group Departments of Electrical."— Presentation transcript:

1 IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization Sung-Boem Park Subhasish Mitra Robust Systems Group Departments of Electrical Eng. & Computer Sc. Stanford University 11

2 Key Message Post-silicon bug localization – Major bottleneck  Pinpoint from system failure  Bug location, exposing stimulus  Existing schemes – Expensive & not scalable IFRA – New technique for processors  Eliminates limitations of existing techniques  96% accuracy  1% area, ~0% performance impact 22

3 Outline Motivation IFRA Overview Simulation Results Conclusion 3

4 Microprocessor Development Flow 4 “Post-silicon cost & complexity is rising faster than design cost” S. Yerramilli, VP, Intel, ITC06 Invited Address Pre-Silicon Post-Silicon Pre-Silicon Verification Design Manufacturing Test POST-SILICON VALIDATION Post-Silicon Validation Costs: 35% of Development Time 25% of Design Resources

5 Detect – Run test content in system  e.g., OS, games, functional tests Localize – Pinpoint from system failure (e.g., crash)  Bug location – e.g., ALU, decoder, scheduler  Exposing stimulus – e.g., instruction sequence  Dominates cost [Josephson DAC06] Root cause & Fix  Optical probing, patch / circuit edit / respin 5 Post-Silicon Validation Steps

6 6 Post-Silicon Bug Types [Josephson DAC06] Functional bugs – Incorrect logic implementation  e.g., design errors  Short localization time – e.g., hours to days Electrical bugs / circuit marginalities  e.g., speed-path, noise, races, hold time  Some voltage / temp / frequency corners  LONG localization time – e.g., days to weeks  Our focus 6

7 Reproduce failure on tester 2 days Localize on tester 3 days Not always Possible Tester-based Detect in system Existing Post-Silicon Bug Localization Flows 7 Detect in system System-based Localize failure in system 1 to 4 weeks Major Problems Failure Reproduction System-level simulation

8 8 IFRA vs. Existing Techniques 8 Techniques Trace buffers Clock manipulation Checkpoint + replay Scan techniques IFRA Intrusive? ??  Yes No Failure reproduction?  Yes No System-level simulation?  Yes No Area impact?  Yes No  1%

9 Instruction Footprint Recording & Analysis Insert recorders inside chip design Design Phase Record special info. in recorders / Run tests Scan out recorder contents Post-analyze offline Localized Bug: (location, stimulus) Failure detected? Yes No Post-Si Validation 9 No system simulation Self-consistency against test program binary Non-intrusive No failure reproduction Single test run sufficient

10 Outline Motivation IFRA Overview  Hardware Support  Automated Post-Analysis Techniques Simulation Results Conclusion 10

11 IFRA Hardware in Superscalar Processor 11 FETCH DECODE ISSUE EXECUTE COMMIT Branch PredictorI-CacheI-TLB Fetch Queue Pipeline Registers Decoders Pipeline Registers Reg Rename Phys Regfile Pipeline Registers Instruction Window Pipeline Registers 2xBr 2xALU MUL 2xLSU D-Cache D-TLBFPU Pipeline Registers Reorder BufferReg Map Pipeline Registers Reg MapReg Free DISPATCH Alpha Part of scan chain Post-Trigger Generator Recorders ID assignment Slow wire No at-speed routing Scan chain

12 INST1 ID1Auxiliary Info: PC1INST2Auxiliary Info: PC2ID2 Pipeline Reg ID1INST1ID1INST1Auxiliary Info: Decoded bits1ID1INST1ID2Auxiliary Info: Decoded bits2ID2INST2 ID2 Auxiliary Info: Decoded bits2INST2 ID2Auxiliary Info: PC2 ID2 Recording Operation Example 12 FETCH DECODE ID Assignment Branch PredictorI-Cache I-TLB Fetch Queue Decoder ID1Auxiliary Info: PC1ID1Auxiliary Info: Decoded bits1 Recorder 1 Recorder 2 Instruction Footprints Special ID assignment rule

13 13 Special Rule for Instruction ID Assignment Simplistic ID assignment inadequate  Speculation + flushes, out-of-order execution  PC does not work for loops Special ID assignment rule – formal proof in paper  ID width: log 2 4n bits  n = max. instructions in flight  e.g., 8 bits for Alpha-like processor (n=64) No timestamp or global synchronization required 13

14 Dominated by memory Simple control logic  Idle cycle compaction  Circular buffer control  Serialization  Stop / Start recording No high-speed global routing  Contents scanned out after failure detection Instruction Footprint Recorder Design 14 Circular Buffer Control Logic Post-trigger signal Instruction ID + Auxiliary info. To slow scan chain 14

15 What to Record? Pipeline stageAuxiliary informationBits per recorder Number of recorders FetchPC324 DecodeDecoding results44 Dispatch2-bit residue of reg. name64 Issue3-bit residue of operands64 Execution (ALU, MUL) 3-bit residue of result34 Execution (Branch) None02 Execution (Load/Store unit) 3-bit residue of result 32-bit memory address 352 CommitExceptions~04 15 Total required storage for all recorders: 60 KBytes

16 Post-Trigger Generation 16 time Failure after 2 billion cycles (e.g., crash) Error after a billion cycles (e.g., speedpath) t=0 Code Execution Too much storage overhead to store 1 billion cycles

17 Post-Trigger Generation 17 time Early failure detection techniques (post-triggers)  Classical error detection – residue, parity  Deadlock & segfault detection  Special early warnings to pause recording  Details in paper Failure after 2 billion cycles (e.g., crash) Error after a billion cycles (e.g., speedpath) t=0 Code Execution Need to capture in recorder storage Early failure detection necessary

18 18 IFRA Area Impact 1% chip-level area impact  Synopsys Design Compiler synthesis  Alpha like processor: 2MB L2 cache  TSMC 130nm technology  No global at-speed routing  Area dominated by circular buffers in recorders  Total recorder storage: 60 KBytes

19 Outline Motivation IFRA Overview  Hardware Support  Post-Analysis Techniques Simulation Results Conclusion 19

20 20 Post-Analysis Overview Link footprints Test program binary Footprints from recorders Run high-level analysis Run low-level analysis List of bug location-stimulus pairs Control-flow analysis Data-dependency analysis Decoding analysis Load/Store analysis Residue consistency check (Not covered today – Details in paper)

21 21 Linking Footprints from Recorder Contents Commit-stage recorder Fetch-stage recorder Execution-stage recorder Test program binary INST6 INST5 INST4 INST3 INST2 INST0 ID: 7 ID: 6 ID: 5 ID: 4 ID: 7 ID: 6 ID: 5 AUX7 AUX6 AUX5 AUX4 AUX3 AUX2 AUX1 PC4 PC3 PC2 PC1 PC3 PC2 PC1 ID: 6 ID: 5 ID: 4 ID: 7 ID: 6 ID: 5 AUX17 AUX16 AUX15 AUX14 AUX12 AUX11 ID: 7 ID: 6 ID: 5 ID: 4 ID: 7 ID: 6 ID: 5 PC6 PC5 PC4 PC3 PC2 PC0 … … … … ID: 0 AUX13 ID: 0 AUX0 ID: 0 AUX8 ID: 0 PC0 ID: 0 PC5 PC1 INST1 PC7 INST7 time ID: 0 AUX10 Special ID assignment rule ensures:  Uncommitted instructions uniquely identified  Relative orders of identical IDs maintained  Even under flushes & out-of-order execution ID: 0 AUX18 ………… ID: 0 PC4

22 22 Debug Example Link footprints Bug locations + exposing stimulus ? ? ? ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Low-level analysis High-level analysis

23 23 Debug Example – Decision 1 R0  R3 + R6 R5  R0 + R6 … … R0  R1 + R2 Test Program Binary Fetch-stage recorder Serial execution trace

24 24 Debug Example – Question 1 R0  R3 + R6 R5  R0 + R6 … … RAW hazard R0  R1 + R2 R0=3 Issue-stage recorder R0=5 Execute-stage recorder Residue of values mismatch? Serial execution trace Producer of R0 Consumer of R0

25 25 Debug Example – Question 2 R0  R3 + R6 R5  R0 + R6 … … RAW hazard R0  R1 + R2 Residue of phys. reg. names mismatch? R0=P5 Dispatch-stage recorder R0=P2 Serial execution trace Producer of R0 Consumer of R0

26 26 Debug Example – Question 3 R0  R3 + R6 R5  R0 + R6 … … RAW hazard R0  R1 + R2 Serial execution trace Producer of R0 Consumer of R0 Residue of phys. reg. name match with previous producer? R0=P5 Dispatch-stage recorder R0=P5 Previous producer

27 27 Debug Example – Result Arch. Dest. Reg Pipeline Register Decoder Read Circuit Write Circuit Reg. Mapping Rest of pipeline reg. R0  R1 + R2 R0  R3 + R6 R5  R0 + R6 Stimulates Bug Bug Location Rest of modules in dispatch stage … … … Propagates to failure

28 Outline Motivation IFRA Overview Simulation Results Conclusion 28

29 29 Experimental Setup Simplescalar architectural simulator  Alpha configuration  Augmented with ~1K error injection points Error model – single bit-flips  Hard-to-repeat electrical bugs  Both flip-flops & combinational logic Stimulus  SpecInt 2000 benchmarks

30 Experimental Flow 30 Any failure detected? Yes No Short error latency? Yes Warm up for a million cycles Inject error Masked/ silent error No 100K simulation runs 800 post-analysis runs Post-analyze Complete miss Localization with candidates Exact localization 

31 IFRA Bug Localization Results 31 Localization resolution  Bug exposing stimulus  One of 200 erroneous design blocks  Avg. block size: 10K 2-input NAND gates Correct localization (96%) Complete miss (4%) Exact localization (78%) Localization with avg. 6 candidates (22%)  

32 Outline Motivation IFRA Overview Simulation Results Conclusion 32

33 Conclusion IFRA  Inexpensive  1% area, no expensive logic analyzers  No failure reproduction or system simulation  Effective  96% accuracy  Practical  Alpha processor demonstration 33

34 Acknowledgement Bob Gottlieb, Intel Nagib Hakim, Intel Ted Hong, Stanford University Doug Josephson, Intel Onur Mutlu, Microsoft Research Priyadarshan Patra, Intel Eric Rentschler, AMD Jason Stinson, Intel 34

35 35 Debug Example – Decision 4 R0  R3 + R6 R5  R0 + R6 … … RAW hazard R0  R1 + R2 Serial execution trace Producer of R0 Consumer of R0 Did they coexist in reorder buffer? More than n instructions in between

36 R0, R1, R2 R0, R3, R6 R5, R0, R6 36 Debug Example – Low Level Analysis Arch. Dest. Reg Reg. Free List Pipeline Register Decoder Read Circuit Write Circuit Reg. Mapping Arch. Src. RegRest P5 R0 P5P2 R4 P2 R0 P5 Dispatch Stage Recorder (stores residue of phys.reg.) Code Execution … … 525 R0  R1 + R2 R0  R3 + R6 R5  R0 + R6 R0 Stimulus Bug Location Stimulus


Download ppt "IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization Sung-Boem Park Subhasish Mitra Robust Systems Group Departments of Electrical."

Similar presentations


Ads by Google