Presentation is loading. Please wait.

Presentation is loading. Please wait.

Preeti Ranjan Panda, Anant Vishnoi, and M. Balakrishnan Proceedings of the IEEE 18th VLSI System on Chip Conference (VLSI-SoC 2010) Sept. 2010 Presenter:

Similar presentations


Presentation on theme: "Preeti Ranjan Panda, Anant Vishnoi, and M. Balakrishnan Proceedings of the IEEE 18th VLSI System on Chip Conference (VLSI-SoC 2010) Sept. 2010 Presenter:"— Presentation transcript:

1 Preeti Ranjan Panda, Anant Vishnoi, and M. Balakrishnan Proceedings of the IEEE 18th VLSI System on Chip Conference (VLSI-SoC 2010) Sept. 2010 Presenter: Chun-Hung Lai 2016/5/31

2 During post-silicon validation/debug of processors, it is common to alternate between two phases: processor execution and state dump. The state dump, where the entire processor state is dumped off-chip to a logic analyzer for further processing, is a major bottleneck. We present a technique for improving debug efficiency by reducing the volume of cache data dumped off- chip, while still capturing the complete state. The reduction is achieved by introducing hardware mechanisms to transmit only the portion of the cache that was updated since the last dump. We propose two design alternatives based on whether or not the processor is permitted to continue execution during the dump: Blocking Incremental Cache Dumping (BICD) and Non-blocking Increment Cache Dumping (NICD). We observe a 64% reduction in overall cache lines dumped and the dump time reduces to an average of 16.8% and 0.0002% for BICD and NICD respectively. Abstract - 2 -

3 The state dump is a major bottleneck during post- silicon debug of processors  Dump processor state to off-chip 。 Last level cache forms the majority of the processor state To improve debug efficiency  Reduce the volume of cache data dumped 。 While still capture the complete state What’s the Problem - 3 - Large amount cache Large cache dump size Huge dump duration

4 Related Works - 4 - Design for debug Collection of selected signal traces Trace compression [6][10][18] Trace compression [6][10][18] Expand few trace signal to restore untraced signal Scan-based debug for physical / logic probing [17][20] Scan-based debug for physical / logic probing [17][20] Halt real time execution Trace signal selection [9][11][13][15] Trace signal selection [9][11][13][15] Reduce area overhead and dump time Iterative silicon debug with signature [2] Capture only error- data; zoom in interval of error signature Only for repeatable Compression specific memory/cache data For performance / energy For Debug Conservative compression [1][4][12][21] Conservative compression [1][4][12][21] Aggressive compression [14][18] Aggressive compression [14][18] Decompression without impacting μp execution Decompression in off-line Compression is limited Online cache dump for μp debug [19] Online cache dump for μp debug [19] Dump simultaneously with μp execution Incremental cache state dumping This paper: Reduce dump size

5 Goal: reduce total amount of cache data to be transferred off-chip  Dump only the cache lines that are updated since last dump Use an Update History Table (UHT) to track all cache updates between two consecutive dumps Incremental Cache Dumping - 5 - Time Dump all Dump updated only μp execution -> $ update

6 Blocking Incremental Cache Dumping (BICD)  Processor is halted during the cache dump  Dump lines whose UHT entry is set Cost-dump time trade-offs  Each UHT bit represents more than one $ line 。 May lead to extra dump Two Methodologies for Incremental Cache Dumping – 1 st BICD - 6 - Blocking Incremental Cache Dumping (BICD) Reduce 56%dump size Don’t update but dump

7 Non-Blocking Incremental Cache Dumping (NICD)  Cache dump is performed simultaneously with μp execution Two challenges with NICD  (1) Cache state is corrupted by the executing processor 。 Reset the corresponding UHT entry after dumping Two Methodologies for Incremental Cache Dumping – 2 nd NICD - 7 - Solution: dump a cache line before the cache attempts to update it 0 0 1 1 0 0 1 1 UHT 0 1 2 3 Updated Being dumped Dump Non-Dumped

8 Two challenges with NICD  (2) Maintenance of the Update History Table (UHT) 。 UHT get incorrectly updated with the “cache dump” and the “executing μp“  UHT-P(previous): cache updates since the last dump (indicate dump)  UHT-C(current): cache updates during dump interval (Swap their roles at the start of the next dump) Two Methodologies for Incremental Cache Dumping – 2 nd NICD (Cont.) - 8 - 0 0 1 1 0 0 1 1 UHT 0 1 2 3 Updated Non- dumped 0 0 1 1 0 0 0 0 UHT 0 1 2 3 Updated 0 0 1 1 0 0 1 1 UHT 0 1 2 3 Updated Dump before update Update but don’t affect current dump Time: TTime: T+1Time: T+2 Solution: use two UHTs

9 Illustration of Non-Blocking Incremental Cache Dumping (NICD) - 1 - 9 - Indicate lines to be dumped Dump then reset UHT entry 1.Dump F then reset UHT-P entry 2.Update F then set UHT-C entry 1.Dump F then reset UHT-P entry 2.Update F then set UHT-C entry Update line F during dump line B

10 Illustration of Non-Blocking Incremental Cache Dumping (NICD) - 2 - 10 - Dump only ‘0’ for F since has been dumped due to update 1 Update line C but don’t affect current dump 2 UHT-P= ‘0’ -> not dump Update line H but don’t affect current dump 3 UHT-P= ‘0’ -> not dump Ready for next dump For next dump: - UHT-P: capture further updates - UHT-C: indicate lines to be dumped For next dump: - UHT-P: capture further updates - UHT-C: indicate lines to be dumped

11 Hardware Implementation – NICD Arch. - 11 - Counter: dump line’s index W_sel Mask: addr. of updated window Mask: addr. of updated window Export from $: - W_sel: Updated way - Write: $ update - Dump: ready for dump UHTs: track $ updates Use for update Use for dump

12 Hardware Implementation- Operation Flow - 12 - W_sel 1 Dump_S: start dump 2 Sense Valid & Dump -> data lines to buffer Dump line Valid from UHT 3 Cache updates (Write): - If the UHT entry = ‘1’ then dump in advance Cache updates (Write): - If the UHT entry = ‘1’ then dump in advance

13 For CHESS  Lines dumped increases with “window size” and “dump interval” For HMMER  Difference is minimal with respect to window size Experimental Results- Lines Dumped at Various Dump Intervals / Window Sizes - 13 - For window size 1: only 36% of total lines is dumped in average Increase with the dump interval

14 Cache updates during the dumping of a window -> stall For CHESS  Average 0.0005% stall overhead for window size 2 For HMMER  Average 0.0001% stall overhead for window size 2 。 Memory requests are spread over time with infrequent updates Experimental Results- Processor Stalls with NICD - 14 - Stalls increase with window size

15 Total dump time overhead  Processor stall overhead + dumping overhead (bus busy during dump) For CHESS  0.0002% dump time for all dump intervals (window size 1) 。 As a percentage of the original dump time For HMMER  0.0009% ~ 0.003% dump time for all dump intervals (window size 1) Experimental Results – Dump Time Overhead for NICD - 15 - Overall dump time follows the trends of processor stalls (increase with window size)

16 Additional area / timing overhead  For BICD 。 Require a UHT (vary window size between 1 and 16)  Area: 0.24 ~ 0.03  Timing: no overhead (UHT access time is smaller than cache access time)  For NICD 。 Dump logic  Area: twice of BICD (no extra timing overhead) 。 Cache modification for online dumping  Area difference is 0.0002 mm 2 (no extra timing overhead) Experimental Results – Area / Access Time - 16 - Original Cache Controller BICDNICD Area (mm 2 )38.9+ ( 0.24~0.03)+ (0.48~0.06) + (0.0002) Timing (ns)2.63+ (0) Dump logic $ modification - 180 nm synthesis technology Require 2 UHTs

17 This paper proposed an incremental cache dumping  Goal: reduce transfer time and logic analyzer space requirement 。 Two hardware mechanisms  Blocking Incremental Cache Dumping (BICD)  Non-blocking Incremental Cache Dumping (NICD) The results show that  Incremental dumping reduces the lines dumped by 64%  BICD: reduce dump time to 16.2% of the original dump time  NICD: reduce dump time to 0.0002% of the original dump time Conclusions - 17 -

18 Good points  Let me understand how to use cache dumping for debug 。 Signature based debugging approach  Map a sequence of events into a cache state dump  Factors for dump time overhead Things can be improved  Why “dump line’s index” doesn’t import to the UHT  From the architecture, it seems to use single-port SRAM 。 How to achieve “cache line dump” and “normal cache access” simultaneously  Environment for transferring from dump logic to logic analyzer is not clear Comments for This Paper - 18 -


Download ppt "Preeti Ranjan Panda, Anant Vishnoi, and M. Balakrishnan Proceedings of the IEEE 18th VLSI System on Chip Conference (VLSI-SoC 2010) Sept. 2010 Presenter:"

Similar presentations


Ads by Google