Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability.

Similar presentations


Presentation on theme: "Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability."— Presentation transcript:

1 Presenter: Shao-Jay Hou

2 In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability to transfer vast amounts of trace data off-chip without significant slow-down has impeded the debugging of such software, in both pre-silicon emulation and in real designs. We consider on-chip trace compression performed in hardware to reduce data volume, using techniques that exploit inherent higher-order redundancy in address trace data. While hardware trace compression is often restricted to poor or moderate performance due to area and memory constraints, we present a parameterizable scheme that leverages the re- sources already found on existing platforms. Harnessing resources such as existing trace buffers on CPUs, and unused embedded memory on FPGA emulation platforms, our trace compression scheme requires only a small additional hardware area to achieve superior compression ratios.

3 MPSoCs multi-threaded program  Traditional debug method can’t be use  Non-invasive method is a good way(on-chip emulation) immense amount of data that must be either stored on-chip or transferred off-chip in real-time  trace of a 32-bit processor, 1 clock per instruction, 100 MHz 400 MB/s data  Data need to be compressed

4 This Paper Compression algorithms[5] Combin e MTF and LZ [1] Combin e MTF and LZ [1] DMTF [17] DMTF [17] Multi-stage compression [11] Multi-stage compression [11] Lempel- Ziv(LZ) [18] Lempel- Ziv(LZ) [18] MCDS [12] ARM ETM[2] Trace compression schemes Compression methods Some example tools

5

6

7 Why?  instructions consecutively until a branch is reached  Branch target address How?  Divided into two part 。 address 。 length  Example:

8

9 Why?  Branch will be taken or not taken  Sequential locality How?  similar to a cache 。 miss the first time a set of instructions is encountered 。 hit for every subsequent encounter that matches the prediction

10

11 Why?  MTF 。 Increase the relevance  Prefix 。 Assist for differential compression How?  Input address and predicted address  Differential compression

12

13 Why?  Prefix byte compression  Probability of prefix How?  Huffman encoding

14

15 Why?  The input for data form MTF/AE stage is 5bytes  But the output to LZ stage is 1byte How?  Use a little buffer to save

16

17 Why?  The input data has high Repeatability How?  Use LZ compression 。 Create a dictionary to save the repeat part 。 But don’t output the dictionary 。 While decompression, create a same dictionary  Don’t output every cycle

18 Benchmark : Mibench CPU: Apple PowerMac G4 with a 1.25 GHz PowerPC 7455, 32-bit fixed instruction-length processor, Linux SMP kernel 2.6.32-24. Simulation software: ModelSim SE-64 v6.5c

19 Logic utilization Usage Scenario  JTAG  software fault 10 -3

20 This paper presented a parameterizable microarchitecture for address trace compression, suited to implementation on ASICs and modern FPGAs. Better compression ratio to others

21 The paper use a dictionary base, multi-stage compression method, can be use to improve our tracer. The paper give a inspiration for future work for our tracer CPUGPU Bus B.T. P.T. T.M.


Download ppt "Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability."

Similar presentations


Ads by Google