Presentation is loading. Please wait.

Presentation is loading. Please wait.

Detailed look at the TigerSHARC pipeline Cycle counting for the IALU versionof the DC_Removal algorithm.

Similar presentations


Presentation on theme: "Detailed look at the TigerSHARC pipeline Cycle counting for the IALU versionof the DC_Removal algorithm."— Presentation transcript:

1 Detailed look at the TigerSHARC pipeline Cycle counting for the IALU versionof the DC_Removal algorithm

2 DC_Removal algorithm performance 2 / 28 To be tackled today Expected and actual cycle count for J- IALU version of DC_Removal algorithm Understanding why the stalls occur and how to fix. Differences between first time into a function (cache empty) and second time into the function

3 DC_Removal algorithm performance 3 / 28 Set up time In principle 1 cycle / instruction 2 + 4 instructions

4 DC_Removal algorithm performance 4 / 28 First key element – Sum Loop -- Order (N) Second key element – Shift Loop – Order (log 2 N) 4 instructions N * 5 instructions 1 + 2 * log 2 N

5 DC_Removal algorithm performance 5 / 28 Third key element – FIFO circular buffer -- Order (N) 6 3 6 * N 2

6 DC_Removal algorithm performance 6 / 28 TigerSHARC pipeline

7 DC_Removal algorithm performance 7 / 28 Using the “Pipeline Viewer” Available with the TigerSHARC simulator ONLY VIEW | Debug Windows | Pipeline viewer F1 to F4 – instruction fetch unit pipeline PD, D, I -- Integer ALU pipeline A, EX1, EX2 – Compute Block pipeline

8 DC_Removal algorithm performance 8 / 28 Pipeline symbols Control - click A – Abort B – Bubble H – BTB Hit (Jumps) S – Stall W – Wait X – Illegal fetch(F1 – F4) X – Illegal instruction (PD – E2)

9 DC_Removal algorithm performance 9 / 28 Time in theory Set up pointers to buffers Insert values into buffers SUM LOOP SHIFT LOOP Update outgoing parameters Update FIFO Function return 2 4 4 + N * 5 1 + 2 * log 2 N 6 3 + 6 * N 2 --------------------------- 22 + 11 N + 2 log 2 N N = 128 – instructions = 1444 1444 cycles + 1100 delay cycles C++ debug mode – 9500 cycles??????? Note other tests executed before this test. Means “cache filled”

10 DC_Removal algorithm performance 10 / 28 Test environment Examine the pipeline the 2 nd time around the loop “Cache’s filled”?

11 DC_Removal algorithm performance 11 / 28 Set up time Expected 2 + 4 instructions Actual 2 + 4 instructions + 2 stalls Why not 4 stalls?

12 DC_Removal algorithm performance 12 / 28 First time round sum loop Expected 9 instructions LC0 load – 3 stalls Each memory fetch – 4 stalls Actual 9 + 11 stalls

13 DC_Removal algorithm performance 13 / 28 Other times around the loop Expected 5 instructions Each memory fetch – 4 stalls Actual 5 + 8 stalls

14 DC_Removal algorithm performance 14 / 28 Shift Loop – 1 st time around Expected 3 instructions No stalls on LC0 load? 4 stall on ASHIFTR BTB hit followed by 5 aborts

15 DC_Removal algorithm performance 15 / 28 Shift loop 2 nd and later times around Expect 2 Get 2

16 DC_Removal algorithm performance 16 / 28 Store back of &left, &right Expect 6 Actual 6 + 3 stalls

17 DC_Removal algorithm performance 17 / 28 Exercise 1 Based on knowledge to this points – determine the expected stalls during the last piece of code – FIFO buffer operatio

18 DC_Removal algorithm performance 18 / 28 Third key element – FIFO circular buffer -- Order (N) 6 3 6 * N 2

19 DC_Removal algorithm performance 19 / 28 Answer

20 DC_Removal algorithm performance 20 / 28

21 DC_Removal algorithm performance 21 / 28

22 DC_Removal algorithm performance 22 / 28

23 DC_Removal algorithm performance 23 / 28 Second time into function

24 DC_Removal algorithm performance 24 / 28 What happens if cache not full? – first time function called? Was 2 + 2 stalls in loop Now 11 + 12 stalls in loop

25 DC_Removal algorithm performance 25 / 28 First time function called 2 nd time around the loop Ditto 3, 4, 5, 6, 7, 8 times

26 DC_Removal algorithm performance 26 / 28 9 th time around the loop ditto 17 th, 25 th, 33 rd, 41 st, 49 th

27 DC_Removal algorithm performance 27 / 28 What is happening? With cache filled – memory read accesses require 4 cycles Unfilled – first one requires “12 cycles” Then next 7 require 4 cycles Total guess – is extra time associated with doing extra reads to fill the cache?

28 DC_Removal algorithm performance 28 / 28 Tackled today Expected and actual cycle count for J-IALU version of DC_Removal algorithm Understanding why the stalls occur and how to fix. Differences between first time into a function (cache empty) and second time into the function Further unknowns – how memory operations really work


Download ppt "Detailed look at the TigerSHARC pipeline Cycle counting for the IALU versionof the DC_Removal algorithm."

Similar presentations


Ads by Google