Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understanding the TigerSHARC ALU pipeline

Similar presentations


Presentation on theme: "Understanding the TigerSHARC ALU pipeline"— Presentation transcript:

1 Understanding the TigerSHARC ALU pipeline
Determining the speed of one stage of IIR filter – Part 2 Understanding the pipeline

2 Understanding the TigerSHARC ALU pipeline
TigerSHARC has many pipelines If these pipelines stall – then the processor speed goes down Need to understand how the ALU pipeline works Learn to use the pipeline viewer Understanding what the pipeline viewer tells in detail Avoiding having to use the pipeline viewer Improving code efficency Excel and Project (Gantt charts) are useful tool 1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

3 Register File and COMPUTE Units
1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

4 Simple Example IIR -- Biquad
For (Stages = 0 to 3) Do S0 = Xin * H5 + S2 * H3 + S1 * H4 Yout = S0 * H0 + S1 * H1 + S2 * H2 S2 = S1 S1 = S0 1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

5 Code return float when using XR8 register – NOTE NOT XFR8
1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

6 Step 2 – Using C++ code as comments set up the coefficients
XFR0 = 0.0;; Does not exist XR0 = 0.0;; DOES EXIST Bit-patterns require integer registers Leave what you wanted to do behind as comments 1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

7 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada
Expect to take 8 cycles to execute 1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

8 PIPELINE STAGES See page 8-34 of Processor manual
10 pipeline stages, but may be completely desynchronized (happen semi-independently) Instruction fetch -- F1, F2, F3 and F4 Integer ALU – PreDecode, Decode, Integer, Access Compute Block – EX1 and EX2 1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

9 Pipeline Viewer Result
XR0 = enters PD 39025, enters E2 stage at cycle is stored into XR0 at cycle cycles execution time 1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

10 Pipeline Viewer Result
XR6 = enters PD stage at cycle 39032 enters E2 stage at cycle is stored into XR6 at cycle cycles execution time Each instruction takes 7 cycles but one new result each cycle Result – once pipeline filled 8 cycles = 8 register transfer operations 1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

11 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada
Doing filter operations – generates different results XR8 = XR enters PD at 39833, enters EX2 at 39838, stored – 7 cycles XFR23 = R9 * R4 enters PD at 39834, enters EX2 at 39839, stored – 7 cycles XFR0 = R0 + R23 enters PD at 39835, enters EX2 at 39841, stored – 8 cycles WHY? – FIND OUT WITH MOUSE CLICK ON S MARKER THEN CONTROL 1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

12 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada
Instruction 0x17e XFR8 = R8 + R23 is STALLED (waiting) for 0x17d to complete XFR23 = R8 * R4 Bubble B means that the pipeline is doing “nothing” Meaning that the instruction shown is “place holder” (garbage) 1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

13 Information on Window Event Icons
1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

14 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada
Result of Analysis Can’t use Float result immediately after calculation Writing XFR23 = R8 * R4;; XFR8 = R8 + R23;; // MUST WAIT FOR XFR // calculation to be completed Is the same as coding XFR23 = R8 * R4;; NOP;;  Note DOUBLE ;; -- extra cycle because of stall XFR8 = R8 + R23;; Proof – write the code with the stalls shown in it Writing this way means we don’t have to use the pipeline viewer all the time Pipeline viewer is only available with (slow) simulator #define SHOW_ALU_STALL nop 1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

15 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada
Code with stalls shown 8 code lines 5 expected stalls Expect 13 cycles to complete if theory is correct 1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

16 Analysis approach IS correct
1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

17 Process for coding for improved speed – code re-organization
Make a copy of the code so can test iirASM( ) and iirASM_Optimized( ) to make sure get correct result Make a table of code showing ALU resource usage (paper, EXCEL, Project (Gantt chart) ) Identify data dependencies Make all “temp operations” use different register Move instructions “forward” to fill delay slots, BUT don’t break data dependencies 1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

18 Copy and paste to make IIRASM_Optimized( )
1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

19 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada
Need to re-order instructions to fill delay slots with useful instructions After refactoring code to fill delay slots, must run tests to ensure that still have the correct result Change – and check NOT EASY MUST HAVE A PLAN I USE EXCEL 1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

20 Show resource usage and data dependencies
1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

21 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada
Change all temporary registers to use different register names Then check code produces correct answer 1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

22 Move instructions forward, without breaking data dependencies
What appears possible! DO one thing at a time and then check that code still works 1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

23 Check that code still operates 1 cycle saved
1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

24 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada
Move next multiplication up. NOTE certain stalls remain, although reason for STALL changes 1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

25 Move up the R10 and R9 assignment operations -- check
4 cycle improvement? 1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

26 CHECK THE PIPELINE AFTER TESTING
1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

27 Are there still more improvements possible (I can see 4 more moves)
1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

28 Problems with approach
Identifying all the data dependencies Keep track of how the data dependencies change as you move the code around Handling all of this “automatically” I started the following design tool as something that might work, but it actually turned out very useful. M. R. Smith and J. Miller, "Microprocessor Scheduling -- the irony of using Microsoft Project", "Don’t say “CAN’T do it - Say “Gantt it”! The irony of organizing microprocessors with a big business tool" Circuit Cellar magazine, Vol. 184, pp , November 2005. 1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

29 Using Microsoft Project – Step 1
1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

30 Add dependencies and resource usage – then activate level
1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

31 Microsoft Project as a microprocessor design tool
Will look at this in more detail when we start using memory operations to fill the coefficient and state arrays 1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

32 Understanding the TigerSHARC ALU pipeline
TigerSHARC has many pipelines If these pipelines stall – then the processor speed goes down Need to understand how the ALU pipeline works Learn to use the pipeline viewer Understanding what the pipeline viewer tells in detail Avoiding having to use the pipeline viewer Improving code efficiency Excel and Project (Gantt charts) are useful tool 1/2/2019 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada


Download ppt "Understanding the TigerSHARC ALU pipeline"

Similar presentations


Ads by Google