Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understanding the TigerSHARC ALU pipeline

Similar presentations


Presentation on theme: "Understanding the TigerSHARC ALU pipeline"— Presentation transcript:

1 Understanding the TigerSHARC ALU pipeline
Determining the speed of one stage of IIR filter – Part 1 Getting code to work

2 Understanding the TigerSHARC ALU pipeline
TigerSHARC has many pipelines If these pipelines stall – then the processor speed goes down Need to understand how the ALU pipeline works Learn to use the pipeline viewer May be different answer for floating point and integer operations 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

3 Register File and COMPUTE Units
12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

4 Simple Example IIR -- Biquad
For (Stages = 0 to 3) Do S0 = Xin * H5 + S2 * H3 + S1 * H4 Yout = S0 * H0 + S1 * H1 + S2 * H2 S2 = S1 S1 = S0 Not a great bit of IIR code as It can’t be used in a loop on an array of values as is really necessary 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

5 Set up the tests. Want to make sure correct answer as code changes
#include <EmbeddedUnit/EmbeddedUnit.h> #include <EmbeddedUnit/CommonTests.h> #include <EmbeddedUnit/EmbeddedTests.h> 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

6 Step 1 – Stub plus return value
Build an assembly language stub for float iirASM(void); Make it return a floating point value of 40.5 to show that we can return a value of 40.5 J8 is an INTEGER so how can we return 40.5? ANSWER – WE DON’T We return the “bit pattern” for 40.5, which is the same as an “INTEGER” bit pattern 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

7 Code does not work when passing back floats with J8 register
We are passing back 40.5 in normal return register, but that is obviously NOT what the C++ compiler was expecting Wrong code convention 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

8 Code does work when using XR8 register – NOTE NOT XFR8
12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

9 Step 2 – Using C++ code as comments -- set up the coefficients
XFR0 = 0.0;; DOES NOT EXIST as a float instruction XR0 = 0.0;; DOES EXIST Bit-patterns require integer X registers Leave what you wanted to do behind as comments 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

10 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada
“ARCHITECTURAL ISSUES “– DON’T NEED SPECIAL FLOAT = CONSTANT INSTRUCTIONS Initialize X registers to float values via “integer” operations XR = Then use XFR “float” operations What I want to do is left behind as comments for the stranger reading my code next week (ME) 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

11 Modify C++ code so that it can be translated into assembly code
Can only have 1 instruction per line Code must execute sequentially so remember the ;; 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

12 Start with S0 = Xin instruction
Can’t use XFR8 = XFR6 to copy a register 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

13 Since XFR8 = XFR6 is not allowed Try XR8 = R6;
SIMD  Single instruction Multiple Data SISD  Single instruction SingleData R6 means move XR6 and YR6 (Multiple data move described in 1 instruction) Try XR8 = XR6 (integer – bit-pattern – move) New TigerSHARC architecture issues SIMD versus SISD 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

14 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada
Some operations are FLOAT operations and must have XFR on left side of equation BUT only R on the right Some operations are SISD operations and must have XR on both side of the equation (or just R on both sides of the equation making them SIMD X and Y with garbage happening on Y) Personally, I think all these problems are “assembler” issues and could be made consistent 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

15 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada
What we have learnt TigerSHARC has both SISD (single data) and SIMD (multiple data) ability XFR4 = R4 * R5; The answer (left) is single data – so the SISD choice is taken on right – read XR4 and XR5 (bit patterns), treat as floats when do multiplication (F on left), and store (bit pattern of answer) in XR4 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

16 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada
What we have learnt TigerSHARC has both SISD (single data) and SIMD (multiple data) ability SISD XR4 = XR5;; Move X part of R5 register into X part of R4 register XR4 = YR5;; Move Y part of R5 register into X part of R4 register SIMD XYR4 = R5;; Move X part of R5 register into X part of R4 register and Y part of R5 register into Y part of R4 register R4 = R5;; Short hand version of XYR4 = R5 to confuse you Does YXR4 = R5 also exist? Move X part of R5 register into Y part of R4 register and X part of R5 register into Y part of R4 register 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

17 Disconnect from target and go to simulator
12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

18 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada
Activate Simulator 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

19 Rebuild the project and set breakpoints at start and end of ASM code
12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

20 Activate the pipeline viewer
12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

21 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada
Adjust the pipeline window so can see all the instruction pipeline stages Have just located an arrow icon which causes the pipeline window to fill the screen all the way across 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

22 PIPELINE STAGES See page 8-34 of Processor manual
10 pipeline stages, but may be completely desynchronized (happen semi-indepently) Instruction fetch -- F1, F2, F3 and F4 Integer ALU – PreDecode, Decode, Integer, Access Compute Block – EX1 and EX2 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

23 PIPELINE STAGES See page 8-34 of Processor manual
Instruction fetch -- F1, F2, F3 and F4 Fetch Unit Pipe Memory driven not instruction driven 128 bits fetched – may make up 1, 2, 3, or 4 instruction lines (or parts of a couple of instruction lines Instruction fetched into IAB, instruction alignment buffer 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

24 PIPELINE STAGES See page 8-34 of Processor manual
Integer ALU pipe – PD, D, I and A PreDecode – the next COMPLETE instruction line (1, 2, 3 or 4 ) fetched from IAB Decode – different instructions dispatched to different execution units (J-IALU, K-IALU, Compute Blocks) Data memory access start in Integer stage A stands for Access stage Results are not available EX2 stage, but (by register forwarding) can be sometimes accessed earlier 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

25 PIPELINE STAGES See page 8-34 of Processor manual
Compute Block EX1 and EX2 Result is always written to the target register on the rising edge of CCLK after stage EX2 Following multiple use of register (read and store) in one line guaranteed to pipeline correctly R2 = R0 + R1; R6 = R2 * R3;; R2 at end of instruction R2 value at beginning of instruction used 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

26 Only interested in later stages of the pipeline. Adjust properties
12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

27 Run the code till first ASM break point: Note down cycle Number 39830
Then run again till reach second ASM breakpoint Calculate execution time Instruction in pipeline for a long time before simulator stops 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

28 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada
Pipeline during code execution 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

29 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada
Pipeline viewer says 26 cycles but what do we expect to get from our code? 1 2 3 4 5 6 7 8 8 cycles in this part of the code as expect 1 instruction per clock cycle 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

30 Pipeline viewer says 26 cycles but what do we expect -- 21
20% error in timing Too much Where are the extra cycles coming from? How easy is it to code in such a way that the extra cycles can be removed? ANSWER Fairly straight forward to fix in principle, can be difficult in practice 1 2 3 4 5 6 7 8 9 10 11 12 13 Again 1 instruction / cycle expected 13 cycles expected + 8 from before = 21 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

31 Understanding the TigerSHARC ALU pipeline
TigerSHARC has many pipelines If these pipelines stall – then the processor speed goes down Need to understand how the ALU pipeline works Learn to use the pipeline viewer May be different answer for floating point and integer operations 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada


Download ppt "Understanding the TigerSHARC ALU pipeline"

Similar presentations


Ads by Google