Understanding the TigerSHARC ALU pipeline

Understanding the TigerSHARC ALU pipeline
Determining the speed of one stage of IIR filter – Part 1 Getting code to work

TigerSHARC has many pipelines If these pipelines stall – then the processor speed goes down Need to understand how the ALU pipeline works Learn to use the pipeline viewer May be different answer for floating point and integer operations 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

Register File and COMPUTE Units
12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

Simple Example IIR -- Biquad
For (Stages = 0 to 3) Do S0 = Xin * H5 + S2 * H3 + S1 * H4 Yout = S0 * H0 + S1 * H1 + S2 * H2 S2 = S1 S1 = S0 Not a great bit of IIR code as It can’t be used in a loop on an array of values as is really necessary 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

Set up the tests. Want to make sure correct answer as code changes
#include <EmbeddedUnit/EmbeddedUnit.h> #include <EmbeddedUnit/CommonTests.h> #include <EmbeddedUnit/EmbeddedTests.h> 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

Step 1 – Stub plus return value
Build an assembly language stub for float iirASM(void); Make it return a floating point value of 40.5 to show that we can return a value of 40.5 J8 is an INTEGER so how can we return 40.5? ANSWER – WE DON’T We return the “bit pattern” for 40.5, which is the same as an “INTEGER” bit pattern 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

Code does not work when passing back floats with J8 register
We are passing back 40.5 in normal return register, but that is obviously NOT what the C++ compiler was expecting Wrong code convention 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

Code does work when using XR8 register – NOTE NOT XFR8

Step 2 – Using C++ code as comments -- set up the coefficients
XFR0 = 0.0;; DOES NOT EXIST as a float instruction XR0 = 0.0;; DOES EXIST Bit-patterns require integer X registers Leave what you wanted to do behind as comments 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada
“ARCHITECTURAL ISSUES “– DON’T NEED SPECIAL FLOAT = CONSTANT INSTRUCTIONS Initialize X registers to float values via “integer” operations XR = Then use XFR “float” operations What I want to do is left behind as comments for the stranger reading my code next week (ME) 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

Modify C++ code so that it can be translated into assembly code
Can only have 1 instruction per line Code must execute sequentially so remember the ;; 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

Start with S0 = Xin instruction
Can’t use XFR8 = XFR6 to copy a register 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

Since XFR8 = XFR6 is not allowed Try XR8 = R6;
SIMD  Single instruction Multiple Data SISD  Single instruction SingleData R6 means move XR6 and YR6 (Multiple data move described in 1 instruction) Try XR8 = XR6 (integer – bit-pattern – move) New TigerSHARC architecture issues SIMD versus SISD 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

Some operations are FLOAT operations and must have XFR on left side of equation BUT only R on the right Some operations are SISD operations and must have XR on both side of the equation (or just R on both sides of the equation making them SIMD X and Y with garbage happening on Y) Personally, I think all these problems are “assembler” issues and could be made consistent 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

What we have learnt TigerSHARC has both SISD (single data) and SIMD (multiple data) ability XFR4 = R4 * R5; The answer (left) is single data – so the SISD choice is taken on right – read XR4 and XR5 (bit patterns), treat as floats when do multiplication (F on left), and store (bit pattern of answer) in XR4 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

What we have learnt TigerSHARC has both SISD (single data) and SIMD (multiple data) ability SISD XR4 = XR5;; Move X part of R5 register into X part of R4 register XR4 = YR5;; Move Y part of R5 register into X part of R4 register SIMD XYR4 = R5;; Move X part of R5 register into X part of R4 register and Y part of R5 register into Y part of R4 register R4 = R5;; Short hand version of XYR4 = R5 to confuse you Does YXR4 = R5 also exist? Move X part of R5 register into Y part of R4 register and X part of R5 register into Y part of R4 register 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

Disconnect from target and go to simulator

Activate Simulator 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

Rebuild the project and set breakpoints at start and end of ASM code

Activate the pipeline viewer

Adjust the pipeline window so can see all the instruction pipeline stages Have just located an arrow icon which causes the pipeline window to fill the screen all the way across 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

PIPELINE STAGES See page 8-34 of Processor manual
10 pipeline stages, but may be completely desynchronized (happen semi-indepently) Instruction fetch -- F1, F2, F3 and F4 Integer ALU – PreDecode, Decode, Integer, Access Compute Block – EX1 and EX2 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

Instruction fetch -- F1, F2, F3 and F4 Fetch Unit Pipe Memory driven not instruction driven 128 bits fetched – may make up 1, 2, 3, or 4 instruction lines (or parts of a couple of instruction lines Instruction fetched into IAB, instruction alignment buffer 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

Integer ALU pipe – PD, D, I and A PreDecode – the next COMPLETE instruction line (1, 2, 3 or 4 ) fetched from IAB Decode – different instructions dispatched to different execution units (J-IALU, K-IALU, Compute Blocks) Data memory access start in Integer stage A stands for Access stage Results are not available EX2 stage, but (by register forwarding) can be sometimes accessed earlier 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

Compute Block EX1 and EX2 Result is always written to the target register on the rising edge of CCLK after stage EX2 Following multiple use of register (read and store) in one line guaranteed to pipeline correctly R2 = R0 + R1; R6 = R2 * R3;; R2 at end of instruction R2 value at beginning of instruction used 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

Only interested in later stages of the pipeline. Adjust properties

Run the code till first ASM break point: Note down cycle Number 39830
Then run again till reach second ASM breakpoint Calculate execution time Instruction in pipeline for a long time before simulator stops 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

Pipeline during code execution 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

Pipeline viewer says 26 cycles but what do we expect to get from our code? 1 2 3 4 5 6 7 8 8 cycles in this part of the code as expect 1 instruction per clock cycle 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

Pipeline viewer says 26 cycles but what do we expect -- 21
20% error in timing Too much Where are the extra cycles coming from? How easy is it to code in such a way that the extra cycles can be removed? ANSWER Fairly straight forward to fix in principle, can be difficult in practice 1 2 3 4 5 6 7 8 9 10 11 12 13 Again 1 instruction / cycle expected 13 cycles expected + 8 from before = 21 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

TigerSHARC has many pipelines If these pipelines stall – then the processor speed goes down Need to understand how the ALU pipeline works Learn to use the pipeline viewer May be different answer for floating point and integer operations 12/2/2018 Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada

Understanding the TigerSHARC ALU pipeline

Similar presentations

Presentation on theme: "Understanding the TigerSHARC ALU pipeline"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Understanding the TigerSHARC ALU pipeline

Similar presentations

Presentation on theme: "Understanding the TigerSHARC ALU pipeline"— Presentation transcript:

Similar presentations

About project

Feedback