Understanding the TigerSHARC ALU pipeline Determining the speed of one stage of IIR filter.

Slides:



Advertisements
Similar presentations
Machine cycle.
Advertisements

PIPELINE AND VECTOR PROCESSING
1/1/ / faculty of Electrical Engineering eindhoven university of technology Speeding it up Part 3: Out-Of-Order and SuperScalar execution dr.ir. A.C. Verschueren.
Processor Architecture Needed to handle FFT algoarithm M. Smith.
Intro to Computer Org. Pipelining, Part 2 – Data hazards + Stalls.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
1 A few words about the quiz Closed book, but you may bring in a page of handwritten notes. –You need to know what the “core” MIPS instructions do. –I.
Detailed look at the TigerSHARC pipeline Cycle counting for the IALU versionof the DC_Removal algorithm.
What are the characteristics of DSP algorithms? M. Smith and S. Daeninck.
Software and Hardware Circular Buffer Operations First presented in ENCM There are 3 earlier lectures that are useful for midterm review. M. R.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
CSCE 212 Quiz 9 – 3/30/11 1.What is the clock cycle time based on for single-cycle and for pipelining? 2.What two actions can be done to resolve data hazards?
TigerSHARC CLU Closer look at the XCORRS M. Smith, University of Calgary, Canada
TigerSHARC processor General Overview. 6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 2 Concepts tackled Introduction to.
Pipelining What is it? How does it work? What are the benefits? What could go wrong? By Derek Closson.
Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.
Pipelining By Toan Nguyen.
SUPERSCALAR EXECUTION. two-way superscalar The DLW-2 has two ALUs, so it’s able to execute two arithmetic instructions in parallel (hence the term two-way.
Processor Architecture Needed to handle FFT algoarithm M. Smith.
What have mr aldred’s dirty clothes got to do with the cpu
Understanding the TigerSHARC ALU pipeline Determining the speed of one stage of IIR filter – Part 3 Understanding the memory pipeline issues.
The Central Processing Unit (CPU) and the Machine Cycle.
Understanding the TigerSHARC ALU pipeline Determining the speed of one stage of IIR filter – Part 2 Understanding the pipeline.
Generating “Rectify( )” Test driven development approach to TigerSHARC assembly code production Assembly code examples Part 1 of 3.
Spring 2003CSE P5481 Precise Interrupts Precise interrupts preserve the model that instructions execute in program-generated order, one at a time If an.
5/13/99 Ashish Sabharwal1 Pipelining and Hazards n Hazards occur because –Don’t have enough resources (ALU’s, memory,…) Structural Hazard –Need a value.
Pentium Architecture Arithmetic/Logic Units (ALUs) : – There are two parallel integer instruction pipelines: u-pipeline and v-pipeline – The u-pipeline.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
A first attempt at learning about optimizing the TigerSHARC code TigerSHARC assembly syntax.
Simple ALU How to perform this C language integer operation in the computer C=A+B; ? The arithmetic/logic unit (ALU) of a processor performs integer arithmetic.
Introduction to Computer Organization Pipelining.
ECE/CS 552: Pipeline Hazards © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim.
Simulator Outline of MIPS Simulator project  Write a simulator for the MIPS five-stage pipeline that does the following: Implements a subset of.
Real-World Pipelines Idea –Divide process into independent stages –Move objects through stages in sequence –At any given times, multiple objects being.
CPIT Program Execution. Today, general-purpose computers use a set of instructions called a program to process data. A computer executes the.
1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.
Generating a software loop with memory accesses TigerSHARC assembly syntax.
Real-World Pipelines Idea Divide process into independent stages
Speed up on cycle time Stalls – Optimizing compilers for pipelining
CDA3101 Recitation Section 8
CSCI206 - Computer Organization & Programming
Figure 8.1 Architecture of a Simple Computer System.
Software and Hardware Circular Buffer Operations
General Optimization Issues
Lecture 6: Advanced Pipelines
TigerSHARC processor General Overview.
Generating the “Rectify” code (C++ and assembly code)
Generating “Rectify( )”
Trying to avoid pipeline delays
Understanding the TigerSHARC ALU pipeline
CSCI206 - Computer Organization & Programming
Chapter Six.
Understanding the TigerSHARC ALU pipeline
Control unit extension for data hazards
Lecture 20: OOO, Memory Hierarchy
Instruction Execution Cycle
* M. R. Smith 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint.
Getting serious about “going fast” on the TigerSHARC
Chapter 8. Pipelining.
General Optimization Issues
Explaining issues with DCremoval( )
General Optimization Issues
Lab. 4 – Part 2 Demonstrating and understanding multi-processor boot
Control unit extension for data hazards
Understanding the TigerSHARC ALU pipeline
Control unit extension for data hazards
A first attempt at learning about optimizing the TigerSHARC code
Working with the Compute Block
A first attempt at learning about optimizing the TigerSHARC code
Presentation transcript:

Understanding the TigerSHARC ALU pipeline Determining the speed of one stage of IIR filter

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada2 / 30 6/18/2015 Understanding the TigerSHARC ALU pipeline TigerSHARC has many pipelines If these pipelines stall – then the processor speed goes down Need to understand how the ALU pipeline works  Learn to use the pipeline viewer May be different answer for floating point and integer operations

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada3 / 30 6/18/2015 Register File and COMPUTE Units

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada4 / 30 6/18/2015 Simple Example IIR -- Biquad For (Stages = 0 to 3) Do  S0 = X in * H5 + S2 * H3 + S1 * H4  Y out = S0 * H0 + S1 * H1 + S2 * H2  S2 = S1  S1 = S0 S0 S1 S2

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada5 / 30 6/18/2015 Set up the tests. Want to make sure correct answer as code changes

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada6 / 30 6/18/2015 Step 1 – Stub plus return value Build an assembly language stub for float iirASM(void); Make it return a floating point value of 40.5 to show that we can return a value of 40.5 J8 is an INTEGER so how can we return 40.5? ANSWER – WE DON’T We return the “bit pattern” for 40.5, which is “INTEGER”

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada7 / 30 6/18/2015 Code does not work when passing back floats with J8 register

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada8 / 30 6/18/2015 Code does work when using XR8 register – NOTE NOT XFR8

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada9 / 30 6/18/2015 Step 2 – Using C++ code as comments set up the coefficients XFR0 = 0.0;; Does not exist XR0 = 0.0;; DOES EXIST Bit-patterns require integer registers Leave what you wanted to do behind as comments

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada10 / 30 6/18/2015

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada11 / 30 6/18/2015 Modify C++ code so that it can be translated into assembly code Can only have 1 instruction per line Code must execute sequentially so remember the ;;

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada12 / 30 6/18/2015 Start with S0 = Xin instruction Can’t use XFR8 = XFR6 to copy a register

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada13 / 30 6/18/2015 Since XFR8 = XFR6 is not allowed Try XR8 = R6; SIMD  Single instruction Multiple Data R6 means move XR6 and YR6 (Multiple data move described in 1 instruction) Try XR8 = XR6

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada14 / 30 6/18/2015 Some operations are FLOAT operations and must have XFR on left side of equation BUT only R on the right Some operations are SISD operations and must have XR on both side of the equation (or just R on both sides of the equation making them SIMD X and Y with garbage happening on Y) Personally, I think all these problems are “assembler” issues and could be made consistent

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada15 / 30 6/18/2015 Disconnect from target and go to simulator

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada16 / 30 6/18/2015 Activate Simulator

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada17 / 30 6/18/2015 Rebuild the project and set breakpoints at start and end of ASM code

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada18 / 30 6/18/2015 Activate the pipeline viewer

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada19 / 30 6/18/2015 Adjust the pipeline window so can see all the instruction pipeline stages

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada20 / 30 6/18/2015 PIPELINE STAGES See page 8-34 of Processor manual Instruction fetch -- F1, F2, F3 and F4  Fetch Unit Pipe – memory driven  128 bits fetched – may make up 1, 2, 3, or 4 instructions (or parts of a couple instructions  Instructions into IAB, instruction alignment buffer Integer ALU pipe – PD, D, I and A

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada21 / 30 6/18/2015 PIPELINE STAGES See page 8-34 of Processor manual 10 pipeline stages, but may be completely desynchronized (happen semi-indepently) Instruction fetch -- F1, F2, F3 and F4 Integer ALU – PreDecode, Decode, Integer, Access Compute Block – EX1 and EX2

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada22 / 30 6/18/2015 PIPELINE STAGES See page 8-34 of Processor manual Instruction fetch -- F1, F2, F3 and F4  Fetch Unit Pipe  Memory driven not instruction driven  128 bits fetched – may make up 1, 2, 3, or 4 instruction lines (or parts of a couple of instruction lines  Instruction fetched into IAB, instruction alignment buffer

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada23 / 30 6/18/2015 PIPELINE STAGES See page 8-34 of Processor manual Integer ALU pipe – PD, D, I and A  PreDecode – the next COMPLETE instruction line (1, 2, 3 or 4 ) fetched from IAB  Decode – different instructions dispatched to different execution units (J-IALU, K-IALU, Compute Blocks)  Data memory access start in Integer stage  A stands for Access stage  Results are not available EX2 stage, but (by register forwarding) can be sometimes accessed earlier

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada24 / 30 6/18/2015 PIPELINE STAGES See page 8-34 of Processor manual Compute Block  EX1 and EX2  Result is always written to the target register on the rising edge of CCLK after stage EX2  Following guaranteed R2 = R0 + R1; R6 = R2 * R3;; R2 at end of instruction R2 value at beginning of instruction used

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada25 / 30 6/18/2015 Only interested in later stages of the pipeline. Adjust properties

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada26 / 30 6/18/2015 Run the code till first ASM break point: Note cycle Number Then run again till reach second ASM breakpoint Calculate execution time

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada27 / 30 6/18/2015

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada28 / 30 6/18/2015 Pipeline viewer says 26 cycles but what do we expect 8 cycles

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada29 / 30 6/18/2015 Pipeline viewer says 26 cycles but what do we expect cycles expected Where are the extra cycles coming from and how easy is it to code in such a way that the extra cycles can be removed ANSWER Fairly straight forward in idea, can be difficult in practice

Speed IIR -- stage 1, M. Smith, ECE, University of Calgary, Canada30 / 30 6/18/2015 Understanding the TigerSHARC ALU pipeline TigerSHARC has many pipelines If these pipelines stall – then the processor speed goes down Need to understand how the ALU pipeline works  Learn to use the pipeline viewer May be different answer for floating point and integer operations