Getting serious about “going fast” on the TigerSHARC

Slides:



Advertisements
Similar presentations
Processor Architecture Needed to handle FFT algoarithm M. Smith.
Advertisements

Detailed look at the TigerSHARC pipeline Cycle counting for the IALU versionof the DC_Removal algorithm.
What are the characteristics of DSP algorithms? M. Smith and S. Daeninck.
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
Software and Hardware Circular Buffer Operations First presented in ENCM There are 3 earlier lectures that are useful for midterm review. M. R.
Lab. 2 Overview 1. What concepts are you expected to understand after the Lab. 2 is finished? 2. How do you demonstrate that you have that knowledge?
Understanding the TigerSHARC ALU pipeline Determining the speed of one stage of IIR filter.
Detailed look at the TigerSHARC pipeline Cycle counting for COMPUTE block versions of the DC_Removal algorithm.
TigerSHARC processor General Overview. 6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 2 Concepts tackled Introduction to.
Just enough information to program a Blackfin Familiarization assignment for the Analog Devices’ VisualDSP++ Integrated Development Environment.
Reduced Instruction Set Computers (RISC) Computer Organization and Architecture.
Computer Organization and Architecture Reduced Instruction Set Computers (RISC) Chapter 13.
CH13 Reduced Instruction Set Computers {Make hardware Simpler, but quicker} Key features  Large number of general purpose registers  Use of compiler.
Processor Architecture Needed to handle FFT algoarithm M. Smith.
Understanding the TigerSHARC ALU pipeline Determining the speed of one stage of IIR filter – Part 3 Understanding the memory pipeline issues.
Averaging Filter Comparing performance of C++ and ‘our’ ASM Example of program development on SHARC using C++ and assembly Planned for Tuesday 7 rd October.
Understanding the TigerSHARC ALU pipeline Determining the speed of one stage of IIR filter – Part 2 Understanding the pipeline.
Generating “Rectify( )” Test driven development approach to TigerSHARC assembly code production Assembly code examples Part 1 of 3.
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
Blackfin Array Handling Part 1 Making an array of Zeros void MakeZeroASM(int foo[ ], int N);
ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.
Reduced Instruction Set Computers. Major Advances in Computers(1) The family concept —IBM System/ —DEC PDP-8 —Separates architecture from implementation.
12/14/2015 Concept of Test Driven Development applied to Embedded Systems M. Smith University of Calgary, Canada 1 Automated Testing Environment Concepts.
A first attempt at learning about optimizing the TigerSHARC code TigerSHARC assembly syntax.
Building a simple loop using Blackfin assembly code If you can handle the while-loop correctly in assembly code on any processor, then most of the other.
“Lab. 5” – Updating Lab. 3 to use DMA Test we understand DMA by using some simple memory to memory DMA Make life more interesting, since hardware is involved,
Generating a software loop with memory accesses TigerSHARC assembly syntax.
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
Lab. 2 Overview – Earlier Tasks Prelaboratory T1, T3, T4 and T5
Software and Hardware Circular Buffer Operations
General Optimization Issues
TigerSHARC processor General Overview.
Generating the “Rectify” code (C++ and assembly code)
Generating “Rectify( )”
A Play Core Timer Interrupts
Introduction to Test Driven Development
Automated Testing Environment
Overview of SHARC processor ADSP Program Flow and other stuff
Trying to avoid pipeline delays
Generating a software loop with memory accesses
Understanding the TigerSHARC ALU pipeline
What are the characteristics of DSP algorithms?
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
TigerSHARC processor and evaluation board
VisualDSP++ and Test Driven Development What happened last lecture?
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
Assembly Language Review
Understanding the TigerSHARC ALU pipeline
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
Moving Arrays -- 2 Completion of ideas needed for a general and complete program Final concepts needed for Final DMA.
Thermal arm-wrestling
Using Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
Moving Arrays -- 2 Completion of ideas needed for a general and complete program Final concepts needed for Final DMA.
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
* M. R. Smith 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint.
General Optimization Issues
Concept of TDD Test Driven Development
Explaining issues with DCremoval( )
General Optimization Issues
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
Chapter 12 Pipelining and RISC
Data Structures & Algorithms
Thermal arm-wrestling
Building a simple loop using Blackfin assembly code
Understanding the TigerSHARC ALU pipeline
A first attempt at learning about optimizing the TigerSHARC code
Lecture 4: Instruction Set Design/Pipelining
Working with the Compute Block
A first attempt at learning about optimizing the TigerSHARC code
Building tests and code for a “software radio”
Presentation transcript:

Getting serious about “going fast” on the TigerSHARC What are the characteristics of DSP algorithms?

DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Tackled today What are the basic characteristics of a DSP algorithm? A near perfect “starting” example DCRemoval( ) has many of the features of the FIR filters used in all the Labs Testing the performance of the CPP version First assembly version – using I-ALU operations – testing and timing Code will be examined in more detail in the next lecture 2/19/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

IEEE Micro Magazine Article How RISCy is DSP? Smith, M.R.; Micro, IEEE  ,Volume: 12 , Issue: 6 , Dec. 1992 Pages:10 - 23 Available on line via the library “Electronic web links” 2/19/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

Characteristics of an FIR algorithm Involves one of the three basic types of DSP algorithms FIR (Type 1), IIR (Type 2) and FFT (Type 3) Representative of DSP equations found in filtering, convolution and modeling Multiplication / addition intensive Simple format within a (long) loop Many memory fetches of fixed and changing data Handle “infinite amount of input data” – need FIFO buffer when handling ON-LINE data All calculations “MUST” be completed in the time interval between samples 2/19/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

Comparing IIR and FIR filters Infinite Impulse Response filters – few operations to produce output from input for each IIR stage Finite Impulse Response filters – many operations to produce output from input. Long FIFO buffer which may require as many operations As FIR calculation itself. Easy to optimize 2/19/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

DCRemoval( ) part of SDR My version Memory intensive Addition intensive Loops for main code FIFO implemented as circular buffer 2/19/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Memory intensive Addition intensive Loops for main code FIFO implemented as circular buffer Not as complex as FIR, but many of the same requirements Easier to handle You use same ideas in optimizing FIR over Labs 2 and 3 Two issues – speed and accuracy. Develop suitable tests for CPP code and check that various assembly language versions satisfy the same tests 2/19/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

E-TDD format of DCRemoval( ) perhaps a little unsophisticated Clear the internal buffer Put in one known value with known result (based on MY implementation If algorithm works for long enough then gives the correct answer 2/19/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada First attempt – “ENCM415 approach” Use the integer ALU operations (I-ALU) Why? Looks less complex that other options Learn one thing at a time Can be done “using direct translation of C++ (working code) Tests 1) Can we call and return from the assembly code routine? – understanding the C++ calling conventions 2) Does the assembly code routine give the same result as the C++ version? 3) How does the assembly code routine’s performance compare to the C++ version? and hidden (IMPLICIT) test – are there any errors in jumping backwards and forwards between C++ and many assembly code routines – detailed understanding of the C++ calling conventions 2/19/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Call and return test Basically – if the code gets here it is probably that we did not crash the system I use a cut-and-paste approach to develop code variants. This test is (embarrassingly) useful. 2/19/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

Initially we expect the code to fail to work correctly If the code works initially, then it is doing so by accident Use XF_CHECK_EQUAL( ) Expected to fail NOTE: This test is just a “cut-and-paste” version of C++ test with three changes of function name 2/19/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Timing test Initially – since all we do is “call and return” -- we expect (trivially) fast code Issues Some algorithms may optimize better when called many times – cache and coding issues Does it matter whether the function is called once, 10 or 100 times? Call the function to test within a loop – make sure that the loop overhead for calling the function does not compromise the timing of the test – may be important if we develop very optimized code, and every last cycle (between interrupts) counts 2/19/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Timing test Once 10 times 100 times Normalized the timing tests to “process the function once” Need to develop various other routines to make tests work -- DoNothing loop, run C++ and assembly code routines in a loop May not be correctly performing timing – but gives initial concepts 2/19/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

Other functions needed to run the test Do Nothing Careful – may be optimized to “nothing” C++ function loop J-ALU function loop 2/19/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

Steps -- Manual technique Add tests to project Build connect file so that tests will be activate Note file name and directory name 2/19/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

Steps -- E-TDD Gui technique Add tests to project Build connect file so that tests will be activate 2/19/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Build and run the code Build and run manually BUILD PROJECT and DEBUG | RUN Or build and run using E-TDD GUI 2/19/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

Use build failure information to determine assembly code function name Required name for void DCremovalASM_JALU(int *, int *) _DCremoval_JALU__FPiT1 2/19/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

Write ASM “call-and-return”, then run the test Simple ASM stub GHOST BREAKPOINT – A break point that is set in the code “some how” – completely random, but seems to occur after making big changes in a project – number of ways of handling them 2/19/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

Proper test run and exit – lib_prog_term Yellow indicates that there are NO failures but some expected failures All successes and failures shown in console window 2/19/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Quick look at the code Will examine in more detail in next class void DCremovalASM(int *, int *) Setting up the static arrays Defining and then setting pointers Moving incoming parameters in FIFO Summing the FIFO values Performing (FAST) division Returning the correct values Updating the FIFO in preparation for next time this function is called – discarding oldest value, and “rippling” the FIFO to make the “newest” FIFO slot empty 2/19/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

Developing the assembly code static arrays – “section data1” 2/19/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Define the (250) register names for code maintainability (and marking) ease Actual static array declaration DEFINE pointers into arrays DEFINE temps DEFINE Inpars SET pointers into arrays 2/19/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Key and common error Same as in C++ There is a difference between Defining / declaring the pointer register and Placing (setting) a value in the pointer register so it actually points some where Register names – what they are and where they are stored. In an exam, use of a register in this format is “required” but it is preferred, rather than required, that you include the define statements – watch for question wording 2/19/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Value into FIFO buffer RISC processor LOAD and STORE architecture MIPS-like (ENCM369) rather than CISC (ENCM415) Read from memory  register  store to memory 2/19/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Perform sum Hardware loops, some 64-bit and some 32-bit instructions Sum Division 2/19/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

Correcting INPARS and then updating the FIFO buffer Adjust the INPARS remember int * Update FIFO memory using load / store approach SLOW 2/19/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

Adjust tests for expected success 2/19/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Tackled today What are the basic characteristics of a DSP algorithm? A near perfect “starting” example DCRemoval( ) has many of the features of the FIR filters used in all the Labs Testing the performance of the CPP version First assembly version – using I-ALU operations – testing and timing Code will be examined in more detail in the next lecture 2/19/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada