Explaining issues with DCremoval( )

Slides:



Advertisements
Similar presentations
1/1/ / faculty of Electrical Engineering eindhoven university of technology Speeding it up Part 3: Out-Of-Order and SuperScalar execution dr.ir. A.C. Verschueren.
Advertisements

Anshul Kumar, CSE IITD CSL718 : VLIW - Software Driven ILP Hardware Support for Exposing ILP at Compile Time 3rd Apr, 2006.
Processor Architecture Needed to handle FFT algoarithm M. Smith.
Blackfin BF533 EZ-KIT Control The O in I/O Activating a FLASH memory “output line” Part 2.
Detailed look at the TigerSHARC pipeline Cycle counting for the IALU versionof the DC_Removal algorithm.
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
Software and Hardware Circular Buffer Operations First presented in ENCM There are 3 earlier lectures that are useful for midterm review. M. R.
TigerSHARC CLU Closer look at the XCORRS M. Smith, University of Calgary, Canada
Detailed look at the TigerSHARC pipeline Cycle counting for COMPUTE block versions of the DC_Removal algorithm.
TigerSHARC CLU Closer look at the XCORRS M. Smith, University of Calgary, Canada
TigerSHARC processor General Overview. 6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 2 Concepts tackled Introduction to.
Blackfin BF533 EZ-KIT Control The O in I/O Activating a FLASH memory “output line” Part 2.
Processor Architecture Needed to handle FFT algoarithm M. Smith.
Blackfin Array Handling Part 2 Moving an array between locations int * MoveASM( int foo[ ], int fee[ ], int N);
Understanding the TigerSHARC ALU pipeline Determining the speed of one stage of IIR filter – Part 3 Understanding the memory pipeline issues.
Averaging Filter Comparing performance of C++ and ‘our’ ASM Example of program development on SHARC using C++ and assembly Planned for Tuesday 7 rd October.
Understanding the TigerSHARC ALU pipeline Determining the speed of one stage of IIR filter – Part 2 Understanding the pipeline.
Generating “Rectify( )” Test driven development approach to TigerSHARC assembly code production Assembly code examples Part 1 of 3.
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
Blackfin Array Handling Part 1 Making an array of Zeros void MakeZeroASM(int foo[ ], int N);
A first attempt at learning about optimizing the TigerSHARC code TigerSHARC assembly syntax.
Building a simple loop using Blackfin assembly code If you can handle the while-loop correctly in assembly code on any processor, then most of the other.
“Lab. 5” – Updating Lab. 3 to use DMA Test we understand DMA by using some simple memory to memory DMA Make life more interesting, since hardware is involved,
Generating a software loop with memory accesses TigerSHARC assembly syntax.
Assembly language.
MIPS Instruction Set Advantages
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
Software and Hardware Circular Buffer Operations
General Optimization Issues
TigerSHARC processor General Overview.
Generating the “Rectify” code (C++ and assembly code)
Generating “Rectify( )”
A Play Core Timer Interrupts
Introduction to Test Driven Development
Automated Testing Environment
The planned and expected
Trying to avoid pipeline delays
Generating a software loop with memory accesses
Understanding the TigerSHARC ALU pipeline
What are the characteristics of DSP algorithms?
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
TigerSHARC processor and evaluation board
VisualDSP++ and Test Driven Development What happened last lecture?
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
Understanding the TigerSHARC ALU pipeline
Moving Arrays -- 2 Completion of ideas needed for a general and complete program Final concepts needed for Final DMA.
Using Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
Moving Arrays -- 2 Completion of ideas needed for a general and complete program Final concepts needed for Final DMA.
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
* M. R. Smith 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint.
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
Getting serious about “going fast” on the TigerSHARC
General Optimization Issues
Concept of TDD Test Driven Development
General Optimization Issues
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
Chapter 12 Pipelining and RISC
Blackfin BF533 EZ-KIT Control The O in I/O
Data Structures & Algorithms
Building a simple loop using Blackfin assembly code
Overview of SHARC processor ADSP-2106X Memory Operations
Understanding the TigerSHARC ALU pipeline
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
A first attempt at learning about optimizing the TigerSHARC code
Working with the Compute Block
A first attempt at learning about optimizing the TigerSHARC code
Building tests and code for a “software radio”
Presentation transcript:

Explaining issues with DCremoval( ) Common problems to avoid

DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Tackled today Testing the performance of the CPP version First assembly version – using I-ALU operations – testing and timing Details of the code 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Memory intensive Addition intensive Loops for main code FIFO implemented as circular buffer Not as complex as FIR, but many of the same requirements Easier to handle You use same ideas in optimizing FIR over Labs 2 and 3 Two issues – speed and accuracy. Develop suitable tests for CPP code and check that various assembly language versions satisfy the same tests 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Call and return test Basically – if the code gets here it is probably that we did not crash the system I use a cut-and-paste approach to develop code variants. This test is (embarrassingly) useful. 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

Initially we expect the code to fail to work correctly If the code works initially, then it is doing so by accident Use XF_CHECK_EQUAL( ) Expected to fail NOTE: This test is just a “cut-and-paste” version of C++ test with three changes of function name 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Timing test Once 10 times 100 times Normalized the timing tests to “process the function once” Need to develop various other routines to make tests work -- DoNothing loop, run C++ and assembly code routines in a loop May not be correctly performing timing – but gives initial concepts 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

Other functions needed to run the test Do Nothing Careful – may be optimized to “nothing” C++ function loop J-ALU function loop 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

Use build failure information to determine assembly code function name Required name for void DCremovalASM_JALU(int *, int *) _DCremoval_JALU__FPiT1 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

Proper test run and exit – lib_prog_term Yellow indicates that there are NO failures but some expected failures All successes and failures shown in console window 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Quick look at the code void DCremovalASM(int *, int *) Setting up the static arrays Defining and then setting pointers Moving incoming parameters in FIFO Summing the FIFO values Performing (FAST) division Returning the correct values Updating the FIFO in preparation for next time this function is called – discarding oldest value, and “rippling” the FIFO to make the “newest” FIFO slot empty 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

Developing the assembly code static arrays – “section data1” In later algorithms we will show that using multiple data sections in different parts of TigerSHARC memory allow us to bring in 256-bits of data per cycle 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

Developing the assembly code static arrays – “section data1” 2) .align 4; Later will use ability to bring in 4 words (32-bits) of data at the same time. Works best when the array starts on a 4 word boundary 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

Developing the assembly code static arrays – “section data1” 3) .var array[128]; The .var syntax allows declaring of “word” arrays. Other syntax for short int and byte arrays NOTE: -- reused .align 4 before next array 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

Developing the assembly code static arrays – “section data1” 4) .var array[128]; Array is “static” – known in this file only – as we don’t globalize the name TRUE or FALSE? KEY – switch between data and program memory is “really key” 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Define the (250) register names for code maintainability (and marking) ease Actual static array declaration DEFINE pointers into arrays DEFINE temps DEFINE Inpars SET pointers into arrays 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Value into FIFO buffer RISC processor LOAD and STORE architecture – Use pointer value (came in J4) to read “left value” passed in by reference into a register MIPS – like rather than CISC Now place this value into last element of FIFO array (make sure that not one element out. NOTE – BUFFERSIZE – 1 is converted BY ASSEMBLER and does not happen at run time Using index with pre-modify offset – J2 is not changed 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Perform sum Hardware loop 1 Set up an index i_J8 to be used as offset into Array – note how this syntax follows C++ Set up LOOP COUNTER 0 Perform test and jumo 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Perform sum Hardware loop 2 Set up LOOP COUNTER 0 Division by 128 is performed by shift (What did C++ do) Note that with the I-ALU you can only shift by 1 bit (not a barrel shifter). Perform test and jumo 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

Some obvious multiple instructions. Can they go wrong? Note Add occurs whether the jump does or does not occur Should this be a predicted or non-predicted jump One shift too many? 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

Correcting INPARS and then updating the FIFO buffer Adjust the INPARS remember int * Update FIFO memory using load / store approach SLOW 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

Adjust tests for expected success 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Run the tests 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Examine the timing In “debug” mode, we are already “beating” the compiler” Questions Why is C++ slower? Is it doing something that us (in ignorance) don’t know we need to do? What happens with “release mode”? 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Can you explain this 10% change in the results depending on how many tests? Timing with all the tests Timing Test only 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada

DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada Tackled today What are the basic characteristics of a DSP algorithm? A near perfect “starting” example DCRemoval( ) has many of the features of the FIR filters used in all the Labs Testing the performance of the CPP version First assembly version – using I-ALU operations – testing and timing Code will be examined in more detail in the next lecture 2/23/2019 DC removal Lecture 1, M. Smith, ECE, University of Calgary, Canada