Presentation is loading. Please wait.

Presentation is loading. Please wait.

Working with the Compute Block

Similar presentations


Presentation on theme: "Working with the Compute Block"— Presentation transcript:

1 Working with the Compute Block
M. R. Smith, ECE University of Calgary Canada

2 Tackled today Problems with using I-ALU as an “integer” processor
TigerSHARC processor architecture What features are available for DSP optimization, and what “do we have to worry about” when using these features? Moving the DCremoval( ) over to the X Compute block Using test macros – useful to know, real time waster for the labs in this class. 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada

3 DCRemoval( ) Not as complex as FIR, but many of the same requirements
Memory intensive Addition intensive Loops for main code FIFO implemented as circular buffer Not as complex as FIR, but many of the same requirements Easier to handle You use same ideas in optimizing FIR over Labs 2 and 3 Two issues – speed and accuracy. Develop suitable tests for CPP code and check that various assembly language versions satisfy the same tests 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada

4 Set up time In principle 1 cycle / instruction
2 + 4 instructions 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada

5 First key element – Sum Loop -- Order (N) Second key element – Shift Loop – Order (log2N)
4 instructions N * 5 instructions 1 + 2 * log2N 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada

6 Third key element – FIFO circular buffer -- Order (N)
6 3 6 * N 2 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada

7 Time in theory Set up pointers to buffers Insert values into buffers 2
SUM LOOP SHIFT LOOP Update outgoing parameters Update FIFO Function return 2 4 4 + N * 5 1 + 2 * log2N 6 3 + 6 * N N + 2 log2N N = 128 – instructions = 1444 1444 cycles delay cycles C++ debug mode – 9500 cycles??????? 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada

8 Is the code too slow? Code is slow IFF (if and only if) you don’t have 2,500 cycles available to perform this part of the software defined radio algorithm. Other components of SDR + other components of complete system must complete within the time between 2 samples at 48 kHz 48,000 interrupts per second 500,000,000 cycles available every second 10,500 cycles available per interrupt My ball-park – Never design code that at the design stage takes more than 50% of available cycles. From take-home quiz 1 – DCremoval( ) – 17% of code time – Need 6 * 2,500 cycles = 15,000 for SDR component alone 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada

9 The code is too slow because we are not taking advantage of the available resources
Bring in up to 128 bits (4 instructions) per cycle Ability to bring in 4 32-bit values along J data bus (data1) and 4 along K bus (data2) Perform address calculations in J and K ALU – single cycle hardware circular buffers Perform math operations on both X and Y compute blocks Background DMA activity Off-load some of the processing to the second processor 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada

10 Version 2 – Move the algorithm component from I-ALU over to Compute Block
5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada

11 Steps for faster code development Cut and paste old code – Change name only
_DCremovalASM_JALU__FPiT1 Becomes _DCremovalASM_Compute__FPiT1 Run test to confirm 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada

12 Add timing and execution tests
5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada

13 Element we want to change
void DCremovalASM(int *, int *) Setting up the static arrays Defining and then setting pointers Moving incoming parameters in FIFO Summing the FIFO values Performing (FAST) division Returning the correct values Updating the FIFO in preparation for next time this function is called – discarding oldest value, and “rippling” the FIFO to make the “newest” FIFO slot empty 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada

14 Perform sum – using I-ALU
5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada

15 Perform sum – using Compute Block
#define left_sum_XR6 XR left_sum_XR6 = 0;; #define left_XR2 XR2 left_XR2 = [left_buffpt_J0 + i_J8];; left_sum_XR6 = R6 + R2;; NOTE SYNTAX left_sum_XR6 = ASHIFT R6 BY -7;; 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada

16 Final sum code Don’t use XR6 = J31
J31 is NOT A ZERO if used with COMPUTE block – condition code reg. 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada

17 Other necessary changes
5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada

18 Time in theory Set up pointers to buffers 2 Insert values into buffers
SUM LOOP SHIFT LOOP Update outgoing parameters Update FIFO Function return 2 4 4 + N * 5 Was * log2N 6 3 + 6 * N N Was N + 2 log2N N = 128 – instructions = 1430 Was cycles 1444 cycles delay cycles 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada

19 Time in Practice Set up pointers to buffers Insert values into buffers SUM LOOP SHIFT LOOP Update outgoing parameters Update FIFO Function return 2 4 4 + N * 5 Was * log2N 6 3 + 6 * N N Was N + 2 log2N N = 128 – instructions = 1430 delay cycles = 1730 cycles Was 2,500 cycles 1444 cycles delay cycles Improved more than expected as accidentally making better use of available resources 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada

20 Possible explanation of speed improvement
Must wait for value to arrive from memory Must wait for I-ALU to become available so can calculate address or do add Remember – working in a loop Wait for I-ALU Savings 2 * N = 256 Actual 700 = 6 * N 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada

21 Next stage in improving code speed Software and hardware circular buffers
Set up pointers to buffers Insert values into buffers SUM LOOP SHIFT LOOP Update outgoing parameters Update FIFO Function return 2 4 4 + N * 5 Was * log2N 6 3 + 6 * N N Was N + 2 log2N N = 128 – instructions = 1430 delay cycles = 1730 cycles 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada

22 Making the tests quicker to develop
Is there an alternative to – cut-and-paste? Do you want to bother to learn and then use it? 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada

23 Develop Call-RETURN test macro
5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada

24 Develop – Validate operation test macro
In practice: Not as trivial an exercise as it looks Acts as “1 long C++ line”. Any error message – unspecific My favourite error Tabs and / or spaces after final \ on each line Solution – use “Home / End” keys to check that \ is at the end of the line 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada

25 Timing test macro – not trivial
Need a new special loop control function generated for each test Name must change Print statement contents must change 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada

26 Some standard “C++” macro issues
A #define must be one line “by definition” So cheat – use final \ -- says newline that follows the \ is not a “new-line character” #define FOO_MACRO(FEE, FUM) \ /* Must have C like comments */ \ /* # character means – turn parameter to string array */ \ puts(#FEE); \ /* ## character means – concatenate parameter \ DoLoop##FUM( ); \ /* Watch out for trailing ; and } – may be required / definitely not wanted */ \ THIS BREAK OVER 2 LINES -- ILLEGAL ; 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada

27 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada

28 Using macros Learning how to do the concatenation and print formatting macros took me about 10 times as long as just cut-and-pasting In the labs – you use test macros at your own risk – the T.A.s and myself will not help you debug them In the exams – you can’t use macros Please note, I have defined macros and am now using them Exam macro -- PLEASE_ANSWER_EXAMQUESTION_FOR_ME( ) causes the marker macro ZERO_OUT_OF_100( ) to be activated Personal opinion – learn the concept for use at a later time – don’t worry about them in the labs 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada

29 Tackled today Problems with using I-ALU as an “integer” processor
TigerSHARC processor architecture What features are available for DSP optimization, and what “do we have to worry about” when using these features? Moving the DCremoval( ) over to the X Compute block Using test macros – useful to know, real time waster for the labs in this class. 5/26/2019 Working with the COMPUTE Block, M. Smith, ECE, University of Calgary, Canada


Download ppt "Working with the Compute Block"

Similar presentations


Ads by Google