Process for changing “C-based” design to SHARC assembler ADDITIONAL EXAMPLE M. R. Smith, Electrical and Computer Engineering University of Calgary, Canada.

Slides:



Advertisements
Similar presentations
1 Lecture 4: Procedure Calls Today’s topics:  Procedure calls  Large constants  The compilation process Reminder: Assignment 1 is due on Thursday.
Advertisements

This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
Systematic development of programs with parallel instructions SHARC ADSP2106X processor M. Smith, Electrical and Computer Engineering, University of Calgary,
6/2/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
Systematic development of programs with parallel instructions SHARC ADSP2106X processor M. Smith, Electrical and Computer Engineering, University of Calgary,
Building a simple loop using Blackfin assembly code M. Smith, Electrical and Computer Engineering, University of Calgary, Canada.
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
6/3/20151 ENCM515 Comparison of Integer and Floating Point DSP Processors M. Smith, Electrical and Computer Engineering, University of Calgary, Canada.
Generation of highly parallel code for TigerSHARC processors An introduction This presentation will probably involve audience discussion, which will create.
Generation of highly parallel code for 2106X processors An introduction Developed by M. R. Smith Presented by S. Lei SHARC2000 Workshop, Boston, September.
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
Squish-DSP Application of a Project Management Tool to manage low-level DSP processor resources M. Smith, University of Calgary, Canada ucalgary.ca.
TigerSHARC processor General Overview. 6/28/2015 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada 2 Concepts tackled Introduction to.
Blackfin Array Handling Part 2 Moving an array between locations int * MoveASM( int foo[ ], int fee[ ], int N);
Generating “Rectify( )” Test driven development approach to TigerSHARC assembly code production Assembly code examples Part 1 of 3.
Efficient Loop Handling for DSP algorithms on CISC, RISC and DSP processors M. Smith, Electrical and Computer Engineering, University of Calgary, Alberta,
Blackfin Array Handling Part 1 Making an array of Zeros void MakeZeroASM(int foo[ ], int N);
Systematic development of programs with parallel instructions SHARC ADSP21XXX processor M. Smith, Electrical and Computer Engineering, University of Calgary,
A first attempt at learning about optimizing the TigerSHARC code TigerSHARC assembly syntax.
Building a simple loop using Blackfin assembly code If you can handle the while-loop correctly in assembly code on any processor, then most of the other.
Generating a software loop with memory accesses TigerSHARC assembly syntax.
Continuous Random Variables
Final Project Presentation
واشوقاه إلى رمضان مرحباً رمضان
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
Microcoded CCU (Central Control Unit)
Continuous Random Variables
Program Flow on ADSP2106X SHARC Pipeline issues
Overview of SHARC processor ADSP and ADSP-21065L
The planned and expected
Overview of SHARC processor ADSP Program Flow and other stuff
Generating a software loop with memory accesses
ENCM K Interrupts Theory and Practice
Comparing 68k (CISC) with 21k (Superscalar RISC DSP)
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
ENCM515 Standard and Custom FIR filters for Lab. 4
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
* M. R. Smith, University of Calgary, Alberta,
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
Comparing 68k (CISC) with 21k (Superscalar RISC DSP)
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
Using Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
Overview of TigerSHARC processor ADSP-TS101 Compute Operations
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
Lab. 2 Modeling an audio channel with delays on ADSP21061
Hints for Post-Lab Quiz 1
-- Tutorial A tool to assist in developing parallel ADSP2106X code
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
* M. R. Smith 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint.
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
* 2000/08/1307/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these.
Getting serious about “going fast” on the TigerSHARC
* L. E. Turner and M. R. Smith, University of Calgary, Alberta, Canada
Explaining issues with DCremoval( )
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
General Optimization Issues
Reference Counted Touchables Design
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
Tutorial on Post Lab. 1 Quiz Practice for parallel operations
Overview of SHARC processor ADSP-2106X Compute Operations
Overview of SHARC processor ADSP-2106X Compute Operations
Overview of SHARC processor ADSP-2106X Memory Operations
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
* 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items.
* M. R. Smith 07/16/96 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint.
ENCM515 Standard and Custom FIR filters
Presentation transcript:

Process for changing “C-based” design to SHARC assembler ADDITIONAL EXAMPLE M. R. Smith, Electrical and Computer Engineering University of Calgary, Canada ucalgary.ca This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during your presentation In Slide Show, click on the right mouse button Select “Meeting Minder” Select the “Action Items” tab Type in action items as they come up Click OK to dismiss this box This will automatically create an Action Item slide at the end of your presentation with your points entered.

6/2/2015 ENEL Translating “C-based” design to code Copyright 2 / 20 To be tackled today Need to set up review process to look for, and remove, common errors when writing assembly code Process to translate a “C” program involving arrays into SHARC code Comparison of timings for non-optimized code, optimized code, hardware loops, super-scalar architecture

6/2/2015 ENEL Translating “C-based” design to code Copyright 3 / 20 Code review Sheet -- PSP Need to identify common errors -- CODE REVIEW Constructs to link to “C” Are all declarations at the start of subroutine -- #define etc CONSTANTS, variables, FunctionNames, EXPORT leading underscores,.segment declarations Assembly syntax Self documentating code, clanguage_register_defines.I Missing semicolons -- CODE REVIEW Conditional Delayed Branching properly handled -- DESIGN REVIEW Load/Store Architecture -- DESIGN REVIEW Can’t do R1 = R Becomes temp = 4; R1 = R2 + temp; Register operations, volatile, order of I and M registers -- CODE REVIEW What is your favourite error to waste time?

6/2/2015 ENEL Translating “C-based” design to code Copyright 4 / 20 Simpler example of array handling void MakeRamp{ float re_array[ ], int num ) { int count; for (count = 0; count < num; count++) { re_array[count] = count; } } THINGS TO WORRY ABOUT DURING TRANSLATION Prologue, EpilogueREVIEW How handleLOAD/STORE architecture How handlefor-loop How handle = count operation (int to float conversion) How handle stepping through array -- post modify How handlehow handle parameter passing

6/2/2015 ENEL Translating “C-based” design to code Copyright 5 / 20 Step 1 -- int to float conversion Int to float conversion must be handled by YOU void MakeRamp{ float re_array[ ], int num ) { int count; for (count = 0; count < num; count++) { re_array[count] = (float) count; } } THINGS TO WORRY ABOUT DURING TRANSLATION Prologue, EpilogueREVIEW How handleLOAD/STORE architecture How handlefor-loop How handle = count operation (int to float) How handle stepping through array -- post modify How handlehow handle parameter passing

6/2/2015 ENEL Translating “C-based” design to code Copyright 6 / 20 Watch for SHARC assembler nastiness The code F2 = dm(I1,1) disassembles as R2 = dm(I1,1) MEANING there is no special instruction needed as F2 and R2 are the same register. Translation handled by assembler F2 = 1.0 is translated as R2 = bit pattern for 1.0 MEANING there is no special instruction needed as F2 and R2 are the same register. Translation handled by assembler NASTY SIDE EFFECT F2 = 1 is translated as R2 = bit pattern for 1 and is NOT TRANSLATED as R2 = bit pattern for (float) 1 so you get the effect of F2 = 1.0 * which is not what you intended. Make sure that you always add the decimal point.0

6/2/2015 ENEL Translating “C-based” design to code Copyright 7 / 20 Use local pointer set to pointer value passed on the stack void MakeRamp{ float *re_array, int num ) { int count NOT A USEABLE POINTER dm float *arraypt = re_array; for (count = 0; count < num; count++) { *arraypt = (float) count; arraypt++; } } THINGS TO WORRY ABOUT DURING TRANSLATION Prologue, EpilogueREVIEW How handleLOAD/STORE architecture How handlefor-loop How handle stepping through array -- post modify How handlehow handle parameter passing Step 2 -- Convert to use local pointers (in scope)

6/2/2015 ENEL Translating “C-based” design to code Copyright 8 / 20 Step 3 -- load-store architecture Use registers variables and scratch register void MakeRamp{ register float *re_array, register int num ) { register int count = GARBAGE; register float scratch = GARBAGE; register dm float *arraypt = re_array; for (count = 0; count < num; count++) { scratch = (float) count; // *arraypt = (float) count *arraypt = scratch; arraypt++; } } THINGS TO WORRY ABOUT DURING TRANSLATION Prologue, EpilogueREVIEW How handleLOAD/STORE architecture How handlefor-loop How handlehow handle parameter passing

6/2/2015 ENEL Translating “C-based” design to code Copyright 9 / 20 Step 4 -- convert the for-loop void MakeRamp{ register float *re_array, register int num ) { register int count = GARBAGE; register float scratch = GARBAGE; register dm float *arraypt = re_array; count = 0; while (count < num) { scratch = (float) count; *arraypt = scratch; arraypt++; count = count + 1; } } THINGS TO WORRY ABOUT DURING TRANSLATION Prologue, EpilogueREVIEW How handlefor-loop -- 68K like -- NOT OPTIMIZED How handlehow handle parameter passing

6/2/2015 ENEL Translating “C-based” design to code Copyright 10 / 20 Step 5 -- Prologue -- which registers? void MakeRamp{ register float *re_array, register int num ) { INPAR1 (R4)INPAR2 (R8) NOW SEE WHY INPAR1 NOT POINTER register int count = GARBAGE; scratchR1 register float scratch = GARBAGE; scratchF2 (not F1) register dm float *arraypt = re_array; scratchDMpt count = 0; while (count < num) { scratch = (float) count; *arraypt = scratch; arraypt++; count = count + 1; } } Prologue -- leaf routine -- no stack changes Epilogue -- since leaf routine -- standard 5 lines How handleparameter passing

6/2/2015 ENEL Translating “C-based” design to code Copyright 11 / 20 Step 6 -- Handle loop -- Part 1 void MakeRamp{ register float *re_array, register int num { #define numR4 INPAR2 #define countR1 scratchR1 // register int count = GARBAGE; countR1 = 0;// count = 0; _MR_WHILE: // while (count < num) { ????// Loop body countR1 = countR1 + 1; // count = count + 1; JUMP(PC, _MR_WHILE) (DB); // } nop; nop; // } end MakeRamp()

6/2/2015 ENEL Translating “C-based” design to code Copyright 12 / 20 Step 7 -- Handle loop -- Part 2 void MakeRamp{ register float *re_array, register int num ) { #define numINPAR2 INPAR2 #define countR1 scratchR1 // register int count; countR1 = 0; // count = 0; MR_WHILE: COMP(countR1,numINPAR2); // while (count < num) { if GT JUMP(PC, MR_ENDLOOP) (DB); nop; nop; ???? // Loop body countR1 = countR1 + 1; // count = count + 1; JUMP(PC, _MR_WHILE) (DB); // }nop; MR_ENDLOOP: 5 magic lines of code for “C” return // }

6/2/2015 ENEL Translating “C-based” design to code Copyright 13 / 20 Reminder of what trying to do! void MakeRamp{ register float *re_array, register int num ) { register int count; register float scratch, *arraypt = re_array; for (count = 0; count < num; count++) { scratch = (float) count; *arraypt = scratch; arraypt++; } }

6/2/2015 ENEL Translating “C-based” design to code Copyright 14 / 20 Step 8 -- handle loop body // void MakeRamp{ register float *re_array, register int num ) {.segment seg_pmco;.global _MakeRamp; _MakeRamp: #define re_arrayINPAR1 INPAR1 // register int count; #define tempF2 scratchF2 // register float temp = GARBAGE #define arraypt scratchDMpt // *arraypt = GARBAGE; arraypt = re_arrayINPAR1; // *arraypt = re_array; // for (count = 0; count < num; count++) { tempF2 = FLOAT countR1; // temp = (float) count; dm(arraypt, 1) = tempF2; // *arraypt = temp; // arraypt++; // } // }

6/2/2015 ENEL Translating “C-based” design to code Copyright 15 / 20 Final “C” Code Translation Code as directly translated Possible Optimization Decide if it is worth the effort of optimizing? Optimized Don’t do it unless asked for this course in quizzes and labs Very easy to get it wrong

6/2/2015 ENEL Translating “C-based” design to code Copyright 16 / 20 #define re_arrayINPAR1 INPAR1 #define numINPAR2 INPAR2.global _MakeRamp; _MakeRamp: #define countR1 scratchR1 #define arraypt scratchDMpt countR1 = 0; arraypt = re_arrayINPAR1; MR_WHILE: COMP(countR1, numINPAR2); if GT JUMP(PC, MR_ENDLOOP) (DB); nop; nop; #define tempF2 scratchF2 tempF2 = FLOAT countR1; dm(arraypt, 1) = tempF2; countR1 = countR1 + 1; JUMP(PC, MR_WHILE) (DB);nop; MR_ENDLOOP: 5 magic lines of code for “C” return

6/2/2015 ENEL Translating “C-based” design to code Copyright 17 / 20 Final “C” Code Translation Code as directly translated (7 + num *10 instr) Possible Optimization -- Worth the effort? Best case would be (7 + num * 6 instructions) Optimized Don’t do it unless asked for this course in quizzes and labs Very easy to get it wrong Improved algorithm using DSP architecture Hardware loop capability (8 + num * 2 instructions) Activate Super-Scalar capability (7 + num * 1 instructions)

6/2/2015 ENEL Translating “C-based” design to code Copyright 18 / 20 #define re_arrayINPAR1 INPAR1 #define numINPAR2 INPAR2.global _MakeRamp; _MakeRamp: #define countR1 scratchR1 #define arraypt scratchDMpt countR1 = 0; CAN’T BE MOVED arraypt = re_arrayINPAR1; CAN’T BE MOVED MR_WHILE: COMP(countR1, numINPAR2); if GT JUMP(PC, MR_ENDLOOP) (DB); nop; #define tempF2 scratchF2 tempF2 = FLOAT countR1; JUMP(PC, MR_WHILE) (DB); dm(arraypt, 1) = tempF2; countR1 = countR1 + 1; MR_ENDLOOP: 5 magic lines of code for “C” return

6/2/2015 ENEL Translating “C-based” design to code Copyright 19 / 20 Final “C” Code Translation Code as directly translated (7 + num *10 instr) Possible Optimization -- Worth the effort? Best case would be (7 + num * 6 instructions) Actual optimized was (7 + num * 7 instructions) Optimized Don’t do it unless asked for this course in quizzes and labs Very easy to get it wrong Improved algorithm using DSP architecture Hardware loop capability (8 + num * 2 instructions) Activate Super-Scalar capability (7 + num * 1 instructions)

6/2/2015 ENEL Translating “C-based” design to code Copyright 20 / 20 Tackled today Need to set up review process to look for, and remove, common errors when writing assembly code Process to translate a “C” program involving arrays into SHARC code Comparison of timings for non-optimized code, optimized code, hardware loops, super-scalar architecture