Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.

Slides:



Advertisements
Similar presentations
Microprocessor or Microcontroller Not just a case of “you say tomarto and I say tomayto” M. Smith, ECE University of Calgary, Canada.
Advertisements

Boot Issues Processor comparison TigerSHARC multi-processor system Blackfin single-core.
Daddy! -- Where do instructions come from? Program Sequencer controls program flow and provides the next instruction to be executed Straight line code,
6/2/2015 Labs in ENCM415. Laboratory 2 PF control, Copyright M. Smith, ECE, University of Calgary, Canada 1 Temperature Sensor Laboratory 2 Part 2 – Developing.
Thermal arm-wrestling Design of a video game using two programmable flags (PF) interrupts Tutorial on handling 2 Hardware interrupts from an external device.
Building a simple loop using Blackfin assembly code M. Smith, Electrical and Computer Engineering, University of Calgary, Canada.
Specialized Video (8-bit) and Vector (16-bit) Instructions on the Blackfin There is always a “MAKE-UP-YOUR-QUESTION-AND-ANSWER-IT” Question on a Dr. Smith.
Software and Hardware Circular Buffer Operations First presented in ENCM There are 3 earlier lectures that are useful for midterm review. M. R.
Microprocessor or Microcontroller Not just a case of “you say tomarto and I say tomayto” M. Smith, ECE University of Calgary, Canada.
Core Timer Code Development How you could have done the Take- Home Quiz using a test driven development (TDD) approach.
Specialized Video (8-bit) and Vector (16-bit) Instructions on the Blackfin Expand on these ideas for Q9 question and answer on the final.
Understanding the Blackfin ADSP-BF5XX Assembly Code Format
Microprocessor or Microcontroller Not just a case of “you say tomarto and I say tomayto” M. Smith, ECE University of Calgary, Canada.
Laboratory 1 – ENCM415 Familiarization with the Analog Devices’ VisualDSP++ Integrated Development Environment.
Microprocessor or Microcontroller Not just a case of “you say tomarto and I say tomayto” M. Smith, ECE University of Calgary, Canada.
Developing a bicycle speed-o-meter Midterm Review.
A Play Core Timer Interrupts Acted by the Human Microcontroller Ensemble from ENCM511.
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
Blackfin Array Handling Part 1 Making an array of Zeros void MakeZeroASM(int foo[ ], int N);
A first attempt at learning about optimizing the TigerSHARC code TigerSHARC assembly syntax.
Building a simple loop using Blackfin assembly code If you can handle the while-loop correctly in assembly code on any processor, then most of the other.
“Lab. 5” – Updating Lab. 3 to use DMA Test we understand DMA by using some simple memory to memory DMA Make life more interesting, since hardware is involved,
Generating a software loop with memory accesses TigerSHARC assembly syntax.
Input/Output (I/O) Important OS function – control I/O
Help for Lab. 1 Subroutines calling Subroutines
Developing a bicycle speed-o-meter
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
Software and Hardware Circular Buffer Operations
Generating the “Rectify” code (C++ and assembly code)
Generating “Rectify( )”
A Play Core Timer Interrupts
SPI Compatible Devices
Thermal arm-wrestling
DMA example Video image manipulation
The planned and expected
Overview of SHARC processor ADSP Program Flow and other stuff
Trying to avoid pipeline delays
Generating a software loop with memory accesses
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
Lab. 2 – More details – Later tasks
Understanding the TigerSHARC ALU pipeline
Moving Arrays -- 2 Completion of ideas needed for a general and complete program Final concepts needed for Final DMA.
Thermal arm-wrestling
Using Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
A Play Lab. 2 Task 8 Core Timer Interrupts
Moving Arrays -- 2 Completion of ideas needed for a general and complete program Final concepts needed for Final DMA.
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
Expand on these ideas for Q9 question and answer on the final
Getting serious about “going fast” on the TigerSHARC
Thermal arm-wrestling
Concept of TDD Test Driven Development
Explaining issues with DCremoval( )
General Optimization Issues
Lab. 4 – Part 2 Demonstrating and understanding multi-processor boot
Independent timers build into the processor Basis for Lab. 2
Handling Arrays Completion of ideas needed for a general and complete program Final concepts needed for Final.
DMA example Video image manipulation
Developing a bicycle speed-o-meter
Independent timers build into the processor
Developing a bicycle speed-o-meter
Developing a bicycle speed-o-meter
Thermal arm-wrestling
Building a simple loop using Blackfin assembly code
Developing a bicycle speed-o-meter Part 2
Understanding the TigerSHARC ALU pipeline
Mistakes, Errors and Defects
A first attempt at learning about optimizing the TigerSHARC code
Working with the Compute Block
Blackfin Syntax Stores, Jumps, Calls and Conditional Jumps
A first attempt at learning about optimizing the TigerSHARC code
Presentation transcript:

Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

DMA , Copyright M. Smith, ECE, University of Calgary, Canada Tackled today Declaring and initializing arrays off the stack – Review and a little bit of new Useful for background DMA tasks Useful for minimizing total memory used in non-general program Declaring arrays and variables on the stack – Review and a little bit of new Re-entrant code and thread safe Demonstrating memory to memory DMA DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

Declaring fixed arrays in memory – not on the stack short foo_startarray[40]; short far_finalarray[40]; void HalfWaveRectifyASM( ) { // Take the signal from foo_startarray[ ] and rectify the signal // Half wave rectify – if > 0 keep the same; if < 0 make zero // Full wave rectify – if > 0 keep the same; if < 0 then abs value // Rectify startarray[ ] and place result in finalarray[ ] for (int count = 0; count < 40; count++) { if (foo_startarray[count] < 0) far_finalarray[count] = 0; else far_finalarray[count] = foo_startarray[count]; } The program code is the same – but the data part is not DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

DMA , Copyright M. Smith, ECE, University of Calgary, Canada Attempt 1 .section data1 Tells linker to place this stuff in memory map location data1 .align 4 We know processor works best when we start things on a boundary between groups of 4 bytes [N * 2] We need N short ints We know the processor works with address working in byes Therefore need N * 2 bytes DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

We said “wrong approach” look at memory 20 bytes (16 bits) for N short value in C++ = N * 2 bytes DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

Said “Correct approach NOT what I expected” ASM Array with space for N long ints .var arrayASM[N]; ASM Array with space for N short ints var arrayASM[N / 2]; ASM Array with space for N chars var arrayASM[N / 4]; DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

Better answer is “Look at the assembler manual” DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

Improving what we did before Big warning – external array initialization occurs on “reload” and NOT on “restart” – Understanding why this is true and why it is a problem will solve many issues when programming DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

When DMA might be useful -- Video manipulation Program Wait for picture 1 to come in – video-in Process picture 1 – lots of mathematics perhaps Wait for picture 1 to be transmitted – video out Spending a lot of time waiting rather than doing DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

When DMA might be useful -- Double Buffering Program Wait for picture 2 memory to fill – video-in Picture 3 comes into memory – background DMA Process picture 2 – place into picture 0 location Picture 4 comes into memory – background DMA Process picture 3 – place into picture 1 location Transmit picture 0 – background DMA Picture 0 comes into memory – background DMA Process picture 4 – place into picture 2 location Transmit picture 1– background DMA Picture 1 comes into memory – background DMA Process picture 0 – place into picture 3 location Transmit picture 2 – background DMA Picture 2 comes into memory – background DMA Process picture 1 – place into picture 4 location Transmit picture 3– background DMA DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

We are only going to look at a simple DMA task Normal code P0  address of start_array[0]; P1  address of final_array[0]; R0  max-value needed to transfer R1  How many values already transferred R1 = 0; LOOP: CC = R0 <= R1 IF CC JUMP DONE: R2 = [P0++]; VERY BIG PIPELINE [P1++] = R2; LATENCY ISSUES JUMP LOOP; MANY INTERNAL PROCESSOR STALLS DONE: WHILE WAIT FOR R2 TO BE Do something else READ, STORED and then TRANSMITTED DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

We are only going to look at a simple DMA task DMA_source_address_register  address of start_array[0]; DMA_destination_address_register  address of final_array[0]; DMA_max_count_register  max-value needed to transfer DMA_count_register  How many values already transferred R1 = 0; LOOP: CC = R0 <= R1 IF CC JUMP DONE: DMA_enable = true R2 = [P0++]; DMA transfer happen in background [P1++] = R2; Miminized pipeline issues JUMP LOOP; DONE: Do something else Do something else DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

Write some test so we know how to proceed -- Test 1 Internal memory test – arrays on stack DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

Write some test so we know how to proceed -- Test 2 External memory test – arrays in external SDRAM SDRAM -- MANY MEGS AVAILABLE Addresses hard-coded DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

Write some test so we know how to proceed -- Test 3 Most probable way to use DMA – Store in SLOW external memory Move to process in FAST internal memory, put back into external SDRAM Addresses hard-coded DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

Some results Code details later Debug Mode Release Mode L1  L1 8748 625 L1  L1 DMA 6579 6477 SDRAM  SDRAM 39132 28200 SDRAM  SDRAM DMA 12175 12090 SDRAM  L1 DMA 5265 4836 SDRAM  L1 DMA L1  SDRAM DMA 9792 9276 DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

Memory to memory move Debug Code DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

DMA , Copyright M. Smith, ECE, University of Calgary, Canada Review for final A) What happened here? B) What happened here? C) What happened here? D) Why did this happen? E) What happened here? F) Determine loop efficiency in terms of instructions in terms of cycles / read_write op DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

DMA , Copyright M. Smith, ECE, University of Calgary, Canada Answer questions A B C D E DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

Review for final -- Worksheet F) Determine loop efficiency in terms of cycles / read_write op internal memory -> internal memory size was ? Useful reads ? Useful writes ? Cycles as measured ? cycles / useful mem op Why not an exact number? Instructions in loop? Total # of reads / write ? / loop ? read / writes – around ? cycles DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

DMA , Copyright M. Smith, ECE, University of Calgary, Canada Review for final F) Determine loop efficiency in terms of cycles / read_write op internal memory -> internal memory size was 300 Useful reads 300 Useful writes 300 Cycles 8748 as measured 8748 / 600 = 14.58 Why not an exact number? Instructions in loop? 19 Total # of reads / write 9 / loop 2700 read / writes – around 3 cycles DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

Review for final -- Worksheet F) Determine loop efficiency in terms of cycles / read_write op SDRAM -> SDRAM size was ? Useful reads ? Useful writes ? Cycles as measured ? cycles / useful mem op Why not an exact number? Instructions in loop? Total # of reads / write ? / loop ? read / writes – around ? cycles DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

DMA , Copyright M. Smith, ECE, University of Calgary, Canada Review for final F) Determine loop efficiency in terms of cycles / read_write op SDRAM external -> SDRAM memory Useful reads / writes 300 each Cycles 39132 as measured 39132 / 600 = 65.22 Why not an exact number? Instructions in loop? 19 Total # of reads / write 9 / loop 7 * 300 read / writes internal 2 * 300 read / writes external Time r/w external = 39132 – 2100*3 33000 / 600 = 5.5 cycles Factor of 2 slower DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

Memory to memory move Release Mode DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

DMA , Copyright M. Smith, ECE, University of Calgary, Canada Review for final A) What happened here? B) What happened here? C) What happened here? D) Why did this happen inside loop? E) What happened here? F) Determine loop efficiency in terms of instructions in terms of cycles / read_write op DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

DMA , Copyright M. Smith, ECE, University of Calgary, Canada Answer questions A B C D E DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

DMA , Copyright M. Smith, ECE, University of Calgary, Canada F) Determine loop efficiency in terms of cycles / read_write op internal memory -> internal memory size was 300 Useful reads 300 Useful writes 300 Cycles 625 as measured 625 / 600 = 1.05 Why not an exact number? Instructions in loop? 4 WE WOULD EXPECT 1200 cycles!!!! Where did the difference go? DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

DMA , Copyright M. Smith, ECE, University of Calgary, Canada Worksheet F) Determine loop efficiency in terms of cycles / read_write op SDRAM -> internal memory size was ? Useful reads ? Useful writes ? Cycles ? as measured ? / ? = ? SDRAM access ? cycles L1 memory 1 cycle Would make sense to process in L1 memory? DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

DMA , Copyright M. Smith, ECE, University of Calgary, Canada F) Determine loop efficiency in terms of cycles / read_write op SDRAM -> internal memory size was 300 Useful reads 300 Useful writes 300 Cycles 28200 as measured 28200 / 600 = 47 SDRAM access 47 cycles L1 memory 1 cycle Would make sense to process in L1 memory – so move SDRAM to L1 to process DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

DMA , Copyright M. Smith, ECE, University of Calgary, Canada Worksheet F) Determine loop efficiency in terms of cycles / read_write op SDRAM -> internal memory size was ? Useful reads ? Useful writes ? Cycles ? as measured 300 of those are L1 writes Leaving ? ? / ? = ? SDRAM read before ? cycles SDRAM read now ? cycles L1 -> L1 ? cycle Would make sense to process in L1 memory – so move SDRAM to L1 to process Loads of overhead in SDRAM to SDRAM DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

DMA , Copyright M. Smith, ECE, University of Calgary, Canada F) Determine loop efficiency in terms of cycles / read_write op SDRAM -> internal memory size was 300 Useful reads 300 Useful writes 300 Cycles 4836 as measured 300 of those are L1 writes Leaving 4500 4500 / 300 = 15 SDRAM read before 47 cycles SDRAM read now 15 cycles L1 -> L1 1 cycle Would make sense to process in L1 memory – so move SDRAM to L1 to process Loads of overhead in SDRAM to SDRAM DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

DMA , Copyright M. Smith, ECE, University of Calgary, Canada F) Determine loop efficiency in terms of cycles / read_write op SDRAM -> internal memory size was 300 Useful reads 300 Useful writes 300 Cycles 4836 as measured 300 of those are L1 writes Leaving 4500 4500 / 300 = 15 SDRAM read before 47 cycles SDRAM read now 15 cycles L1 -> L1 1 cycle Would make sense to process in L1 memory – so move SDRAM to L1 to process Loads of overhead in SDRAM to SDRAM DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

DMA , Copyright M. Smith, ECE, University of Calgary, Canada Tackled today Review of handling external arrays from assembly code Arrays declared in another file Arrays declared in this file -- NEW Needed for arrays used by ISRs Arrays declared on the stack Pointers passed as parameters to a subroutine Can’t use arrays on the stack when used by ISR DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019

DMA , Copyright M. Smith, ECE, University of Calgary, Canada Information taken from Analog Devices On-line Manuals with permission http://www.analog.com/processors/resources/technicalLibrary/manuals/ Information furnished by Analog Devices is believed to be accurate and reliable. However, Analog Devices assumes no responsibility for its use or for any infringement of any patent other rights of any third party which may result from its use. No license is granted by implication or otherwise under any patent or patent right of Analog Devices. Copyright  Analog Devices, Inc. All rights reserved. DMA , Copyright M. Smith, ECE, University of Calgary, Canada 1/1/2019