Daddy! -- Where do instructions come from? Program Sequencer controls program flow and provides the next instruction to be executed Straight line code,

Daddy! -- Where do instructions come from? Program Sequencer controls program flow and provides the next instruction to be executed Straight line code, jumps and loops

6/1/2015 Program sequencer, Copyright M. Smith, ECE, University of Calgary, Canada 2 Tackled today Program sequencer Linear flow of instruction Why not discuss idle instruction here? Jumps Software loops – normal and more efficient “down-counting” loops Special Motorola MC68XXX software loop instructions Loops – hardware loops Subroutines -- – next lecture Interrupts and Exceptions – next lecture Idle – next lecture

6/1/2015 Program sequencer, Copyright M. Smith, ECE, University of Calgary, Canada 3 Example code Look at moving elements from array fooHere[ ] to farAway[ ] using various instruction modes Straight line coding In a loop – please make sure that you understand the terminology – exam question Software loop Hardware loop In a subroutine Via an interrupt

6/1/2015 Program sequencer, Copyright M. Smith, ECE, University of Calgary, Canada 4 Linear program flow Program flow on the chip is mainly linear The processor fetches and executes program instructions sequentially Non sequential structures (instructions and supporting registers) direct the processor to execute an instruction that is not the next sequential address

6/1/2015 Program sequencer, Copyright M. Smith, ECE, University of Calgary, Canada 5 Array movement.extern _fooHere, _farAway;extern long fooHere[5], farAway[5] P0.H = _fooHere; P0.L = _fooHere; P1.H = _farAway; P1.L = _farAway; R0 = [P0]; [P1] = R0; farAway[0] = fooHere[0]; R0 = [P0 + ?]; [P1 + ?] = R0; farAway[1] = fooHere[1]; R0 = [P0 + ??]; [P1 + ??] = R0; farAway[2] = fooHere[2]; farAway[3] = fooHere[3]; farAway[4] = fooHere[4]; Question – What goes in the place of the ? and ?? when doing loop or when doing [P1 + ?] = R0; W[P1 + ?] = R1; B[P1 + ?] = R2; ANSWER: -- Find out the correct answer – and make sure you do it correctly all the time ANSWER: -- Why worry? DO THE CODE a different way and don’t worry

6/1/2015 Program sequencer, Copyright M. Smith, ECE, University of Calgary, Canada 6 Better solution – let the processor worry about getting the indexing correct!.extern _fooHere;.extern _farAway; extern long fooHere[5], farAway[5] P0.H = _fooHere; P0.L = _fooHere; P1.H = _farAway; P1.L = _farAway; P1.H = _farAway; P0.L = _farAway; R0 = [P0++]; [P1++] = R0; R0 = [P0]; [P1] = R0; farAway[0] = fooHere[0]; R0 = [P0++]; [P1++] = R0; R0 = [P0 + ?]; [P1 + ?] = R0; farAway[1] = fooHere[1]; R0 = [P0++]; [P1++] = R0; R0 = [P0 + ??]; [P1 + ??] = R0; farAway[2] = fooHere[2]; Remember -- P0 will end up pointing PAST the end of the array farAway[3] = fooHere[3]; farAway[4] = fooHere[4];

6/1/2015 Program sequencer, Copyright M. Smith, ECE, University of Calgary, Canada 7 The C++ code we actually developed.extern _fooHere;.extern _farAway; extern long fooHere[5]; extern farAway[5]; extern long fooHere[5], farAway[5]; P0.H = _fooHere; P0.L = _fooHere; long *pt0; pt0 = fooHere; (Actually pt0 = &fooHere[0];) P1.H = _farAway; P1.L = _farAway; long *pt1; pt1 = farAway; (Actually pt1 = &ffarAway[0];) R0 = [P0++]; [P1++] = R0; *pt1++ = *pt0++; farAway[0] = fooHere[0]; R0 = [P0++]; [P1++] = R0; *pt1++ = *pt0++; farAway[1] = fooHere[1]; R0 = [P0++]; [P1++] = R0; *pt1++ = *pt0++; farAway[2] = fooHere[2]; Remember -- P0 will end up pointing PAST the end of the array farAway[3] = fooHere[3];

6/1/2015 Program sequencer, Copyright M. Smith, ECE, University of Calgary, Canada 8 IDLE – Seems the next simplest! IDLE instruction is part of a sequence of instructions to place the processor in a quiescent state so that something can happen External system can change clock frequencies – power saving – high clock frequency can mean high power consumption A ssync instruction MUST immediately follow the idle instruction Getting out of the idle instruction sequence needs an understanding of interrupts Will discuss more about idle later More info in instruction ref. manual p 11.3

6/1/2015 Program sequencer, Copyright M. Smith, ECE, University of Calgary, Canada 9 Jump instruction Both JUMP and CALL instructions transfer program flow to another memory location The difference between JUMP and CALL is that the CALL automatically loads the return address into the RETS register. The return address is the next sequenctal address after the CALL instruction. JUMPs can be conditional (depends on CC bit in ASTAT register. Conditional JUMP instructions use static branch prediction to reduce branch latency caused by the length of the Blackfin instruction pipeline. What does “static” branch prediction mean? What is “dynamic” branch prediction? When possible the assembler will use the short relative jump. The target instruction must be within -4096 to +4094 bytes of the current instruction.

6/1/2015 Program sequencer, Copyright M. Smith, ECE, University of Calgary, Canada 10 Array movement.extern _fooHere, _farAway;extern long fooHere[5], farAway[5] P0.H = _fooHere; P0.L = _fooHere; P1.H = _farAway; P1.L = _farAway; R0 = [P0]; [P1] = R0; for (int num = 0; num < 5 ; num++) { R0 = [P0 + ?]; [P1 + ?] = R0; farAway[num] = fooHere[num]; R0 = [P0 + ??]; [P1 + ??] = R0; } …… and so on …. Linear code – Straight line coding is STILL a viable solution for solving a loop. You don’t waste any time in incrementing a loop counter You don’t waste time in checking a loop counter You don’t waste time upsetting the processor instruction pipeline by jumping back and throwing away all prefetched instructions.

6/1/2015 Program sequencer, Copyright M. Smith, ECE, University of Calgary, Canada 11 Standard software Loop The C++ code we actually developed.extern _fooHere;.extern _farAway; P0.H = _fooHere; P0.L = _fooHere; P1.H = _farAway; P1.L = _farAway; extern long fooHere[5]; extern farAway[5]; long *pt0; pt0 = fooHere; long *pt1; pt1 = farAway; extern long fooHere[5], farAway[5]; R1 = 0; R2 = 5; LOOP: CC = R2 <= R1; IF CC JUMP LOOP_END; int num = 0; for ( /* empty */; num < 5 ; num++) { for (int num = 0; num < 5 ; num++) { R0 = [P0++]; [P1++] = R0; *pt1++ = *pt0++; farAway[num] = fooHere[num]; R1 += 1; JUMP LOOP; LOOP_END: outside loop }} PREDICTED NOT TAKEN

6/1/2015 Program sequencer, Copyright M. Smith, ECE, University of Calgary, Canada 12 Program Loops Most programs have 1 or 2 loops embedded inside each other, occasionally 3 or more For all images in a list For each row in each image For each column (pixel) in each row For each colour in each pixel Important to get the maximum efficiency of the instructions that are executed the most often!

6/1/2015 Program sequencer, Copyright M. Smith, ECE, University of Calgary, Canada 13 Efficiency of Standard software Loop Suppose we go round the loop N times 2 loop control instructions outside of loop + 4 * N loop control instructions inside the loop 2 * N “useful instructions” inside loop + 4 useful set up instructions Loop efficiency = 4 + 2 * N -------------------------- * 100% 4 + 2 * N + 2 + 4 * N If N is large 2 * N ----------- * 100% = 33% 6 * N.extern _fooHere;.extern _farAway; P0.H = _fooHere; P0.L = _fooHere; P1.H = _farAway; P1.L = _farAway; extern long fooHere[5]; extern farAway[5]; long *pt0; pt0 = fooHere; long *pt1; pt1 = farAway; R1 = 0; R2 = 5; LOOP: CC = R2 <= R1; IF CC JUMP LOOP_END; int num = 0; for ( /* empty */; num < 5 ; num++) { R0 = [P0++]; [P1++] = R0; *pt1++ = *pt0++; R1 += 1; JUMP LOOP; LOOP_END: outside loop }

6/1/2015 Program sequencer, Copyright M. Smith, ECE, University of Calgary, Canada 14 Down-counting software loop.extern _fooHere;.extern _farAway; P0.H = _fooHere; P0.L = _fooHere; P1.H = _farAway; P1.L = _farAway; extern long fooHere[5]; extern farAway[5]; long *pt0; pt0 = fooHere; long *pt1; pt1 = farAway; extern long fooHere[5], farAway[5]; R1 = ; CC = R1 <= 0; IF CC JUMP DO_WHILE_END; DO_WHILE: int num = 5 ; if (num > 0) do { // Test needed if // exact value of // num not known for (int num = 0; num < 5 ; num++) { R0 = [P0++]; [P1++] = R0; *pt1++ = *pt0++; farAway[num] = fooHere[num]; R1 += -1; CC = R1 <= 0; IF !CC JUMP DO_WHILE (BP); DO_WHILE_END: outside loop } while ( (--num) > 0) }

6/1/2015 Program sequencer, Copyright M. Smith, ECE, University of Calgary, Canada 15 Efficiency of Down-counting software Loop Suppose we go round the loop N times 3 loop control instructions outside of loop + 3 * N loop control instructions inside the loop 2 * N “useful instructions” inside loop + 4 useful set up instructions Loop efficiency = 4 + 2 * N -------------------------- * 100% 4 + 2 * N + 3 + 3 * N If N is large 2 * N ----------- * 100% = 40% 5 * N.extern _fooHere;.extern _farAway; P0.H = _fooHere; P0.L = _fooHere; P1.H = _farAway; P1.L = _farAway; extern long fooHere[5]; extern farAway[5]; long *pt0; pt0 = fooHere; long *pt1; pt1 = farAway; R1 = ; CC = R1 <= 0; IF CC JUMP DO_WHILE_END; DO_WHILE: int num = 5 ; if (num > 0) do { // Test needed if // exact value of // num not known R0 = [P0++]; [P1++] = R0; *pt1++ = *pt0++; R1 += -1; CC = R1 <= 0; IF !CC JUMP DO_WHILE (BP); DO_WHILE_END: outside loop } while ( (--num) > 0)

6/1/2015 Program sequencer, Copyright M. Smith, ECE, University of Calgary, Canada 16 Efficient loops Motorola MC68XXX has specialized loop instruction – essentially Decrement the counter (data register) and start the jump occurring While the decrement is occurring, test if OLD COUNTER WAS LESS THAN ZERO. If old counter less than zero then stop the jump Motorola has specialized memory operations WHICH TAKE MANY PROCESSOR CYCLES Motorola has instruction [P1++] = [P0++] which has all the following steps – each taking 4 clock cycles Fetch instruction internReg.L = W[P0]; internReg.H = W[P0+2]; W[P1] = internReg.L; W[P1+2] = internReg.H; P0 += 4; P1 += 4; TOTAL OF 24 cycles at 8 MHz

6/1/2015 Program sequencer, Copyright M. Smith, ECE, University of Calgary, Canada 17 Efficiency of “Motorola-style” Down-counting software Loop with specialized branch instructions Suppose we go round the loop N times 3 loop control instructions outside of loop + 1 * N loop control instructions inside the loop 1 * N “useful instructions” inside loop + 2 useful set up instructions Loop efficiency = 6 + 5 * N -------------------------- * 100% 6 + 5 * N + 4 + 1 * N If N is large 5 * N ----------- * 100% = 84% 6 * N.extern _fooHere;.extern _farAway; P0 = _fooHere; P1 = _farAway; extern long fooHere[5]; extern farAway[5]; long *pt0; pt0 = fooHere; long *pt1; pt1 = farAway; R1 = (5 – 1); CC = R1 < 0; IF CC JUMP DO_WHILE_END; DO_WHILE: int num = 5 ; if (num > 0) do { // Test needed if // exact value of // num not known [P1++] = [P0++]; *pt1++ = *pt0++; IF (R1 < 0 ) THEN CONTINUE OTHERWISE (R1 += -1) AND JUMP DO_WHILE (BP); DO_WHILE_END: outside loop } while ( (--num) > 0) NOTE: NOT AVAILABLE ON BLACKFIN

6/1/2015 Program sequencer, Copyright M. Smith, ECE, University of Calgary, Canada 18 Blackfin Hardware Loops Blackfin supports a mechanism for zero-overhead looping Common design decision – the two inner-most loops are the most often executed – so make those the most efficient The program sequencer contains TWO loop units, each containing three registers Loop Top registers – LT0, LT1 Loop Bottom registers – LB0, LB1 Loop Count registers – LC0, LC1

6/1/2015 Program sequencer, Copyright M. Smith, ECE, University of Calgary, Canada 19 Blackfin Hardware Loops The program sequencer contains TWO loop units, each containing three registers Loop Top registers – LT0, LT1 Loop Bottom registers – LB0, LB1 Loop Count registers – LC0, LC1 When that when an instruction at address X is executed (meaning PC = = X) and if the address X matches the contents of LBn (meaning PC = = LBn) and the counter register is greater than equal to 2 (LCx >= 2) THEN the next instruction will be taken from address LTn Note that if two loops end on the same instruction then loop 1 has the highest priority

6/1/2015 Program sequencer, Copyright M. Smith, ECE, University of Calgary, Canada 20 Pseudo code example Set LT0 = first instruction in loop -- LOOP START Set LB0 = last instruction in loop; -- LOOP END: Set LC0 = 5; LOOP_START: R0 = [P0++]; LOOP_END: [P1++] = R0; Manual (P4-16) says Each loop register can be loaded individually with a register transfer, but this incurs a significant overhead if the loop count is non-zero (the loop is active) at the time of the transfer. That sounds unpleasant – so lets find an easier way Manual (P4-16) says The LSETUP instruction can be used to load all three registers of a loop unit at the same time

6/1/2015 Program sequencer, Copyright M. Smith, ECE, University of Calgary, Canada 21 Efficiency of Standard software Loop Suppose we go round the loop N times 2 loop control instructions outside of loop + 4 * N loop control instructions inside the loop 2 * N “useful instructions” inside loop + 4 useful set up instructions Loop efficiency = 4 + 2 * N -------------------------- * 100% 4 + 2 * N + 2 + 4 * N If N is large 2 * N ----------- * 100% = 33% 6 * N.extern _fooHere;.extern _farAway; P0.H = _fooHere; P0.L = _fooHere; P1.H = _farAway; P1.L = _farAway; extern long fooHere[5]; extern farAway[5]; long *pt0; pt0 = fooHere; long *pt1; pt1 = farAway; R1 = 0; R2 = 5; LOOP: CC = R2 <= R1; IF CC JUMP LOOP_END; int num = 0; for ( /* empty */; num < 5 ; num++) { R0 = [P0++]; [P1++] = R0; *pt1++ = *pt0++; R1 += 1; JUMP LOOP; LOOP_END: outside loop } WARNING: LOOP_END is an instruction that IS NOT EXECUTED INSIDE THE SOFTWARE LOOP

6/1/2015 Program sequencer, Copyright M. Smith, ECE, University of Calgary, Canada 22 Efficiency of Hardware Loop Suppose we go round the loop N times 2 loop control instructions outside of loop + 0 loop control instructions inside the loop – There are some pipeline overhead issues on leaving loop 2 * N “useful instructions” inside loop + 4 useful set up instructions Loop efficiency = 4 + 2 * N -------------------------- * 100% 4 + 2 * N + 2 If N is large 2 * N ----------- * 100% = 100% 2 * N.extern _fooHere;.extern _farAway; P0.H = _fooHere; P0.L = _fooHere; P1.H = _farAway; P1.L = _farAway; extern long fooHere[5]; extern farAway[5]; long *pt0; pt0 = fooHere; long *pt1; pt1 = farAway; P2 = 5; LSETUP( LOOP_START, LOOP_END) LC1 = P2; int num = 0; for ( /* empty */; num < 5 ; num++) { LOOP_START: R0 = [P0++]; *pt1++ = *pt0++; LOOP_END: [P1++] = R0; OUTSIDE_LOOP: } WARNING: LOOP_END is an instruction that IS EXECUTED INSIDE THE HARDWARE LOOP

6/1/2015 Program sequencer, Copyright M. Smith, ECE, University of Calgary, Canada 23 Big warning SOFTWARE LOOPHARDWARE LOOP R1 = 0; R2 = 5; LOOP: CC = R2 <= R1; IF CC JUMP LOOP_END; LOOP_START: R0 = [P0++]; R0 = [P0++]; [P1++] = R0; LOOP_END: [P1++] = R0; OUTSIDE_LOOP: R1 += 1; JUMP LOOP; LOOP_END: outside loop LOOP_END Always executed in hardware loop

6/1/2015 Program sequencer, Copyright M. Smith, ECE, University of Calgary, Canada 24 Warning and speed issues The distance between LSETUP instruction and LOOP_START instruction MUST NOT BE MORE THAN 30 bytes (otherwise the offset description will not fit into the instruction). There is a 4 clock cycle advantage if LSETUP is the instruction immediately before the LOOP_START instruction The distance between LSETUP instruction and LOOP_END instruction MUST NOT BE MORE THAN 2046 bytes (otherwise the offset description will not fit into the instruction) The processor supports a four-location instruction loop buffer. If the loop code contains four or fewer instructions, then no fetched to instruction memory are necessary for any number of loop iterations because the instructions are stored locally. This eliminates instruction fetch time (especially important when accessing external memory) Really efficient loops are no more than 4 long. Have requested information if 4 instructions or 4 instructions which can be highly parallel (like 16 instructions in a non-parallel mode)

6/1/2015 Program sequencer, Copyright M. Smith, ECE, University of Calgary, Canada 25 Tackled today Program sequencer Linear flow of instruction Why not discuss idle instruction here? Jumps Software loops – normal and more efficient “down-counting” loops Special Motorola MC68XXX software loop instructions Loops – hardware loops Subroutines -- – next lecture Interrupts and Exceptions – next lecture Idle – next lecture

6/1/2015 Program sequencer, Copyright M. Smith, ECE, University of Calgary, Canada 26 Information taken from Analog Devices On-line Manuals with permission http://www.analog.com/processors/resources/technicalLibrary/manuals/ http://www.analog.com/processors/resources/technicalLibrary/manuals/ Information furnished by Analog Devices is believed to be accurate and reliable. However, Analog Devices assumes no responsibility for its use or for any infringement of any patent other rights of any third party which may result from its use. No license is granted by implication or otherwise under any patent or patent right of Analog Devices. Copyright  Analog Devices, Inc. All rights reserved.

Daddy! -- Where do instructions come from? Program Sequencer controls program flow and provides the next instruction to be executed Straight line code,

Similar presentations

Presentation on theme: "Daddy! -- Where do instructions come from? Program Sequencer controls program flow and provides the next instruction to be executed Straight line code,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Daddy! -- Where do instructions come from? Program Sequencer controls program flow and provides the next instruction to be executed Straight line code,

Similar presentations

Presentation on theme: "Daddy! -- Where do instructions come from? Program Sequencer controls program flow and provides the next instruction to be executed Straight line code,"— Presentation transcript:

Similar presentations

About project

Feedback