Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 2 TMS320C6000 Architectural Overview. Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 2 Describe C6000 CPU architecture.

Similar presentations


Presentation on theme: "Chapter 2 TMS320C6000 Architectural Overview. Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 2 Describe C6000 CPU architecture."— Presentation transcript:

1 Chapter 2 TMS320C6000 Architectural Overview

2 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 2 Describe C6000 CPU architecture. Describe C6000 CPU architecture. Introduce some basic instructions. Introduce some basic instructions. Describe the C6000 memory map. Describe the C6000 memory map. Provide an overview of the peripherals. Provide an overview of the peripherals. Learning Objectives

3 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 3 General DSP System Block Diagram PERIPHERALSPERIPHERALSPERIPHERALSPERIPHERALS CentralProcessing Unit Unit Internal Memory Internal Buses ExternalMemory

4 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 4 Implementation of Sum of Products (SOP) Implementation of Sum of Products (SOP) It has been shown in Chapter 1 that SOP is the key element for most DSP algorithms. So lets write the code for this algorithm and at the same time discover the C6000 architecture. Two basic operations are required for this algorithm. (1) Multiplication (2) Addition Therefore two basic instructions are required Y = N a n x n a n x n n = 1 * = a 1 * x 1 + a 2 * x 2 +... + a N * x N

5 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 5 Two basic operations are required for this algorithm. (1) Multiplication (2) Addition Therefore two basic instructions are required Implementation of Sum of Products (SOP) Implementation of Sum of Products (SOP) Y = N a n x n a n x n n = 1 * So lets implement the SOP algorithm! The implementation in this module will be done in assembly. = a 1 * x 1 + a 2 * x 2 +... + a N * x N

6 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 6 Multiply (MPY) The multiplication of a 1 by x 1 is done in assembly by the following instruction: MPYa1, x1, Y This instruction is performed by a multiplier unit that is called.M Y = N a n x n a n x n n = 1 * = a 1 * x 1 + a 2 * x 2 +... + a N * x N

7 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 7 Multiply (.M unit).M.M Y = 40 a n x n a n x n n = 1 * The. M unit performs multiplications in hardware MPY.Ma1, x1, Y Note: 16-bit by 16-bit multiplier provides a 32-bit result. 32-bit by 32-bit multiplier provides a 64-bit result.

8 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 8 Addition (.?).M.M.?.? Y = 40 a n x n a n x n n = 1 * MPY.Ma1, x1, prod ADD.?Y, prod, Y

9 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 9 Add (.L unit).M.M.L.L Y = 40 a n x n a n x n n = 1 * MPY.Ma1, x1, prod ADD.LY, prod, Y RISC processors such as the C6000 use registers to hold the operands, so lets change this code.

10 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 10 Register File - A Y = 40 a n x n a n x n n = 1 * MPY.Ma1, x1, prod ADD.LY, prod, Y Let us correct this by replacing a, x, prod and Y by the registers as shown above..M.M.L.L A0A1A2A3A15 Register File A............ a1 x1 prod 32-bits Y

11 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 11 Specifying Register Names Y = 40 a n x n a n x n n = 1 * MPY.MA0, A1, A3 ADD.LA4, A3, A4 The registers A0, A1, A3 and A4 contain the values to be used by the instructions..M.M.L.L A0A1A2A3A15 Register File A............ a1 x1 prod 32-bits Y

12 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 12 Specifying Register Names Y = 40 a n x n a n x n n = 1 * MPY.MA0, A1, A3 ADD.LA4, A3, A4 Register File A contains 16 registers (A0 -A15) which are 32-bits wide..M.M.L.L A0A1A2A3A15 Register File A............ a1 x1 prod 32-bits Y

13 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 13 Data loading Q: How do we load the operands into the registers?.M.M.L.L A0A1A2A3A15 Register File A............ a1 x1 prod 32-bits Y

14 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 14 Load Unit.D A: The operands are loaded into the registers by loading them from the memory using the.D unit..M.M.L.L A0A1A2A3A15 Register File A............ a1 x1 prod 32-bits Y.D.D Data Memory Q: How do we load the operands into the registers?

15 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 15 Load Unit.D It is worth noting at this stage that the only way to access memory is through the.D unit..M.M.L.L A0A1A2A3A15 Register File A............ a1 x1 prod 32-bits Y.D.D Data Memory

16 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 16 Load Instruction Q: Which instruction(s) can be used for loading operands from the memory to the registers?.M.M.L.L A0A1A2A3A15 Register File A............ a1 x1 prod 32-bits Y.D.D Data Memory

17 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 17 Load Instructions (LDB, LDH,LDW,LDDW) A: The load instructions..M.M.L.L A0A1A2A3A15 Register File A............ a1 x1 prod 32-bits Y.D.D Data Memory Q: Which instruction(s) can be used for loading operands from the memory to the registers?

18 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 18 Using the Load Instructions 00000000 00000002 00000004 00000006 00000008 Data 16-bits Before using the load unit you have to be aware that this processor is byte addressable, which means that each byte is represented by a unique address. Also the addresses are 32-bit wide. address FFFFFFFF

19 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 19 The syntax for the load instruction is: Where: Rn is a register that contains the address of the operand to be loaded and Rm is the destination register. Using the Load Instructions 00000000 00000002 00000004 00000006 00000008 Data a1 x1 prod 16-bits Y address FFFFFFFF LD *Rn,Rm

20 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 20 The syntax for the load instruction is: The question now is how many bytes are going to be loaded into the destination register? Using the Load Instructions 00000000 00000002 00000004 00000006 00000008 Data a1 x1 prod 16-bits Y address FFFFFFFF LD *Rn,Rm

21 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 21 The syntax for the load instruction is: LD *Rn,Rm Using the Load Instructions 00000000 00000002 00000004 00000006 00000008 Data a1 x1 prod 16-bits Y address FFFFFFFF The answer, is that it depends on the instruction you choose: LDB: loads one byte (8-bit) LDB: loads one byte (8-bit) LDH: loads half word (16-bit) LDH: loads half word (16-bit) LDW: loads a word (32-bit) LDW: loads a word (32-bit) LDDW: loads a double word (64-bit) LDDW: loads a double word (64-bit) Note: LD on its own does not exist.

22 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 22 Using the Load Instructions 00000000 00000002 00000004 00000006 00000008 Data 16-bits address FFFFFFFF 0xB0xA 0xD0xC Example: If we assume that A5 = 0x4 then: (1) LDB *A5, A7 ; gives A7 = 0x00000001 (2) LDH *A5,A7; gives A7 = 0x00000201 (3) LDW *A5,A7; gives A7 = 0x04030201 (4) LDDW *A5,A7:A6; gives A7:A6 = 0x0807060504030201 0x10x2 0x30x4 0x50x6 0x70x8 The syntax for the load instruction is: LD *Rn,Rm 01

23 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 23 Using the Load Instructions 00000000 00000002 00000004 00000006 00000008 Data 16-bits address FFFFFFFF 0xB0xA 0xD0xC Question: If data can only be accessed by the load instruction and the.D unit, how can we load the register pointer Rn in the first place? 0x10x2 0x30x4 0x50x6 0x70x8 The syntax for the load instruction is: LD *Rn,Rm

24 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 24 The instruction MVKL will allow a move of a 16-bit constant into a register as shown below: The instruction MVKL will allow a move of a 16-bit constant into a register as shown below: MVKL.?a, A5 (a is a constant or label) How many bits represent a full address? How many bits represent a full address? 32 bits So why does the instruction not allow a 32-bit move? So why does the instruction not allow a 32-bit move? All instructions are 32-bit wide (see instruction opcode). Loading the Pointer Rn

25 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 25 To solve this problem another instruction is available: To solve this problem another instruction is available:MVKH Loading the Pointer Rn eg. MVKH.?a, A5 (a is a constant or label) ah ahx al a A5 MVKL a, A5 MVKH a, A5 Finally, to move the 32-bit address to a register we can use: Finally, to move the 32-bit address to a register we can use:

26 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 26 Loading the Pointer Rn MVKL0x1234FABC, A5 A5 = 0xFFFFFABC ; Wrong Example 1 A5 = 0x87654321 MVKL0x1234FABC, A5 A5 = 0xFFFFFABC (sign extension) MVKH0x1234FABC, A5 A5 = 0x1234FABC ; OK Example 2 MVKH0x1234FABC, A5 A5 = 0x12344321 Always use MVKL then MVKH, look at the following examples: Always use MVKL then MVKH, look at the following examples:

27 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 27 LDH, MVKL and MVKH.M.M.L.L A0A1A2A3A4A15 Register File A............ a x prod 32-bits Y.D.D Data Memory MVKL pt1, A5 MVKL pt1, A5 MVKH pt1, A5 MVKH pt1, A5 MVKL pt2, A6 MVKL pt2, A6 MVKH pt2, A6 MVKH pt2, A6 LDH.D*A5, A0 LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 pt1 and pt2 point to some locations in the data memory.

28 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 28 Creating a loop MVKL pt1, A5 MVKL pt1, A5 MVKH pt1, A5 MVKH pt1, A5 MVKL pt2, A6 MVKL pt2, A6 MVKH pt2, A6 MVKH pt2, A6 LDH.D*A5, A0 LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 So far we have only implemented the SOP for one tap only, i.e. Y= a 1 * x 1 So lets create a loop so that we can implement the SOP for N Taps.

29 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 29 Creating a loop With the C6000 processors there are no dedicated instructions such as block repeat. The loop is created using the B instruction. So far we have only implemented the SOP for one tap only, i.e. Y= a 1 * x 1 So lets create a loop so that we can implement the SOP for N Taps.

30 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 30 What are the steps for creating a loop 1. Create a label to branch to. 2. Add a branch instruction, B. 3.Create a loop counter. 4. Add an instruction to decrement the loop counter. 5. Make the branch conditional based on the value in the loop counter.

31 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 31 1. Create a label to branch to MVKL pt1, A5 MVKL pt1, A5 MVKH pt1, A5 MVKH pt1, A5 MVKL pt2, A6 MVKL pt2, A6 MVKH pt2, A6 MVKH pt2, A6 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4

32 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 32 MVKL pt1, A5 MVKL pt1, A5 MVKH pt1, A5 MVKH pt1, A5 MVKL pt2, A6 MVKL pt2, A6 MVKH pt2, A6 MVKH pt2, A6 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 B.?loop 2. Add a branch instruction, B.

33 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 33 Which unit is used by the B instruction?.M.M.L.L A0A1A2A3A15 Register File A............ a x prod 32-bits Y.D.D.M.M.L.L A0A1A2A3A15............ a x prod 32-bits Y.D.D Data Memory.S.S MVKLpt1, A5 MVKLpt1, A5 MVKH pt1, A5 MVKH pt1, A5 MVKL pt2, A6 MVKL pt2, A6 MVKH pt2, A6 MVKH pt2, A6 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 B.?loop

34 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 34 Data Memory Which unit is used by the B instruction?.M.M.L.L A0A1A2A3A15 Register File A............ a x prod 32-bits Y.D.D.M.M.L.L A0A1A2A3A15............ a x prod 32-bits Y.D.D.S.S MVKL.S pt1, A5 MVKL.S pt1, A5 MVKH.S pt1, A5 MVKH.S pt1, A5 MVKL.S pt2, A6 MVKL.S pt2, A6 MVKH.S pt2, A6 MVKH.S pt2, A6 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 B.Sloop

35 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 35 Data Memory 3. Create a loop counter..M.M.L.L A0A1A2A3A15 Register File A............ a x prod 32-bits Y.D.D.M.M.L.L A0A1A2A3A15............ a x prod 32-bits Y.D.D.S.S MVKL.S pt1, A5 MVKL.S pt1, A5 MVKH.S pt1, A5 MVKH.S pt1, A5 MVKL.S pt2, A6 MVKL.S pt2, A6 MVKH.S pt2, A6 MVKH.S pt2, A6 MVKL.Scount, B0 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 B.Sloop B registers will be introduced later

36 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 36 4. Decrement the loop counter.M.M.L.L A0A1A2A3A15 Register File A............ a x prod 32-bits Y.D.D Data Memory.M.M.L.L A0A1A2A3A15 Register File A............ a x prod 32-bits Y.D.D.S.S MVKL.S pt1, A5 MVKL.S pt1, A5 MVKH.S pt1, A5 MVKH.S pt1, A5 MVKL.S pt2, A6 MVKL.S pt2, A6 MVKH.S pt2, A6 MVKH.S pt2, A6 MVKL.Scount, B0 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 B.Sloop

37 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 37 What is the syntax for making instruction conditional? What is the syntax for making instruction conditional? [condition] InstructionLabel e.g. [B1]Bloop (1) The condition can be one of the following registers: A1, A2, B0, B1, B2. (2) Any instruction can be conditional. 5. Make the branch conditional based on the value in the loop counter

38 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 38 The condition can be inverted by adding the exclamation symbol ! as follows: The condition can be inverted by adding the exclamation symbol ! as follows: [!condition] InstructionLabel e.g. [!B0]Bloop ;branch if B0 = 0 [B0]Bloop;branch if B0 != 0 5. Make the branch conditional based on the value in the loop counter

39 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 39 Data Memory.M.M.L.L A0A1A2A3A15 Register File A............ a x prod 32-bits Y.D.D.M.M.L.L A0A1A2A3A15............ a x prod 32-bits Y.D.D.S.S MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2count, B0 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop 5. Make the branch conditional

40 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 40 Case 1: B.S1 label Case 1: B.S1 label Relative branch. Relative branch. Label limited to +/- 2 20 offset. Label limited to +/- 2 20 offset. More on the Branch Instruction (1) With this processor all the instructions are encoded in a 32-bit. With this processor all the instructions are encoded in a 32-bit. Therefore the label must have a dynamic range of less than 32-bit as the instruction B has to be coded. Therefore the label must have a dynamic range of less than 32-bit as the instruction B has to be coded. 21-bit relative address B32-bit

41 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 41 More on the Branch Instruction (2) By specifying a register as an operand instead of a label, it is possible to have an absolute branch. By specifying a register as an operand instead of a label, it is possible to have an absolute branch. This will allow a dynamic range of 2 32. This will allow a dynamic range of 2 32. Case 2: B.S2register Case 2: B.S2register Absolute branch. Absolute branch. Operates on.S2 ONLY! Operates on.S2 ONLY! 5-bit register code B32-bit

42 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 42 Testing the code This code performs the following operations: operations: a 0 *x 0 + a 0 *x 0 + a 0 *x 0 + … + a 0 *x 0 However, we would like to perform: a 0 *x 0 + a 1 *x 1 + a 2 *x 2 + … + a N *x N MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2count, B0 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop

43 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 43 Modifying the pointers The solution is to modify the pointers A5 and A6. MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2count, B0 loop LDH.D*A5, A0 LDH.D*A6, A1 LDH.D*A6, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop

44 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 44 Indexing Pointers Description Pointer Syntax Pointer Modified *R*R*R*RNo R can be any register In this case the pointers are used but not modified.

45 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 45 Indexing Pointers Description Pointer + Pre-offset - Pre-offset Syntax Pointer Modified *R *+R[ disp ] *-R[ disp ] NoNoNo [disp] specifies the number of elements size in DW (64-bit), W (32-bit), H (16-bit), or B (8-bit). [disp] specifies the number of elements size in DW (64-bit), W (32-bit), H (16-bit), or B (8-bit). disp = R or 5-bit constant. disp = R or 5-bit constant. R can be any register. R can be any register. In this case the pointers are modified BEFORE being used and RESTORED to their previous values.

46 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 46 Indexing Pointers Description Pointer + Pre-offset - Pre-offset Pre-incrementPre-decrement Syntax Pointer Modified *R *+R[ disp ] *-R[ disp ] *++R[ disp ] *--R[ disp ] NoNoNoYesYes In this case the pointers are modified BEFORE being used and NOT RESTORED to their Previous Values.

47 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 47 Indexing Pointers Description Pointer + Pre-offset - Pre-offset Pre-incrementPre-decrementPost-incrementPost-decrement Syntax Pointer Modified *R *+R[ disp ] *-R[ disp ] *++R[ disp ] *--R[ disp ] *R++[ disp ] *R--[ disp ] NoNoNoYesYesYesYes In this case the pointers are modified AFTER being used and NOT RESTORED to their Previous Values.

48 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 48 Indexing Pointers Description Pointer + Pre-offset - Pre-offset Pre-incrementPre-decrementPost-incrementPost-decrement Syntax Pointer Modified *R *+R[ disp ] *-R[ disp ] *++R[ disp ] *--R[ disp ] *R++[ disp ] *R--[ disp ] NoNoNoYesYesYesYes [disp] specifies # elements - size in DW, W, H, or B. [disp] specifies # elements - size in DW, W, H, or B. disp = R or 5-bit constant. disp = R or 5-bit constant. R can be any register. R can be any register.

49 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 49 Modify and testing the code This code now performs the following operations: operations: a 0 *x 0 + a 1 *x 1 + a 2 *x 2 +... + a N *x N MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2count, B0 loop LDH.D*A5++, A0 LDH.D*A6++, A1 LDH.D*A6++, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop

50 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 50 Store the final result This code now performs the following operations: operations: a 0 *x 0 + a 1 *x 1 + a 2 *x 2 +... + a N *x N MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2count, B0 loop LDH.D*A5++, A0 LDH.D*A6++, A1 LDH.D*A6++, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop STH.DA4, *A7

51 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 51 Store the final result The Pointer A7 has not been initialised. MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2count, B0 loop LDH.D*A5++, A0 LDH.D*A6++, A1 LDH.D*A6++, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop STH.DA4, *A7

52 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 52 Store the final result The Pointer A7 is now initialised. MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2pt3, A7 MVKH.S2pt3, A7 MVKL.S2count, B0 loop LDH.D*A5++, A0 LDH.D*A6++, A1 LDH.D*A6++, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop STH.DA4, *A7

53 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 53 What is the initial value of A4? A4 is used as an accumulator, so it needs to be reset to zero. so it needs to be reset to zero. MVKL.S2 pt1, A5 MVKL.S2 pt1, A5 MVKH.S2 pt1, A5 MVKH.S2 pt1, A5 MVKL.S2 pt2, A6 MVKL.S2 pt2, A6 MVKH.S2 pt2, A6 MVKH.S2 pt2, A6 MVKL.S2pt3, A7 MVKH.S2pt3, A7 MVKL.S2count, B0 ZERO.LA4 loop LDH.D*A5++, A0 LDH.D*A6++, A1 LDH.D*A6++, A1 MPY.MA0, A1, A3 MPY.MA0, A1, A3 ADD.LA4, A3, A4 ADD.LA4, A3, A4 SUB.SB0, 1, B0 [B0]B.Sloop [B0]B.Sloop STH.DA4, *A7

54 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 54 How can we add more processing power to this processor?.S1.S1.M1.M1.L1.L1.D1.D1 A0 A1 A2 A3 A4 Register File A............ Data Memory A15 32-bits Increasing the processing power!

55 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 55 (1)Increase the clock frequency..S1.S1.M1.M1.L1.L1.D1.D1 A0 A1 A2 A3 A4 Register File A............ Data Memory A15 32-bits Increasing the processing power! (2)Increase the number of Processing units.

56 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 56 To increase the Processing Power, this processor has two sides (A and B or 1 and 2) Data Memory.S1.S1.M1.M1.L1.L1.D1.D1 A0 A1 A2 A3 A4 Register File A............ A15 32-bits.S2.S2.M2.M2.L2.L2.D2.D2 B0 B1 B2 B3 B4 Register File B............ B15 32-bits

57 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 57 Can the two sides exchange operands in order to increase performance? Data Memory.S1.S1.M1.M1.L1.L1.D1.D1 A0 A1 A2 A3 A4 Register File A............ A15 32-bits B15.S2.S2.M2.M2.L2.L2.D2.D2 B0 B1 B2 B3 B4 Register File B............ 32-bits

58 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 58 The answer is YES but there are limitations. To exchange operands between the two sides, some cross paths or links are required. To exchange operands between the two sides, some cross paths or links are required. What is a cross path? A cross path links one side of the CPU to the other. A cross path links one side of the CPU to the other. There are two types of cross paths: There are two types of cross paths: Data cross paths. Data cross paths. Address cross paths. Address cross paths.

59 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 59 Data Cross Paths Data cross paths can also be referred to as register file cross paths. Data cross paths can also be referred to as register file cross paths. These cross paths allow operands from one side to be used by the other side. These cross paths allow operands from one side to be used by the other side. There are only two cross paths: There are only two cross paths: one path which conveys data from side B to side A, 1X. one path which conveys data from side B to side A, 1X. one path which conveys data from side A to side B, 2X. one path which conveys data from side A to side B, 2X.

60 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 60 TMS320C67x Data-Path

61 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 61 Data Cross Paths Data cross paths only apply to the.L,.S and.M units. Data cross paths only apply to the.L,.S and.M units. The data cross paths are very useful, however there are some limitations in their use. The data cross paths are very useful, however there are some limitations in their use.

62 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 62 Data Cross Path Limitations A 2x.L1.M1.S1 B 1x (1) The destination register must be on same side as unit. (2) Source registers - up to one cross path per execute packet per side. Execute packet: group of instructions that execute simultaneously.

63 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 63 Cross Path Limitations Data Cross Path LimitationsA 2x.L1.M1.S1 B 1x eg: ADD.L1x A0,A1,B2 MPY.M1x A0,B6,A9 SUB.S1x A8,B2,A8 || ADD.L1x A0,B0,A2 || Means that the SUB and ADD belong to the same fetch packet, therefore execute simultaneously.

64 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 64 Data Cross Path Limitations A 2x.L1.M1.S1 B 1x eg: ADD.L1x A0,A1,B2 MPY.M1x A0,B6,A9 SUB.S1x A8,B2,A8 || ADD.L1x A0,B0,A2 NOT VALID!

65 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 65 Data Cross Paths for both sides A 2x.L1.M1.S1 B 1x .L2.M2.S2

66 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 66 Address cross paths.D1 A Addr Data LDW.D1T1 *A0,A5 STW.D1T1 A5,*A0 LDW.D1T1 *A0,A5 STW.D1T1 A5,*A0 (1) The pointer must be on the same side of the unit.

67 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 67 Load or store to either side.D1 A *A0 B Data1A5 Data2B5 DA1 = T1 DA2 = T2 LDW.D1T1 *A0,A5 LDW.D1T2 *A0,B5 LDW.D1T1 *A0,A5 LDW.D1T2 *A0,B5

68 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 68 Standard Parallel Loads.D1 A A5 *A0 B B5.D2 Data1 *B0 LDW.D1T1 *A0,A5 LDW.D1T1 *A0,A5 || LDW.D2T2 *B0,B5 LDW.D1T1 *A0,A5 LDW.D1T1 *A0,A5 || LDW.D2T2 *B0,B5 DA1 = T1 DA2 = T2

69 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 69 Parallel Load/Store using address cross paths.D1 A A5 *A0 B B5.D2 Data1 *B0 LDW.D1T2 *A0,B5 LDW.D1T2 *A0,B5 || STW.D2T1 A5,*B0 LDW.D1T2 *A0,B5 LDW.D1T2 *A0,B5 || STW.D2T1 A5,*B0 DA1 = T1 DA2 = T2

70 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 70 Fill the blanks... Does this work?.D1 A *A0 B.D2 Data1 *B0 LDW.D1__ *A0,B5 LDW.D1__ *A0,B5 || STW.D2__ B6,*B0 LDW.D1__ *A0,B5 LDW.D1__ *A0,B5 || STW.D2__ B6,*B0 DA1 = T1 DA2 = T2

71 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 71 Not Allowed! Parallel accesses: both cross or neither cross.D1 A *A0 B B5B6.D2 Data1 *B0 LDW.D1T2 *A0,B5 LDW.D1T2 *A0,B5 || STW.D2T2 B6,*B0 LDW.D1T2 *A0,B5 LDW.D1T2 *A0,B5 || STW.D2T2 B6,*B0 DA2 = T2

72 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 72 Conditions Dont Use Cross Paths If a conditional register comes from the opposite side, it does NOT use a data or address cross-path. If a conditional register comes from the opposite side, it does NOT use a data or address cross-path. Examples: [B2] ADD.L1 A2,A0,A4 [A1] LDW.D2 *B0,B5 Examples: [B2] ADD.L1 A2,A0,A4 [A1] LDW.D2 *B0,B5

73 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 73 CPU Ref Guide Full CPU Datapath (Pg 2-2) C62x Data-Path Summary

74 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 74 C67x Data-Path Summary C67x

75 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 75 Cross Paths - Summary Data Data Destination register on same side as unit. Destination register on same side as unit. Source registers - up to one cross path per execute packet per side. Source registers - up to one cross path per execute packet per side. Use x to indicate cross-path. Use x to indicate cross-path. Address Address Pointer must be on same side as unit. Pointer must be on same side as unit. Data can be transferred to/from either side. Data can be transferred to/from either side. Parallel accesses: both cross or neither cross. Parallel accesses: both cross or neither cross. Conditionals Dont Use Cross Paths. Conditionals Dont Use Cross Paths.

76 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 76 Code Review (using side A only) MVK.S140, A2; A2 = 40, loop count loop:LDH.D1*A5++, A0; A0 = a(n) LDH.D1*A6++, A1; A1 = x(n) MPY.M1A0, A1, A3; A3 = a(n) * x(n) ADD.L1A3, A4, A4; Y = Y + A3 SUB.L1A2, 1, A2; decrement loop count [A2]B.S1loop; if A2 0, branch [A2]B.S1loop; if A2 0, branch STH.D1A4, *A7; *A7 = Y Y = 40 a n x n a n x n n = 1 * Note: Assume that A4 was previously cleared and the pointers are initialised.

77 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 77 Let us have a look at the final details concerning the functional units. Consider first the case of the.L and.S units.

78 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 78 So where do the 40-bit registers come from? Operands - 32/40-bit Register, 5-bit Constant Operands can be: Operands can be: 5-bit constants (or 16-bit for MVKL and MVKH). 5-bit constants (or 16-bit for MVKL and MVKH). 32-bit registers. 32-bit registers. 40-bit Registers. 40-bit Registers. However, we have seen that registers are only 32-bit. However, we have seen that registers are only 32-bit.

79 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 79 A 40-bit register can be obtained by concatenating two registers. A 40-bit register can be obtained by concatenating two registers. However, there are 3 conditions that need to be respected: However, there are 3 conditions that need to be respected: The registers must be from the same side. The registers must be from the same side. The first register must be even and the second odd. The first register must be even and the second odd. The registers must be consecutive. The registers must be consecutive. Operands - 32/40-bit Register, 5-bit Constant

80 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 80 A1:A0A3:A2A5:A4A7:A6A9:A8A11:A10A13:A12A15:A14 odd even : 32 8 40-bit Reg 40-bit RegB1:B0B3:B2B5:B4B7:B6B9:B8B11:B10B13:B12B15:B14 odd even : 32 8 Operands - 32/40-bit Register, 5-bit Constant All combinations of 40-bit registers are shown below: All combinations of 40-bit registers are shown below:

81 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 81 32-bit Reg 40-bit Reg 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg.L or.S Operands - 32/40-bit Register, 5-bit Constant instr.unit,, instr.unit,,

82 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 82 Operands - 32/40-bit Register, 5-bit Constant instr.unit,, instr.unit,, 32-bit Reg 40-bit Reg 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg.L or.S

83 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 83 Operands - 32/40-bit Register, 5-bit Constant OR.L1 A0, A1, A2 instr.unit,, instr.unit,, 32-bit Reg 40-bit Reg 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg.L or.S

84 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 84 Operands - 32/40-bit Register, 5-bit Constant OR.L1 A0, A1, A2 ADD.L2 -5, B3, B4 instr.unit,, instr.unit,, 32-bit Reg 40-bit Reg 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg.L or.S

85 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 85 Operands - 32/40-bit Register, 5-bit Constant OR.L1 A0, A1, A2 ADD.L2 -5, B3, B4 ADD.L1 A2, A3, A5:A4 instr.unit,, instr.unit,, 32-bit Reg 40-bit Reg 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg.L or.S

86 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 86 Operands - 32/40-bit Register, 5-bit Constant OR.L1 A0, A1, A2 ADD.L2 -5, B3, B4 ADD.L1 A2, A3, A5:A4 SUB.L1 A2, A5:A4, A5:A4 instr.unit,, instr.unit,, 32-bit Reg 40-bit Reg 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg.L or.S

87 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 87 Operands - 32/40-bit Register, 5-bit Constant OR.L1 A0, A1, A2 ADD.L2 -5, B3, B4 ADD.L1 A2, A3, A5:A4 SUB.L1 A2, A5:A4, A5:A4 ADD.L2 3, B9:B8, B9:B8 instr.unit,, instr.unit,, 32-bit Reg 40-bit Reg 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg.L or.S

88 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 88 To move the content of a register (A or B) to another register (B or A) use the moveMV Instruction, e.g.: To move the content of a register (A or B) to another register (B or A) use the moveMV Instruction, e.g.: MV A0, B0 MV B6, B7 To move the content of a control register to another register (A or B) or vice-versa use the MVC instruction, e.g.: To move the content of a control register to another register (A or B) or vice-versa use the MVC instruction, e.g.: MVC IFR, A0 MVC A0, IRP Register to register data transfer

89 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 89 TMS320C6000 Instruction Set

90 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 90 'C62x Instruction Set (by category) Arithmetic ABS ADD ADDA ADDK ADD2 MPY MPYH NEG SMPY SMPYH SADD SAT SSUB SUB SUBA SUBC SUB2 ZERO Program Ctrl B IDLE NOP Logical AND CMPEQ CMPGT CMPLT NOT OR SHL SHR SSHL XOR Data Mgmt LDB/H/W MV MVC MVK MVKL MVKH MVKLH STB/H/W Bit Mgmt CLR EXT LMBD NORM SET Note: Refer to the 'C6000 CPU Reference Guide for more details.

91 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 91 'C62x Instruction Set (by unit).S Unit MVKLH NEG NOT OR SET SHL SHR SSHL SUB SUB2 XOR ZERO ADD ADDK ADD2 AND B CLR EXT MV MVC MVK MVKL MVKH.M Unit SMPY SMPYH MPY MPYH.L Unit NOT OR SADD SAT SSUB SUB SUBC XOR ZERO ABS ADD AND CMPEQ CMPGT CMPLT LMBD MV NEG NORM.D Unit STB/H/W SUB SUBA ZERO ADD ADDA LDB/H/W MV NEG Other IDLENOP Note: Refer to the 'C6000 CPU Reference Guide for more details.

92 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 92 ' C6700: Superset of Fixed-Point ( by unit ).L.D.S.M No Unit Used IDLENOP.S Unit NEG NOT OR SET SHL SHR SSHL SUB SUB2 XOR ZERO ADD ADDK ADD2 AND B CLR EXT MV MVC MVK MVKL MVKH ABSSP ABSDP CMPGTSP CMPEQSP CMPLTSP CMPGTDP CMPEQDP CMPLTDP RCPSP RCPDP RSQRSP RSQRDP SPDP.L Unit NOT OR SADD SAT SSUB SUB SUBC XOR ZERO ABS ADD AND CMPEQ CMPGT CMPLT LMBD MV NEG NORM ADDSP ADDDP SUBSP SUBDP INTSP INTDP SPINT DPINT SPRTUNC DPTRUNC DPSP.M Unit SMPY SMPYH MPY MPYH MPYLH MPYHL MPYSP MPYDP MPYI MPYID.D Unit NEG STB(B/H/W) SUB SUBAB (B/H/W) ZERO ADD ADDAB(B/H/W) ADDAD LDB(B/H/W) LDDW MV Note: Refer to the 'C6000 CPU Reference Guide for more details. C67x

93 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 93 C62x: Dual 32-Bit Load/Store C64x: Dual 64-Bit Load/Store C67x: Dual 64-Bit Load/32-Bit Store Instruction Decode Instruction Dispatch Instruction Fetch Control Registers Interrupt Control Emulation Registers (B0 - B15) D2 + M2 X S2 + + L2 + Registers (A0 - A15) D1 + M1 X S1 + + L1 + Advanced Instruction Packing Advanced Emulation Registers (B16 - B31)Registers (A16 - A31) + x x x x X + + + + + + x x x x X + + + + + Superset of Fixed-Point

94 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 94 'C64x: Superset of C62x Data Pack/Un PACK2 PACKH2 PACKLH2 PACKHL2 PACKH4 PACKL4 UNPKHU4 UNPKLU4 SWAP2/4 Dual/Quad Arith ABS2 ADD2 ADD4 MAX MIN SUB2 SUB4 SUBABS4 Bitwise Logical ANDN Shift & Merge SHLMB SHRMB Load Constant MVK (5-bit).L.D.S.M Bit Operations BITC4 BITR DEAL SHFL Move MVD Average AVG2 AVG4 Shifts ROTL SSHVL SSHVR Multiplies MPYHI MPYLI MPYHIR MPYLIR MPY2 SMPY2 DOTP2 DOTPN2 DOTPRSU2 DOTPNRSU2 DOTPU4 DOTPSU4 GMPY4 XPND2/4 Mem Access LDDW LDNW LDNDW STDW STNW STNDW Load Constant MVK (5-bit) Dual Arithmetic ADD2 SUB2 Bitwise Logical AND ANDN OR XOR Address Calc. ADDAD Data Pack/Un PACK2 PACKH2 PACKLH2 PACKHL2 UNPKHU4 UNPKLU4 SWAP2 SPACK2 SPACKU4 Dual/Quad Arith SADD2 SADDUS2 SADD4 Bitwise Logical ANDN Shifts & Merge SHR2 SHRU2 SHLMB SHRMB Compares CMPEQ2 CMPEQ4 CMPGT2 CMPGT4 Branches/PC BDEC BPOS BNOP ADDKPC

95 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 95 TMS320C6000 Memory

96 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 96 Memory size per device DevicesInternalEMIFAEMIFB C6201,C6701 C6204,C6205 P = 64 kB D = 64 kB 52M Bytes 52M Bytes (32-bits wide) (32-bits wide) N/A N/A C6202 P = 256 kB D = 128 kB C6203 P = 384 kB D = 512 kB C6211C6711 L1P= 4 kB L1D= 4 kB L2= 64 kB 128M Bytes (32-bits wide) (32-bits wide) N/A N/A C6712 64M Bytes (16-bits wide) (16-bits wide) C6713 L1P= 4 kB L1D= 4 kB L2= 256 kB 128M Bytes (32-bits wide) (32-bits wide)N/A C6411DM642 L1P= 16 kB L1D= 16 kB L2= 256 kB 128M Bytes (32-bits wide) (32-bits wide)N/A C6414C6415C6416 L1P=16 kB L1D=16 kB L2=1 MB 256M Bytes (64-bits wide) (64-bits wide) 64M Bytes 64M Bytes (16-bits wide) (16-bits wide)

97 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 97 Internal Memory Summary DevicesInternal(L2)External C6211C6711C6713 64 kB 512M (32-bit wide) C6712 256 kB 512M (16-bit wide) LINK: TMS320C6000 DSP Generation DevicesInternal(L2)ExternalC6414C6415C6416 1 MB A:1GB (64-bit) B:256kB (16-bit) DM642 256 kB 1GB (64-bit) C6411 256 kB 256MB (32-bit)

98 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 98 TMS320C6000 Peripherals

99 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 99 'C6x System Block Diagram PERIPHERALSPERIPHERALSPERIPHERALSPERIPHERALS Memory Internal Buses ExternalMemory CPU.D1.M1.L1.S1.D2.M2.L2.S2 Regs (B0-B15) Regs (A0-A15) Control Regs

100 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 100 C6x Internal Buses A D Internal Memory x32 A D External Interface A D x32 Peripherals can perform 64-bit data loads. C67x Data Addr- T1 x32 Data Data- T1 x32/64 Data Addr- T2x32 Data Data- T2 x32/64 Aregs Bregs Program Addrx32 Program Datax256 PC DMA Addr- Readx32 DMA Data- Readx32 DMA Addr- Writex32 DMA Data- Writex32 DMA

101 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 101 PERIPHERALSPERIPHERALSPERIPHERALSPERIPHERALS Memory 'C6x System Block Diagram Internal Buses CPU.D1.M1.L1.S1.D2.M2.L2.S2 Regs (B0-B15) Regs (A0-A15) Control Regs EMIF ExtlMemory

102 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 102 PERIPHERALSPERIPHERALSPERIPHERALSPERIPHERALS 'C6x System Block Diagram Internal Buses CPU.D1.M1.L1.S1.D2.M2.L2.S2 Regs (B0-B15) Regs (A0-A15) Control Regs EMIF ExtlMemory - Sync - Async Program RAM Data Ram Addr D (32)

103 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 103 'C6000 Peripherals External Memory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory Serial GPIO VCP / TCP DMA, EDMA (Boot) Timers PLL Parallel Comm EMIF Ethernet Video Ports

104 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 104EMIF.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory SDRAM External Memory Interface ( EMIF ) Glueless access to async/sync memory Works with PC100/133 SDRAM (cheap, fast, and easy!) Byte-wide data access 16, 32, or 64-bit bus widths External Memory Interface ( EMIF ) Glueless access to async/sync memory Works with PC100/133 SDRAM (cheap, fast, and easy!) Byte-wide data access 16, 32, or 64-bit bus widths Async SBSRAM EMIF

105 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 105 HPI / XBUS / PCI External Memory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory Parallel Communication Interfaces HPI:Dedicated, slave-only, async 16/32-bit bus allows host- P access to C6000 memory XBUS:Similar to HPI but provides … Master/slave and sync modes Glueless i/f to FIFOs (up to single-cycle xfer rate) PCI:Standard 32-bit, 33MHz/66MHz PCI interface These interfaces provide means to bootstrap the C6000 Parallel Communication Interfaces HPI:Dedicated, slave-only, async 16/32-bit bus allows host- P access to C6000 memory XBUS:Similar to HPI but provides … Master/slave and sync modes Glueless i/f to FIFOs (up to single-cycle xfer rate) PCI:Standard 32-bit, 33MHz/66MHz PCI interface These interfaces provide means to bootstrap the C6000 EMIF Parallel Comm

106 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 106GPIO External Memory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory GPIO EMIF General Purpose Input/Output (GPIO) C64x and C6713 provide 8-16 bits of general purpose bit I/O Use to observe or control the signal of a single-pin General Purpose Input/Output (GPIO) C64x and C6713 provide 8-16 bits of general purpose bit I/O Use to observe or control the signal of a single-pin Parallel Comm

107 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 107 McBSP and Utopia External Memory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory GPIO EMIF Multi-Channel Audio Serial Port ( McASP ) McBSP features plus more … Up to 8 stereo lines (16 channels) IIC support On DM642, C6713 Utopia ( C64x ) ATM connection 50 MHz wide area network connectivity Multi-Channel Audio Serial Port ( McASP ) McBSP features plus more … Up to 8 stereo lines (16 channels) IIC support On DM642, C6713 Utopia ( C64x ) ATM connection 50 MHz wide area network connectivity Parallel Comm Multi-Channel Buffered Serial Port ( McBSP ) 2 (or 3) full-duplex, synchronous serial-ports Up to 100 Mb/sec performance Supports multi-channel operation (T1, E1, MVIP, …) Multi-Channel Buffered Serial Port ( McBSP ) 2 (or 3) full-duplex, synchronous serial-ports Up to 100 Mb/sec performance Supports multi-channel operation (T1, E1, MVIP, …) Serial

108 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 108 DMA / EDMA External Memory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory GPIO EMIF Direct Memory Access (DMA / EDMA) Transfers any set of memory locations to another 4 / 16 / 64 channels (transfer parameter sets) Transfers can be triggered by any interrupt (sync) Operates independent of CPU On reset, provides bootstrap from memory Direct Memory Access (DMA / EDMA) Transfers any set of memory locations to another 4 / 16 / 64 channels (transfer parameter sets) Transfers can be triggered by any interrupt (sync) Operates independent of CPU On reset, provides bootstrap from memory Parallel Comm Serial DMA, EDMA (Boot)

109 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 109Timer/Counter External Memory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory GPIO EMIF Timer / Counter Two (or three) 32-bit timer/counters Can generate interrupts Both input and output pins Timer / Counter Two (or three) 32-bit timer/counters Can generate interrupts Both input and output pins Parallel Comm Serial DMA, EDMA (Boot) Timers

110 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 110 Ethernet MAC External Memory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory Serial GPIO VCP / TCP DMA, EDMA (Boot) Timers PLL Parallel Comm EMIF Ethernet Video Ports Ethernet (DM642 only) 10/100 Ethernet MAC Pins are muxed with PCI TCP/IP stack available from TI Ethernet (DM642 only) 10/100 Ethernet MAC Pins are muxed with PCI TCP/IP stack available from TI

111 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 111 Video Ports External Memory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory Serial GPIO VCP / TCP DMA, EDMA (Boot) Timers PLL Parallel Comm EMIF Ethernet Video Ports Video Ports (DM642 only) Each configurable for Capture or Display Dual 8/10-bit BT656 or raw modes 16/20-bit raw modes and 20-bit Y/C for high definition Horz Scaling and Chroma Resampling Support for 8-bit modes Supports transport interface mode Video Ports (DM642 only) Each configurable for Capture or Display Dual 8/10-bit BT656 or raw modes 16/20-bit raw modes and 20-bit Y/C for high definition Horz Scaling and Chroma Resampling Support for 8-bit modes Supports transport interface mode

112 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 112 VCP / TCP -- 3G Wireless External Memory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory McBSPs Utopia GPIO DMA, EDMA (Boot) Timers EMIF Parallel Comm VCP / TCP PLL Video Ports Turbo Coprocessor (TCP) (C6416 only) Supports 35 data channels at 384 kbps 3GPP / IS2000 Turbo coder Programmable parameters include mode, rate and frame length Viterbi Coprocessor (VCP) (C6416 only) Supports > 500 voice channels at 8 kbps Programmable decoder parameters include constraint length, code rate, and frame length Turbo Coprocessor (TCP) (C6416 only) Supports 35 data channels at 384 kbps 3GPP / IS2000 Turbo coder Programmable parameters include mode, rate and frame length Viterbi Coprocessor (VCP) (C6416 only) Supports > 500 voice channels at 8 kbps Programmable decoder parameters include constraint length, code rate, and frame length

113 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 113 External Memory.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory GPIO DMA, EDMA (Boot) Timers EMIF Phase Locked Loop (PLL) Input CLKIN Output CLKOUT1 CLKOUT2 (reduced rate clkout) Input CLKIN Output CLKOUT1 CLKOUT2 (reduced rate clkout) Parallel Comm Serial VCP / TCP Timers PLL Ethernet Video Ports DMA, EDMA (Boot) PLL Clock multiplier Reduces EMI and cost Rate is Pin selectable PLL Clock multiplier Reduces EMI and cost Rate is Pin selectable

114 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 114 Clock Cycle What is a clock cycle? C6000 CLKIN CLKOUT2 (½, ¼, or 1/6 CLKOUT1) CLKOUT1 (C6000 clock cycle) PLL When we talk about cycles... The time between successive instructions CLKIN (MHz) PLL Rate CPU Clock Frequency CPU Clock Cycle Time MIPs (max) 60x12 720 MHz 1.39 ns 5760 30x10 300 MHz 3.33 ns 2400 50x4 200 MHz 5 ns 1600 25x4 100 MHz 10 ns 800

115 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 115 'C6000 Peripherals Summary.D1.M1.L1.S1.D2.M2.L2.S2 Register Set B Register Set A CPU Internal Buses Internal Memory External Memory GPIO EMIF Parallel Comm Serial Timers VCP / TCP Timers PLL Ethernet Video Ports DMA, EDMA (Boot)

116 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 116 C6x Family Part Numbering Example = TMS320LC6201PKGA200 Example = TMS320LC6201PKGA200 TMS320= TI DSP TMS320= TI DSP L = Place holder for voltage levels L = Place holder for voltage levels C6 = C6x family C6 = C6x family 2 = Fixed-point core 2 = Fixed-point core 01 = Memory/peripheral configuration 01 = Memory/peripheral configuration PKG= Pkg designator (actual letters TBD) PKG= Pkg designator (actual letters TBD) A = -40 to 85C (blank for 0 to 70C) A = -40 to 85C (blank for 0 to 70C) 200 = Core CPU speed in Mhz 200 = Core CPU speed in Mhz

117 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 117 Module 1 Exam 1. Functional Units a. How many can perform an ADD? Name them. a. How many can perform an ADD? Name them. b. Which support memory loads/stores? b. Which support memory loads/stores?.M.S.D.L 2. Memory Map a. How many external ranges exist on C6201?

118 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 118 3. Conditional Code a.Which registers can be used as condl registers? b.Which instructions can be conditional? 4. Performance a.What is the 'C6711 instruction cycle time? b.How can the 'C6711 execute 1200 MIPs?

119 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 119 5. Coding Problems a. Move contents of A0-->A1 a. Move contents of A0-->A1

120 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 120 5. Coding Problems a. Move contents of A0-->A1 a. Move contents of A0-->A1 b. Move contents of CSR-->A1 b. Move contents of CSR-->A1 c. Clear register A5

121 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 121 5. Coding Problems (contd) d. A2 = A0 2 + A1 d. A2 = A0 2 + A1 e. If (B1 0) then B2 = B5 * B6 e. If (B1 0) then B2 = B5 * B6 f. A2 = A0 * A1 + 10 g. Load an unsigned constant (19ABCh) into register A6.

122 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 122 5. Coding Problems (contd) h.Load A7 with contents of mem1 and post- increment the selected pointer. h.Load A7 with contents of mem1 and post- increment the selected pointer.

123 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 123 Module 1 Exam (solution) 1. Functional Units a. How many can perform an ADD? Name them. a. How many can perform an ADD? Name them. b. Which support memory loads/stores? b. Which support memory loads/stores?.M.S.D.L six;.L1,.L2,.D1,.D2,.S1,.S2 2. Memory Map a. How many external ranges exist on C6201? Four

124 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 124 3. Conditional Code a.Which registers can be used as condl registers? b.Which instructions can be conditional? 4. Performance a.What is the 'C6711 instruction cycle time? b.How can the 'C6711 execute 1200 MIPs? A1, A2, B0, B1, B2 All of them CLKOUT1 1200 MIPs = 8 instructions (units) x 150 MHz

125 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 125 5. Coding Problems a. Move contents of A0-->A1 a. Move contents of A0-->A1 MV.L1A0, A1 orADD.S1A0, 0, A1 orMPY.M1A0, 1, A1 (whats the problem with this?)

126 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 126 5. Coding Problems a. Move contents of A0-->A1 a. Move contents of A0-->A1 b. Move contents of CSR-->A1 b. Move contents of CSR-->A1 c. Clear register A5 MV.L1A0, A1 orADD.S1A0, 0, A1 orMPY.M1A0, 1, A1 ZERO.S1A5 orSUB.L1A5, A5, A5 orMPY.M1A5, 0, A5 orCLR.S1A5, 0, 31, A5 orMVK.S10, A5 orXOR.L1A5,A5,A5 (A0 can only be a (A0 can only be a 16-bit value) MVCCSR, A1

127 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 127 5. Coding Problems (contd) d. A2 = A0 2 + A1 d. A2 = A0 2 + A1 e. If (B1 0) then B2 = B5 * B6 e. If (B1 0) then B2 = B5 * B6 f. A2 = A0 * A1 + 10 g. Load an unsigned constant (19ABCh) into register A6. MPY.M1A0, A0, A2 ADD.L1A2, A1, A2 [B1] MPY.M2B5, B6, B2 mvkl.s10x00019abc,a6 mvkh.s10x00019abc,a6 value.equ0x00019abc mvkl.s1value,a6 mvkh.s1value,a6 MPYA0, A1, A2 ADD10, A2, A2

128 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 128 5. Coding Problems (contd) h.Load A7 with contents of mem1 and post- increment the selected pointer. h.Load A7 with contents of mem1 and post- increment the selected pointer. x16 mem mem1 10h A7 load_mem1:MVKL.S1mem1,A6 load_mem1:MVKL.S1mem1,A6 MVKH.S1mem1,A6 LDH.D1*A6++,A7

129 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 129Architecture Links: Links: C6711 data sheet: tms320c6711.pdf C6711 data sheet: tms320c6711.pdftms320c6711.pdf C6713 data sheet: tms320c6713.pdf C6713 data sheet: tms320c6713.pdftms320c6713.pdf tms320c6416.pdf C6416 data sheet: tms320c6416.pdftms320c6416.pdf User guide: spru189f.pdf User guide: spru189f.pdfspru189f.pdf Errata: sprz173c.pdf Errata: sprz173c.pdfsprz173c.pdf

130 Chapter 2 TMS320C6000 Architectural Overview - End -


Download ppt "Chapter 2 TMS320C6000 Architectural Overview. Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 2 Describe C6000 CPU architecture."

Similar presentations


Ads by Google