Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 1 Introduction.

Similar presentations


Presentation on theme: "Lecture 1 Introduction."— Presentation transcript:

1 Lecture 1 Introduction

2 Learning Objectives Why process signals digitally?
Definition of a real-time application. Why use Digital Signal Processing processors? What are the typical DSP algorithms? Parameters to consider when choosing a DSP processor. Programmable vs ASIC DSP. Texas Instruments’ TMS320 family.

3 Why go digital? Digital signal processing techniques are now so powerful that sometimes it is extremely difficult, if not impossible, for analogue signal processing to achieve similar performance. Examples: FIR filter with linear phase. Adaptive filters.

4 Why go digital? Analogue signal processing is achieved by using analogue components such as: Resistors. Capacitors. Inductors. The inherent tolerances associated with these components, temperature, voltage changes and mechanical vibrations can dramatically affect the effectiveness of the analogue circuitry.

5 Why go digital? With DSP it is easy to: Additionally DSP reduces:
Change applications. Correct applications. Update applications. Additionally DSP reduces: Noise susceptibility. Chip count. Development time. Cost. Power consumption.

6 Why NOT go digital? High frequency signals cannot be processed digitally because of two reasons: Analog to Digital Converters, ADC cannot work fast enough. The application can be too complex to be performed in real-time.

7 Real-time processing DSP processors have to perform tasks in real-time, so how do we define real-time? The definition of real-time depends on the application. Example: a 100-tap FIR filter is performed in real-time if the DSP can perform and complete the following operation between two samples:

8 Real-time processing Waiting Time Processing Time n n+1 Sample Time We can say that we have a real-time application if: Waiting Time  0

9 Why do we need DSP processors?
Why not use a General Purpose Processor (GPP) such as a Pentium instead of a DSP processor? What is the power consumption of a Pentium and a DSP processor? What is the cost of a Pentium and a DSP processor?

10 Why do we need DSP processors?
Use a DSP processor when the following are required: Cost saving. Smaller size. Low power consumption. Processing of many “high” frequency signals in real-time. Use a GPP processor when the following are required: Large memory. Advanced operating systems.

11 What are the typical DSP algorithms?
The Sum of Products (SOP) is the key element in most DSP algorithms:

12 Hardware vs. Microcode multiplication
DSP processors are optimised to perform multiplication and addition operations. Multiplication and addition are done in hardware and in one cycle. Example: 4-bit multiply (unsigned). Hardware Microcode 1011 x 1110 1011 x 1110 0000 Cycle 1 1011. Cycle 2 1011.. Cycle 3 Cycle 4 Cycle 5

13 Parameters to consider when choosing a DSP processor
Arithmetic format Extended floating point Extended Arithmetic Performance (peak) Number of hardware multipliers Number of registers Internal L1 program memory cache Internal L1 data memory cache Internal L2 cache 32-bit N/A 40-bit 1200MIPS 2 (16 x 16-bit) with 32-bit result 32 32K 512K 64-bit 1200MFLOPS 2 (32 x 32-bit) with 32 or 64-bit result TMS320C6211 TMS320C6711 C6711 Datasheet: \Links\TMS320C6711.pdf C6211 Datasheet: \Links\TMS320C6211.pdf

14 Parameters to consider when choosing a DSP processor
I/O bandwidth: Serial Ports (number/speed) DMA channels Multiprocessor support Supply voltage Power management On-chip timers (number/width) Cost Package External memory interface controller JTAG 2 x 75Mbps 16 Not inherent 3.3V I/O, 1.8V Core Yes 2 x 32-bit US$ 21.54 256 Pin BGA TMS320C6211 TMS320C6711

15 Floating vs. Fixed point processors
Applications which require: High precision. Wide dynamic range. High signal-to-noise ratio. Ease of use. Need a floating point processor. Drawback of floating point processors: Higher power consumption. Can be more expensive. Can be slower than fixed-point counterparts and larger in size.

16 Floating vs. Fixed point processors
It is the application that dictates which device and platform to use in order to achieve optimum performance at a low cost. For educational purposes, use the floating-point device (C6711) as it can support both fixed and floating point operations.

17 General Purpose DSP vs. DSP in ASIC
Application Specific Integrated Circuits (ASICs) are semiconductors designed for dedicated functions. The advantages and disadvantages of using ASICs are listed below: Advantages Disadvantages High throughput Lower silicon area Lower power consumption Improved reliability Reduction in system noise Low overall system cost High investment cost Less flexibility Long time from design to market

18 Texas Instruments’ TMS320 family
Different families and sub-families exist to support different markets. Lowest Cost Control Systems Motor Control Storage Digital Ctrl Systems C2000 C5000 Efficiency Best MIPS per Watt / Dollar / Size Wireless phones Internet audio players Digital still cameras Modems Telephony VoIP C6000 Multi Channel and Multi Function App's Comm Infrastructure Wireless Base-stations DSL Imaging Multi-media Servers Video Performance & Best Ease-of-Use

19 TMS320C64x: The C64x fixed-point DSPs offer the industry's highest level of performance to address the demands of the digital age. At clock rates of up to 1 GHz, C64x DSPs can process information at rates up to 8000 MIPS with costs as low as $ In addition to a high clock rate, C64x DSPs can do more work each cycle with built-in extensions. These extensions include new instructions to accelerate performance in key application areas such as digital communications infrastructure and video and image processing. TMS320C62x: These first-generation fixed-point DSPs represent breakthrough technology that enables new equipments and energizes existing implementations for multi-channel, multi-function applications, such as wireless base stations, remote access servers (RAS), digital subscriber loop (xDSL) systems, personalized home security systems, advanced imaging/biometrics, industrial scanners, precision instrumentation and multi-channel telephony systems. TMS320C67x:  For designers of high-precision applications, C67x floating-point DSPs offer the speed, precision, power savings and dynamic range to meet a wide variety of design needs. These dynamic DSPs are the ideal solution for demanding applications like audio, medical imaging, instrumentation and automotive.

20 C6000 Roadmap Object Code Software Compatibility Performance
Highest Performance Object Code Software Compatibility Floating Point Multi-core C64x™ DSP 1.1 GHz Performance C6412 DM642 2nd Generation C6415 C6416 C6411 C6414 1st Generation C6713 C6203 C6202 C6204 C6205 C6201 C6211 C62x/C64x/DM642: Fixed Point C67x: Floating Point C6701 C6711 C6712 Time

21 Useful Links Selection Guide: \Links\DSP Selection Guide.pdf
\Links\DSP Selection Guide.pdf (3Q 2004) \Links\DSP Selection Guide.pdf (4Q 2004)

22 Learning Objectives Describe C6000 CPU architecture.
Introduce some basic instructions. Describe the C6000 memory map. Provide an overview of the peripherals.

23 General DSP System Block Diagram
External Memory Internal Memory Internal Buses P E R I P H E R A L S Central Processing Unit

24 Implementation of Sum of Products (SOP)
Two basic operations are required for this algorithm. (1) Multiplication (2) Addition Therefore two basic instructions are required Y = N an xn n = 1 * = a1 * x1 + a2 * x aN * xN It has been shown in Chapter 1 that SOP is the key element for most DSP algorithms. So let’s write the code for this algorithm and at the same time discover the C6000 architecture.

25 Implementation of Sum of Products (SOP)
Two basic operations are required for this algorithm. (1) Multiplication (2) Addition Therefore two basic instructions are required Y = N an xn n = 1 * So let’s implement the SOP algorithm! The implementation in this module will be done in assembly. = a1 * x1 + a2 * x aN * xN

26 Multiply (MPY) Y = an xn *
= a1 * x1 + a2 * x aN * xN The multiplication of a1 by x1 is done in assembly by the following instruction: MPY a1, x1, Y This instruction is performed by a multiplier unit that is called “.M”

27 Multiply (.M unit) Y = 40 an xn n = 1 * .M The . M unit performs multiplications in hardware MPY .M a1, x1, Y Note: 16-bit by 16-bit multiplier provides a 32-bit result. 32-bit by 32-bit multiplier provides a 64-bit result.

28 Addition (.?) Y = an xn * .M .? MPY .M a1, x1, prod ADD .? Y, prod, Y
40 an xn n = 1 * .M MPY .M a1, x1, prod ADD .? Y, prod, Y .?

29 Add (.L unit) Y = 40 an xn n = 1 * .M MPY .M a1, x1, prod ADD .L Y, prod, Y .L RISC processors such as the C6000 use registers to hold the operands, so lets change this code.

30 Register File - A .M .L A0 A1 A2 A3 A31 Register File A . . . a1 x1 prod 32-bits Y Y = 40 an xn n = 1 * MPY .M a1, x1, prod ADD .L Y, prod, Y Let us correct this by replacing a, x, prod and Y by the registers as shown above.

31 Specifying Register Names
.L A0 A1 A2 A3 A31 Register File A . . . a1 x1 prod 32-bits Y Y = 40 an xn n = 1 * MPY .M A0, A1, A3 ADD .L A4, A3, A4 The registers A0, A1, A3 and A4 contain the values to be used by the instructions.

32 Specifying Register Names
.L A0 A1 A2 A3 A31 Register File A . . . a1 x1 prod 32-bits Y Y = 40 an xn n = 1 * MPY .M A0, A1, A3 ADD .L A4, A3, A4 Register File A contains 32 registers (A0 –A31) which are 32-bits wide.

33 Data loading Q: How do we load the operands into the registers? .M .L
Register File A . . . a1 x1 prod 32-bits Y Q: How do we load the operands into the registers?

34 Load Unit “.D” Q: How do we load the operands into the registers? .M
Register File A . . . a1 x1 prod 32-bits Y .D Data Memory Q: How do we load the operands into the registers? A: The operands are loaded into the registers by loading them from the memory using the .D unit.

35 Load Unit “.D” .M .L A0 A1 A2 A3 A31 Register File A . . . a1 x1 prod 32-bits Y .D Data Memory It is worth noting at this stage that the only way to access memory is through the .D unit.

36 Load Instruction .M .L A0 A1 A2 A3 A31 Register File A . . . a1 x1 prod 32-bits Y .D Data Memory Q: Which instruction(s) can be used for loading operands from the memory to the registers?

37 Load Instructions (LDB, LDH,LDW,LDDW)
.M .L A0 A1 A2 A3 A31 Register File A . . . a1 x1 prod 32-bits Y .D Data Memory Q: Which instruction(s) can be used for loading operands from the memory to the registers? A: The load instructions.

38 Using the Load Instructions
Before using the load unit you have to be aware that this processor is byte addressable, which means that each byte is represented by a unique address. Also the addresses are 32-bit wide. Data address FFFFFFFF 16-bits

39 Using the Load Instructions
The syntax for the load instruction is: Where: Rn is a register that contains the address of the operand to be loaded and Rm is the destination register. Data address a1 x1 LD *Rn,Rm prod Y FFFFFFFF 16-bits

40 Using the Load Instructions
The syntax for the load instruction is: The question now is how many bytes are going to be loaded into the destination register? Data address a1 x1 LD *Rn,Rm prod Y FFFFFFFF 16-bits

41 Using the Load Instructions
The syntax for the load instruction is: Data address a1 x1 LD *Rn,Rm prod The answer, is that it depends on the instruction you choose: LDB: loads one byte (8-bit) LDH: loads half word (16-bit) LDW: loads a word (32-bit) LDDW: loads a double word (64-bit) Note: LD on its own does not exist. Y FFFFFFFF 16-bits

42 Using the Load Instructions
Data The syntax for the load instruction is: address 1 0xA 0xB 0xC 0xD LD *Rn,Rm 0x2 0x1 0x4 0x3 Example: If we assume that A5 = 0x4 then: (1) LDB *A5, A7 ; gives A7 = 0x (2) LDH *A5,A7; gives A7 = 0x (3) LDW *A5,A7; gives A7 = 0x (4) LDDW *A5,A7:A6; gives A7:A6 = 0x 0x6 0x5 0x8 0x7 FFFFFFFF 16-bits

43 Using the Load Instructions
The syntax for the load instruction is: Data address 0xA 0xB 0xC 0xD LD *Rn,Rm 0x2 0x1 0x4 0x3 Question: If data can only be accessed by the load instruction and the .D unit, how can we load the register pointer Rn in the first place? 0x6 0x5 0x8 0x7 FFFFFFFF 16-bits

44 Loading the Pointer Rn The instruction MVKL will allow a move of a 16-bit constant into a register as shown below: MVKL .? a, A5 (‘a’ is a constant or label) How many bits represent a full address? 32 bits So why does the instruction not allow a 32-bit move? All instructions are 32-bit wide (see instruction opcode).

45 Loading the Pointer Rn To solve this problem another instruction is available: MVKH ah x al a A5 eg MVKH .? a, A5 (‘a’ is a constant or label) Finally, to move the 32-bit address to a register we can use: MVKL a, A5 MVKH a, A5

46 Loading the Pointer Rn Always use MVKL then MVKH, look at the following examples: Example 1 A5 = 0x MVKL 0x1234FABC, A5 A5 = 0xFFFFFABC (sign extension) MVKH 0x1234FABC, A5 A5 = 0x1234FABC ; OK Example 2 MVKH 0x1234FABC, A5 A5 = 0x MVKL 0x1234FABC, A5 A5 = 0xFFFFFABC ; Wrong

47 LDH, MVKL and MVKH .M .L .D Data Memory MVKL pt1, A5 Register File A
MVKH pt1, A5 MVKL pt2, A6 MVKH pt2, A6 LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 Register File A A0 A1 A2 A3 A4 A31 a x .M prod Y .L . . . .D 32-bits pt1 and pt2 point to some locations in the data memory. Data Memory

48 Creating a loop MVKL pt1, A5 MVKH pt1, A5 MVKL pt2, A6 MVKH pt2, A6 LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 So far we have only implemented the SOP for one tap only, i.e. Y= a1 * x1 So let’s create a loop so that we can implement the SOP for N Taps.

49 Creating a loop So far we have only implemented the SOP for one tap only, i.e. Y= a1 * x1 So let’s create a loop so that we can implement the SOP for N Taps. With the C6000 processors there are no dedicated instructions such as block repeat. The loop is created using the B instruction.

50 What are the steps for creating a loop
1. Create a label to branch to. 2. Add a branch instruction, B. 3. Create a loop counter. 4. Add an instruction to decrement the loop counter. 5. Make the branch conditional based on the value in the loop counter.

51 1. Create a label to branch to
MVKL pt1, A5 MVKH pt1, A5 MVKL pt2, A6 MVKH pt2, A6 loop LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4

52 2. Add a branch instruction, B.
MVKL pt1, A5 MVKH pt1, A5 MVKL pt2, A6 MVKH pt2, A6 loop LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 B .? loop

53 Which unit is used by the B instruction?
.M .L A0 A1 A2 A3 A15 Register File A . . . a x prod 32-bits Y MVKL pt1, A5 MVKH pt1, A5 MVKL pt2, A6 MVKH pt2, A6 loop LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 B .? loop Register File A A0 A1 A2 A3 A31 .S a x .M prod Y .L . . . .D .D 32-bits Data Memory

54 Which unit is used by the B instruction?
.M .L A0 A1 A2 A3 A15 Register File A . . . a x prod 32-bits Y MVKL .S pt1, A5 MVKH .S pt1, A5 MVKL .S pt2, A6 MVKH .S pt2, A6 loop LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 B .S loop Register File A A0 A1 A2 A3 A31 .S a x .M prod Y .L . . . .D .D 32-bits Data Memory

55 3. Create a loop counter. .S .M .M .L .L .D .D Data Memory A0 A1 A2 A3
Register File A . . . a x prod 32-bits Y MVKL .S pt1, A5 MVKH .S pt1, A5 MVKL .S pt2, A6 MVKH .S pt2, A6 MVKL .S count, B0 loop LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 B .S loop Register File A A0 A1 A2 A3 A31 .S a x .M prod Y .L . . . .D .D 32-bits B registers will be introduced later Data Memory

56 4. Decrement the loop counter
A0 A1 A2 A3 A15 Register File A . . . a x prod 32-bits Y MVKL .S pt1, A5 MVKH .S pt1, A5 MVKL .S pt2, A6 MVKH .S pt2, A6 MVKL .S count, B0 loop LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 SUB .S B0, 1, B0 B .S loop Register File A A0 A1 A2 A3 A31 .S a x .M prod Y .L . . . .D .D 32-bits Data Memory

57 5. Make the branch conditional based on the value in the loop counter
What is the syntax for making instruction conditional? [condition] Instruction Label e.g. [B1] B loop (1) The condition can be one of the following registers: A1, A2, B0, B1, B2. (2) Any instruction can be conditional.

58 5. Make the branch conditional based on the value in the loop counter
The condition can be inverted by adding the exclamation symbol “!” as follows: [!condition] Instruction Label e.g. [!B0] B loop ;branch if B0 = 0 [B0] B loop ;branch if B0 != 0

59 5. Make the branch conditional
Register File A . . . a x prod 32-bits Y MVKL .S2 pt1, A5 MVKH .S2 pt1, A5 MVKL .S2 pt2, A6 MVKH .S2 pt2, A6 MVKL .S2 count, B0 loop LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 SUB .S B0, 1, B0 [B0] B .S loop Register File A A0 A1 A2 A3 A31 .S a x .M prod Y .L . . . .D .D 32-bits Data Memory

60 More on the Branch Instruction (1)
With this processor all the instructions are encoded in a 32-bit. Therefore the label must have a dynamic range of less than 32-bit as the instruction B has to be coded. 21-bit relative address B 32-bit Case 1: B .S1 label Relative branch. Label limited to +/- 220 offset.

61 More on the Branch Instruction (2)
By specifying a register as an operand instead of a label, it is possible to have an absolute branch. This will allow a dynamic range of 232. 5-bit register code B 32-bit Case 2: B .S2 register Absolute branch. Operates on .S2 ONLY!

62 This code performs the following
Testing the code This code performs the following operations: a0*x0 + a0*x0 + a0*x0 + … + a0*x0 MVKL .S2 pt1, A5 MVKH .S2 pt1, A5 MVKL .S2 pt2, A6 MVKH .S2 pt2, A6 MVKL .S2 count, B0 loop LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 SUB .S B0, 1, B0 [B0] B .S loop However, we would like to perform: a0*x0 + a1*x1 + a2*x2 + … + aN*xN

63 Modifying the pointers
The solution is to modify the pointers A5 and A6. MVKL .S2 pt1, A5 MVKH .S2 pt1, A5 MVKL .S2 pt2, A6 MVKH .S2 pt2, A6 MVKL .S2 count, B0 loop LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 SUB .S B0, 1, B0 [B0] B .S loop

64 Indexing Pointers Description Pointer Syntax Pointer Modified *R No
In this case the pointers are used but not modified. R can be any register

65 Indexing Pointers Description Pointer + Pre-offset - Pre-offset Syntax
Pointer Modified *R *+R[disp] *-R[disp] No In this case the pointers are modified BEFORE being used and RESTORED to their previous values. [ disp] specifies the number of elements size in DW (64-bit), W (32-bit), H (16-bit), or B (8-bit). disp = R or 5-bit constant. R can be any register.

66 Indexing Pointers Description Pointer + Pre-offset - Pre-offset
Pre-increment Pre-decrement Syntax Pointer Modified *R *+R[disp] *-R[disp] *++R[disp] *--R[disp] No Yes In this case the pointers are modified BEFORE being used and NOT RESTORED to their Previous Values.

67 Indexing Pointers Description Pointer + Pre-offset - Pre-offset
Pre-increment Pre-decrement Post-increment Post-decrement Syntax Pointer Modified *R *+R[disp] *-R[disp] *++R[disp] *--R[disp] *R++[disp] *R--[disp] No Yes In this case the pointers are modified AFTER being used and NOT RESTORED to their Previous Values.

68 Indexing Pointers Syntax Description Pointer Modified *R *+R[disp]
+ Pre-offset - Pre-offset Pre-increment Pre-decrement Post-increment Post-decrement No Yes [disp] specifies # elements - size in DW, W, H, or B. disp = R or 5-bit constant. R can be any register.

69 Modify and testing the code
This code now performs the following operations: a0*x0 + a1*x1 + a2*x aN*xN MVKL .S2 pt1, A5 MVKH .S2 pt1, A5 MVKL .S2 pt2, A6 MVKH .S2 pt2, A6 MVKL .S2 count, B0 loop LDH .D *A5++, A0 LDH .D *A6++, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 SUB .S B0, 1, B0 [B0] B .S loop

70 This code now performs the following
Store the final result This code now performs the following operations: a0*x0 + a1*x1 + a2*x aN*xN MVKL .S2 pt1, A5 MVKH .S2 pt1, A5 MVKL .S2 pt2, A6 MVKH .S2 pt2, A6 MVKL .S2 count, B0 loop LDH .D *A5++, A0 LDH .D *A6++, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 SUB .S B0, 1, B0 [B0] B .S loop STH .D A4, *A7

71 The Pointer A7 has not been initialised.
Store the final result The Pointer A7 has not been initialised. MVKL .S2 pt1, A5 MVKH .S2 pt1, A5 MVKL .S2 pt2, A6 MVKH .S2 pt2, A6 MVKL .S2 count, B0 loop LDH .D *A5++, A0 LDH .D *A6++, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 SUB .S B0, 1, B0 [B0] B .S loop STH .D A4, *A7

72 The Pointer A7 is now initialised.
Store the final result The Pointer A7 is now initialised. MVKL .S2 pt1, A5 MVKH .S2 pt1, A5 MVKL .S2 pt2, A6 MVKH .S2 pt2, A6 MVKL .S2 pt3, A7 MVKH .S2 pt3, A7 MVKL .S2 count, B0 loop LDH .D *A5++, A0 LDH .D *A6++, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 SUB .S B0, 1, B0 [B0] B .S loop STH .D A4, *A7

73 What is the initial value of A4?
A4 is used as an accumulator, so it needs to be reset to zero. MVKL .S2 pt1, A5 MVKH .S2 pt1, A5 MVKL .S2 pt2, A6 MVKH .S2 pt2, A6 MVKL .S2 pt3, A7 MVKH .S2 pt3, A7 MVKL .S2 count, B0 ZERO .L A4 loop LDH .D *A5++, A0 LDH .D *A6++, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 SUB .S B0, 1, B0 [B0] B .S loop STH .D A4, *A7

74 How can we add more processing power to this processor?
Increasing the processing power! How can we add more processing power to this processor? Register File A A0 .S1 A1 A2 .M1 A3 A4 .L1 . . . .D1 A31 32-bits Data Memory

75 (1) Increase the clock frequency.
Increasing the processing power! Register File A A0 .S1 A1 (1) Increase the clock frequency. A2 .M1 A3 (2) Increase the number of Processing units. A4 .L1 . . . .D1 A31 32-bits Data Memory

76 To increase the Processing Power, this processor has two sides (A and B or 1 and 2)
.M1 .L1 .D1 A0 A1 A2 A3 A4 Register File A . . . A31 32-bits .S2 .M2 .L2 .D2 B0 B1 B2 B3 B4 Register File B . . . B31 32-bits Data Memory

77 Can the two sides exchange operands in order to increase performance?
Register File A .S2 .M2 .L2 .D2 B0 B1 B2 B3 B4 Register File B . . . 32-bits A0 .S1 A1 A2 .M1 A3 A4 .L1 . . . .D1 A31 B31 32-bits Data Memory

78 The answer is YES but there are limitations.
To exchange operands between the two sides, some cross paths or links are required. What is a cross path? A cross path links one side of the CPU to the other. There are two types of cross paths: Data cross paths. Address cross paths.

79 Data Cross Paths Data cross paths can also be referred to as register file cross paths. These cross paths allow operands from one side to be used by the other side. There are only two cross paths: one path which conveys data from side B to side A, 1X. one path which conveys data from side A to side B, 2X.

80 TMS320C67x Data-Path

81 Data Cross Paths Data cross paths only apply to the .L, .S and .M units. The data cross paths are very useful, however there are some limitations in their use.

82 Data Cross Path Limitations
B 1x <src> <dst> 2x (1) The destination register must be on same side as unit. (2) Source registers - up to one cross path per execute packet per side. Execute packet: group of instructions that execute simultaneously.

83 Data Cross Path Limitations
B 1x <src> <dst> 2x eg: ADD .L1x A0,A1,B2 MPY .M1x A0,B6,A9 SUB .S1x A8,B2,A8 || ADD .L1x A0,B0,A2 || Means that the SUB and ADD belong to the same fetch packet, therefore execute simultaneously.

84 Data Cross Path Limitations
<src> <dst> <src> 2x eg: ADD .L1x A0,A1,B2 MPY .M1x A0,B6,A9 SUB .S1x A8,B2,A8 || ADD .L1x A0,B0,A2 NOT VALID! B 1x

85 Data Cross Paths for both sides
.L1 .M1 .S1 <src> <dst> <src> 2x B 1x <src> .L2 .M2 .S2 <dst> <src>

86 A .D1 Address cross paths Data Addr LDW.D1T1 *A0,A5 STW.D1T1 A5,*A0
(1) The pointer must be on the same side of the unit. LDW.D1T1 *A0,A5 STW.D1T1 A5,*A0

87 Load or store to either side
Data1 A5 .D1 DA1 = T1 *A0 B DA2 = T2 LDW.D1T1 *A0,A5 LDW.D1T2 *A0,B5 Data2 B5

88 Standard Parallel Loads
Data1 A5 .D1 DA1 = T1 *A0 .D2 B DA2 = T2 *B0 B5 LDW.D1T1 *A0,A5 || LDW.D2T2 *B0,B5

89 Parallel Load/Store using address cross paths
Data1 A5 .D1 DA1 = T1 *A0 .D2 B DA2 = T2 *B0 B5 LDW.D1T2 *A0,B5 || STW.D2T1 A5,*B0

90 Fill the blanks ... Does this work?
Data1 .D1 DA1 = T1 *A0 .D2 B DA2 = T2 *B0 LDW.D1__ *A0,B5 || STW.D2__ B6,*B0

91 Not Allowed! Parallel accesses: both cross or neither cross
Data1 .D1 *A0 LDW.D1T2 *A0,B5 || STW.D2T2 B6,*B0 .D2 B DA2 = T2 *B0 B5 B6

92 Conditions Don’t Use Cross Paths
If a conditional register comes from the opposite side, it does NOT use a data or address cross-path. Examples: [B2] ADD .L1 A2,A0,A4 [A1] LDW .D2 *B0,B5

93 Full CPU Datapath (Pg 2-2)
‘C62x Data-Path Summary CPU Ref Guide Full CPU Datapath (Pg 2-2)

94 ‘C67x Data-Path Summary ‘C67x

95 Cross Paths - Summary Data Address
Destination register on same side as unit. Source registers - up to one cross path per execute packet per side. Use “x” to indicate cross-path. Address Pointer must be on same side as unit. Data can be transferred to/from either side. Parallel accesses: both cross or neither cross. Conditionals Don’t Use Cross Paths.

96 Code Review (using side A only)
40 an xn n = 1 * MVK .S1 40, A2 ; A2 = 40, loop count loop: LDH .D1 *A5++, A0 ; A0 = a(n) LDH .D1 *A6++, A1 ; A1 = x(n) MPY .M1 A0, A1, A3 ; A3 = a(n) * x(n) ADD .L1 A3, A4, A4 ; Y = Y + A3 SUB .L1 A2, 1, A2 ; decrement loop count [A2] B .S1 loop ; if A2  0, branch STH .D1 A4, *A7 ; *A7 = Y Note: Assume that A4 was previously cleared and the pointers are initialised.

97 Consider first the case of the .L and .S units.
Let us have a look at the final details concerning the functional units. Consider first the case of the .L and .S units.

98 Operands - 32/40-bit Register, 5-bit Constant
Operands can be: 5-bit constants (or 16-bit for MVKL and MVKH). 32-bit registers. 40-bit Registers. However, we have seen that registers are only 32-bit. So where do the 40-bit registers come from?

99 Operands - 32/40-bit Register, 5-bit Constant
A 40-bit register can be obtained by concatenating two registers. However, there are 3 conditions that need to be respected: The registers must be from the same side. The first register must be even and the second odd. The registers must be consecutive.

100 Operands - 32/40-bit Register, 5-bit Constant
All combinations of 40-bit registers are shown below: A1:A0 A3:A2 A5:A4 A7:A6 A9:A8 A11:A10 A13:A12 A15:A14 odd even : 32 8 40-bit Reg B1:B0 B3:B2 B5:B4 B7:B6 B9:B8 B11:B10 B13:B12 B15:B14 odd even : 32 8 40-bit Reg

101 Operands - 32/40-bit Register, 5-bit Constant
instr .unit <src>, <src>, <dst> 32-bit Reg 5-bit Const 40-bit Reg < src > < src > .L or .S < dst > 32-bit Reg 40-bit Reg

102 Operands - 32/40-bit Register, 5-bit Constant
instr .unit <src>, <src>, <dst> 32-bit Reg 40-bit Reg < src > 5-bit Const < dst > .L or .S

103 Operands - 32/40-bit Register, 5-bit Constant
instr .unit <src>, <src>, <dst> 32-bit Reg 40-bit Reg < src > 5-bit Const < dst > .L or .S OR.L1 A0, A1, A2

104 Operands - 32/40-bit Register, 5-bit Constant
instr .unit <src>, <src>, <dst> 32-bit Reg 40-bit Reg < src > 5-bit Const < dst > .L or .S OR.L1 A0, A1, A2 ADD.L2 -5, B3, B4

105 Operands - 32/40-bit Register, 5-bit Constant
instr .unit <src>, <src>, <dst> 32-bit Reg 40-bit Reg < src > 5-bit Const < dst > .L or .S OR.L1 A0, A1, A2 ADD.L2 -5, B3, B4 ADD.L1 A2, A3, A5:A4

106 Operands - 32/40-bit Register, 5-bit Constant
instr .unit <src>, <src>, <dst> 32-bit Reg 40-bit Reg < src > 5-bit Const < dst > .L or .S OR.L1 A0, A1, A2 ADD.L2 -5, B3, B4 ADD.L1 A2, A3, A5:A4 SUB.L1 A2, A5:A4, A5:A4

107 Operands - 32/40-bit Register, 5-bit Constant
instr .unit <src>, <src>, <dst> 32-bit Reg 40-bit Reg < src > 5-bit Const < dst > .L or .S OR.L1 A0, A1, A2 ADD.L2 -5, B3, B4 ADD.L1 A2, A3, A5:A4 SUB.L1 A2, A5:A4, A5:A4 ADD.L2 3, B9:B8, B9:B8

108 Register to register data transfer
To move the content of a register (A or B) to another register (B or A) use the move “MV” Instruction, e.g.: MV A0, B0 MV B6, B7 To move the content of a control register to another register (A or B) or vice-versa use the MVC instruction, e.g.: MVC IFR, A0 MVC A0, IRP

109 TMS320C6000 Instruction Set

110 'C62x Instruction Set (by category)
Arithmetic ABS ADD ADDA ADDK ADD2 MPY MPYH NEG SMPY SMPYH SADD SAT SSUB SUB SUBA SUBC SUB2 ZERO Logical AND CMPEQ CMPGT CMPLT NOT OR SHL SHR SSHL XOR Data Mgmt LDB/H/W MV MVC MVK MVKL MVKH MVKLH STB/H/W Program Ctrl B IDLE NOP Bit Mgmt CLR EXT LMBD NORM SET Note: Refer to the 'C6000 CPU Reference Guide for more details.

111 'C62x Instruction Set (by unit)
.S Unit MVKLH NEG NOT OR SET SHL SHR SSHL SUB SUB2 XOR ZERO ADD ADDK ADD2 AND B CLR EXT MV MVC MVK MVKL MVKH .L Unit NOT OR SADD SAT SSUB SUB SUBC XOR ZERO ABS ADD AND CMPEQ CMPGT CMPLT LMBD MV NEG NORM .D Unit STB/H/W SUB SUBA ZERO ADD ADDA LDB/H/W MV NEG .M Unit SMPY SMPYH MPY MPYH Other IDLE NOP Note: Refer to the 'C6000 CPU Reference Guide for more details.

112 ' C6700: Superset of Fixed-Point (by unit)
.L .D .S .M No Unit Used IDLE NOP .S Unit NEG NOT OR SET SHL SHR SSHL SUB SUB2 XOR ZERO ADD ADDK ADD2 AND B CLR EXT MV MVC MVK MVKL MVKH ABSSP ABSDP CMPGTSP CMPEQSP CMPLTSP CMPGTDP CMPEQDP CMPLTDP RCPSP RCPDP RSQRSP RSQRDP SPDP .L Unit NOT OR SADD SAT SSUB SUB SUBC XOR ZERO ABS ADD AND CMPEQ CMPGT CMPLT LMBD MV NEG NORM ADDSP ADDDP SUBSP SUBDP INTSP INTDP SPINT DPINT SPRTUNC DPTRUNC DPSP .M Unit SMPY SMPYH MPY MPYH MPYLH MPYHL MPYSP MPYDP MPYI MPYID .D Unit NEG STB (B/H/W) SUB SUBAB (B/H/W) ZERO ADD ADDAB (B/H/W) ADDAD LDB (B/H/W) LDDW MV ‘C67x This foil doesn’t need much introduction. The ‘C67x includes all the ‘C62x instructions along with these additional instructions. Those that end in SP are for single-precision floating-point; those in DP are for double-precision floating-point. The ‘C67x is the only DSP (if not only microprocessor) with hardware support for double-precision. Note: Refer to the 'C6000 CPU Reference Guide for more details.

113 Superset of Fixed-Point
Instruction Fetch Control Registers Interrupt Control Instruction Dispatch Emulation Advanced Instruction Packing Advanced Emulation Instruction Decode Registers (A0 - A15) Registers (B0 - B15) Registers (A16 - A31) Registers (B16 - B31) L1 S1 M1 D1 D2 M2 S2 L2 + VelociTI.2 CPU Enhancements 64 registers in total 64 bit load/store datapath Dual 16-Bit Arithmetic on 6 Functional Units Quad 8-Bit Arithmetic on 4 Functional Units Instruction set extensions for communications and imaging Increased Orthogonality Increased Code Density C64x CPU 1.3 ns Cycle Time 6000 MIPs at 750MHz bit MMACs: 4 Per Cycle bit MMACs: 8 Per Cycle 100% Object code compatible with C62x + X + + X + + + x X + + ‘C62x: Dual 32-Bit Load/Store ‘C64x: Dual 64-Bit Load/Store ‘C67x: Dual 64-Bit Load/32-Bit Store

114 'C64x: Superset of ‘C62x .S .L .D .M
Data Pack/Un PACK2 PACKH2 PACKLH2 PACKHL2 UNPKHU4 UNPKLU4 SWAP2 SPACK2 SPACKU4 Dual/Quad Arith SADD2 SADDUS2 SADD4 Bitwise Logical ANDN Shifts & Merge SHR2 SHRU2 SHLMB SHRMB Compares CMPEQ2 CMPEQ4 CMPGT2 CMPGT4 Branches/PC BDEC BPOS BNOP ADDKPC .L Data Pack/Un PACK2 PACKH2 PACKLH2 PACKHL2 PACKH4 PACKL4 UNPKHU4 UNPKLU4 SWAP2/4 Dual/Quad Arith ABS2 ADD2 ADD4 MAX MIN SUB2 SUB4 SUBABS4 Bitwise Logical ANDN Shift & Merge SHLMB SHRMB Load Constant MVK (5-bit) Bit Operations BITC4 BITR DEAL SHFL Move MVD Average AVG2 AVG4 Shifts ROTL SSHVL SSHVR Multiplies MPYHI MPYLI MPYHIR MPYLIR MPY2 SMPY2 DOTP2 DOTPN2 DOTPRSU2 DOTPNRSU2 DOTPU4 DOTPSU4 GMPY4 XPND2/4 .D Mem Access LDDW LDNW LDNDW STDW STNW STNDW Load Constant MVK (5-bit) Dual Arithmetic ADD2 SUB2 Bitwise Logical AND ANDN OR XOR Address Calc. ADDAD .M This foil doesn’t need much introduction. The ‘C67x includes all the ‘C62x instructions along with these additional instructions. Those that end in SP are for single-precision floating-point; those in DP are for double-precision floating-point. The ‘C67x is the only DSP (if not only microprocessor) with hardware support for double-precision.

115 TMS320C6000 Memory

116 Memory size per device Devices Internal EMIFA EMIFB C6201, C6701
P = kB D = kB 52M Bytes (32-bits wide) N/A C6202 P = kB D = kB C6203 P = kB D = kB C6211 C6711 L1P = 4 kB L1D = 4 kB L2 = 64 kB 128M Bytes C6712 64M Bytes (16-bits wide) C6713 L2 = kB C6411 DM642 L1P = 16 kB L1D = 16 kB C6414 C6415 C6416 L1P = 16 kB L1D = 16 kB L2 = 1 MB 256M Bytes (64-bits wide)

117 Internal Memory Summary
Devices Internal (L2) External C6211 C6711 C6713 64 kB 512M (32-bit wide) C6712 256 kB (16-bit wide) Devices Internal (L2) External C6414 C6415 C6416 1 MB A: 1GB (64-bit) B: 256kB (16-bit) DM642 256 kB 1GB (64-bit) C6411 256MB (32-bit) LINK: TMS320C6000 DSP Generation

118 TMS320C6000 Peripherals

119 'C6x System Block Diagram
P E R I P H E R A L S Memory Internal Buses External CPU .D1 .M1 .L1 .S1 .D2 .M2 .L2 .S2 Regs (B0-B15) Regs (A0-A15) Control Regs

120 ‘C6x Internal Buses A D PC Internal Memory A External Interface B
Program Addr x32 Program Data x256 PC A D Internal Memory x32 External Interface Peripherals Data Addr - T x32 Data Data - T x32/64 Data Addr - T2 x32 Data Data - T x32/64 A regs B DMA Addr - Read x32 DMA Data - Read x32 DMA Addr - Write x32 DMA Data - Write x32 DMA can perform 64-bit data loads. ‘C67x

121 'C6x System Block Diagram
Memory Internal Buses P E R I P H E R A L S EMIF Ext’l Memory CPU .D1 .M1 .L1 .S1 .D2 .M2 .L2 .S2 Regs (B0-B15) Regs (A0-A15) Control Regs

122 'C6x System Block Diagram
Program RAM Data Ram Addr Internal Buses P E R I P H E R A L S D (32) EMIF Ext’l Memory CPU .D1 .M1 .L1 .S1 .D2 .M2 .L2 .S2 Regs (B0-B15) Regs (A0-A15) Control Regs - Sync - Async

123 'C6000 Peripherals .D1 .M1 .L1 .S1 .D2 .M2 .L2 .S2 External Memory
Register Set B Register Set A CPU Internal Buses Internal Memory Parallel Comm GPIO External Memory EMIF Serial DMA, EDMA (Boot) Timers Each of these peripherals has a module dedicated to them. (I don’t discuss this, but we don’t really have material on the timers - these are easy enough to figure out on their own from the specs). The main point here is to simply say that each of these can exist on the C6x and a one sentence description of their capability. I sometimes note that the EMIF is considered a peripheral - outside of the core CPU. Depending on the exact device (C6201 for example), the peripheral mix may change. Don’t get into too much detail on any one peripheral - unless the question is simple/quick to answer - again, we will have time to explore each of these later. Ethernet Video Ports VCP / TCP PLL

124 EMIF External Memory Interface (EMIF) Async .D1 .M1 .L1 .S1 .D2 .M2
Register Set B Register Set A CPU Internal Buses Internal Memory SDRAM EMIF SBSRAM Each of these peripherals has a module dedicated to them. (I don’t discuss this, but we don’t really have material on the timers - these are easy enough to figure out on their own from the specs). The main point here is to simply say that each of these can exist on the C6x and a one sentence description of their capability. I sometimes note that the EMIF is considered a peripheral - outside of the core CPU. Depending on the exact device (C6201 for example), the peripheral mix may change. Don’t get into too much detail on any one peripheral - unless the question is simple/quick to answer - again, we will have time to explore each of these later. External Memory Interface (EMIF) Glueless access to async/sync memory Works with PC100/133 SDRAM (cheap, fast, and easy!) Byte-wide data access 16, 32, or 64-bit bus widths

125 HPI / XBUS / PCI Parallel Communication Interfaces .D1 .M1 .L1 .S1 .D2
Register Set B Register Set A CPU Internal Buses Internal Memory Parallel Comm External Memory EMIF Parallel Communication Interfaces HPI: Dedicated, slave-only, async 16/32-bit bus allows host-P access to C6000 memory XBUS: Similar to HPI but provides …  Master/slave and sync modes  Glueless i/f to FIFOs (up to single-cycle xfer rate) PCI: Standard 32-bit, 33MHz/66MHz PCI interface These interfaces provide means to bootstrap the C6000 Each of these peripherals has a module dedicated to them. (I don’t discuss this, but we don’t really have material on the timers - these are easy enough to figure out on their own from the specs). The main point here is to simply say that each of these can exist on the C6x and a one sentence description of their capability. I sometimes note that the EMIF is considered a peripheral - outside of the core CPU. Depending on the exact device (C6201 for example), the peripheral mix may change. Don’t get into too much detail on any one peripheral - unless the question is simple/quick to answer - again, we will have time to explore each of these later.

126 GPIO General Purpose Input/Output (GPIO) .D1 .M1 .L1 .S1 .D2 .M2 .L2
Register Set B Register Set A CPU Internal Buses Internal Memory Parallel Comm GPIO External Memory EMIF Each of these peripherals has a module dedicated to them. (I don’t discuss this, but we don’t really have material on the timers - these are easy enough to figure out on their own from the specs). The main point here is to simply say that each of these can exist on the C6x and a one sentence description of their capability. I sometimes note that the EMIF is considered a peripheral - outside of the core CPU. Depending on the exact device (C6201 for example), the peripheral mix may change. Don’t get into too much detail on any one peripheral - unless the question is simple/quick to answer - again, we will have time to explore each of these later. General Purpose Input/Output (GPIO) C64x and C6713 provide 8-16 bits of general purpose bit I/O Use to observe or control the signal of a single-pin

127 McBSP and Utopia Multi-Channel Buffered Serial Port (McBSP)
2 (or 3) full-duplex, synchronous serial-ports Up to 100 Mb/sec performance Supports multi-channel operation (T1, E1, MVIP, …) .D1 .M1 .L1 .S1 .D2 .M2 .L2 .S2 Register Set B Register Set A CPU Internal Buses Internal Memory Parallel Comm GPIO External Memory EMIF Serial Multi-Channel Audio Serial Port (McASP) McBSP features plus more … Up to 8 stereo lines (16 channels) IIC support On DM642, C6713 Utopia (C64x) ATM connection 50 MHz wide area network connectivity Each of these peripherals has a module dedicated to them. (I don’t discuss this, but we don’t really have material on the timers - these are easy enough to figure out on their own from the specs). The main point here is to simply say that each of these can exist on the C6x and a one sentence description of their capability. I sometimes note that the EMIF is considered a peripheral - outside of the core CPU. Depending on the exact device (C6201 for example), the peripheral mix may change. Don’t get into too much detail on any one peripheral - unless the question is simple/quick to answer - again, we will have time to explore each of these later.

128 DMA / EDMA Direct Memory Access (DMA / EDMA) .D1 .M1 .L1 .S1 .D2 .M2
Register Set B Register Set A CPU Internal Buses Internal Memory Parallel Comm GPIO External Memory EMIF Serial DMA, EDMA (Boot) Direct Memory Access (DMA / EDMA) Transfers any set of memory locations to another 4 / 16 / 64 channels (transfer parameter sets) Transfers can be triggered by any interrupt (sync) Operates independent of CPU On reset, provides bootstrap from memory Each of these peripherals has a module dedicated to them. (I don’t discuss this, but we don’t really have material on the timers - these are easy enough to figure out on their own from the specs). The main point here is to simply say that each of these can exist on the C6x and a one sentence description of their capability. I sometimes note that the EMIF is considered a peripheral - outside of the core CPU. Depending on the exact device (C6201 for example), the peripheral mix may change. Don’t get into too much detail on any one peripheral - unless the question is simple/quick to answer - again, we will have time to explore each of these later.

129 Timer/Counter Timer / Counter .D1 .M1 .L1 .S1 .D2 .M2 .L2 .S2 External
Register Set B Register Set A CPU Internal Buses Internal Memory Parallel Comm GPIO External Memory EMIF Serial DMA, EDMA (Boot) Timers Each of these peripherals has a module dedicated to them. (I don’t discuss this, but we don’t really have material on the timers - these are easy enough to figure out on their own from the specs). The main point here is to simply say that each of these can exist on the C6x and a one sentence description of their capability. I sometimes note that the EMIF is considered a peripheral - outside of the core CPU. Depending on the exact device (C6201 for example), the peripheral mix may change. Don’t get into too much detail on any one peripheral - unless the question is simple/quick to answer - again, we will have time to explore each of these later. Timer / Counter Two (or three) 32-bit timer/counters Can generate interrupts Both input and output pins

130 Ethernet MAC Ethernet (DM642 only) .D1 .M1 .L1 .S1 .D2 .M2 .L2 .S2
Register Set B Register Set A CPU Internal Buses Internal Memory Parallel Comm GPIO External Memory EMIF Serial DMA, EDMA (Boot) Timers Each of these peripherals has a module dedicated to them. (I don’t discuss this, but we don’t really have material on the timers - these are easy enough to figure out on their own from the specs). The main point here is to simply say that each of these can exist on the C6x and a one sentence description of their capability. I sometimes note that the EMIF is considered a peripheral - outside of the core CPU. Depending on the exact device (C6201 for example), the peripheral mix may change. Don’t get into too much detail on any one peripheral - unless the question is simple/quick to answer - again, we will have time to explore each of these later. Ethernet Ethernet (DM642 only) 10/100 Ethernet MAC Pins are muxed with PCI TCP/IP stack available from TI Video Ports VCP / TCP PLL

131 Video Ports Video Ports (DM642 only) .D1 .M1 .L1 .S1 .D2 .M2 .L2 .S2
Register Set B Register Set A CPU Internal Buses Internal Memory Parallel Comm GPIO External Memory EMIF Video Ports (DM642 only) Each configurable for Capture or Display Dual 8/10-bit BT656 or raw modes 16/20-bit raw modes and 20-bit Y/C for high definition Horz Scaling and Chroma Resampling Support for 8-bit modes Supports transport interface mode Serial DMA, EDMA (Boot) Timers Each of these peripherals has a module dedicated to them. (I don’t discuss this, but we don’t really have material on the timers - these are easy enough to figure out on their own from the specs). The main point here is to simply say that each of these can exist on the C6x and a one sentence description of their capability. I sometimes note that the EMIF is considered a peripheral - outside of the core CPU. Depending on the exact device (C6201 for example), the peripheral mix may change. Don’t get into too much detail on any one peripheral - unless the question is simple/quick to answer - again, we will have time to explore each of these later. Ethernet Video Ports VCP / TCP PLL

132 VCP / TCP -- 3G Wireless .D1 .M1 .L1 .S1 .D2 .M2 .L2 .S2 External
Register Set B Register Set A CPU Internal Buses Internal Memory Parallel Comm GPIO External Memory EMIF Turbo Coprocessor (TCP) (C6416 only) Supports 35 data channels at 384 kbps 3GPP / IS2000 Turbo coder Programmable parameters include mode, rate and frame length Viterbi Coprocessor (VCP) (C6416 only) Supports > 500 voice channels at 8 kbps Programmable decoder parameters include constraint length, code rate, and frame length McBSP’s Utopia DMA, EDMA (Boot) Each of these peripherals has a module dedicated to them. (I don’t discuss this, but we don’t really have material on the timers - these are easy enough to figure out on their own from the specs). The main point here is to simply say that each of these can exist on the C6x and a one sentence description of their capability. I sometimes note that the EMIF is considered a peripheral - outside of the core CPU. Depending on the exact device (C6201 for example), the peripheral mix may change. Don’t get into too much detail on any one peripheral - unless the question is simple/quick to answer - again, we will have time to explore each of these later. Timers Video Ports VCP / TCP PLL

133 Phase Locked Loop (PLL)
.M1 .L1 .S1 .D2 .M2 .L2 .S2 Register Set B Register Set A CPU Internal Buses Internal Memory Parallel Comm GPIO External Memory EMIF Serial Input  CLKIN Output CLKOUT1 CLKOUT2 (reduced rate clkout) DMA, EDMA (Boot) PLL  Clock multiplier  Reduces EMI and cost  Rate is Pin selectable DMA, EDMA (Boot) Timers Each of these peripherals has a module dedicated to them. (I don’t discuss this, but we don’t really have material on the timers - these are easy enough to figure out on their own from the specs). The main point here is to simply say that each of these can exist on the C6x and a one sentence description of their capability. I sometimes note that the EMIF is considered a peripheral - outside of the core CPU. Depending on the exact device (C6201 for example), the peripheral mix may change. Don’t get into too much detail on any one peripheral - unless the question is simple/quick to answer - again, we will have time to explore each of these later. Ethernet Timers Video Ports VCP / TCP PLL

134 When we talk about cycles ...
Clock Cycle What is a clock cycle? The time between successive instructions C6000 CLKIN CLKOUT2 (½, ¼, or 1/6 CLKOUT1) CLKOUT1 (C6000 clock cycle) PLL When we talk about cycles ... CLKIN (MHz) PLL Rate CPU Clock Frequency CPU Clock Cycle Time MIPs (max) 60 x12 720 MHz 1.39 ns 5760 30 x10 300 MHz 3.33 ns 2400 50 x4 200 MHz 5 ns 1600 25 100 MHz 10 ns 800 HIDDEN FOIL If you have customers who are curious as to what we mean by “Clock Cycles” or MIPS rate, you may want to use this additional foil. Better yet, go over this material using the white board. Many people ask - so when you say “clock cycle” what do you mean? This is the def'n. It is the machine rate of the processor and this explains how we get the number. But depending on your system - you can choose any number you want by using a different CLKIN and PLL option.

135 'C6000 Peripherals Summary External Memory .D1 .M1 .L1 .S1 .D2 .M2 .L2
Parallel Comm Internal Memory GPIO External Memory EMIF Internal Buses .D1 .M1 .L1 .S1 .D2 .M2 .L2 .S2 Register Set B Register Set A CPU Serial DMA, EDMA (Boot) Timers Each of these peripherals has a module dedicated to them. (I don’t discuss this, but we don’t really have material on the timers - these are easy enough to figure out on their own from the specs). The main point here is to simply say that each of these can exist on the C6x and a one sentence description of their capability. I sometimes note that the EMIF is considered a peripheral - outside of the core CPU. Depending on the exact device (C6201 for example), the peripheral mix may change. Don’t get into too much detail on any one peripheral - unless the question is simple/quick to answer - again, we will have time to explore each of these later. Ethernet Timers Video Ports VCP / TCP PLL

136 ‘C6x Family Part Numbering
Example = TMS320LC6201PKGA200 TMS320 = TI DSP L = Place holder for voltage levels C6 = C6x family 2 = Fixed-point core 01 = Memory/peripheral configuration PKG = Pkg designator (actual letters TBD) A = -40 to 85C (blank for 0 to 70C) 200 = Core CPU speed in Mhz

137 Homework 1. Functional Units
a. How many can perform an ADD? Name them. b. Which support memory loads/stores? .M .S .D .L 2. Memory Map a. How many external ranges exist on ‘C6201?

138 3. Conditional Code 4. Performance
a. Which registers can be used as cond’l registers? b. Which instructions can be conditional? 4. Performance a. What is the 'C6711 instruction cycle time? b. How can the 'C6711 execute 1200 MIPs?

139 5. Coding Problems a. Move contents of A0-->A1

140 5. Coding Problems a. Move contents of A0-->A1
b. Move contents of CSR-->A1 c. Clear register A5

141 5. Coding Problems (cont’d)
d. A2 = A02 + A1 e. If (B1  0) then B2 = B5 * B6 f. A2 = A0 * A1 + 10 g. Load an unsigned constant (19ABCh) into register A6.

142 5. Coding Problems (cont’d)
h. Load A7 with contents of mem1 and post-increment the selected pointer.


Download ppt "Lecture 1 Introduction."

Similar presentations


Ads by Google