Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.

Slides:



Advertisements
Similar presentations
Intro to the “c6x” VLIW processor
Advertisements

TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.
Adding the Jump Instruction
TMS320C6713 Assembly Language (cont’d). Module 1 Exam (solution) 1. Functional Units a. How many can perform an ADD? Name them. a. How many can perform.
Lecture 6 Programming the TMS320C6x Family of DSPs.
Assembly and Linear Assembly Evgeny Kirshin, 05/10/2011
TMS320 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Overview A.A
TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.
DSP Module 1 Open Exam. Module 1 Exam You have 20 minutes to complete the exam. Exam is open mind, open book, open eyes. Sharing answers, cheating, asking.
The LC-3 – Chapter 6 COMP 2620 Dr. James Money COMP
TMS320C6000 Architectural and Programming Overview.
ITCS 3181 Logic and Computer Systems 2015 B. Wilkinson slides3.ppt Modification date: March 16, Addressing Modes The methods used in machine instructions.
TMS320C6000 Architectural Overview.  Describe C6000 CPU architecture.  Introduce some basic instructions.  Describe the C6000 memory map.  Provide.
2.3) Example of program execution 1. instruction  B25 8 Op-code B means to change the value of the program counter if the contents of the indicated register.
ECSE DSP architecture Review of basic computer architecture concepts C6000 architecture: VLIW Principle and Scheduling Addressing Assembly and linear.
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
Pipelining II Andreas Klappenecker CPSC321 Computer Architecture.
Execution of an instruction
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
CMPUT Computer Organization and Architecture II1 CMPUT229 - Fall 2003 TopicE: Building a Data Path and a Control Path for a Microprocessor José Nelson.
Computer Structure - Datapath and Control Goal: Design a Datapath  We will design the datapath of a processor that includes a subset of the MIPS instruction.
The Processor 2 Andreas Klappenecker CPSC321 Computer Architecture.
Pipelined Processor II CPSC 321 Andreas Klappenecker.
Design and Synthesis of a RISC Stored-Program Machine
Computer Science 210 Computer Organization The Instruction Execution Cycle.
Chapter 5 Basic Processing Unit
1 Computer Organization Today: First Hour: Computer Organization –Section 11.3 of Katz’s Textbook –In-class Activity #1 Second Hour: Test Review.
CPU Design. Introduction – The CPU must perform three main tasks: Communication with memory – Fetching Instructions – Fetching and storing data Interpretation.
Execution of an instruction
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CMPE 421 Parallel Computer Architecture
ajay patil 1 TMS320C6000 Assembly Language and its Rules Assignment One of the simplest operations in C is to assign a constant to a variable: One.
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
MICROPROCESSORS Dr. Hugh Blanton ENTC TMS320C6x INSTRUCTION SET.
Computer Architecture Lecture 03 Fasih ur Rehman.
Represents different voltage levels High: 5 Volts Low: 0 Volts At this raw level a digital computer is instructed to carry out instructions.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
Computer Organization Instructions Language of The Computer (MIPS) 2.
8085 INTERNAL ARCHITECTURE.  Upon completing this topic, you should be able to: State all the register available in the 8085 microprocessor and explain.
Central Processing Unit Decode Cycle. Central Processing Unit Current Instruction Register (CIR) I1 The fetch cycle has transferred an instruction from.
MIPS Processor.
ECSE436 Tutorial Assembly and Linear Assembly Laurier Boulianne.
Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.
Single-Cycle Datapath and Control
TMS320C6713 Assembly Language
Morgan Kaufmann Publishers The Processor
Single Clock Datapath With Control
Microcomputer Programming
Details .L and .S units TMS320C6000.
Assembly Lang. – Intel 8086 Addressing modes – 1
Computer Science 210 Computer Organization
CS/COE0447 Computer Organization & Assembly Language
Computer Science 210 Computer Organization
Computer Organization “Central” Processing Unit (CPU)
Computer Architecture and Design Lecture 6
Topic 6 LC-3.
MICROPROCESSORS Dr. Hugh Blanton ENTC 4337.
MIPS Processor.
Figure 8.1 Architecture of a Simple Computer System.
Instruction encoding We’ve already seen some important aspects of processor design. A datapath contains an ALU, registers and memory. Programmers and compilers.
The Processor Lecture 3.2: Building a Datapath with Control
Instruction encoding We’ve already seen some important aspects of processor design. A datapath contains an ALU, registers and memory. Programmers and compilers.
Control units In the last lecture, we introduced the basic structure of a control unit, and translated our assembly instructions into a binary representation.
ARM ORGANISATION.
Review: The whole processor
Instruction execution and ALU
Computer Operation 6/22/2019.
MIPS Processor.
Presentation transcript:

Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004

2 Let us have a look at the final details concerning the functional units. Consider first the case of the.L and.S units. Details.L and.S units

3 So where do the 40-bit registers come from? Operands 32/40-bits Register, 5-bits Constant Operands can be: Operands can be: 5-bit constants (or 16bit for MVKL and MVKH). 5-bit constants (or 16bit for MVKL and MVKH). 32-bit registers. 32-bit registers. 40-bit Registers. 40-bit Registers. However, we have seen that registers are only 32bit. However, we have seen that registers are only 32bit.

4 A 40-bit register can be obtained by concatenating two registers. A 40-bit register can be obtained by concatenating two registers. However, there are 3 conditions that need to be respected: However, there are 3 conditions that need to be respected: The registers must be from the same side. The registers must be from the same side. The first register must be even and the second odd. The first register must be even and the second odd. The registers must be consecutive. The registers must be consecutive. Operands 32/40-bits Register, 5-bits Constant

5 A1:A0A3:A2A5:A4A7:A6A9:A8A11:A10A13:A12A15:A14 odd even : bit Reg 40-bit RegB1:B0B3:B2B5:B4B7:B6B9:B8B11:B10B13:B12B15:B14 odd even : 32 8 All combinations of 40-bit registers are shown below: All combinations of 40-bit registers are shown below: Operands 32/40-bits Register, 5-bits Constant

6 32-bit Reg 40-bit Reg 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg.L or.S instr.unit,, instr.unit,, Operands 32/40-bits Register, 5-bits Constant

7 instr.unit,, instr.unit,, 32-bit Reg 40-bit Reg 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg.L or.S Operands 32/40-bits Register, 5-bits Constant

8 OR.L1 A0, A1, A2 instr.unit,, instr.unit,, 32-bit Reg 40-bit Reg 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg.L or.S Operands 32/40-bits Register, 5-bits Constant

9 OR.L1 A0, A1, A2 ADD.L2 -5, B3, B4 instr.unit,, instr.unit,, 32-bit Reg 40-bit Reg 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg.L or.S Operands 32/40-bits Register, 5-bits Constant

10 OR.L1 A0, A1, A2 ADD.L2 -5, B3, B4 ADD.L1 A2, A3, A5:A4 instr.unit,, instr.unit,, 32-bit Reg 40-bit Reg 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg.L or.S Operands 32/40-bits Register, 5-bits Constant

11 OR.L1 A0, A1, A2 ADD.L2 -5, B3, B4 ADD.L1 A2, A3, A5:A4 SUB.L1 A2, A5:A4, A5:A4 instr.unit,, instr.unit,, 32-bit Reg 40-bit Reg 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg.L or.S Operands 32/40-bits Register, 5-bits Constant

12 OR.L1 A0, A1, A2 ADD.L2 -5, B3, B4 ADD.L1 A2, A3, A5:A4 SUB.L1 A2, A5:A4, A5:A4 ADD.L2 3, B9:B8, B9:B8 instr.unit,, instr.unit,, 32-bit Reg 40-bit Reg 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg.L or.S Operands 32/40-bits Register, 5-bits Constant

13 To move the content of a register (A or B) to another register (B or A) use the move MV Instruction, e.g.: To move the content of a register (A or B) to another register (B or A) use the move MV Instruction, e.g.: MV A0, B0 MV B6, B7 To move the content of a control register to another register (A or B) or vice-versa use the MVC instruction, e.g.: To move the content of a control register to another register (A or B) or vice-versa use the MVC instruction, e.g.: MVC IFR, A0 MVC A0, IRP Register to register data transfer

Increasing the processing power TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004

15 Y = 40  a n x n n = 1 * Code Review (using side A only) MVK.S140, A2; A2 = 40, loop count loop:LDH.D1*A5++, A0; A0 = a(n) LDH.D1*A6++, A1; A1 = x(n) MPY.M1A0, A1, A3; A3 = a(n) * x(n) ADD.L1A4, A3, A4; Y = Y + A3 SUB.L1A2, 1, A2; decrement loop count [A2]B.S1loop; if A2  0, branch [A2]B.S1loop; if A2  0, branch STH.D1A4, *A7; *A7 = Y Note: Assume that A4 was previously cleared and the pointers are initialised. Assume thatA2 is B0 Assume that A2 is B0

16 How can we add more processing power to this processor?.S1.S1.M1.M1.L1.L1.D1.D1 A0 A1 A2 A3 A4 Register File A Data Memory A15 32-bits Increasing the processing power!

17 (1)Increase the clock frequency. Increasing the processing power! (2)Increase the number of Processing units..S1.S1.M1.M1.L1.L1.D1.D1 A0 A1 A2 A3 A4 Register File A Data Memory A15 32-bits

18 To increase the Processing Power, this processor has two sides (A and B or 1 and 2) Data Memory.S1.M1.L1.D1 A0A0A0A0 A1A1A1A1 A2A2A2A2 A3A3A3A3 A4A4A4A4 Register File A A15 32-bits.S2.M2.L2.D2 B0B0B0B0 B1B1B1B1 B2B2B2B2 B3B3B3B3 B4B4B4B4 Register File B B15 32-bits

19 Data Memory.S1.S1.M1.M1.L1.L1.D1.D1 A0 A1 A2 A3A3A3A3 A4 Register File A A15 32-bits.S2.S2.M2.M2.L2.L2.D2.D2 B0 B1 B2 B3 B4 Register File B B15 32-bits Can the two sides exchange operands in order to increase performance?

20 The answer is YES but there are limitations  To exchange operands between the two sides, some cross paths or links are required. What is a cross path?  A cross path links one side of the CPU to the other.  There are two types of cross paths:  Data cross paths.  Address cross paths.

21 Data Cross Paths  Data cross paths can also be referred to as register file cross paths.  These cross paths allow operands from one side to be used by the other side.  There are only two cross paths:  one path which conveys data from side B to side A, 1X.  one path which conveys data from side A to side B, 2X.

22 TMS320C67x Data-Path  Data cross paths only apply to the.L,.S and.M units.  The data cross paths are very useful, however there are some limitations in their use.

23 Data Cross Path - Limitations (1) The destination register must be on same side as unit. (2) Source registers - up to one cross path per execute packet per side. Execute packet: group of instructions that execute simultaneously. A 2x.L1.M1.S1 B 1x <src> <src> <dst>

24 Data Cross Path - Limitations A 2x.L1.M1.S1 B 1x <src> <src> <dst> eg: ADD.L2x A0,A1,B2 MPY.M1x A0,B6,A9 SUB.S1x A8,B2,A8 || || ADD.L1x A0,B0,A2 || Means that the SUB and ADD belong to the same fetch packet, therefore execute simultaneously.

25 Data Cross Path - Limitations eg: ADD.L2x A0,A1,B2 MPY.M1x A0,B6,A9 SUB.S1x A8,B2,A8 || ADD.L1x A0,B0,A2 NOT VALID! A 2x.L1.M1.S1 B 1x <src> <src> <dst>

26 Data Cross Paths for both sides A 2x.L1.M1.S1 B 1x <src> <src> <dst>.L2.M2.S2 <dst> <src> <src>

27 Address cross paths.D1 A Addr Data LDW.D1T1 *A0,A5 STW.D1T1 A5,*A0 LDW.D1T1 *A0,A5 STW.D1T1 A5,*A0 (1) The pointer must be on the same side of the unit.

28 Load or store to either side.D1 A *A0 B Data1A5 Data2B5 DA1 = T1 DA2 = T2 LDW.D1T1 *A0,A5 LDW.D1T2 *A0,B5 LDW.D1T1 *A0,A5 LDW.D1T2 *A0,B5

29 Standard Parallel Loads.D1 A A5 *A0 B B5.D2 Data1 *B0 LDW.D1T1 *A0,A5 LDW.D1T1 *A0,A5 || LDW.D2T2 *B0,B5 LDW.D1T1 *A0,A5 LDW.D1T1 *A0,A5 || LDW.D2T2 *B0,B5 DA1 = T1 DA2 = T2

30 Parallel Load/Store using address cross paths.D1 A A5 *A0 B B5.D2 Data1 *B0 LDW.D1T2 *A0,B5 LDW.D1T2 *A0,B5 || STW.D2T1 A5,*B0 LDW.D1T2 *A0,B5 LDW.D1T2 *A0,B5 || STW.D2T1 A5,*B0 DA1 = T1 DA2 = T2

31 Fill the blanks... Does this work?.D1 A *A0 B.D2 Data1 *B0 LDW.D1__ *A0,B5 LDW.D1__ *A0,B5 || STW.D2__ B6,*B0 LDW.D1__ *A0,B5 LDW.D1__ *A0,B5 || STW.D2__ B6,*B0 DA1 = T1 DA2 = T2

32 Not Allowed! Parallel accesses: both cross or neither cross.D1 A *A0 B B5B6.D2 Data1 *B0 LDW.D1T2 *A0,B5 LDW.D1T2 *A0,B5 || STW.D2T2 B6,*B0 LDW.D1T2 *A0,B5 LDW.D1T2 *A0,B5 || STW.D2T2 B6,*B0 DA2 = T2

33 Conditions Don’t Use Cross Paths If a conditional register comes from the opposite side, it does NOT use a data or address cross-path. If a conditional register comes from the opposite side, it does NOT use a data or address cross-path. Examples: [B2] ADD.L1 A2,A0,A4 [A1] LDW.D2 *B0,B5 Examples: [B2] ADD.L1 A2,A0,A4 [A1] LDW.D2 *B0,B5

34 ‘C6x Data-Path - Summary CPU Ref Guide Full CPU Datapath (Pg 2-2) ‘C67x

35 Cross Paths - Summary Data Data Destination register on same side as unit. Destination register on same side as unit. Source registers - up to one cross path per execute packet per side. Source registers - up to one cross path per execute packet per side. Use “x” to indicate cross-path. Use “x” to indicate cross-path. Address Address Pointer must be on same side as unit. Pointer must be on same side as unit. Data can be transferred to/from either side. Data can be transferred to/from either side. Parallel accesses: both cross or neither cross. Parallel accesses: both cross or neither cross. Conditionals Don’t Use Cross Paths. Conditionals Don’t Use Cross Paths.