Details .L and .S units TMS320C6000.

Slides:



Advertisements
Similar presentations
Intro to the “c6x” VLIW processor
Advertisements

Details.L and.S units TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004.
TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.
TMS320C6713 Assembly Language (cont’d). Module 1 Exam (solution) 1. Functional Units a. How many can perform an ADD? Name them. a. How many can perform.
Lecture 6 Programming the TMS320C6x Family of DSPs.
Assembly and Linear Assembly Evgeny Kirshin, 05/10/2011
TMS320 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Overview A.A
TMS320C6000 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Architectural Overview.
There are two types of addressing schemes:
TMS320C6000 Architectural and Programming Overview.
ITCS 3181 Logic and Computer Systems 2015 B. Wilkinson slides3.ppt Modification date: March 16, Addressing Modes The methods used in machine instructions.
TMS320C6000 Architectural Overview.  Describe C6000 CPU architecture.  Introduce some basic instructions.  Describe the C6000 memory map.  Provide.
2.3) Example of program execution 1. instruction  B25 8 Op-code B means to change the value of the program counter if the contents of the indicated register.
Execution of an instruction
Pipelined Processor II CPSC 321 Andreas Klappenecker.
Design and Synthesis of a RISC Stored-Program Machine
Computer Science 210 Computer Organization The Instruction Execution Cycle.
1 Computer Organization Today: First Hour: Computer Organization –Section 11.3 of Katz’s Textbook –In-class Activity #1 Second Hour: Test Review.
A Simple Computer Architecture Digital Logic Design Instructor: Kasım Sinan YILDIRIM.
Execution of an instruction
ajay patil 1 TMS320C6000 Assembly Language and its Rules Assignment One of the simplest operations in C is to assign a constant to a variable: One.
MICROPROCESSORS Dr. Hugh Blanton ENTC TMS320C6x INSTRUCTION SET.
INTRODUCTION TO THE TMS320C6x VLIW DSP Prof. Brian L. Evans in collaboration with Niranjan Damera-Venkata and Magesh Valliappan Embedded Signal Processing.
Computer Architecture Lecture 03 Fasih ur Rehman.
Represents different voltage levels High: 5 Volts Low: 0 Volts At this raw level a digital computer is instructed to carry out instructions.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
Lecture 2: Instruction Set Architecture part 1 (Introduction) Mehran Rezaei.
Computer Organization Instructions Language of The Computer (MIPS) 2.
8085 INTERNAL ARCHITECTURE.  Upon completing this topic, you should be able to: State all the register available in the 8085 microprocessor and explain.
Chapter 2 TMS320C6000 Architectural Overview. Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004 Chapter 2, Slide 2  Describe C6000 CPU.

Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.
Single Cycle CPU.
Single-Cycle Datapath and Control
TMS320C6713 Assembly Language
Lecture 1 Introduction.
Morgan Kaufmann Publishers The Processor
Processor Architecture: Introduction to RISC Datapath (MIPS and Nios II) CSCE 230.
Morgan Kaufmann Publishers The Processor
Single Clock Datapath With Control
Microcomputer Programming
Decode and Operand Read
Assembly Lang. – Intel 8086 Addressing modes – 1
Computer Science 210 Computer Organization
CS/COE0447 Computer Organization & Assembly Language
CS/COE0447 Computer Organization & Assembly Language
Computer Science 210 Computer Organization
Single-Cycle CPU DataPath.
CS/COE0447 Computer Organization & Assembly Language
Datapaths For the rest of the semester, we’ll focus on computer architecture: how to assemble the combinational and sequential components we’ve studied.
MICROPROCESSORS Dr. Hugh Blanton ENTC 4337.
Symbolic Instruction and Addressing
ECE232: Hardware Organization and Design
Chapter 10 Interrupts.
Rocky K. C. Chang 6 November 2017
Instruction encoding We’ve already seen some important aspects of processor design. A datapath contains an ALU, registers and memory. Programmers and compilers.
Instructions Instructions (referred to as micro-instructions in the book) specify a relatively simple task to be executed It is assumed that data are stored.
The Processor Lecture 3.2: Building a Datapath with Control
Chapter 12 Software Optimisation
Instruction encoding We’ve already seen some important aspects of processor design. A datapath contains an ALU, registers and memory. Programmers and compilers.
Superscalar and VLIW Architectures
ARM ORGANISATION.
Chapter 6 –Symbolic Instruction and Addressing
Review: The whole processor
Computer Concept and Practice
CS/COE0447 Computer Organization & Assembly Language
INTRODUCTION TO THE TMS320C6000 VLIW DSP
Instruction execution and ALU
Computer Operation 6/22/2019.
Presentation transcript:

Details .L and .S units TMS320C6000

Details .L and .S units .S .L OPERANDI OPERAZIONI ARITMETICO LOGICHE Data Memory .D .M .L A0 A1 A2 A3 A15 Register File A . . . a x prod 32-bits Y .S OPERAZIONI ARITMETICO LOGICHE General Purpose OPERANDI CO .U <? >, <?> , <?>

OPERANDS 32/40-bits Register, 5-bits Constant OPERANDS can be: 5-bit constants (or 16bit for MVKL and MVKH) 32-bit registers 40-bit Registers However, we have seen that registers are only 32-bit. So where do the 40-bit registers come from?

OPERANDS 40-bits Register A 40-bit register can be obtained by concatenating two registers There are 3 conditions that need to be respected: The registers must be from the same side. The first register must be even and the second odd. The registers must be consecutive.

OPERANDS 40-bits Register All combinations of 40-bit registers are shown below: A1:A0 A3:A2 A5:A4 A7:A6 A9:A8 A11:A10 A13:A12 A15:A14 odd even : 32 8 40-bit Reg B1:B0 B3:B2 B5:B4 B7:B6 B9:B8 B11:B10 B13:B12 B15:B14 odd even : 32 8 40-bit Reg

OPERANDS 32/40-bits Register, 5-bits Constant instr .unit < SRC >, < SRC >, < DST > 32-bit Reg 5-bit Const 32-bit Reg 40-bit Reg < src > < dst > .L or .S 32-bit Reg 40-bit Reg

Operands 32/40-bits Register, 5-bits Constant instr .L < SRC >, < SRC >, < DST > 32-bit Reg 40-bit Reg < src > 5-bit Const < dst > .L or .S

Operands 32/40-bits Register, 5-bits Constant instr .L < SRC >, < SRC >, < DST > 32-bit Reg 40-bit Reg < src > 5-bit Const < dst > .L or .S OR .L1 A0, A1, A2

Operands 32/40-bits Register, 5-bits Constant instr .L < SRC >, < SRC >, < DST > 32-bit Reg 40-bit Reg < src > 5-bit Const < dst > .L or .S OR .L1 A0, A1, A2 ADD .L2 -5, B3, B4

Operands 32/40-bits Register, 5-bits Constant instr .L < SRC >, < SRC >, < DST > 32-bit Reg 40-bit Reg < src > 5-bit Const < dst > .L or .S OR .L1 A0, A1, A2 ADD .L2 -5, B3, B4 ADD .L1 A2, A3, A5:A4

Operands 32/40-bits Register, 5-bits Constant instr .L < SRC >, < SRC >, < DST > 32-bit Reg 40-bit Reg < src > 5-bit Const < dst > .L or .S OR.L1 A0, A1, A2 ADD.L2 -5, B3, B4 ADD.L1 A2, A3, A5:A4 SUB.L1 A2, A5:A4, A5:A4

Operands 32/40-bits Register, 5-bits Constant instr .L < SRC >, < SRC >, < DST > 32-bit Reg 40-bit Reg < src > 5-bit Const < dst > .L or .S OR.L1 A0, A1, A2 ADD.L2 -5, B3, B4 ADD.L1 A2, A3, A5:A4 SUB.L1 A2, A5:A4, A5:A4 ADD.L2 3, B9:B8, B9:B8

Register to Register Data Transfer To move the content of a Register (A or B) to another register (B or A) use the move MV Instruction, e.g.: MV A0 , B0 or MV B6 , B7 To move the content of a Control Register to another register (A or B) or vice-versa use the MVC instruction, e.g.: MVC IFR , A0 or MVC A0 , IRP

Increasing the processing power TMS320C6000

Code Review (using side A only) 40 an xn n = 1 * Code Review (using side A only) MVK .S1 40, A2 ; A2 = 40, loop count loop: LDH .D1 *A5++, A0 ; A0 = a(n) LDH .D1 *A6++, A1 ; A1 = x(n) MPY .M1 A0, A1, A3 ; A3 = a(n) * x(n) ADD .L1 A4, A3, A4 ; Y = Y + A3 SUB .L1 A2, 1, A2 ; decrement loop count [A2] B .S1 loop ; if A2  0, branch STH .D1 A4, *A7 ; *A7 = Y Note: Assume that A4 was previously cleared and the pointers are initialised. Assume that A2 is B0

How can we add more processing power to this processor? Increasing the processing power! Data Memory .D .M .L A0 A1 A2 A3 A15 Register File A . . . 32-bits .S How can we add more processing power to this processor?

Increase the clock frequency Increasing the processing power! Data Memory .D .M .L A0 A1 A2 A3 A15 Register File A . . . 32-bits .S Increase the clock frequency Increase the number of Processing units

To increase the Processing Power, this processor has Two Sides Increasing the processing power! To increase the Processing Power, this processor has Two Sides Register File A .S1 .M1 .L1 .D1 A0 A1 A2 A3 A4 . . . A15 32-bits Register File B .S2 .M2 .L2 .D2 B0 B1 B2 B3 B4 . . . B15 32-bits Scambio di operandi Scambio di operandi Data Memory

Increasing the processing power! To exchange operands between the two sides, some CROSS PATH or LINKS are required What is a CROSS PATH? A Cross Path links one side of the CPU to the other There are two types of Cross Paths: DATA CROSS PATH ADDRESS CROSS PATH

Data Cross Paths Data cross paths can also be referred to as register file cross paths These cross paths allow operands from one side to be used by the other side There are only two cross paths: one path which conveys data from side B to side A, 1X one path which conveys data from side A to side B, 2X

TMS320C67x Data-Path DATA cross paths only apply to the .L, .S and .M units The data cross paths are very useful, however there are some limitations in their use.

Data Cross Path - Limitations 2x .L1 .M1 .S1 B 1x <src> <dst> (1) The Destination register must be on same side as unit (2) Source registers - up to ONE Cross Data Path per execute packet per Side. Execute packet: group of instructions that execute simultaneously.

Data Cross Path - Limitations 2x .L1 .M1 .S1 B 1x <src> <dst> ADD .L2x A0 , A1 , B2 MPY .M1x A0 , B6 , A9 SUB .S1x A8 , B2 , A8 || ADD .L1x A0 , B0 , A2 ||  Means that the SUB and ADD belong to the same fetch packet, therefore execute simultaneously. Not Valid !

Data Cross Path - Limitations 2x .L1 .M1 .S1 B 1x <src> <dst> .L2 .M2 .S2 SUB .S1x A8 , B2 , A8 || ADD .L2x A0 , A0 , B5

The pointer must be on the same side of the unit Address paths .D1 A Addr Data LDW .D1T1 *A0, A5 STW .D1T1 A5, *A0 The pointer must be on the same side of the unit

A B Address Cross Paths .D1 .D2 Data1 A5 DA1 = T1 *A0 LDW .D1T1 *A0,A5 LDW .D1T2 *A0,B5 B DA2 = T2 .D2 Data2 B5

Standard Parallel Loads Data1 A5 A .D1 DA1 = T1 *A0 .D2 B DA2 = T2 *B0 LDW .D1T1 *A0,A5 || LDW .D2T2 *B0,B5 B5

Parallel Load/Store using Address Cross Paths Data1 A5 .D1 DA1 = T1 *A0 .D2 B DA2 = T2 *B0 LDW .D1T2 *A0,B5 || STW .D2T1 A5,*B0 B5

Fill the blanks ... Does this work? Data1 .D1 DA1 = T1 *A0 .D2 B DA2 = T2 *B0 LDW .D1__ *A0,B5 || STW .D2__ B6,*B0

Not Allowed! Parallel accesses: both cross or neither cross Data1 .D1 *A0 .D2 B DA2 = T2 *B0 B5 B6 LDW .D1T2 *A0,B5 || STW .D2T2 B6,*B0

Conditions Don’t Use Cross Paths If a conditional register comes from the opposite side, it does NOT use a data or address cross-path. Examples: [B2] ADD .L1 A2,A0,A4 [A1] LDW .D2 *B0,B5

‘C6x Data-Path - Summary CPU Ref Guide Full CPU Datapath (Pg 2-2) ‘C67x

Cross Paths - Summary Data Address Conditionals Don’t Use Cross Paths. Destination register on same side as unit. Source registers - up to one cross path per execute packet per side. Use “x” to indicate cross-path. Address Pointer must be on same side as unit. Data can be transferred to/from either side. Parallel accesses: both cross or neither cross. Conditionals Don’t Use Cross Paths.