Microcomputer Systems 1

Microcomputer Systems 1
Digital Systems: Hardware Organization and Design 9/18/2018 Microcomputer Systems 1 Blackfin BF533/2/1 DSP’s Architecture Architecture of a Respresentative 32 Bit Processor

Blackfin ADSP-BF533/2/1 Architecture Overview
Digital Systems: Hardware Organization and Design 9/18/2018 Blackfin ADSP-BF533/2/1 Architecture Overview Core Registers ALU, MAC, Shifter Data Addressing Modes Program Sequencer Event Controller Peripherals Instruction Set Overview Memory Architecture Cache 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Blackfin Architecture Details
Digital Systems: Hardware Organization and Design 9/18/2018 Blackfin Architecture Details Blackfin Core Details Registers ALU, MAC, Shifter Sequencer, Pipeline, Event Controller Blackfin Memory Memory Architecture Cache Peripherals General Peripherals: Parallel Peripheral Interface (PPI) Serial Ports (SPORTs) Serial Peripheral Interface (SPI) General Purpose Timers Universal Asynchronous Receiver Transmitter (UART) Twin-Wire Interface (TWI) Real Time Clock (RTC) Watchdog Timer (WDT) Ethernet, CAN DMA Peripherals listed in Blue are included in BF531/532/533 Family 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design
9/18/2018 Features Integrated instruction set architecture Single instruction set for signal processing and control Programmable interrupt levels Real-time tasks get the highest priority level Memory protection with an MMU Regions of memory can be protected from access Networked peripherals in addition high speed connectivity to ADC, DAC and video peripherals Unified address space and byte addressable Support for User and Supervisor modes Robust ALU including both signal processing functions as well as traditional MPC/MPU functions 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 ADSP-BF533 Processor 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Processor Core Architecture
Digital Systems: Hardware Organization and Design 9/18/2018 Processor Core Architecture 2 16-bit Multipliers 2 40-bit Accumulators 2 40-bit Arithmetic Logic Units (ALU) 1 40-bit Shifter 4 8-bit Video ALUs Data Arithmetic Unit Performs Operations on 8-bit, 16-bit, & 32-bit data From Register File: 8 32-bit Registers R0-R7 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Computational Units 2 ALUs 2 MAC Shifter Set of Video ALUs Architecture of a Respresentative 32 Bit Processor

9/18/2018 Computational Units Data Arithmetic Unit LD0 32-bits 16 16 8 8 8 8 R7 R7.H R7.L LD1 32-bits R6 R6.H R6.L R5 R5.H R5.L R4 R4.H R4.L barrel 40 40 SD 32-bits R3 shifter R3.H R3.L R2 R2.H R2.L R1 R1.H R1.L A0 A1 R0 R0.H R0.L 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Data Formats ADSP-BF53x processors are primarily: 16-bit Fixed-point machines. Most operations are in Two’s complement number representation, others are Unsigned numbers, and Binary strings Additional instructions support 32-bit integer arithmetic 8-bit arithmetic and block floating point. For futher details in number format see Appendix D, “Numeric Formats” of Hardware Reference Manual for BF53x family. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Binary String 16 bits treated as a bit pattern Computations using this format: NOT AND OR XOR ALU operations on this data type are treated with no provision for sign bit or binary point placement. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Binary Numbers Numbers represented as 16/32 bits: 216=65,536 or 232=4,294,967,296 bit patterns Unsigned Integer Format Stored Value: 16-bit: 0..65,536 or 32-bit: 0..4,294,967,296 Signed Integer Format: Two’s Complement Stored Value: 16-bit: -32, ,767 or 32-bit: -2,147,483,648..2,147,483647 Most ADSP-BF53x processors operations presume or support two’s complement arithmetic. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Binary Numbers Fractional Representation: 1.15 1 sign bit (MSB) 15 fractional bits represent values -1 to 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Blackfin Fixed-Point Representation
Digital Systems: Hardware Organization and Design 9/18/2018 Blackfin Fixed-Point Representation ADI DSP’s including Blackfin use Fractional Format Representation; 16-bit Example in 1.15 Format: MSB LSB -20 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-8 2-9 2-10 2-11 2-12 2-13 2-14 2-15 HEX BINARY DECIMAL 7FFF 0111 1111 0001 0000 FFFF 8000 1000 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Register Files Blackfin processors are register-intensive devices All computations are performed on data contained in registers All peripherals are setup using registers Memory is accessed using pointers in address registers There are two types of Blackfin processor registers Core registers Memory-Mapped Registers (MMRs) 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Blacfin Core Registers
Digital Systems: Hardware Organization and Design 9/18/2018 Blacfin Core Registers Pointer Register File Data Registers 31 39 31 15 P0 Address Registers A0X A0.H A0.L P1 A1X A1.H A0.L P2 P3 31 15 R0 P4 R0.H R0.L P5 R1 R1.H R1.L FP R2 SP R3 USP R4 R4.H R4.L 31 31 31 31 Data Register File R5 I0 L0 B0 M0 I1 L1 B1 M1 R6 I2 L2 B2 M2 R7 R7.H R7.L I3 L3 B3 M3 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Register Files of Computational Unit
Digital Systems: Hardware Organization and Design 9/18/2018 Register Files of Computational Unit Data Register File Receives operands from data buses Stores Computational Unit results. Pointer Register File Contains pointers for addressing operations Data Address Generator (DAG) registers. Dedicated registers to manage zero-overhead circular buffers for DSP operations. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Blackfin Core Registers
Digital Systems: Hardware Organization and Design 9/18/2018 Blackfin Core Registers Data Registers: R0-R7 R0.L & R0.H 32 bit register R0 may be used as 2 independent register halves. Three 32-bit buses connect Register File to the L1 data memory 2 Read Buses 1 Write Bus Transfer between Data Register File and Data Memory can move up to 4 16-bit words in each cycle. Accumulator Registers: A0, A1 2 40-bit accumulator registers: An.L, An.H each 16-bit  An.W lower 32-bit An.X 8-bit extension, or An 40-bit register. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Blackfin Core Registers Pointer Registers: 32-bit 6-entry, P register files P[0:5], Frame Pointers FP Used to point to the current procedure’s activation record. Stack Pointer Registers SP Used to point to the last used location on the runtime stack Note: P-registers are 32 bits wide. Although P-registers are primarily used for address calculations, they may also be used for general integer arithmetic with a limited set of arithmetic operations; for instance, to maintain counters. However, unlike the Data registers, P-register arithmetic does not affect the Arithmetic Status (ASTAT) register status flags. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Blackfin Core Registers DSP instructions primarily use the Data Address Generator (DAG) 32-bit wide register set for addressing. The DAG register set consists of these registers: I[3:0] contain Index addresses Contains address of a 8-bit byte location in memory that is an effective address M[3:0] contain Modify values Contains an offset value that is added or subtracted to/from one of the Index registers. B[3:0] contain Base addresses Contains address of a 8-bit byte location in memory. L[3:0] contain Length values With B register it defines a circular buffer. B starting address, L length of the buffer in bytes. I,B,L have corresponding associations (e.g., I0  B0 & L0) M register may be associated with any I register. Core registers are accessed directly by name 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Register File Instruction Summary
Digital Systems: Hardware Organization and Design 9/18/2018 Register File Instruction Summary Allreg denotes: R[7:0], P[5:0], SP, FP, I[3:0], M[3:0],B[3:0], L[3:0], A0.X, A0.W, A1.X, A1.W, ASTAT, RETS, RETI,RETX, RETN, RETE, LC[1:0], LT[1:0], LB[1:0], USP, SEQSTAT, SYSCFG, CYCLES, and CYCLES2. An denotes either ALU Result register A0 or A1. Dreg denotes any Data Register File register. Sysreg denotes the system registers: ASTAT, SEQSTAT, SYSCFG, RETI, RETX, RETN, RETE, or RETS, LC[1:0], LT[1:0], LB[1:0], CYCLES, and CYCLES2. Preg denotes any Pointer register, FP, or SP register. Dreg_even denotes R0,R2,R4, or R6. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Register File Instruction Summary
Digital Systems: Hardware Organization and Design 9/18/2018 Register File Instruction Summary Dreg_odd denotes R1,R3,R5, or R7. DPreg denotes any Data Register File register or any Pointer register, FP, or SP register. Dreg_lo denotes the lower 16 bits of any Data Register File register. Dreg_hi denotes the upper 16 bits of any Data Register File register. An.L denotes the lower 16 bits of Accumulator A0.W or A1.W. An.H denotes the upper 16 bits of Accumulator A0.W or A1.W. Dreg_byte denotes the low order 8 bits of each Data register. Option (X) denotes sign extended. Option (Z) denotes zero extended. * Indicates the flag may be set or cleared, depending on the result of the instruction. ** Indicates the flag is cleared. – Indicates no effect. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Data Types ADSP-BF53x Processor supports 32-bit words 16-bit half words, 8-bit bytes. 32 and 16 bit data can be Integer Fractional Bytes are integers only 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Data Types Some instructions manipulate data in the registers by: Zero-extending unsigned data Sign-extending signed: 16-bit and 8-bit data. Other instructions manipulate data as 32-bit numbers: 2 16-bit or 4 8-bit data can be manipulated as 32-bit value. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Supported Data Formats
Digital Systems: Hardware Organization and Design 9/18/2018 Supported Data Formats Format Representation in Memory Representation in 32-bit Register 32.0 Unsigned Word dddd dddd dddd dddd dddd dddd dddd dddd 32.0 Signed Word sddd dddd dddd dddd dddd dddd dddd dddd 16.0 Unsigned Half Word dddd dddd dddd dddd dddd dddd dddd dddd 16.0 Signed Half Word sddd dddd dddd dddd ssss ssss ssss ssss sddd dddd dddd dddd 8.0 Unsigned Byte dddd dddd dddd dddd 8.0 Signed Byte sddd dddd ssss ssss ssss ssss ssss ssss sddd dddd 0.16 Unsigned Fraction .dddd dddd dddd dddd dddd dddd dddd dddd 1.16 Signed Fraction s.ddd dddd dddd dddd ssss ssss ssss ssss s.ddd dddd dddd dddd 0.32 Unsigned Fraction .dddd dddd dddd dddd dddd dddd dddd dddd 1.32 Signed Fraction s.ddd dddd dddd dddd dddd dddd dddd dddd 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Supported Data Formats
Digital Systems: Hardware Organization and Design 9/18/2018 Supported Data Formats Format Representation in Memory Representation in 32-bit Register Packed 8.0 Unsigned Byte dddd dddd dddd dddd dddd dddd dddd dddd Packed 0.16 Unsigned Fraction .dddd dddd dddd dddd .dddd dddd dddd dddd Packed 1.15 Signed Fraction s.ddd dddd dddd dddd s.ddd dddd dddd dddd Endianess Internal and External Memory is accessed in little endian byte order (Intel Chips). 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 ALU Data Types ALU Operations treat operands and results as either: 16-bit 32-bit binary strings Except Signed Division Primitive (DIVS) ALU result Status bits treat the results as signed indicated status with: Overflow flags (AV0 & AV1) Negative flag (AN) Each ALU has its own sticky overflow flag (AV0S & AV1S) The bits can be cleared only by writing directly to ASTAT register. The logic of overflow bits (V, VS, AV0, AV0S, AV1, AV1S) is based on two’s complement arithmetic. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 ALU Data Types Example: Adding two positive/negative numbers must generate positive/negative result; Change in sign bit of the result (to negative/positive) indicates an overflow and sets AVn overflow flag. Logic of carry bits (AC0 & AC1) is based on unsigned magnitude arithmetic. The bit is set if carry is generated from bit 16 (the MSB) The carry bits (AC0, AC1) are most useful for the lower word portions of a multiword operation. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Multiplier Data Types Multipliers produce results that are binary strings Inputs are interpreted according to the information provided in the instruction itself: Signed x Signed Unsigned x Unsigned Signed x Unsigned and vice versa Rounding Operation Output is always 32-bit signed value. Sign extended across the full 40-bit width of the A0 & A1 registers. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Multiplier Data Types Supported Formats: Fraction Mode 1.15 fractional operands - 1 sign and 15 fractional bits Integer mode for integer operands (16.0 format) Input: operands Output: 2.30 format (2 sign bits and 30 fractional bits) Multiplier shifts the resulting product left by one bit before transferring the result to the multiplier result register (A0 or A1). Shifting causes the multiplier result to be in 1.31 format 1.31 result can be rounded to 1.15 format. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Multiplier Data Types Integer Mode Left shift does not occur Inputs: 16.0 Output: 32.0 Left shift would change the numerical representation and thus is not needed 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Shifter Data Types Operations of the shifter explicitly geared to: Signed two’s complement values (arithmetic shifts) Unsigned values (logical shifts) Exponent logic – assumes two’s complement numbers Support for block floating point of Two’s complement values Two’s complement fractional values 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Arithmetic Formats Summary
Digital Systems: Hardware Organization and Design 9/18/2018 Arithmetic Formats Summary ALU Arithmetic Formats Operation Operand Format Result Format Addition Signed or Unsigned Interpret flags Subtraction Logical Binary string Same as operand Division Explicitly signed or unsigned 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Arithmetic Formats Summary Multiplier Fractional Modes Formats Operation Operand Format Result Format Multiplication 1.15 explicitly Signed or Unsigned 2.30 shifted to 1.31 Multiplication/Addition Multiplication/Subtraction 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Arithmetic Formats Summary Multiplier Arithmetic Integer Modes Formats Operation Operand Format Result Format Multiplication 16.0 explicitly Signed or Unsigned 32.0 not shifted Multiplication/Addition Multiplication/Subtraction 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Arithmetic Formats Summary Shifter Arithmetic Formats Operation Operand Format Result Format Logic Shift Unsigned binary string Same as operand Arithmetic Shift Signed Exponent Detect 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Multiplier Integer & Fractional Format Usage
Digital Systems: Hardware Organization and Design 9/18/2018 Multiplier Integer & Fractional Format Usage Multiply-Add Computation (MAC) The output of fractional or integer operations of the multiplier is fed into a 40-bit adder/subtracter The resulting product is added/subtracted with the current contents of the A0 or A1 register to produce final 40-bit result Rounding Multiplier Results Support for result rounding (RND option) RND_MOD bit in ASTAT register determines whether the RND option provides biased on unbiased rounding. Biased: RND_MOD = 1 Unbiased: RND_MOD = 0 (preferred for most ops) 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Multiplier Integer & Fractional Format Usage – Unbiased Rounding
Digital Systems: Hardware Organization and Design 9/18/2018 Multiplier Integer & Fractional Format Usage – Unbiased Rounding Unbiased Rounding The convergent rounding method return the number closest to the original. In cases where the original lies exactly halfway, it returns the nearest even number Rounding can be specifies as part of the instruction code When rounding is selected The output register contains the rounded result Accumulator is never rounded. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Multiplier Integer & Fractional Format Usage Net Bias Removal Conventional method of biased rounding adds a 1 into bit position 15 of the adder chain. Causes net positive bias because the midway value is always rounded upward. This bias is removed by forcing bit 16 in the result output to 0 when midway point is detected. Forcing bit 16 to 0 has the effect of rounding A0.L or A1.L values odd ➨ upward, and even ➨ downward 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Multiplier Integer & Fractional Format Usage Typical Unbiased Multiplier Rounding 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Multiplier Integer & Fractional Format Usage Avoiding Net Bias in Unbiased Multiplier Rounding 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Multiplier Integer & Fractional Format Usage – Biased Rounding
Digital Systems: Hardware Organization and Design 9/18/2018 Multiplier Integer & Fractional Format Usage – Biased Rounding Biased Rounding The round-to-nearest method returns the number closest to the original. Half-way numbers would always round up to larger of the two. When A0.L or A1.L has value of 0x8000, rather then only rounding odd values biased rounding rounds all values up. Because the rounding effect is to round up, this method is called biased rounding. RND_MOD =1 in the ASTAT register enables biased rounding. The following table compares the results of biased and unbiased rounding. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Biased Rounding in Multiplier Operation
Digital Systems: Hardware Organization and Design 9/18/2018 Biased Rounding in Multiplier Operation A0/A1 Before RND Biased RND Result Unbiased RND Result 0x 0x 0x 0x 0x 0x 0x 0x 0x FFF 0x FFFF 0x FFF 0x FFFF 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Biased Rounding Biased Rounding useful in Global System for Mobile Communications (GSM) to implement speech compression algorithms. This algorithm requires biased rounding Biased rounding is more efficient and thus beneficial in this case. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Multiplier Integer & Fractional Format Usage – Truncation
Digital Systems: Hardware Organization and Design 9/18/2018 Multiplier Integer & Fractional Format Usage – Truncation Truncation Another method of reducing the significant bits representing a number by masking off the lower (N-M) bits. Truncation results in a relatively large bias. Instructions that do not support rounding revert to truncation. RND_MOD bit in ASTAT register has no effect on truncation. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Special Rounding Instructions
Digital Systems: Hardware Organization and Design 9/18/2018 Special Rounding Instructions ALU provides the ability to round the arithmetic results directly into data registers: Biased Unbiased, or Different bit boundaries. Options: RND12, RND, and RND20 Extract resulting 16-bit values from bit 12, 16 and bit 20 respectively. Perform biased rounding regardless of the state of RND_MOD bit in ASTAT register. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Special Rounding Instructions
Digital Systems: Hardware Organization and Design 9/18/2018 Special Rounding Instructions Examples: R3.L = R4(RND); Biased rounding at bit 16 storing the result in a half word in data register R3. R3.L = R4 + R5 (RND12); Addition of two 32-bit numbers, biased rounding at bit 12 depositing the result in a half word. R3.L = R4 + R5 (RND22); Addition of two 32-bit numbers, biased rounding at bit 20 depositing the result in a half word. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Using Computational Status
Digital Systems: Hardware Organization and Design 9/18/2018 Using Computational Status Multiplier, ALU & Shifter update status flags in the processor’s Arithmetic Status (ASTAT) register. Conditional Instructions test the CC flag in the ASTAT register after execution of an instruction to test for status conditions. Monitoring of each instruction’s outcome. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 ASTAT Register 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Arithmetic Logic Units
Digital Systems: Hardware Organization and Design 9/18/2018 Arithmetic Logic Units Architecture of a Respresentative 32 Bit Processor

Arithmetic Logic Unit (ALU)
Digital Systems: Hardware Organization and Design 9/18/2018 Arithmetic Logic Unit (ALU) Data Arithmetic Unit LD0 32-bits 16 16 8 8 8 8 R7 R7.H R7.L LD1 32-bits R6 R6.H R6.L R5 R5.H R5.L R4 R4.H R4.L barrel 40 40 SD 32-bits R3 shifter R3.H R3.L R2 R2.H R2.L R1 R1.H R1.L A0 A1 R0 R0.H R0.L 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Arithmetic Logic Unit (ALU) Two 40-bit ALUs perform: Arithmetic Logical Fixed-point data Input data: 16-bit, 32-bit, and 40-bit Output data: 40-bit results. Functions Fixed-point addition and subtraction from registers Addition and subtraction of immediate values Accumulation and subtraction of multiplier results Logical AND, OR, NOT, XOR, bitwise XOR (LFSR), Negate Functions: ABS, MAX, MIN, Round, division primitives Supports conditional instructions Four 8-bit video ALUs 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Arithmetic Logic Unit (ALU) Each 40-bit ALU operations support the following inputs and outputs: Combining operations in both ALU can result in: Four 16-bit results Two 32-bit results Two 40-bit results obtained in a single instruction. Input Output Two or Four 16-bit operands One or two 16-bit results Two 32-bit operands One 32-bit result 32-bit result from the multiplier Combination of 32-bit result from the multiplier with a 40-bit accumulation result 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Single 16-Bit Operations
Digital Systems: Hardware Organization and Design 9/18/2018 Single 16-Bit Operations In a single 16-bit operation: Any 2 16-bit register halves may be used as input to the ALU Addition, Subtraction or Logical operation produces a 16-bit result Result is stored into an arbitrary destination register half. Since ALU0 is primary ALU it is used for these operations. R3.H = R1.H + R2.L (NS); Adds the 16-bit contents of R1.H to the contents of R2.L and stores the result in R3.H with no saturation 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Dual 16-Bit Operations In dual 16-bit operations: Any two 32-bit registers may be used as input to the ALU, considered as pairs of 16-bit operands Addition, Subtraction or Logical operation produces a two 16-bit results Result is stored into an arbitrary destination (32-bit) register. Since ALU0 is primary ALU it is used for these operations. R3 = R1 +|- R2 (S); Adds the 16-bit contents of R2.H to the contents of R1.H and stores the result in R3.H with saturation, AND Subtracts the 16-bit contents of R2.L from the contents of R1.L and stores the result in R3.L with saturation. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Dual 16-Bit Operations H + H, L + L H + H, L - L H - H, L + L H - H, L - L C D A 31 B 31 31 31 Rm Rm Rm Rm Rn Rn Rn Rn Rn Rn Rn Rn 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Dual 16-Bit Cross Options
Digital Systems: Hardware Organization and Design 9/18/2018 Dual 16-Bit Cross Options For dual 16-bit operations the results may be crossed. This is particularly useful in calculations of FFT that uses a basic “butterfly configuration”. 31 Rm Rn Rn 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Quad 16-Bit Operations In quad 16-bit operations: Any two 32-bit registers may be used as input to the ALU0 and ALU1, considered as pairs of 16-bit operands Addition, Subtraction or Logical operation produces a four 16-bit results Result is stored into two arbitrary destination (32-bit) registers. Since there are only two 32-bit data paths from Data Register File to ALU’s: The same two pairs of 16-bit inputs are presented to ALU0 and ALU1 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Quat 16-Bit Operations Example R3 = R0 +|+ R1, R2=R0 -|- R1 (S); Adds the 16-bit contents of R1.H to the 16-bit contents of R0.H and result is stored in R3.H with saturation Adds R1.L to R0.L and results is stored in R3.L with saturation Subtracts R1.H from R0.H and stores the result to R2.H with saturation Subtracts R1.L from R0.L and stores the result to R2.L with saturation 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Quat 16-Bit Operations The four explicit equivalent instruction are: R3.H = R0.H + R1.H (S); R3.L = R0.L + R1.L (S); R2.H = R0.H - R1.H (S); R2.L = R0.L - R1.L (S); 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Single 32-Bit Operations
Digital Systems: Hardware Organization and Design 9/18/2018 Single 32-Bit Operations In single 32-bit operation: Any two 32-bit registers may be used as input to the ALU, considered as 32-bit operands Addition, Subtraction or Logical operation produces a 32-bit results Result is stored into arbitrary destination (32-bit) register. Since ALU0 is primary ALU it is used for these operations. R3 = R1 + R2 (NS); Adds the 32-bit contents of R2 to the 32-bit contents of R1 and stores the result in R3 with no saturation R3 = R1 + R2 (S); Adds the 32-bit contents of R2 to the 32-bit contents of R1 and stores the result in R3 with saturation 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Dual 32-Bit Operations In dual 32-bit operation: Any two 32-bit registers may be used as input to the ALU, considered as 32-bit operands Addition, Subtraction or Logical operation produces a two 32-bit results Results are stored into two arbitrary destination (32-bit) registers. Since there are only two 32-bit data paths from Data Register File to ALU’s: The same two pairs of 16-bit inputs are presented to ALU0 and ALU1. R3 = R1 + R2, R4 = R1 – R2 (NS); Adds the 32-bit contents of R2 to the 32-bit contents of R1 and stores the result in R3 with no saturation, also Subtracts R2 from R1 and stores the result in R4 with no saturation. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Dual 32-Bit Operations Special form of the previous instruction uses ALU 40-bit result registers as input operands, creating sum and differences of the A0 and A1 registers. R3 = A0 + A1, R4 = A0 – A1 (S); Transfers to the result/destination registers two 32-bit, saturated, sum and difference values of the ALU registers. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 ALU Data Flow Details 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

ALU Division Support Features
Digital Systems: Hardware Organization and Design 9/18/2018 ALU Division Support Features ALU supports division with two special divide primitives: DIVS DIVQ Division can be: Signed Unsigned Both dividend and divisor must be of the same type. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Special SIMD Video ALU Operations
Digital Systems: Hardware Organization and Design 9/18/2018 Special SIMD Video ALU Operations Four 8-bit Video ALUs enable the processor to process video information with high efficiency. Each Video ALU instruction may take from one to four pairs of 8-bit inputs and return one to four 8-bit results. The inputs are presented to the Video ALUs in two 32-bit words from the Data Register File. The possible operations include: Quad 8-Bit Add or Subtract Quad 8-Bit Average Quad 8-Bit Pack or Unpack Quad 8-Bit Subtract-Absolute-Accumulate Byte Align For more information about the operation of these instructions, see the ADSP-BF53x Blackfin Processor Instruction Set Reference. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Multiply Accumulators (MAC)
Digital Systems: Hardware Organization and Design 9/18/2018 Multiply Accumulators (MAC) Architecture of a Respresentative 32 Bit Processor

Multiply Accumulators (MAC)
Digital Systems: Hardware Organization and Design 9/18/2018 Multiply Accumulators (MAC) Data Arithmetic Unit LD0 32-bits 16 16 LD1 32-bits 8 8 8 8 R7 R7.H R7.L R6 R6.H R6.L SD 32-bits R5 R5.H R5.L R4 R4.H R4.L barrel R3 40 40 R3.H R3.L shifter R2 R2.H R2.L R1 R1.H R1.L R0 R0.H R0.L A0 A1 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Multiply-Accumulate (MAC)
Digital Systems: Hardware Organization and Design 9/18/2018 Multiply-Accumulate (MAC) Two identical MACs Each performs fixed-point multiplication and multiply-accumulate operations on 16-bit fixed-point input data and outputs 32-bit Resulting 32-bit output can be added or subtracted from a 40-bit accumulator. Inputs: Fractional data format, or Integer, unsigned or two’s complement. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Multiply-Accumulate (MAC) Functions Multiplication Multiply-accumulate with addition, rounding optional Multiply-accumulate with subtraction, rounding optional Dual versions of the above Features Saturation of accumulator results Optional rounding of multiplier results 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Multiply-Accumulate (MAC) Inputs: Two 32-bit inputs to both Multipliers 32-bit input provides two 16-bit operands. Operands come from any Register in Data Register File. Each multiplier instruction determines if its inputs are both in: Integer format Fractional format Output format is the same as input MAC0 both inputs are treated as signed or unsigned MAC1 supports mixed-mode operation. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Multiply-Accumulate (MAC) Outputs: Each multiplier can accumulate results in its Accumulator register (A0 or A1). Accumulator results can be saturated to 32-bits or 40-bits If both inputs are fractional and signed – multipliers automatically shifts the result left one bit to remove the redundant sign bit. For unsigned fractional, unsigned integer or integer, and mixed mode operations, not shift operation is performed 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Placing Multiplier Results in Multiplier Accumulator Registers
Digital Systems: Hardware Organization and Design 9/18/2018 Placing Multiplier Results in Multiplier Accumulator Registers MAC0 ➨ A0 (39:0) A0.L ➨ A0 (15:0) A0.H ➨ A0 (31:16) A0.X ➨ A0 (39:32) MAC1 ➨ A1 (39:0) A0.L ➨ A0 (15:0) A0.H ➨ A0 (31:16) A0.X ➨ A0 (39:32) The Sign bit (MSB) of the multiplication result is sign-extended into the upper 8-bits of the register (A0.X/A1.X) Multiplier results can also be deposited as 16- or 32-bit in Data registers of Data Register File. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Rounding or Saturating Multiplier Results
Digital Systems: Hardware Organization and Design 9/18/2018 Rounding or Saturating Multiplier Results On Multiply-and-Accumulate operations, the accumulator data can be: Saturated, Rounded for extraction to a register or register half (optional). When Multiply operation deposit the results directly into Data Register File saturation and rounding works the same way as follows: Rounding is applied only to fractional results Exception: Rounding is also applied when high-half extraction on the integer result is executed (which is indicated by IH option) Rounding is obtained by adding 0x8000 to the accumulator (for MAC) or multiply result (for mult) And then saturating to 32-bits. Overflow or underflow occurrences during saturation operation set the specified Result register to the maximum positive or negative value. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Saturating Multiplier Results on Overflow
Digital Systems: Hardware Organization and Design 9/18/2018 Saturating Multiplier Results on Overflow The following bits in ASTAT indicate multiplier overflow status: Bit 16 (AV0 for A0) and bit 18 (AV1 for A1) record overflow condition (whether the result has overflowed 32 bits). AV0/AV1 = 0 ➨ no overflow or underflow AV0/AV1 = 1 ➨ overflow or underflow has occurred AV0S/AV1S are sticky bits. Bit 24 (V) and bit 25 (VS) are set if overflow occurs in extracting the accumulator result to register. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Multiplier Instruction Summary
Digital Systems: Hardware Organization and Design 9/18/2018 Multiplier Instruction Summary Instruction ASTAT Status Flags AV0 AV0S AV1 AV1S V V_COPY VS Dreg_lo = Dreg_lo_hi * Dreg_lo_hi ; - * Dreg_hi = Dreg_lo_hi * Dreg_lo_hi ; Dreg = Dreg_lo_hi * Dreg_lo_hi ; An = Dreg_lo_hi * Dreg_lo_hi ; An += Dreg_lo_hi * Dreg_lo_hi ; An -= Dreg_lo_hi * Dreg_lo_hi ; Dreg_lo = ( A0 = Dreg_lo_hi * Dreg_lo_hi ) ; Dreg_lo = ( A0 += Dreg_lo_hi * Dreg_lo_hi ) ; Dreg_lo = ( A0 -= Dreg_lo_hi * Dreg_lo_hi ) ; Dreg_hi = ( A0 = Dreg_lo_hi * Dreg_lo_hi ) ; Dreg_hi = ( A0 += Dreg_lo_hi * Dreg_lo_hi ) ; Dreg_hi = ( A0 -= Dreg_lo_hi * Dreg_lo_hi ) ; 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Multiplier Instruction Summary
Digital Systems: Hardware Organization and Design 9/18/2018 Multiplier Instruction Summary Instruction ASTAT Status Flags AV0 AV0S AV1 AV1S V V_COPY VS Dreg = ( An += Dreg_lo_hi * Dreg_lo_hi ) ; * Dreg = ( An -= Dreg_lo_hi * Dreg_lo_hi ) ; Dreg = ( An = Dreg_lo_hi * Dreg_lo_hi ) ; Dreg *= Dreg ; - 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Notation Clarification
Digital Systems: Hardware Organization and Design 9/18/2018 Notation Clarification Dreg denotes any Data Register File register: R(0:7). Dreg_lo_hi denotes any 16-bit register half in any Data Register File register: R(0:7).L or R(0:7).H Dreg_lo denotes the lower 16 bits of any Data Register File register: R(0:8).L. Dreg_hi denotes the upper 16 bits of any Data Register File register R(0:8).H. An denotes either MAC Accumulator register A0 or A1. * Indicates the flag may be set or cleared, depending on the results of the instruction. – Indicates no effect. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Multiplier Instruction Options
Digital Systems: Hardware Organization and Design 9/18/2018 Multiplier Instruction Options For further instructions on how to use those options consult: ADSP-BF53x Blackfin Processor Instruction Set Reference. Default No option; input data is signed fraction. (IS) Input data operands are signed integer. No shift correction is made (FU) Input data operands are unsigned fraction. No shift correction is made. (IU) Input data operands are unsigned integer. No shift correction is made. (T) Input data operands are signed fraction. When copying to the destination half register, truncates the lower 16 bits of the Accumulator contents. (TFU) Input data operands are unsigned fraction. When copying to the destination half register, truncates the lower 16 bits of the Accumulator contents. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Multiplier Instruction Options (ISS2) If multiplying and accumulating to a register: Input data operands are singed integer, When copying to the destination register, Accumulator contents are scaled (multipliedx2 by one-place shift-left). If scaling produces a signed value larger than 32 bits, the number is saturated to its maximum positive value. If multiplying and accumulating to a half-register: When copying the lower 16 bits to the destination half register, the Accumulator contents are scaled. If scaling produces a signed value greater than 16 bits. The number is saturated to its maximum positive or negative value. (IH) This option indicates integer multiplication with high half word extraction. The accumulator is saturated at 32 bits, and bits [31;16] of the Accumulator are rounded, and the coped into he destination half register. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Multiplier Instruction Options (W32) Input data operands are signed fraction with no extension bits in the Accumulators. Left-shift correction of the product is performed, as required. This option is used for legacy GSM speech vocoder algorithms written for 32-bit Accumulators. For this option only, this special case applies: 0x8000 x 0x8000 = 0x7FFF (M) Operation uses mixed-multiply mode. Operand one is signed; operand two is unsigned. Valid only for MAC1 versions of the instruction. Multiplies a signed fraction by an unsigned fraction operand with no left-shift correction. MAC0 performs an unmixed multiply on signed fractions by default, or another format as specified. That is, MAC0 executes the specified singed/signed or unsigned/unsigned multiplication. The (M) option can be used alone or in conjuction with one other format option. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Multiplier Data Flow Details
Digital Systems: Hardware Organization and Design 9/18/2018 Multiplier Data Flow Details 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Multiplier Data Flow Details Each multiplier has Two 16-bit inputs Performs 16 bit multiplication Result is a 32 bit product Result is stored in a 40-bit accumulator or extracted to a 16-bit or 32-bit register Input Operands: Selected from low-half or high-half of one 32-bit word from a data register. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Multiplier Data Flow Details 31 31 Rm Rm Rn Rn ADD/SUB ADD/SUB MAC0 MAC0 39 39 A0 A0 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Example Instructions A0 = R3.L * R4.H ; A1 += R3.H * R4.H ; 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Multiply Without Accumulate
Digital Systems: Hardware Organization and Design 9/18/2018 Multiply Without Accumulate Results are stored directly into destination register of Data Register File if operation does not include accumulation function. Destination register can be: 16 bits If low half register: MAC0 is used If high half register: MAC1 is used 32 bits Either MAC0 or MAC1 is used. Format of the resulting data output depends on the format of the input: Fractional Operands or (IH) option Output is extracted from the high half of the result and stored in the 16-bit destination register. Integer Operands Low half of the result is extracted and stored in the 16-bit destination registers. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Multiply Without Accumulate Multiplication of fractional operands 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Multiply Without Accumulate Multiplication of Integer Operands 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Multiply Without Accumulate Examples
Digital Systems: Hardware Organization and Design 9/18/2018 Multiply Without Accumulate Examples R0.L = R1.L * R2.L (FU) ; The instruction deposits the upper 16 bits of the multiply result with rounding and saturation into the lower half of R0 using MAC0. R0.H = R2.H * R3.h (IU) ; The instruction deposits the lower 16 bits of the multiply answer with any required saturation into the high half of R0 using MAC1. R0 = R1.L * R2.L ; Regardless of operand type, the preceding operation deposits 32 bits of the multiplier answer with saturation into R0 using MAC0 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Special 32-bit Integer MAC Instruction
Digital Systems: Hardware Organization and Design 9/18/2018 Special 32-bit Integer MAC Instruction Support of a multicycle 32-bit MAC instruction: Dreg *= Dreg ; Single instruction: Input: Two 32-bit integer operands Output: 32-bit integer result. One of the input operands is destroyed. Instruction takes multiple cycles to execute. This macro function is interruptable and does not modify the data in either Accumulator registers A0 or A1. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Dual MAC Operations Two 16-bit MAC’s can be used in the same operation to double the MAC throughput. Restriction is that the same two 32-bit input registers must be used as inputs to each MAC unit. Each MAC can use one of the 4 possible combinations of 16-bit input operands. Example of two multiply and accumulate instructions executing concurrently in one cycle: A1 += R1.H * R2.L, A0 += R1.L *R2.H ; 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Dual MAC Operations Resulting output of MAC operations may be written to registers in a number of ways: A pair of 16-bit halves A pair of 32-bit registers Independent 16-bit half register or 32-bit register. Example: R3.H = (A1 += R1.H * R2.L), R3.L = (A0 += R1.L * R2.L) ; The operand type determines the correct bits to extract from the Accumulator and deposit in the 16-bit destination register. MAC1 MAC0 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Dual MAC Operations Packing the result into two 32-bit registers: 40-bit Accumulators are packed into two 32-bit registers. The registers must be register pairs: R[1:0], R[3:2], R[5:4], or R[7:6]. R3 = (A1 += R1.H * R2.L), R2 = (A0 += R1.L * R2.L) ; 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

R3 = (A1 += R1.H * R2.L), R2 = (A0 += R1.L * R2.L) ;
Digital Systems: Hardware Organization and Design 9/18/2018 R3 = (A1 += R1.H * R2.L), R2 = (A0 += R1.L * R2.L) ; Example of the instruction with one Accumulator data transfer but not the other. Either a 16 or 32-bit register can be specified as destination register. R3.H = (A1 += R1.H * R2.L), A0 += R1.L * R2.L ; 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Barrel Shifter Architecture of a Respresentative 32 Bit Processor

9/18/2018 Barrel Shifter Data Arithmetic Unit 16 16 8 8 8 8 R7 R7.H R7.L LD0 32-bits R6 R6.H R6.L R5 R5.H R5.L R4 R4.H R4.L LD1 32-bits R3 R3.H R3.L barrel 40 40 R2 R2.H R2.L shifter R1 R1.H R1.L SD 32-bits R0 R0.H R0.L A0 A1 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Barrel Shifter Functions
Digital Systems: Hardware Organization and Design 9/18/2018 Barrel Shifter Functions Performs bitwise shifting for 16-bit, 32-bit or 40-bit inputs and yields 16-bit, 32-bit, or 40-bit outputs. Shift Functions Arithmetic Shifts: Preserve the sign of the original number. The sign bit value back-fills the left-most bit positions vacated by the arithmetic right shift. Logical Shifts: Discard any bits shifted out of the register and back-fills vacated bits with zeros. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Barrel Shifter Functions
Digital Systems: Hardware Organization and Design 9/18/2018 Barrel Shifter Functions Additional Functions Rotate: Rotates a registered number through the CC bit a specified distance and direction. Bit Operations – Set, Clear, Toggle, Test Field Extract and Deposit (Pack & Unpack) Exponent Detection Functions Note: Shift functions can be combined to implement numerical format control including floating-point representation (software). 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Shifter Operations The shifter instructions: ASHIFT and “>>>” – Arithmetic Shift LSHIFT and “<<“, “>>” – Logical Shift Instructions ROT – Rotate Arithmetic and Logical shift instructions are further broken down into: 16-bit single 16-bit paired, instructions Three operand instructions using ASHIFT and LSHIFT Instructions operating on 32-bit register values Two operand instructions using “>>>” and “>>” Shift argument can be obtained from: Register Directly from immediate value in instruction 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Two-Operand Shifts Two operand shift instructions: Shift an input register, and Deposit the result in the same register. Immediate Shifts Right (downshift) or left (upshift) by given number of bits. Example: R0 contains 0000 B6A3 R0 >>= 0x04 ; results in R0 contains B6A R0 <<= 0x04 ; results in R0 contains 000B 6A30 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Two-Operand Shifts Register Shifts Right (downshift) or left (upshift) by number of bits specified by the content of a register. 32-bit register can hold shift values greater or equal to 32. In such cases the result is either 0 or -1 Example: R0 contains 0000 B6A3 ; R2 contains ; R0 <<= R2 ; results in R0 contains 000B 6A30 ; 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Three-Operand Shifts Three-operand shifter instructions shift an input register and deposit the result in specified destination register. Immediate Shifts Example: R0 contains 0000 B6A3 ; R1 = R0 >> 0x04 ; results in R1 contains B6A ; R0.L contains B6A3 ; R1.H = R0.L << 0x04 ; results in R1.H contains 6A30 ; 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Three-Operand Shifts Register Shifts Register holds the shift value. Shift value is always found in the lwo half of a register (Rn.L) Bottom six bits of Rn.L are masked off and used as shift value. Example R0 contains 0000 B6A3 ; R2.L contains 0004 ; R1 = R0 ASHIFT by R2.L ; results in R1 contains 000B 6A30 ; 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Three-Operand Shifts Example (cont.) The following example shows the input value rotated. Assume the Condition Code (CC) bit is set to 0. R0 contains ABCD EF12 ; R2.L contains 0004 ; R1 = R0 ROT by R2.L ; results in R1 contains BCDE F125 ; Note the CC bit is included in the result, at bit 3. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Three-Operand Shifts Rotate 4 bits including CC CC A B C D 1 1 1 1 1 1 1 1 1 1 R0.H 1 1 1 1 1 1 1 1 1 R0.L E F 1 2 CC A B C D 1 1 1 1 1 1 1 1 1 1 R0.H 1 1 1 1 1 1 1 1 1 R0.L E F 1 2 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Bit Test, Set, Clear, Toggle
Digital Systems: Hardware Organization and Design 9/18/2018 Bit Test, Set, Clear, Toggle Shifter Provides means to: Test, Set, Clear Toggle, Specific bits of a data register. All instructions have two arguments Source register The bit field value 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Bit Test, Set, Clear, Toggle
Digital Systems: Hardware Organization and Design 9/18/2018 Bit Test, Set, Clear, Toggle Test Instruction Does not change the source register Result of the test resides in CC bit Examples: BITCLR ( R0, 6 ) ; // Bit Clear BITSET ( R2, 9 ) ; // Bit Set BITTGL ( R3, 2 ) ; // Bit Toggle CC = BITTST ( R3, 0 ) ; // Bit Test 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Field Extract and Field Deposit
Digital Systems: Hardware Organization and Design 9/18/2018 Field Extract and Field Deposit When shifter is used, a Source field may be deposited anywhere in a 32-bit Destination field. Source field: 1-16 bits in length Extraction location of the field: anywhere within a 32-bit source field. Two register arguments instruction: 32-bit destination or source Extract/deposit value, length, and its position within the source. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Shifter Instruction Summary
Digital Systems: Hardware Organization and Design 9/18/2018 Shifter Instruction Summary Notational Conventions: Dreg denotes any Data Register File register. Dreg_lo denotes the lower 16 bits of any Data Register File register. Dreg_hi denotes the upper 16 bits of any Data Register File register. * Indicates the flag may be set or cleared, depending on the results of the instruction. * 0 Indicates versions of the instruction that send results to Accumulator A0 set or clear AV0. * 1 Indicates versions of the instruction that send results to Accumulator A1 set or clear AV1. ** Indicates the flag is cleared. *** Indicates CC contains the latest value shifted into it. • – Indicates no effect. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 See Table 2-11, page 2-56 of HRM 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Operating Modes and States
Digital Systems: Hardware Organization and Design 9/18/2018 Operating Modes and States Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Operating Modes and States The processor supports the following three processor modes: User mode Supervisor mode Emulation mode User Mode Considered the domain of application programs. Supervisor & Emulation modes Reserved for kernel code of an operating system Processor mode determined by the Event Controller. When servicing: An interrupt Nonmaskable interrupt (NMI) Exception ➨ Supervisor Mode When servicing an emulation event ➨ Emulation Mode When not servicing any events ➨ User Mode Unrestricted access to the core resources 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Operating Modes and States The current processor mode may be identified by interrogating the IPEND memory-mapped register (MMR), as shown in Table. Note: MMRs cannot be read while the processor is in User mode. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Identification of Processor Mode
Digital Systems: Hardware Organization and Design 9/18/2018 Identification of Processor Mode Event Mode IPEND Interrupt Supervisor ≥ 0x10 but IPEND[0], IPEND[1], IPEND[2], and IPEND[3] = 0. Exception ≥ 0x08 The core is processing an exception event if IPEND[0] = 0, IPEND[1] = 0, IPEND[2] = 0, IPEND[3] = 1, and IPEND[15:4] are 0’s or 1’s. NMI ≥ 0x04 The core is processing an NMI event if IPEND[0] = 0, IPEND[1] = 0, IPEND[2] = 1, and IPEND[15:2] are 0’s or 1’s. Reset = 0x02 As the reset state is exited, IPEND is set to 0x02, and the reset vector runs in Supervisor mode. Emulation Emulator = 0x01 The processor is in Emulation mode if IPEND[0] = 1, regardless of the state of the remaining bits IPEND[15:1]. None User = 0x00 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Operating Modes and States Non-processing States: Idle State Reset State 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 User Mode Processor is in User Mode when NOT: Reset state, Idle state, and Servicing: Interrupt, NMI, Exception, or Emulation event. Explicit access to system registers not allowed in User Mode. Any attempt to access restricted system registers causes an exception event. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Registers Accessible in User Mode
Digital Systems: Hardware Organization and Design 9/18/2018 Registers Accessible in User Mode Processor Registers Register Names Data Registers R[7:0], A[1:0] Pointer Registers P[5:0], SP, FP, I[3:0], M[3:0], L[3:0], B[3:0] Sequencer and Status Registers RETS, LC[1:0], LT[1:0], LB[1:0], ASTAT, CYCLES, CYCLES2 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Protected Resources and Instructions
Digital Systems: Hardware Organization and Design 9/18/2018 Protected Resources and Instructions System resources consist of a subset of processor registers: All MMRs, and A subset of protected instructions These system and core MMRs are located starting at address 0xFFC This region of memory is protected from User Mode access. List of protected instructions is provided in the table in the next slide. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Protected Instructions
Digital Systems: Hardware Organization and Design 9/18/2018 Protected Instructions Instruction Description RTI Return from Interrupt RTX Return from Exception RTN Return from NMI CLI Disable Interrupts STI Enable Interrupts RAISE Force Interrupt/Reset RTE Return from Emulation Causes and exception only if executed outside Emulation Mode 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Protected Memory Additional Memory Locations can be protected from User Mode access. A Cacheability Protection Lookaside Buffer (CPLB) entry can be created and enabled. See “Memory Management Unit” further information 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Entering User Mode When coming out of reset Processor is in Supervisor Mode – servicing a reset event. To enter User Mode after reset two steps are necessary: Return address must be loaded into the RETI register RTI must be issued. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Example Code to Enter User Mode Upon Reset
Digital Systems: Hardware Organization and Design 9/18/2018 Example Code to Enter User Mode Upon Reset P1.L = START ; /* Point to start of user code */ P1.H = START ; RETI = P1 ; RTI ; /* Return from Reset Event */ START : /* Place user code here */ 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Return Instructions That Invoke User Mode
Digital Systems: Hardware Organization and Design 9/18/2018 Return Instructions That Invoke User Mode A summary of return instructions that can be used to invoke User mode from various processor event service routines is provided in the next table. When these instructions are used in service routines, the value of the return address must be first stored in the appropriate evetn RETx register. In the case of an interrupt routine (if service routine is interruptible) the return address is stored on the stack. Address can be found by popping the value from the stack into RETI. Once RETI has been loaded, the RTI instruction can be used. Note: the stack pop is optional. If the RETI register is not pushed/popped, then the interrupt service routine becomes non-interruptible, because the return address is not saved on the stack. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Return Instructions That Can Invoke User Mode
Digital Systems: Hardware Organization and Design 9/18/2018 Return Instructions That Can Invoke User Mode Current Process Activity Return Instruction to Use Execution Resumes at Address in This Register Interrupt Service Routine RTI RETI Exception Service Routine RTX RETX Nonmaskable Interrupt Service Routine RTN RETN Emulation Service Routine RTE RETE 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Supervisor Mode Full unrestricted access to all processor: System resources, and Emulation resources, unless CPLB has be configured or enabled. USP register accessing User Stack Pointer in memory exclusively accessed with Supervisor Mode. Note: In Supervisor mode SP refers to the kernel stack pointer rather than to the user stack pointer. From Reset state normally processing begins in Supervisor Mode. Processor remains in Supervisor Mode until An emulation event, or Return instruction occurs to change the mode. Before Return instruction is issues, the RETI register must be loaded with a valid return address. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Non-OS Environment For Non-OS Environments: Application code should remain in Supervisor Mode in order to access all core and system resources One way of keeping the processor in Supervisor Mode and still allowing lower priority events to be processed is to set up and force the lowest priority interrupt (IVG15). More about Events and Interrupts in “Events and Sequencing” section. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Example Code for Supervisor Mode Coming Out of Reset
Digital Systems: Hardware Organization and Design 9/18/2018 Example Code for Supervisor Mode Coming Out of Reset P0.L = LO(EVT15) ; /* Point to IVG15 in Event Vector Table */ P0.H = HI(EVT15) ; P1.L = START ; /* Point to start of User code */ P1.H = START ; [P0] = P1 ; /* Place the address of start code in IVG15 of EVT */ P0.L = LO(IMASK) ; R0 = [P0] ; R1.L = EVT_IVG15 & 0xFFFF ; R0 = R0 | R1 ; [P0] = R0 ; /* Set (enable) IVG15 bit in Interrupt Mask Register */ RAISE 15 ; /* Invoke IVG15 interrupt */ P0.L = WAIT_HERE ; P0.H = WAIT_HERE ; RETI = P0 ; /* RETI loaded with return address */ RTI ; /* Return from Reset Event */ WAIT_HERE : /* Wait here till IVG15 interrupt is serviced */ JUMP WAIT_HERE ; START: /* IVG15 vectors here */ [--SP] = RETI ; /* Enables interrupts and saves return address to stack */ 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Emulation Mode The processor enters Emulation Mode if: Emulation Mode is Enabled and An external emulation event occurs, or The EMUEXCPT is issued. The processor remains in the Emulation mode until the emulation service routine executes an RTE instruction. If no interrupts are pending when the RTE instruction executes, The processor switches to User Mode, otherwise The processor switches to Supervisor Mode to service the interrupt. Note: Emulation mode is the highest priority mode, and the processor has unrestricted access to all system resources. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Idle State Idle state stops all processor activity at the user’s discretion, usually to conserve power during lulls in activity. In Idle State: No processing occurs Idle State is invoked by a sequential IDLE instruction ➨ this instruction notifies the processor hardware that the Idle State is requested. The SSYNC instruction purges all speculative and transient states in the core and external system. The processor remains in Idle State until: A peripheral or External device: SPORT or Real-Time Clock (RTC) Generates interrupt that requires servicing. If core interrupts are disabled Idle State can be terminated by asserting a WAKEUP signal. When WAKEUP is asserted the processor wakes up and the STI instruction enables interrupts again. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Example Code for Transition to Idle State
Digital Systems: Hardware Organization and Design 9/18/2018 Example Code for Transition to Idle State CLI R0 ; /* disable interrupts */ IDLE ; /* drain pipeline and send core into IDLE state */ STI R0 ; /* re-enable interrupts after wakeup */ 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Reset State Reset state initializes the processor logic. During Reset state, application programs and the operating system do not execute. Clocks are stopped while in Reset state. Processor remains in Reset State as long as external logic asserts the external RESET signal Upon de-assertion, the processor switches to Supervisor Mode and executed code found at the reset event vector. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Reset State Reset State can be invoked without external RESET signal: Supervisor Mode or Emulation Mode, by issuing Reset version of the RAISE instruction. Application Programs in User Mode cannot invoke the Reset State, except through a system call provided by an operating system kernel. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Booting Methods For more information about Booting Methods consult HRM 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Program Sequencer Architecture of a Respresentative 32 Bit Processor

9/18/2018 Program Sequencer Controls all program flow Contains a 10-stage instruction pipeline Maintains in-program branching Subroutines Jumps Interrupts and Exceptions Maintains loops Includes zero-overhead loop registers No cost for wrapping from loop bottom to loop top 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Program Flow Variations
Digital Systems: Hardware Organization and Design 9/18/2018 Program Flow Variations 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Program Sequencer The sequencer manages execution of these program structures by selecting the address of the next instruction to execute. Fetched address of the next instruction enters 10-stage super-pipeline Sequencer ensures that the pipeline is fully interlocked and that all the data hazards are hidden from the programmer If executing an instruction that requires data to be fetched, the pipeline will stall until that data is available See EE-197 application note for a complete list of stalls and multi-cycle instructions: 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Program Sequencer The pipeline contains the 32-bit addresses of the instructions currently being fetched, decoded, and executed. Ending with the program counter (PC) The PC couples with the RETn registers, which store return addresses. All addresses generated by the sequencer are 32-bit memory instruction addresses. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Program Sequencer Handles events with Event Controller: Interrupts Event Processing Determines whether an interrupt is masked Generates appropriate event vector address. In addition to providing data addresses, the data address generators (DAGs) can provide instruction addresses for the sequencer’s indirect branches. The sequencer evaluates conditional instructions and loop termination conditions. The loop registers support nested loops. The memory-mapped registers (MMRs) store information used to implement interrupt service routines. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Sequencer Related Registers
Digital Systems: Hardware Organization and Design 9/18/2018 Sequencer Related Registers PC and SEQSTAT registers are the only registers that are not directly readable and writable. Explicit Pushing and Popping registers to/from stack with instructions: [--SP] = Rn // Push Rn = [SP++] // Pop 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Sequencer-Related Registers
Digital Systems: Hardware Organization and Design 9/18/2018 Sequencer-Related Registers Register Name Description SEQSTAT Sequencer Status Register RETX RETN RETI RETE RETS Exception Return NMI Return Interrupt Return Emulation Return Subroutine Return LC0, LC1 LT0, LT1 LB0, LB1 Zero-Overhead Loop registers: Loop Counters Loop Tops Loop Bottoms FS, SP Frame Pointer and Stack Pointer SYSCFG System Configuration Register CYCLES, CYCLES2 Cycle Counters PC Program Counter 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 SEQSTAT Register Sequencer Status Register (SEQSTAT) contains Information about the current state of the sequencer, as well as Diagnostic information from the last event SEQSTAT is read-only register accessible only in Supervisor Mode. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Sequencer Status Register Details
Digital Systems: Hardware Organization and Design 9/18/2018 Sequencer Status Register Details 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Zero-Overhead Registers (LC, LT & LB)
Digital Systems: Hardware Organization and Design 9/18/2018 Zero-Overhead Registers (LC, LT & LB) Two sets of registers implement zero-overhead loops using hardware to set up counters and evaluating loop conditions. Registers Description Function LC0, LC1 Loop Counters Maintains a count of the remaining iterations of the loop. LT0, LT1 Loop Tops Holds the address of the first instruction within a loop. LB0, LB1 Loop Bottoms Holds the address of the last instruction of the loop. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 SYSCFG Register The System Configuration Register (SYSCFG) controls the configuration of the processor. This processor is accessible only from the Supervisor Mode. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

System Configuration Register Details
Digital Systems: Hardware Organization and Design 9/18/2018 System Configuration Register Details 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Instruction Pipeline The program sequencer determines the next instruction address by examining both: The current instruction being executed and The current state of the processor. If no conditions require otherwise, the processor executes instructions from memory in sequential order by incrementing the lookahead address. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Stages of Instruction Pipeline
Digital Systems: Hardware Organization and Design 9/18/2018 Stages of Instruction Pipeline Pipeline Stage Description Instruction Fetch 1 (IF1) Issue instruction address to IAB bus, start compare tag of instruction cache Instruction Fetch 2 (IF2) Wait for instruction data Instruction Fetch 3 (IF3) Read from IDB bus and Align instruction Instruction Decode (DEC) Decode instructions Address Calculation (AC) Calculation of data addresses and branch target address Data Fetch 1 (DF1) Issue data address to DA0 and DA1 bus, start compare tag of data cache Data Fetch 2 (DF2) Read register files Execute 1 (EX1) Read data from LD0 and LD1 bus, start multiply and video instructions Execute 2 (EX2) Execute/Complete instructions (shift, add, logic, etc.) Write Back (WB) Writes back to register files, SD bus, and pointer updates (also referred to as the “commit” stage) 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Diagram of the Processor Pipeline
Digital Systems: Hardware Organization and Design 9/18/2018 Diagram of the Processor Pipeline 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Program Sequencer Most common numeric operations have no instruction latency VisualDSP++ Pipeline Viewer highlights Stall and Kill conditions 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Branching and Sequencing
Digital Systems: Hardware Organization and Design 9/18/2018 Branching and Sequencing A branch occurs when: JUMP or CALL instruction begins execution at a new location other than the next sequential address. JUMP: The sequencer supports both Short and (13-bit immediate relative PC offset value) Long jumps (25-bit immediate relative PC offset value) 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 CALL: Direct Call with instruction that has 25-bit PC-relative offset that must multiple of two (bit zero must be 0). Indirect Call & Branch Destination address is obtained from Data Address Generator (DAG) P-register. For CALL RETS register is loaded with the return address of the next instruction which would have executed next. Example: JUMP (P3) ; CALL (P0) ; 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

PC-Relative Indirect Branch and Call
Digital Systems: Hardware Organization and Design 9/18/2018 PC-Relative Indirect Branch and Call PC-relative indirect JUMP and CALL instruction use the contents of a P-register as an offset to the branch target. CALL instruction: RETS register is loaded with the address of the instruction which would have executed next Example JUMP (PC+P3) ; CALL (PC+P0) ; 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Condition Code (CC) Flag
Digital Systems: Hardware Organization and Design 9/18/2018 Condition Code (CC) Flag CC Flag is used to resolve the direction of a branch. It may be accessed in 8 ways: A conditional branch is resolved by the value in CC. A data register value may be copied into CC, and the value in CC may be copied to a Data Register. The BITTST instruction accesses the CC flag. A status flag may be copied into CC, and the value in CC may be copied to a status flag. The CC flag bit may be set to the result of a Pointer register comparison. The CC flag bit may be set to the results of a Data register comparison. Some shifter instructions (rotate or BXOR) use CC as a portion of the shift operand/result. Test and set instruction can set and clear the CC bit. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Condition Code (CC) Flag These eight ways of accessing the CC bit are used to control program flow. A single bit resides in the instruction encoding that specifies the interpretation for the value of CC. The interpretation is to “branch on true” or “branch on false.” The comparison operations have the form CC = expr where expr involves a pair of registers of the same type: Data registers or Pointer registers, or a Single register and a small immediate constant. The small immediate constant is a 3-bit (–4 through 3) signed number for signed comparisons and a 3-bit (0 through 7) unsigned number for unsigned comparisons. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Condition Code (CC) Flag The actual test condition is determined by operators: “==“ // Equal “<“, “>” // Less/Greater than, or “<=“, “>=“ // Less/Greater than equal Additional BIT test operations Testing whether a bit in a 32-bit R-register is set. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Conditional Branches JUMP instructions Branch on value of CC-bit, or Continue Branch target: PC-relative address from the location of the instruction, plus Offset: 11-bit immediate signed value multiple of two: (-1024:+1024) IF CC JUMP dest_address ; 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Conditional Register Move
Digital Systems: Hardware Organization and Design 9/18/2018 Conditional Register Move Register moves can be conditioned on value of the CC flag: Using this instruction when possible will eliminate the cycles associated with conditional branch. Conditional move instruction can be done between R and P registers (including SP and FP) IF CC R0 = P0 ; 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Branch Prediction Static Branch Prediction of the sequencer to accelerate execution of conditional branches is supported. Sequencer compares actual CC bit value to the predicted in the EX2 stage of the pipeline. If value is mispredicted: Branch is corrected Correct address is made available for WB stage of the pipeline. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Branch Prediction Branch Latency: If prediction was “not to take branch,” and branch was actually not taken: 0 CCLK cycles. If prediction was “not to take branch,” and branch was actually taken: 8 CCLK cycles. If prediction was “to take branch,” and branch was actually taken: 4 CCLK cycles. If prediction was “to take branch,” and branch was actually not taken: 8 CCLK cycles. Summary: On erroneous prediction: 8 CCLK On correct prediction: Not to take branch: 0 CCLK To take branch: CCLK 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Unconditional Branches
Digital Systems: Hardware Organization and Design 9/18/2018 Unconditional Branches Branch target address is computed in AC stage of the pipeline Target address is sent to the Instruction Fetch Address bus at the beginning of the DF1 stage. All unconditional branches have 4 CCLK cycles of latency. Instruction Description IF CC JUMP dest (bp) ; This instruction tests the CC flag, and if it is set, jumps to a location identified by the label, dest. If the CC flag is set, the branch is correctly predicted and the branch latency is reduced, otherwise, the branch is incorrectly predicted and the branch latency increases. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Loops and Sequencing Sequencer supports zero-overhead looping: Two loop units with three registers: Loop Top register: LT0, LT1 Loop Bottom register: LB0, LB1 Loop Counter register; LC0, LC1 When an instruction at address X is executed, and If X matches the contents of LB0, (PC == LB0) Then Next instruction executed will be from the address in LT0 Loop unit 1 is assigned higher priority so it can be used as the inner loop in a nested loop structure: Loop unit 0 is allowed to loop back only after the lop count 1 is exhausted. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Loops and Sequencing The LSETUP instruction is used to load all three registers of a loop unit at once. Each loop register can also be loaded individually with a register transfer This incurs a significant overhead if the loop count is nonzero (the loop is active) at the time of the transfer. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Address Registers Pointer Register File 31 P0 Address Registers P1 P2 P3 P4 P5 FP SP USP 31 31 31 31 I0 L0 B0 M0 I1 L1 B1 M1 I2 L2 B2 M2 I3 L3 B3 M3 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Loop Example LT0 = lp_start LB0 = lp_end LC0 = P5 P5 = 0x20 ; // 32 in decimal LSETUP ( lp_start, lp_end ) LCO = P5 ; lp_start: R5 = R0 + R1(ns) || R2 = [P2++] || R3 = [I1++] ; lp_end: R5 = R5 + R2 ; Note: When LCx = 0 the loop is disabled and a single pass of the code executes. No Saturation 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Loop Registers First/Last Address of the Loop PC-Relative Offset Used to Compute the Loop Start Address Effective Range of the Loop Start Instruction Top/First 5-bit singed immediate; mist be a multiple of 2 0 t 30 bytes away from LSETUP instruction Bottom/Last 11-bit signed immediate; must be a multiple of 2 0 to 2046 bytes away from LSETUP instruction (the defined loop can be 2046 bytes long) 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Loop Instruction Fetch
Digital Systems: Hardware Organization and Design 9/18/2018 Loop Instruction Fetch Four-location instruction loop buffer is used to reduce instruction fetches while in loops. If loop code contains four or fewer instructions, then no fetches to instruction memory are necessary for any number of loop iterations. A four cycle latency occurs on the first loopback when the LSETUP specifies a nonzero start offset (lp_start). Note that zero start offsets are thus preferred. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Events and Sequencing Architecture of a Respresentative 32 Bit Processor

9/18/2018 Event Controller The Event Controller manages 5 types of Events: Emulation (via JTAG interface) Reset (via SW or external pin) Non-Maskable Interrupt (NMI) - for events that require immediate processor attention (via SW, external pin, or Watchdog) Exception – Synchronous to program flow. The exception is taken before the instruction is allowed to complete. Interrupts – Asynchronous to program flow; caused by input pins, timers, and other peripherals: Hardware Error Core Timer 9 General-Purpose Interrupts for servicing peripherals Can be custom prioritized for optimal system performance All events can be serviced by Interrupt Service Routines (ISR) 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Interrupts vs. Exceptions
Digital Systems: Hardware Organization and Design 9/18/2018 Interrupts vs. Exceptions EXCEPTIONS Software initiated event whose effects are synchronous to program flow. Service Exception Return address (RETE) is the address following the excepting instruction Never re-executed EXCPT instruction is in this category Error Condition Exception Return address (RETE) is the address of the excepting instruction Excepting instruction will be re-executed INTERRUPTS Interrupt is an event that changes the normal processor instruction flow and is asynchronous to program flow Hardware-generated Asynchronous to program flow Requested by a peripheral Software-generated Synchronous to program flow Generated by RAISE instruction All instructions preceding the interrupt in the pipeline are killed The Blackfin is always in Supervisor Mode while executing Event Handler software - can be in User Mode only while executing application tasks. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Processor Event Controller
Digital Systems: Hardware Organization and Design 9/18/2018 Processor Event Controller Processor Event Controller Consists of 2 stages: The Core Event Controller (CEC) System Interrupt Controller (SIC) Conceptually: Interrupts from the peripherals arrive at SIC SIC works with CEC to prioritize and control all system interrupts. SIC provides mapping between the many peripheral interrupt sources and the prioritized general-purpose interrupt inputs of the core. This mapping is programmable, and individual interrupts sources can be masked in the SIC 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

System Interrupt Controller (SIC)
Digital Systems: Hardware Organization and Design 9/18/2018 System Interrupt Controller (SIC) SIC provides mapping and routing of events: From: Peripheral interrupt sources To: Prioritized general-purpose interrupt inputs of the CEC. Processor default mapping can be altered by the user via Interrupt Assignment Register (IAR). 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Core Event Controller (CEC)
Digital Systems: Hardware Organization and Design 9/18/2018 Core Event Controller (CEC) CEC supports 9 general-purpose interrupts: IVG7-15 IVG14-15 – 2 lowest for software handlers. IRVG7-13 – 7 highest priority interrupts to support peripherals. Additional dedicated interrupt and exception events described in the Table in the next slide. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Core Event Mapping Event Source Core Event Name Core Events Emulation (highest priority) EMU Reset RST NMI Exception EVX Reserved - Hardware Error IVHW Core Timer IVTMR 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 System Event Mapping Event Source Core Event Name System Interrupts PLL Wakeup Interrupt DMA Error (generic) PPI Error Interrupt SPORT0 & SPORT1 Error Interrupt SPI Error Interrupt UART Error Interrupt IVG7 Real-Time Clock Interrupt DMA0 Interrupt (PPI) IVG8 DMA1 Interrupt (SPORT0 RX) DMA2 Interrupt (SPORT0 TX) DMA3 Interrupt (SPORT1 RX) DMA4 Interrupt (SPORT1 TX) IVG9 DMA5 Interrupt (SPI) DMA6 Interrupt (UART RX) DMA7 Interrupt (UART TX) IVG10 Timer0, Timer1, Timer2 Interrupts IVG11 Programmable Flags Interrupt A/B IVG12 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 System Event Mapping Event Source Core Event Name System Interrupts DMA8/9 Interrupt (Memory DMA Stream 0) DMA10/11 Interrupt (Memory DMA Stream 1) Software Watchdog Timer IVG13 Software Interrupt 1 IVG14 Software Interrupt 2 (lowest priority) IVG15 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Event Processing Flow 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Interrupt Service Routine
Digital Systems: Hardware Organization and Design 9/18/2018 Interrupt Service Routine ISR address is stored in the Event Vector Table Used as the next fetch address when the event occurs Program Counter (PC) address is saved to a register RETI, RETX, RETN, RETE, based on event Always concludes with “Return” Instruction RTI, RTX, RTN, RTE (respectively) When executed, PC is loaded with address stored in RETI, RETX, RETN, or RETE to continue app code Optional nesting of higher-priority interrupts possible See appnote EE-192, which covers writing interrupt routines in C ( 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

System Interrupt Processing
Digital Systems: Hardware Organization and Design 9/18/2018 System Interrupt Processing SIC_ISR logs the request and keeps track of system interrupts that are asserted but not yet serviced (that is, an interrupt service routine hasn’t yet cleared the interrupt). SIC_IWR checks to see if it should wake up the core from an idled state based on this interrupt request. SIC_IMASK masks off or enables interrupts from peripherals at the system level. If Interrupt A is not masked, the request proceeds to Step 4. The SIC_IARx registers, which map the peripheral interrupts to a smaller set of general-purpose core interrupts (IVG7 – IVG15), determine the core priority of Interrupt A. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

System Interrupt Processing (cont.)
Digital Systems: Hardware Organization and Design 9/18/2018 System Interrupt Processing (cont.) ILAT adds Interrupt A to its log of interrupts latched by the core but not yet actively being serviced. IMASK masks off or enables events of different core priorities. If the IVGx event corresponding to Interrupt A is not masked, the process proceeds to Step 7. The Event Vector Table (EVT) is accessed to look up the appropriate vector for Interrupt A’s interrupt service routine (ISR). When the event vector for Interrupt A has entered the core pipeline, the appropriate IPEND bit is set, which clears the respective ILAT bit. Thus, IPEND tracks all pending interrupts, as well as those being presently serviced. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

System Interrupt Processing (cont.)
Digital Systems: Hardware Organization and Design 9/18/2018 System Interrupt Processing (cont.) When the interrupt service routine (ISR) for Interrupt A has been executed, the RTI instruction clears the appropriate IPEND bit. However, the relevant SIC_ISR bit is not cleared unless the interrupt service routine clears the mechanism that generated Interrupt A, or if the process of servicing the interrupt clears this bit. Note: Emulation, reset, NMI, and exception events, as well as hardware error (IVHW) and core timer (IVTMR) interrupt requests, enter the interrupt processing chain at the ILAT level and are not affected by the system-level interrupt registers (SIC_IWR, SIC_ISR, SIC_IMASK, SIC_IARx). 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

System Peripheral Interrupts
Digital Systems: Hardware Organization and Design 9/18/2018 System Peripheral Interrupts The processor system has numerous peripherals, which therefore require many supporting interrupts. Table presented in the next slides lists: The Peripheral Interrupt source The Peripheral Interrupt ID used in the System Interrupt Assignment registers (SIC_IARx). See “System Interrupt Assignment Registers (SIC_IARx)” of HWR Manual. The general-purpose interrupt of the core to which the interrupt maps at reset The Core Interrupt ID used in the System Interrupt Assignment registers (SIC_IARx). See “System Interrupt Assignment Registers (SIC_IARx)” of HWR Manual. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Peripheral Interrupt Source Reset State
Digital Systems: Hardware Organization and Design 9/18/2018 Peripheral Interrupt Source Reset State Peripheral Interrupt Source Peripheral Interrupt ID General-purpose Interrupt (Assignment at Reset) Core Interrupt ID PLL Wakeup Interrupt IVG7 DMA Error (generic) 1 PPI Error Interrupt 2 SPORT0 Error Interrupt 3 SPORT1 Error Interrupt 4 SPI Error Interrupt 5 UART Error Interrupt 6 Real-Time Clock Interrupts (alarm, second, minute, hour, countdown) 7 IVG8 DMA 0 Interrupt (PPI) 8 DMA 1 Interrupt (SORT0 RX) 9 IVG9 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Peripheral Interrupt Source Reset State (cont.)
Digital Systems: Hardware Organization and Design 9/18/2018 Peripheral Interrupt Source Reset State (cont.) Peripheral Interrupt Source Peripheral Interrupt ID General-purpose Interrupt (Assignment at Reset) Core Interrupt ID DMA 2 Interrupt (SORT0 TX) 10 IVG9 2 DMA 3 Interrupt (SORT1 RX) 11 DMA 4 Interrupt (SORT1 TX) 12 DMA 5 Interrupt (SPI) 13 IVG10 3 DMA 6 Interrupt (UART RX) 14 DMA 7 Interrupt (UART TX) 15 Timer0 Interrupt 16 IVG11 4 Timer1 Interrupt 17 Timer2 Interrupt 18 PF Interrupt A 19 IVG12 5 PF Interrupt B 20 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Peripheral Interrupt Source Reset State (cont.)
Digital Systems: Hardware Organization and Design 9/18/2018 Peripheral Interrupt Source Reset State (cont.) Peripheral Interrupt Source Peripheral Interrupt ID General-purpose Interrupt (Assignment at Reset) Core Interrupt ID DMA 8/9 Interrupt (Memory DMA Stream 0) 21 IVG13 6 DMA 10/11 Interrupt (Memory DMA Stream 1) 22 Software Watchdog Timer Interrupt Reserved 24-31 - 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

System Peripheral Interrupts
Digital Systems: Hardware Organization and Design 9/18/2018 System Peripheral Interrupts An interrupt service routine that supports multiple interrupt sources must interrogate the appropriate system memory mapped registers (MMRs) to determine which peripheral generated the interrupt. If the default assignments shown in Table 4-7 are acceptable, then interrupt initialization involves only: Initialization of the core Event Vector Table (EVT) vector address entries Initialization of the IMASK register Unmasking the specific peripheral interrupts in SIC_IMASK that the system requires 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

BF533 System & Core Interrupt Controllers
Digital Systems: Hardware Organization and Design 9/18/2018 BF533 System & Core Interrupt Controllers System Interrupt Source IVG # 1 Event Source IVG # Core Event Name Highest PLL Wakeup interrupt IVG7 DMA error (generic) PPI error interrupt SPORT0 error interrupt SPORT1 error interrupt SPI error interrupt UART error interrupt RTC interrupt IVG8 DMA 0 interrupt (PPI) DMA 1 interrupt (SPORT0 RX) IVG9 DMA 2 interrupt (SPORT0 TX) DMA 3 interrupt (SPORT1 RX) DMA 4 interrupt (SPORT1 TX) DMA 5 interrupt (SPI) IVG10 DMA 6 interrupt (UART RX) DMA 7 interrupt (UART TX) Timer0 interrupt IVG11 Timer1 interrupt Timer2 interrupt PF interrupt A IVG12 PF interrupt B DMA 8/9 interrupt (MemDMA0) IVG13 DMA 10/11 interrupt (MemDMA1) Watchdog Timer Interrupt Emulator EMU Reset 1 RST Non Maskable Interrupt 2 NMI Exceptions 3 EVSW Reserved 4 - Hardware Error 5 IVHW Core Timer 6 IVTMR General Purpose 7 7 IVG7 General Purpose 8 8 IVG8 General Purpose 9 9 IVG9 General Purpose 10 10 IVG10 General Purpose 11 11 IVG11 General Purpose 12 12 IVG12 General Purpose 13 13 IVG13 General Purpose 14 14 IVG14 General Purpose 15 15 IVG15 P r i o r i t y Lowest 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Exception Handling Interrupts and exceptions treat instructions in the pipeline differently. When an interrupt occurs, all instructions in the pipeline are aborted. When an exception occurs, all instructions in the pipeline after the excepting instruction are aborted. For error exceptions, the excepting instruction is also aborted. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Deferring Exception Processing
Digital Systems: Hardware Organization and Design 9/18/2018 Deferring Exception Processing Exception handlers are usually long routines, because they must discriminate among several exception causes and take corrective action accordingly. To avoid lengthy suspension of interrupts, exception handler should be written to identify the exception cause, but defer the processing to a low priority interrupt. To set up the low priority interrupt handler, use the Force Interrupt / Reset instruction (RAISE). In the following example consult Event Exception Table for reference that defines EXCASUE bits [0:5] of SEQSTAT register. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Example Code for an Exception Handler
Digital Systems: Hardware Organization and Design 9/18/2018 Example Code for an Exception Handler /* Determine exception cause by examining EXCAUSE field in SEQSTAT (first save contents of R0, P0, P1 and ASTAT in Supervisor SP) */ [--SP] = R0 ; // Pushing R0 Content into Stack [--SP] = P0 ; // Pushing P0 Content into Stack [--SP] = P1 ; // Pushing P1 Content into Stack [--SP] = ASTAT ; // Pushing ASTAT Content into Stack R0 = SEQSTAT ; // Reading Sequencer Status Register /* Mask the contents of SEQSTAT, and leave only EXCAUSE in R0 */ R0 <<= 26 ; // Up/Left-shift 26-bit positions R0 >>= 26 ; // Down/Right-shift 26-bit positions // leaving 32-26=6 lower bits [0:6] /* Using jump table EVTABLE, jump to the event pointed to by R0 */ P0 = R0 ; // Load Pointer Register P0 with event code P1 = _EVTABLE ; // Pointing to EVTABLE P0 = P1 + ( P0 << 1 ) ; // Adjusting Address of P0 to point to EVTABLE R0 = W [ P0 ] (Z) ; // Reading from Memory Pointed to by P0 – Zero // Extended value P1 = R0 ; // Event Address stored in P1 JUMP (PC + P1) ; // Unconditional relative JUMP to PI from current PC 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Example Code for an Exception Handler /* The entry point for an event is as follows. Here, processing is deferred to low priority interrupt IVG15. Also, parameter passing would typically be done here. */ _EVENT1: // EVENT1 Handling RAISE 15 ; JUMP.S _EXIT ; /* Entry for event at IVG14 */ _EVENT2: // EVENT2 Handling RAISE 14 ; /* Comments for other events */ … /* At the end of handler, restore R0, P0, P1 and ASTAT, an return. */ _EXIT: ASTAT = [SP++] ; P1 = [SP++] ; P0 = [SP++] ; R0 = [SP++] ; RTX ; 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Example Code for an Exception Handler _EVTABLE: .byte2 addr_event1; .byte2 addr_event2; ... .byte2 addr_eventN; 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Example Code for an Exception Handler /* The jump table EVTABLE holds 16-bit address offsets for each event. With offsets, this code is position independent and the table is small. | addr_event1 | _EVTABLE | addr_event2 | _EVTABLE + 2 | | | addr_eventN | _EVTABLE + 2N */ 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Example Code for an Exception Routine
Digital Systems: Hardware Organization and Design 9/18/2018 Example Code for an Exception Routine The following code provides an example framework for an interrupt routine jumped to from an exception handler such as that described above. [--SP] = RETI ; // Push return address on stack. // Put body of routine here. RETI = [SP++] ; // To return, pop return address and jump. RTI ; // Return from interrupt. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Example Code for Using Hardware Loops in an ISR
Digital Systems: Hardware Organization and Design 9/18/2018 Example Code for Using Hardware Loops in an ISR The following code shows the optimal method of saving and restoring when using hardware loops in an interrupt service routine. lhandler: <Save other registers here> [--SP] = LC0; /* save loop 0 */ [--SP] = LB0; [--SP] = LT0; <Handler code here> /* If the handler uses loop 0, it is a good idea to have it leave LC0 equal to zero at the end. Normally, this will happen naturally as a loop is fully executed. If LC0 == 0, then LT0 and LB0 restores will not incur additional cycles. If LC0 != 0 when the following pops happen, each pop will incur a ten-cycle “replay” penalty. Popping or writing LC0 always incurs the penalty. */ LT0 = [SP++]; LB0 = [SP++]; LC0 = [SP++]; /* This will cause a “replay,” that is, aten-cycle refetch. */ <Restore other registers here> RTI ; 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Data Address Generators
Digital Systems: Hardware Organization and Design 9/18/2018 Data Address Generators Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Data Address Generators The Data Address Generators (DAGs) generate addresses for data moves to and from memory. By generating addresses, the DAGs let programs refer to addresses indirectly, using a DAG register instead of an absolute address. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Data Address Generator Registers
Digital Systems: Hardware Organization and Design 9/18/2018 Data Address Generator Registers I0 I1 I2 I3 L0 L1 L2 L3 B0 B1 B2 B3 M1 31 M0 M2 M3 Pointer Register File P0 P1 P2 P3 P4 P5 31 FP SP USP Address Registers 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Data Address Generators Functions: Supply address – provides address during a data access Supply address and post-modify – provides an address during a data move and auto-increments/decrements the stored address for the next move Supply address with offset – Provides and address from a base with an offset without incrementing the original address pointer Modify address – Increments or decrements the stored address without performing a data move Bit-reversed carry address – Provides a bit-reversed carry address during a data move without reversing the stored address 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 DAG Subsystem 2 DAG Arithmetic units 9 Pointer Registers 4 Index registers 4 Complete sets of inter-related registers: Modify Base Length 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Register Types Index registers, I[3:0]. Unsigned 32-bit Index registers hold an address pointer to memory. For example, the instruction: R3 = [I0] loads the data value found at the memory location pointed to by the register I0. Index registers can be used for 16- and 32-bit memory accesses. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Register Types Modify registers, M[3:0]. Signed 32-bit Modify registers provide the increment or step size by which an Index register is post-modified during a register move. For example, R0 = [I0 ++ M1] instruction directs the DAG to: Output the address in register I0 Load the contents of the memory location pointed to by I0 into R0 Modify the contents of I0 by the value contained in the M1 register 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Register Types Base and Length registers, B[3:0] and L[3:0]. Unsigned 32-bit Base and Length registers set up the range of addresses and the starting address of a circular buffer. Each B, L pair is always coupled with a corresponding I-register, for example, I3, B3, L3. For more information on circular buffers, see “Addressing Circular Buffers” on page 5-6 of ADSP-BF533 Blackfin Processor Hardware Reference. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Register Types Pointer registers, P[5:0], FP, USP, and SP. 32-bit Pointer registers hold an address pointer to memory. The P[5:0] field, FP (Frame Pointer) and SP/USP (Stack Pointer/User Stack Pointer) can be manipulated and used in various instructions. For example, the instruction R3 = [P0] loads the register R3 with the data value found at the memory location pointed to by the register P0. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Pointer Registers (cont.)
Digital Systems: Hardware Organization and Design 9/18/2018 Pointer Registers (cont.) Pointer registers have no effect on circular buffer addressing. Pointer Registers can be used for 8, 16, and 32-bit memory access. Important Note: I, M, L and B registers contain random values after reset. Initialize L-registers to zero for linear addressing or to the buffer length for circular buffer addressing All DAG registers must be initialized individually. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Data Addressing Modes Architecture of a Respresentative 32 Bit Processor

9/18/2018 Address Registers One set of 32-bit general-purpose Pointer registers P0-P5, SP and FP One set of 32-bit DSP buffer addressing registers I0-I3, B0-B3, L0-L3, M0-M3 All addresses are byte addresses into a 4 GB address space 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Address Registers 31 P0 Address Registers P1 P2 P3 SP points to supervisor stack in Supervisor mode and user stack in User mode USP is accessible in supervisor mode only – Allows access to user stack location while in Supervisor mode P4 P5 FP SP USP 31 31 31 31 I0 L0 B0 M0 I1 L1 B1 M1 I2 L2 B2 M2 I3 L3 B3 M3 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Addressing Modes Register Indirect Addressing Index Registers (32-bit and 16-bit accesses) Pointer Registers P0 – P5 (32-bit, 16-bit, and 8-bit accesses) Stack and Frame Pointer Registers (32-bit accesses) Types of address pointer modify Modify/Post-Modify Linear addressing Circular buffering / modulo addressing Enables automatic maintenance of pointers to stay within bounds of a circular buffer Bit-Reversal (Modify only) Pre-Modify with update (using Stack Pointer) Pre-Modify without update 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Linear vs. Circular Buffering
Digital Systems: Hardware Organization and Design 9/18/2018 Linear vs. Circular Buffering Linear Buffer Access Index (I0:3) registers hold the address sent out on the address bus. Length (L0:3) register set to 0, thus disabling circular buffering. Default for C compiler Provisions in compiler to allow circular buffers Modify (M0:3) registers contain the value (positive or negative) that is added to the I registers at the end of each memory access. Circular Buffer Access Base (B0:3) registers contain the circular buffer’s start address. Length (L0:3) register set to length of circular buffer. Modify (M0:3) value must be less than or equal to the length of the circular buffer. Indexing wraps back to Base address when Index modification exceeds Base + Length 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Circular Buffering Example
Digital Systems: Hardware Organization and Design 9/18/2018 Circular Buffering Example Address 4 8 C 10 14 18 1C 20 24 28 1st Access 0x 0x 0x 0x 0x 0x B 0x 0x 0x 0x 0x A 0x 0x 0x 0x 0x 0x B 0x 0x 0x 0x 0x A 4th Access 2nd Access 5th Access 3rd Access Base Address and Starting Index Address (B0 = 0; I0 = 0;) Buffer Length is 44 (L0 = 44;) There are 11 data elements and each data element is 4 bytes Modify Value is 16 (M0 = 16;) 4 elements * 4 bytes/element 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Addressing With DAGs DAGs generate addresses incremented by a: Value, or Register content. Post-modify Addressing: I-register value unchanged is output M-register or Immediate value is ADD-ed to DAG. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Addressing With DAGs Indexing Addressing: P-register value is offset by a small value P-register is not updated with new value P-register value used as an offset for particular memory space access. Memory space is byte addressed. DAG registers are incremented/decremented by 1, 2, or 4 to match 8-, 16-, or 32-bit access respectively. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Addressing With DAGs Example: R0 = [P3++ ]; Access 32-bit memory (why?) location pointed to by P3 Place the result in R0 Post-increment P3 by 4 to maintain alignment with the 32-bit access. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Addressing With DAGs R0.L = W[ I3++ ]; This instruction fetches 16-bit word, pointed to by the content of I3. The result is placed in lower half of the destination register R0.L Content of I3 is post-incremented by 2 to maintain alignment with the 16-bit access. Note W – refers to 16-bit memory Word. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Addressing With DAGs R0 = B [ P3++ ] (Z) ; This instruction fetches 8-bit word, pointed to by the content of register P3. The result is stored in destination register R0 (32-bit). P3 is post-incremented by 1 to maintain alignment with the 8-bit access. The byte value is (Z)ero extended in the destination register. Alternatively the byte value could have been (S)ign extended. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Addressing With DAGs Instructions using Index registers (I) use: An M-register or A small immediate value (+/- 2 or 4). Instructions using Pointer registers (P) use: A small immediate value, or Another P-register as the modifier. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Frame and Stack Pointers
Digital Systems: Hardware Organization and Design 9/18/2018 Frame and Stack Pointers The Frame (FP) and Stack Pointer (SP) register perform like the other P-registers P[0:5]. FP & SP act as: General Pointers in any of the load/store instructions (R1 = B[SP] (Z)), + Additional Functionality Stack Pointer Registers include: A User Stack Pointer USP in Supervisor mode SP in User mode A Supervisor Stack Pointer (SP in Supervisor mode) 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Frame and Stack Pointers User Mode: Any reference to SP implicitly uses the USP as the effective address R0 = [ SP++] ; Supervisor Mode: Any reference to SP implicitly uses the Supervisor Stack Pointer as the effective address To manipulate the User Stack Pointer for code running in Supervisor Mode, use the register alias USP The register alias USP can only be used in Supervisor mode R0 = USP ; 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Frame and Stack Pointers Implicit Use of FP and SP FP-indexed load/store: Extends the addressing range for 16-bit encoded load/stores Stack push/pop instructions: Including those for pushing and popping multiple registers Link/unlink instructions: control stack frame space, and manage the Frame Pointer register (FP) for that space 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Addressing Circular Buffers
Digital Systems: Hardware Organization and Design 9/18/2018 Addressing Circular Buffers Repeated stepping over address range. For circular buffering the registers operate as follows: Index (I) register: Contains the value that the DAG outputs on the address bus. Modify (M) register: Contains the post-modify amount (positive or negative) that the DAG adds to the I-register at the end of each memory access. Any M-register can be used with any I-register Modify value an be an immediate value instead of M-register The size of the modify value must be less than or equal to the length (L-register) of the circular buffer. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Addressing Circular Buffers The Length (L) register: Sets the size of the circular buffer and the address range through which the DAG circulates the I-register. L is positive value no greater then 232-1 If an L-register’s value is zero, its circular buffer operation is disabled. The Base (B) register or B-register plus the L-register: Is the value with which the DAG compares the modified I-register value after each access. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 9/18/2018 Addressing Circular Buffers Memory B I L L M 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Specific Example of Circular Addressing with DAGs
Digital Systems: Hardware Organization and Design 9/18/2018 Specific Example of Circular Addressing with DAGs 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Positive & Negative Modification Values
Digital Systems: Hardware Organization and Design 9/18/2018 Positive & Negative Modification Values If M is positive: Inew = Iold + M if Iold + M < buffer base + length (end of buffer) Inew = Iold + M – L if Iold + M ≥ buffer base + length (end of buffer) If M is negative: if Iold + M ≥ buffer base (start of buffer) Inew = Iold + M + L if Iold + M < buffer base (start of buffer) 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Addressing With Bit-reversed Addresses
Digital Systems: Hardware Organization and Design 9/18/2018 Addressing With Bit-reversed Addresses Some algorithms (e.g., FFT – Fast Fourier Transform) require bit-reversed order addressing. DAG’s also features bit-reversed addressing to satisfy requirements of such algorithms. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Indexed Addressing With Index and Pointer Registers
Digital Systems: Hardware Organization and Design 9/18/2018 Indexed Addressing With Index and Pointer Registers Indexed addressing uses the value in the Index (I) or Pointer (P) register as an effective address. Instructions can load or store 16- or 32-bit values. Default is 32-bit transfer. 16-bit transfers require W designator as a preface to the load or store instruction. R0 = [ I2 ] // 32-bit value load R0.H = W[ I2 ] // 16-bit value load [ P1 ] = R0 ; // 32-bit value store B [ P1++ ] = R0 // 8-bit value store Pointer Registers can be used for 8-bit loads and stores. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Auto-increment and Auto-decrement Addressing
Digital Systems: Hardware Organization and Design 9/18/2018 Auto-increment and Auto-decrement Addressing Auto-increment/decrement : Update (positive/negative) the Pointer (P) or Index (I) register after access Update value depends on the word size: 32-bit word access results in the pointer update by 4 16-bit word access results in the pointer update by 2 8-bit word access results in the pointer update by 1 R0 = W [ P1++ ] (Z) ; R0 = [ I2-- ] ; 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Pre-modify Stack Pointer Addressing
Digital Systems: Hardware Organization and Design 9/18/2018 Pre-modify Stack Pointer Addressing Only instructions that use pre-modify employ SP register. SP register is decremented by 4 and then used as an effective address for the store. SP register is used for stack operations and can support only 32-bit word transfer [ --SP ] = R0 ; 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Indexed Addressing With Immediate Offset
Digital Systems: Hardware Organization and Design 9/18/2018 Indexed Addressing With Immediate Offset Indexed addressing allows programs to obtain values from data tables with reference to the base of that table. Pointer register is modified by the immediate filed and then used as effective address The value of Pointer register is not updated. P1 = 0x13 ; [ P1 + 0x11] would effectively be equal to [ 0x24 ] 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Post-modify Addressing
Digital Systems: Hardware Organization and Design 9/18/2018 Post-modify Addressing Index or Pointer register values are used as effective address and then its value is modified by the content of another registers. Pointer registers are modified by other Pointer registers Index registers are modified by Modify registers. Post-modify addressing does not support: Pointer (P) registers as Destination registers Byte addressing R5 = [ P1++P2 ] ; R2 = W [ P4++P5 ] (Z) ; R2 = [ I2++M1 ] ; 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Modifying DAG and Pointer Registers
Digital Systems: Hardware Organization and Design 9/18/2018 Modifying DAG and Pointer Registers DAGs supports operations that modify an address value in an Index (I[0:3]) or Pointer (P[0:5], FP, SP) register without outputting an address. Those address-modify operations are useful for maintaining pointers. If B- and L-registers are set up for the circular buffering, the address-modify operation performs the specified buffer wraparound when needed. For Index registers, an M-register is used as the modifier For Pointer registers, another P-register is sued as the modifier. I1 += M2 ; 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Memory Address Alignment
Digital Systems: Hardware Organization and Design 9/18/2018 Memory Address Alignment Processor requires proper memory alignment to be maintained for the data size being accessed. Violations of memory alignment cause an alignment exception (unless exceptions are disabled) Many of Video ALU instructions – automatically disable alignment exceptions because the data may not be properly aligned when stored in memory Alignment exceptions may be disable d by issuing the DISALGNEXPT instruction in parallel with the load/store operation. Memory system requires normally two address alignments 32-bit word load/stores are accessed on four-byte boundaries, meaning the two least significant bits of the address are b#00. 16-bit word load/stores are accessed on two-byte boundaries, meaning the least significant bit of the address must be b#0. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Types of Transfers Supported and Transfer Sizes
Digital Systems: Hardware Organization and Design 9/18/2018 Types of Transfers Supported and Transfer Sizes 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Summary of Supported Addressing Modes
Digital Systems: Hardware Organization and Design 9/18/2018 Summary of Supported Addressing Modes 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

DAG Instruction Summary
Digital Systems: Hardware Organization and Design 9/18/2018 DAG Instruction Summary See ADSP-BF533 Blackfin Processor Hardware Reference pages 5-17 to 5-22. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Blackfin Memory Architecture of a Respresentative 32 Bit Processor

Blacfin Memory Hierarchy
Digital Systems: Hardware Organization and Design 9/18/2018 Blacfin Memory Hierarchy As processor speeds increase (300Mhz – 1 GHz), it becomes increasingly difficult to have large memories running at full speed. The BF5xx uses a memory hierarchy with a primary goal of achieving memory performance similar to that of the fastest memory (i.e. L1) with an overall cost close to that of the least expensive memory (i.e. L2) L2 Memory External Larger capacity Higher latency On-Chip CORE (Registers) L1 Memory Internal Smallest capacity Single cycle access L3 Memory External Largest capacity Highest latency 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Memory Architecture Unified 4G byte address range: Spanning combination of on-chip and off-chip memory and memory-mapped I/O resources. Part of this range of address space is dedicated to internal, on-chip resources: L1 Static Random Access Memories (SRAM) A set of Memory-Mapped Registers (MMRs) A boot Read-Only Memory (ROM) A portion of the internal L1 SRAM can also be configured to run as cache. Remaining part of this range of address space supports usage of an external memory: Asynchronous Memory Synchronous SDRAM 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Basics of Blackfin Memory Architecture
Digital Systems: Hardware Organization and Design 9/18/2018 Basics of Blackfin Memory Architecture >600MHz Core >600MHz Single cycle to access 10s of Kbytes L1 Instruction Memory L1 Data Memory L1 Data Memory >300MHz Unified L2 On-chip Off-chip Several cycles to access 100s of Kbytes DMA <133MHz Several system cycles to access 100s of Mbytes External Memory External Memory External Memory Unified L3 External Memory 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Memory Architecture Memory is addressed as a unified 4G space using 32-bit addresses. Internal Memory External Memory I/O Control Registers 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Hierarchical Memory L1 Memory System Highest Performance Memory Available to the Core Processor Off-Chip Memory Accessible via External Bus Interface Unit (EBIU) SDRAM Flash Memory SRAM Up to 132M Bytes. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Internal Memory 3 Blocks of on-chip memory: L1 Instruction Memory: SRAM & 4-way set-associative cache. Access at full processor speed. L1 Data Memory SRAM &/or 2-way set-associative cache. Access at full processor speed L1 scratchpad RAM Operating as L1 Memory Accessible as Data SRAM Cannot be configured as Cache Memory 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 External Memory Off-Chip Memory Accessed via External Bus Interface Unit (EBIU) 16-bit Interface Synchronous DRAM (SDRAM) Bank 4 Banks of Asynchronous Memory Devices: Flash Memory EPROM ROM SRAM Memory-Mapped I/O 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 External Memory SDRAM Controller (PC133 compliant) can be programmed to interface up to 128M Bytes of SDRAM. Asynchronous Memory Controller: 4 banks of 1M Byte Segment 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 I/O Memory Space Blacfin processors do not define separate I/O space. Control Registers for On-Chip I/O Devices are mapped into Memory-Mapped Registers (MMRs) separated into two smaller blocks of total 4G Byte address space. Control MMRs (accessible only in Supervisor mode) for all Core functions, & Registers needed for Setup and Control of the on-chip peripherals outside of the Core. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Blackfin Memory Overview
Digital Systems: Hardware Organization and Design 9/18/2018 Blackfin Memory Overview Enet PHY 25MHz XTAL No need for second XTAL 25MHz SDRAM Enet Data 131MHz PLL VCO 1:64X DMA 64 bit L1 Instruction Programmable frequency and voltage control Large enough to run application code Cache available if operations from SDRAM are desired Max Bandwidth 266MB/sec 525 MHz Ext Bus W/direction Control Rows are “open” in 4 SDRAM banks reduces page activation 32 16 L1 Data A Blackfin Processor Makes best use of SDRAM 2 core fetches or 1 fetch and 1 store 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

ADSP-BF533 Processor System Memory Map
Digital Systems: Hardware Organization and Design 9/18/2018 ADSP-BF533 Processor System Memory Map 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Configuration of Memory
Digital Systems: Hardware Organization and Design 9/18/2018 Configuration of Memory Best system performance can be achieved when executing code or fetching data out of L1 memory Two methods can be used to fill L1 memory – Caching and Dynamic Downloading – Blackfin Processor supports both General Purpose processors have typically used the caching method, as they often have large programs residing in external memory and determinism is not as important. DSPs have typically used dynamic downloading, as they need direct control over which code runs in the fastest memory. Blackfin processors allow the programmer to choose one or both methods to optimize system performance. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Memory Configurations
Digital Systems: Hardware Organization and Design 9/18/2018 Memory Configurations Internal Memory Configuration: Upper portion of Internal Memory Space is allocated to the Core and System MMRs Access to this area restricted to processor when in Supervisor or Emulation mode. Lowest 1K byte of Internal Memory space is occupied by the Boot ROM. See Booting Methods on page 3-18 of the ADSP-BF533 Blackfin Processor Hardware Reference manual. External Memory Configuration: 4 Bank of Asynchronous memory (1M byte each) 1 Bank of SDRAM (up to 128M byte) 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Processor Memory Architecture
Digital Systems: Hardware Organization and Design 9/18/2018 Processor Memory Architecture 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Overview of Internal Memory
Digital Systems: Hardware Organization and Design 9/18/2018 Overview of Internal Memory Core CORE CLOCK CCLK 64-bit 32-bit 32-bit 32-bit L1 Instruction Memory L1 Data Memory L1 Data Memory L1 Data Memory L1 Memory System 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 L1 Memory System L1 Memory System Provides: A modified Harvard architecture Up to 4 Core Memory accesses per clock cycle One 64-bit instruction fetch Two 32-bit data loads One Pipelined 32-bit data store. Simultaneous system: DMA Cache Maintenance Core Accesses SRAM access at processor clock rate (CCLK) for critical DSP algorithms and fast context switching Instruction and data cache options for microcontroller code, High Level Language (HLL) support Easy of programming cache control instructions: PREFETCH & FLUSH Memory protection. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 L1 Instruction Memory L1 Instruction Memory consist of a combination of dedicated SRAM and banks which can be configured as SRAM or Cache Control bits in IMEM_CONTROL register can be used to organize all four sub-banks of the L1 Instruction Memory as: A simple SRAM A 4-Way, set associative instruction cache A cache with as many as four locked Ways. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

L1 Instruction Memory: 16kB Configurable Memory Bank
Digital Systems: Hardware Organization and Design 9/18/2018 L1 Instruction Memory: 16kB Configurable Memory Bank 4KB 4KB sub-bank sub-bank EAB – Cache Line Fill DCB - DMA Instruction 4KB 4KB sub-bank sub-bank 16 KB SRAM Four 4KB single-ported sub-banks Allows simultaneous core and DMA accesses to different banks 16 KB cache 4-way set associative with arbitrary locking of ways and lines LRU replacement No DMA access 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 What is Cache? In a hierarchical memory system, cache is the first level of memory reached once the address leaves the core (i.e L1) If the instruction/data word (8, 16, 32, or 64 bits) that corresponds to the address is in the cache, there is a cache hit and the word is forwarded to the core from the cache. If the word that corresponds to the address is not in the cache, there is a cache miss. This causes a fetch of a fixed size block (which contains the requested word) from the main memory. The Blackfin allows the user to specify which regions (i.e. pages) of main memory are cacheable and which are not through the use of CPLBs (Cacheability Protection Lookaside Buffer). If a page is cacheable, the block (i.e. cache line containing 32 bytes) is stored in the cache after the requested word is forwarded to the core If a page is non-cacheable, the requested word is simply forwarded to the core 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Cache Hits & Misses A cache hit occurs when the address for an instruction fetch request from the core matches a valid entry in the cache. A cache hit is determined by comparing the upper 18 bits, and bits 11 and 10 of the instruction fetch address to the address tags of valid lines currently stored in a cache set. Only valid cache lines (i.e. cache lines with their valid bits set) are included in the address tag compare operation. When a cache hit occurs, the target 64-bit instruction word is sent to the instruction alignment unit where it is stored in one of two 64-bit instruction buffers. When a cache miss occurs, the instruction memory unit generates a cache line-fill access to retrieve the missing cache line from external memory to the core. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

L1 Instruction Memory: 16kB Configurable Memory Bank
Digital Systems: Hardware Organization and Design 9/18/2018 L1 Instruction Memory: 16kB Configurable Memory Bank Block is Multi-ported when: Accessing different sub-bank OR Accessing one odd and one even access (Addr bit 2 different) within the same sub-bank. EAB – Cache Line Fill 4KB 4KB sub-bank sub-bank DCB - DMA Data 0 Data 1 4KB 4KB sub-bank sub-bank When Used as SRAM Allows simultaneous dual DAG and DMA access When Used as Cache Each bank is 2-way set-associative No DMA access Allows simultaneous dual DAG access 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

IMEM_CONTROL Register
Digital Systems: Hardware Organization and Design 9/18/2018 IMEM_CONTROL Register The Instruction Memory Control Register (IMEM_CONTROL) contains bit information that enables control of the operation of the Cache. For Details see sections: “L1 Instruction Cache” on page 6-12 “Instruction Cache Locking by Line” on page 6-19 “Instruction Cache Locking by Way” on page 6-20. of ADSP-BF533 Blackfin Processor Hardware Reference Manual. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 L1 Data Memory L1 Data Memory (SRAM/cache) is constructed from single-ported subsections, but organized to reduce the likelihood of access collisions. When there are no collisions, L1 data traffic could occur in a single clock cycle: Two 32-bit DAG loads One pipelined 32-bit DAG store One 64-bit DMA IO One 64-bit cache fill/victim buffer The Data Memory Control register (DMEM_CONTROL) contains control bits for L1 Data Memory that can be used as: SRAM or Cache 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 L1 Data Memory For additional information refer to “L1 Data Memory” section 6-26 in ADSP-BF533 Blackfin Processor Hardware Reference Manual 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

External Memory The external memory space is shown in Slide 249.
The processor supports SDRAM bank that is programmable ranging from 16Mbyte up to 128 M byte. The start address of the bank is 0x Each of the next four banks contains 1M byte and is dedicated to support asynchronous memories. The start address of the asynchronous memory bank is 0x 18 September 2018 Veton Këpuska

External Memory For details on external memory including:
Memory Management Unit (MMU) Memory Pages Cacheability Protection Lookaside Buffer (CPLB) MMU Management CPLB Registers, etc. Consult ADSP-BF533 Blackfin Processor Hardware Reference Manual, page 6-46. 18 September 2018 Veton Këpuska

9/18/2018 Blackfin Peripherals Architecture of a Respresentative 32 Bit Processor

Blackfin Peripherals & Power Management
Digital Systems: Hardware Organization and Design 9/18/2018 Blackfin Peripherals & Power Management Common Peripherals (All Blackfins) SPI, UART, SPORT, WD, RTC PPI BF534/BF536/BF537 Peripherals TWI, CAN BF536/BF537 Peripheral Ethernet DMA and Handshake DMA Power Manager 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Serial Communication Peripherals
Digital Systems: Hardware Organization and Design 9/18/2018 Serial Communication Peripherals SPI (Serial Peripheral Interface) High-Speed SPI port (up to SCLK/4, max MHz) Master/Slave compatible with control of up to 7 slave-selects Single-Duplex DMA (Either TX or RX) Typically used to interface with serial EPROMS, CPUs, converters, and displays UART (Universal Asynchronous Receiver/Transmitter) PC-style UART port (baud rate up to SCLK/16, max MHz) Supports half-duplex IrDA SIR (9.6/115.2 Kbps rate) Autobaud detection support through the use of the Timers Separate TX and RX DMA support Typically used for maintenance port or interfacing with slow serial peripherals SPORTs (Synchronous Serial Ports) High Speed Serial Port (up to SCLK/2, max 66.5 MHz) Variable word length support ( bits) I2S-Compatible 128 Channels out of 1024-Channel Window for TDM support Primary and Secondary Data channels Typically used for interfacing with CODECs and TDM data streams 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Real-Time Clock Features
Digital Systems: Hardware Organization and Design 9/18/2018 Real-Time Clock Features Used to implement real-time watch or “life counter” Time of day, alarm, stopwatch count-down, and elapsed time since last system reset Uses four counters - Seconds, Minutes, Hours, Days Equipped with two alarm features Daily and Day-And-Time Uses dedicated kHz crystal to RTXI / RTXO Can be pre-scaled to 1 Hz to count in real-time seconds Uses dedicated power supply pins Independent of any reset Can take processor out of all low-power states 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Parallel Peripheral Interface - PPI
Digital Systems: Hardware Organization and Design 9/18/2018 Parallel Peripheral Interface - PPI Parallel Peripheral Interface Programmable bus width (from 8 – 16 bits in 1-bit steps) Bidirectional (half-duplex) parallel interface Synchronous Interface Interface is driven by an external clock (“PPI_CLK”) Up to 66MHz rate (SCLK/2) Asynchronous to SCLK Includes three frame syncs to control the interface timing Applications Driving LCD Interface General Purpose Interface to outside world High speed data converters Video CODECs 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Two Wire Interface - TWI
Digital Systems: Hardware Organization and Design 9/18/2018 Two Wire Interface - TWI Fully compliant to the Philips I2C bus protocol See Philips I2C Bus Specification version 2.1 7-bit addressing 100 Kb/s (normal mode) and 400Kb/s (fast mode) data rates General call address support Supports Master and Slave operation Separate receive and transmit FIFOs SCCB (Serial Camera Control Bus) support Only in Master mode Slave mode cannot be used because the TWI controller always issues an Acknowledge in slave mode 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Controller Area Network (CAN)
Digital Systems: Hardware Organization and Design 9/18/2018 Controller Area Network (CAN) Adheres fully to CAN V2.0B standard Supports both standard (11-bit) and extended (29-bit) Identifiers Data Rates up to 1Mbit/second 32 Configurable Mailboxes 8 dedicated transmitters and 8 dedicated receivers 16 configurable (transmit or receive) Dedicated Acceptance Mask for each Mailbox Data Filtering (first two bytes) can be used for Acceptance Filtering CAN wakeup from Hibernation (lowest static power consumption) Mode CAN Protocol Stacks Automotive: CAN drivers and protocol stacks through Vector CANtech Industrial: Leading third parties will provide a full Industrial suite for CANOpen, DeviceNet, etc. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Ethernet MAC Features ADSP-BF536/537 Ethernet MAC has advanced features beyond IEEE 802.3: For improved performance: Automatic Checksum Computation for IP Header and Payload on RX Frames Programmable RX Data Alignment Mode for 32-bit Alignment Independent RX & TX DMA Channels with Delivery of Frame Status to Memory System Wakeup on Magic Packet for 4 User-Definable Wakeup Frame Filters For lower overall system cost: No PHY XTAL required – Buffered XTAL output from processor feeds PHY Connection to either MII or RMII PHY ADSP-BF536/537 enhances throughput and dataflow via these features: Enhanced DMA channels allow for processor core independence Direction Control to exploit SDRAM physics Four SDRAM rows can be ‘open’ at any given time ADSP-BF536/537 overall networking bandwidth: Full 100Mbps wire speed on 1400-bit payload with an optimized networking stack UDP : ~44% processor core loading TCP/IP: ~75% processor core loading 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 DMA Support Multiple Independent DAM controllers DMA transfers can occur between: Internal memories and any of its DMA -capable peripherals, and DMA peripherals and external devices connected to external memory interfaces including SDRAM controller and Asynchronous Memory Controller. DMA capable peripherals: SPORTs SPI Port UART & PPO. Each individual DMA peripheral has at least one dedicated DMA channel. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 DMA Support (cont) DMA controller supports: 1 - dimensional (1D) & 2 - dimensional (2D) DMA transfers. 2D DAM supports arbitrary: Row and Column sizes up to 64K x 64K elements. Row and Column Step Sizes up to +/- 32K elements. Column size < Row size ⇒ Interleaved data. Useful in video applications. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 DMA Enhancements 4 additional DMA channels All 12 peripheral DMA channels can be assigned to any of the peripherals Provides MAC further control over the assigned DMA channels Can reload DMA registers if incorrect checksum is detected Two External Handshaking Memory DMA Controllers Good for asynchronous FIFOs or off-chip interface controllers between Blackfin memory and hardware buffers 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Dynamic Power Management
Digital Systems: Hardware Organization and Design 9/18/2018 Dynamic Power Management Full on Mode (Maximum Performance) Active Mode (Moderate Power Savings) Sleep Mode (High Power Savings) Deep Sleep Mode (Maximum Power Savings) Hibernate State 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Dynamic Power Management
Digital Systems: Hardware Organization and Design 9/18/2018 Dynamic Power Management Power (mW) 600 MHz, 1.2V, 264 mW 200 MHz, 1.2V, 156 mW 500 MHz, 1.2V 500 MHz, 1.0V Frequency Only Voltage & Frequency Power Savings Audio Processing Video Processing 200 MHz, 0.8V, 90 mW Variable Frequency Clock dividers (1x to 63x) enable low latency changes in system performance Variable Voltage On-Chip Voltage Regulator generates accurate voltage from 2.25 – 3.6V input Core voltage programmable from 0.8V to 1.2V (50 mV increments) Maximum 40usec latency for PLL to relock (Frequency or Voltage changes) System Cost Reduction 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Instruction Set Overview
Digital Systems: Hardware Organization and Design 9/18/2018 Instruction Set Overview Architecture of a Respresentative 32 Bit Processor

Instruction Set Description
Digital Systems: Hardware Organization and Design 9/18/2018 Instruction Set Description Full-featured flexible multifunction instructions Employs an algebraic-style syntax Optimized to allow access to many of the processor core resources within a single instruction Compiled C and C++ source code makes optimal use of instructions Format designed for ease of coding and readability Tuned to generate dense code (small memory size footprint) 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Blackfin Assembly Language Features
Digital Systems: Hardware Organization and Design 9/18/2018 Blackfin Assembly Language Features Multi-issue load/store modified-Harvard architecture supports Two 16-bit MAC or four 8-bit ALU + two load/store + two pointer updates per cycle. Unified 4G byte memory space All registers, I/O, and memory are mapped to a unified 4G byte memory space Providing a simplified programming model Microcontroller features: Arbitrary bit and bit-field manipulation, insertion, and extraction Integer operations on 8-, 16-, and 32-bit data-types Separate user and supervisor stack pointers Code density enhancements Intermixing of 16- and 32-bit instructions (no mode switching, no code segregation) Frequently used instructions are encoded in 16 bits. 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

9/18/2018 Blackfin Code Density Instruction Set Tuned for Compact Code Multi-length Instructions 16, 32-bit Opcodes Limited Multi-Issue Compact Call/Return No Memory Alignment Restrictions for Code Transparent Alignment HW Blackfin Supports 16 and 32-bit Memory Systems 16-bit OP 32-bit OP 16-bit wide memory 15 64-bit Multi-OP Packet 31 32-bit wide No Memory Alignment Restrictions: Maximum Code Density and Minimum System Memory Cost Instruction Formats 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Blackfin Code Density Features
Digital Systems: Hardware Organization and Design 9/18/2018 Blackfin Code Density Features Free intermixing of 16/32-bit instructions - no mode switching, no code segregation Frequently used instructions encoded as 16-bits 3-bit register fields Conditional moves Push/Pop multiple registers Three operand instructions Single condition bit and evaluation 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Blackfin Dual Operational Model
Digital Systems: Hardware Organization and Design 9/18/2018 Blackfin Dual Operational Model A DSP with a RISC instruction set and a MMU, an event controller and a wide range of peripherals Data Movement LD, ST, 8,16,32 bits Unsigned, Sign-extend Register moves, P-D-DAG, Push, Pop, Push/Popmult CC to dreg, etc. Addressing Modes Auto incr, Auto decr, Pre-decr store on SP, Indirect Indexed w/immed offset Post-incr w/ nonunity stride Byte addressable Program Control BRCC, UJUMP, Call, RETS, Loop Setup Arithmetic +,-,*,/,>>>, Negate 2 and 3 operand instructs Logical AND, OR, XOR, NOT BITtst,set,tgl,clr, CC ops <<,>> Video SAA, Byteops: Residual calc, Spatial Interpolation, Spatial Filter Cache Control Prefetch, Flush Memory management Event control Supervisor/user modes Wide range of peripherals 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Blackfin Microcontroller Features
Digital Systems: Hardware Organization and Design 9/18/2018 Blackfin Microcontroller Features Arbitrary bit and bit-field manipulation, insertion and extraction Integer operations on 8/16/32-byte data-types Memory protection and separate user and supervisor stack pointers Scratch SRAM for context switching Population and leading digit counting Byte addressing DAGs Compact Code Density 18 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Microcomputer Systems 1

Similar presentations

Presentation on theme: "Microcomputer Systems 1"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Microcomputer Systems 1

Similar presentations

Presentation on theme: "Microcomputer Systems 1"— Presentation transcript:

Similar presentations

About project

Feedback