Presentation is loading. Please wait.

Presentation is loading. Please wait.

Datapath and Control Unit Design

Similar presentations


Presentation on theme: "Datapath and Control Unit Design"— Presentation transcript:

1 Datapath and Control Unit Design
Simple Processor! ( th ed)

2 Datapath vs Control Datapath Controller signals Control Points
Datapath: Storage, FU, interconnect sufficient to perform desired functions Gets Control inputs from control Controller: controls operation on data path

3 CPU Performance Performance determined by: Instruction count - code
CPI Inst. Count Cycle Time Performance determined by: Instruction count - code cycle time cycles per instruction - CPI Processor design impacts: cycle time  clock cycles per instruction

4 MIPS Format (Review) R I J
All MIPS instructions 32 bits. Three formats: R I J op target address 26 31 6 bits 26 bits rs rt rd shamt funct 6 11 16 21 5 bits immediate 16 bits One of the most important thing you need to know before you start designing a processor is how the instructions look like. Or in more technical term, you need to know the instruction format. One good thing about the MIPS instruction set is that it is very simple. First of all, all MIPS instructions are 32 bits long and there are only three instruction formats: (a) R-type, (b) I-type, and (c) J-type. The different fields of the R-type instructions are: (a) OP specifies the operation of the instruction. (b) Rs, Rt, and Rd are the source and destination register specifiers. (c) Shamt specifies the amount you need to shift for the shift instructions. (d) Funct selects the variant of the operation specified in the “op” field. For the I-type instruction, bits 0 to 15 are used as an immediate field. I will show you how this immediate field is used differently by different instructions. Finally for the J-type instruction, bits 0 to 25 become the target address of the jump. +3 = 10 min. (X:50)

5 Instructions executed in steps
R-type: fetch inst., select registers (rs, rt), [operand fetch] ALU operation write back registers lw/sw: fetch instruction select a register(rs) calculate address, need ALU access memory (read/write) write register file (lw) Branch: fetch the instruction select registers (for beq) test condition, calculate target addr., need ALU First two steps are common

6 Functional Units - to build datapath review

7 Review: How Registers work
Similar to D Flip Flop N-bit input and output Write Enable input Write Enable: negated (0): Data Out will not change asserted (1): Data Out will become Data In after clock edge Write Enable Data In Data Out N N Clk As far as storage elements are concerned, we will need a N-bit register that is similar to the D flip-flop I showed you in class. The significant difference here is that the register will have a Write Enable input. That is the content of the register will NOT be updated if Write Enable is not asserted (0). The content is updated at the clock tick ONLY if the Write Enable signal is asserted (1). +1 = 31 min. (Y:11)

8 MIPS Register File Register File consists of 32 registers:
RW R1 R2 Register File consists of 32 registers: Two 32-bit outputs: Read data 1 & Read data 2 A 32-bit input bus: write data Register selection: R1 (read register 1) selects the register to put on read data 1 R2 (read register 2) selects the register to put on read data 2 RW (write register) selects the register to be written (write data) when Write Enable is 1 (Regwrite) Clock input (CLK) The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: Read data1 & read data 2 valid after “access time.” 5 5 5 Write Enable Read data 1 Write data 32 32 32-bit Registers 32 Read data 2 Clk 32 We will also need a register file that consists of bit registers with two output busses (busA and busB) and one input bus. The register specifiers Ra and Rb select the registers to put on busA and busB respectively. When Write Enable is 1, the register specifier Rw selects the register to be written via busW. In our simplified version of the register file, the write operation will occurs at the clock tick. Keep in mind that the clock input is a factor ONLY during the write operation. During read operation, the register file behaves as a combinational logic block. That is if you put a valid value on Ra, then bus A will become valid after the register file’s access time. Similarly if you put a valid value on Rb, bus B will become valid after the register file’s access time. In both cases (Ra and Rb), the clock input is not a factor. +2 = 33 min. (Y:13)

9 Memory review Write data read data Memory (Data)
Write Enable Address Memory (Data) Input: Data In (Write data) Output: Data Out (Read Data) Memory word selection: Address selects word Write Enable = 1: address selects memory word to be written via the Data In (Memwrite) Clock input (CLK) (omitted from Book diag for simplicity) The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: Address valid => Data Out valid after “access time.” Instruction memory data not shown (similar) Write data read data Data In DataOut 32 32 Clk The last storage element you will need for the datapath is the idealized memory to store your data and instructions. This idealized memory block has just one input bus (DataIn) and one output bus (DataOut). When Write Enable is 0, the address selects the memory word to put on the Data Out bus. When Write Enable is 1, the address selects the memory word to be written via the DataIn bus at the next clock tick. Once again, the clock input is a factor ONLY during the write operation. During read operation, it behaves as a combinational logic block. That is if you put a valid value on the address lines, the output bus DataOut will become valid after the access time of the memory. +2 = 35 min. (Y:15)

10 Clocking - Review Clk Setup Hold Setup Hold Don’t Care . Remember, we will be using a clocking methodology where all storage elements are clocked by the same clock edge. Consequently, our cycle time will be the sum of: (a) The Clock-to-Q time of the input registers. (b) The longest delay path through the combinational logic block. (c) The set up time of the output register. (d) And finally the clock skew. In order to avoid hold time violation, you have to make sure this inequality is fulfilled. +2 = 18 min. (X:58) All storage elements are clocked by the same clock edge Cycle Time = CLK-to-Q + Longest Delay Path + Setup + Clock Skew (CLK-to-Q + Shortest Delay Path - Clock Skew) > Hold Time

11 Single-Cycle: Instruction Fetch Datapath
Next Address Logic Instruction fetch Inst. In instr. memory program counter points to current instruction adder increments PC to point to next inst. For branch inst., the next inst. address may not be valid Read address Instruction Inst memory

12 R-type Datapath R[rd] <- R[rs] op R[rt] Example: add rd, rs, rt
Ra, Rb, and Rw come from instruction’s rs, rt, and rd fields ALUctr and RegWr: control logic after decoding the instruction op rs rt rd shamt funct 6 11 16 21 26 31 6 bits 5 bits Rd Rs Rt ALU control Write 5 5 5 Read data 1 Rw R1 R2 Write data 32 32 32-bit Registers Result ALU 32 32 Clk Read data 2 32

13 Complete R-type Datapath
Next Address Logic ALU control Read register1 read data 1 zero Read address Read register2 result register file Instruction read data 2 Write register write data Inst memory Write

14 Timing: One complete cycle
Clk Clk-to-Q PC Old Value New Value Instruction Memory Access Time Rs, Rt, Rd, Op, Func Old Value New Value Delay through Control Logic ALUctr Old Value New Value RegWr Old Value New Value Register File Access Time Let’s take a more quantitative picture of what is happening. At each clock tick, the Program Counter will present its latest value to the Instruction memory after Clk-to-Q time. After a delay of the Instruction Memory Access time, the Opcode, Rd, Rs, Rt, and Function fields will become valid on the instruction bus. Once we have the new instruction, that is the Add or Subtract instruction, on the instruction bus, two things happen in parallel. First of all, the control unit will decode the Opcode and Func field and set the control signals ALUctr and RegWr accordingly. We will cover this in the next lecture. While this is happening (points to Control Delay), we will also be reading the register file (Register File Access Time). Once the data is valid on busA and busB, the ALU will perform the Add or Subtract operation based on the ALUctr signal. Hopefully, the ALU is fast enough that it will finish the operation (ALU Delay) before the next clock tick. At the next clock tick, the output of the ALU will be written into the register file because the RegWr signal will be equal to 1. +3 = 45 min. (Y:25) Read data 1& 2 Old Value New Value ALU Delay Write data Old Value New Value Rd Rs Rt RegWr ALUctr Register Write Occurs Here 5 5 5 Read data 1 Rw Ra Rb Write data 32 bit Registers Result ALU 32 32 Clk Read data 2 32

15 Load/Store Datapath fetch same as R
lw $1, offset-value($2) ; sw $1, offset-value($2) register file (get base reg.) ALU to calculate memory address data memory: read OR write sign extension (offset ext.) data memory Read data1 rg 1 read data2 rg2 Write reg write data address write data read data Register file sign ext. 32 16

16 Branch Inst. Datapath beq $1, $2, offset if ($1=$2) goto PC+offset*4
target beq $1, $2, offset if ($1=$2) goto PC+offset*4 ALU for branch condition Adder for computing branch target address Shift left 2: increases the range of offset by 4 Zero: control logic to decide if branch. Add shift left 2 Registers Read Reg 1 zero Data1 Inst. ALU To branch control logic Read Reg 2 Data2 32 ALU control sign ext. 16

17 Complete Datapath for : R, LD/ST, BEQ
I n s t r u c i o m e y R a d 1 6 3 2 A L U l M x g W S h f 4 Z D Executes basic instructions in single clock cycle Any resource can only be once during a single cycle

18 Datapath controlled by control unit
Identify your controls Identify your controls

19 Single-Cycle: Control Signals
input: 6-bit opcode output: 9 control lines ALU control: input: ALUop + 6-bit (function field) output: 3 lines for I, J type, ALU control depends on only ALUop Main op func

20 ALU Control, Truth Table
*ALUop: output of main control R-: ALUop=10, lw/sw: ALUop=00 *ALU Control: combinational logic 8 inputs, 3 output.

21 Datapath with Control unit

22 Datapath with Control unit

23 Datapath timings Rformat timing= 400 +200+30 +120 +30 (IF – WB)
100 100 400 120 200 350 30 30 Rformat timing= (IF – WB) OR = (IF – cntl – Pcmux)

24 Control Unit -- Control Signal Definitions
PCsrc = branch AND zero

25 Example 1: Execution flow for add $1, $1, $3 (4 steps + bypass)
1. IF 1. IF 3. EX, ALU func. 2.D 4.Bypass 1. IF 5 5. WB write back result

26 Example 2: LW S0, OFF(S1) Memory address = OFF + S1
1. IF 3. EX, calc address 2.D 4.Mem rd OFF 5. WB write back result

27 Example 3: BEQ S1, S0, cs330 target address = PC + offset x 4
Update PC with target addr. If successful 1. IF 3. EX, compare s1:s0 2.D

28 Single-Cycle: J-type So far, datapath can handle R-type, lw/sw, beq
How about J-type? J-type j L1 P jal L1 Exercise address= current PC = Actual address L1 =

29 Single-Cycle: Datapath + Control including jump inst

30 What’s wrong with Single cycle CPI=1 processor?
Inst Memory ALU Data Mem Reg File cmp Arithmetic & Logical Load Store Branch Critical Path RegW Long Cycle Time All instructions take as much time as the slowest Real memory is slow

31 Single Cycle Timing Diagram
Clk Single Cycle Implementation: Load Store Waste Here are the timing diagrams showing the differences between the single cycle, multiple cycle, and pipeline implementations. For example, in the pipeline implementation, we can finish executing the Load, Store, and R-type instruction sequence in seven cycles. In the multiple clock cycle implementation, however, we cannot start executing the store until Cycle 6 because we must wait for the load instruction to complete. Similarly, we cannot start the execution of the R-type instruction until the store instruction has completed its execution in Cycle 9. In the Single Cycle implementation, the cycle time is set to accommodate the longest instruction, the Load instruction. Consequently, the cycle time for the Single Cycle implementation can be five times longer than the multiple cycle implementation. But may be more importantly, since the cycle time has to be long enough for the load instruction, it is too long for the store instruction so the last part of the cycle here is wasted. +2 = 77 min. (X:57)

32 CPU VS Microcontroller
Microcontroller = CPU + Flash(ROM) + RAM + popular I/O peripherals. 8051 Microcontroller Block Diagram: Used in Lab project Used to implement low cost applications & Embedded Systems Eg automotive, appliances, elevators

33 Microcontroller Block Diagram: PIC


Download ppt "Datapath and Control Unit Design"

Similar presentations


Ads by Google