Datapath and Control Unit Design

Slides:



Advertisements
Similar presentations
EECC550 - Shaaban #1 Lec # 4 Summer Major CPU Design Steps 1Using independent RTN, write the micro- operations required for all target ISA.
Advertisements

361 datapath Computer Architecture Lecture 8: Designing a Single Cycle Datapath.
The Processor: Datapath & Control
CS61C L19 CPU Design : Designing a Single-Cycle CPU (1) Beamer, Summer 2007 © UCB Scott Beamer Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine.
CS61C L26 Single Cycle CPU Datapath II (1) Garcia © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.
Chapter 5 The Processor: Datapath and Control Basic MIPS Architecture Homework 2 due October 28 th. Project Designs due October 28 th. Project Reports.
Savio Chau Single Cycle Controller Design Last Time: Discussed the Designing of a Single Cycle Datapath Control Datapath Memory Processor (CPU) Input Output.
Processor II CPSC 321 Andreas Klappenecker. Midterm 1 Tuesday, October 5 Thursday, October 7 Advantage: less material Disadvantage: less preparation time.
Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 25 CPU design (of a single-cycle CPU) Sat Google in Mountain.
EECC550 - Shaaban #1 Lec # 4 Winter CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional.
Levels in Processor Design
Microprocessor Design
CS61C L25 Single Cycle CPU Datapath (1) Garcia © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Computer Structure - Datapath and Control Goal: Design a Datapath  We will design the datapath of a processor that includes a subset of the MIPS instruction.
CS61C L25 CPU Design : Designing a Single-Cycle CPU (1) Garcia, Fall 2006 © UCB T-Mobile’s Wi-Fi / Cell phone  T-mobile just announced a new phone that.
CS61C L26 CPU Design : Designing a Single-Cycle CPU II (1) Garcia, Fall 2006 © UCB Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.
Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 25 CPU design (of a single-cycle CPU) Intel is prototyping circuits that.
CS61C L25 CPU Design : Designing a Single-Cycle CPU (1) Garcia, Spring 2007 © UCB Google Summer of Code  Student applications are now open (through );
EECC550 - Shaaban #1 Lec # 4 Winter Major CPU Design Steps 1Using independent RTN, write the micro- operations required for all target.
EEM 486: Computer Architecture Lecture 3 Designing a Single Cycle Datapath.
CS 61C L16 Datapath (1) A Carle, Summer 2004 © UCB inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #16 – Datapath Andy.
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 28: Single-Cycle CPU Datapath Control Part 1 Guest Lecturer: Sagar Karandikar.
361 control Computer Architecture Lecture 9: Designing Single Cycle Control.
ECE 232 L12.Datapath.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 12 Datapath.
Chapter 4 Sections 4.1 – 4.4 Appendix D.1 and D.2 Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.
CS3350B Computer Architecture Winter 2015 Lecture 5.6: Single-Cycle CPU: Datapath Control (Part 1) Marc Moreno Maza [Adapted.
COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections
Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per.
Computer Organization CS224 Fall 2012 Lesson 22. The Big Picture  The Five Classic Components of a Computer  Chapter 4 Topic: Processor Design Control.
Computer Architecture
Designing a Single Cycle Datapath In this lecture, slides from lectures 3, 8 and 9 from the course Computer Architecture ECE 201 by Professor Mike Schulte.
CS 61C: Great Ideas in Computer Architecture Datapath
EEM 486: Computer Architecture Designing a Single Cycle Datapath.
Computer Architecture and Design – ECEN 350 Part 6 [Some slides adapted from A. Sprintson, M. Irwin, D. Paterson and others]
CPE 442 single-cycle datapath.1 Intro. To Computer Architecture CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath.
CS3350B Computer Architecture Winter 2015 Lecture 5.7: Single-Cycle CPU: Datapath Control (Part 2) Marc Moreno Maza [Adapted.
1 Processor: Datapath and Control Single cycle processor –Datapath and Control Multicycle processor –Datapath and Control Microprogramming –Vertical and.
MIPS processor continued. In Class Exercise Question Show the datapath of a processor that supports only R-type and jr reg instructions.
1 Processor: Datapath and Control Single cycle processor –Datapath and Control Multicycle processor –Datapath and Control Microprogramming –Vertical and.
Computer Organization CS224 Chapter 4 Part a The Processor Spring 2011 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture.
Designing a Single- Cycle Processor 國立清華大學資訊工程學系 黃婷婷教授.
Cpu control.1 2/14 Datapath Components for Lab The Processor! ( th ed)
ECE-C355 Computer Structures Winter 2008 The MIPS Datapath Slides have been adapted from Prof. Mary Jane Irwin ( )
Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.
CS4100: 計算機結構 Designing a Single-Cycle Processor 國立清華大學資訊工程學系 一零零學年度第二學期.
Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.
COM181 Computer Hardware Lecture 6: The MIPs CPU.
Csci 136 Computer Architecture II –Single-Cycle Datapath Xiuzhen Cheng
EEM 486: Computer Architecture Lecture 3 Designing Single Cycle Control.
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Single-Cycle CPU Datapath & Control Part 2 Instructors: Krste Asanovic & Vladimir Stojanovic.
Single Cycle Controller Design
Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:
CS161 – Design and Architecture of Computer Systems
IT 251 Computer Organization and Architecture
(Chapter 5: Hennessy and Patterson) Winter Quarter 1998 Chris Myers
Processor (I).
MIPS processor continued
CPU Organization (Design)
Single Cycle CPU Design
MIPS Processor.
Levels in Processor Design
CS152 Computer Architecture and Engineering Lecture 8 Designing a Single Cycle Datapath Start: X:40.
The Processor Lecture 3.2: Building a Datapath with Control
COMS 361 Computer Organization
COSC 2021: Computer Organization Instructor: Dr. Amir Asif
The Processor: Datapath & Control.
COMS 361 Computer Organization
Designing a Single-Cycle Processor
Processor: Datapath and Control
Presentation transcript:

Datapath and Control Unit Design Simple Processor! (4.1- 4.4 4th ed)

Datapath vs Control Datapath Controller signals Control Points Datapath: Storage, FU, interconnect sufficient to perform desired functions Gets Control inputs from control Controller: controls operation on data path

CPU Performance Performance determined by: Instruction count - code CPI Inst. Count Cycle Time Performance determined by: Instruction count - code cycle time cycles per instruction - CPI Processor design impacts: cycle time  clock cycles per instruction

MIPS Format (Review) R I J All MIPS instructions 32 bits. Three formats: R I J op target address 26 31 6 bits 26 bits rs rt rd shamt funct 6 11 16 21 5 bits immediate 16 bits One of the most important thing you need to know before you start designing a processor is how the instructions look like. Or in more technical term, you need to know the instruction format. One good thing about the MIPS instruction set is that it is very simple. First of all, all MIPS instructions are 32 bits long and there are only three instruction formats: (a) R-type, (b) I-type, and (c) J-type. The different fields of the R-type instructions are: (a) OP specifies the operation of the instruction. (b) Rs, Rt, and Rd are the source and destination register specifiers. (c) Shamt specifies the amount you need to shift for the shift instructions. (d) Funct selects the variant of the operation specified in the “op” field. For the I-type instruction, bits 0 to 15 are used as an immediate field. I will show you how this immediate field is used differently by different instructions. Finally for the J-type instruction, bits 0 to 25 become the target address of the jump. +3 = 10 min. (X:50)

Instructions executed in steps R-type: fetch inst., select registers (rs, rt), [operand fetch] ALU operation write back registers lw/sw: fetch instruction select a register(rs) calculate address, need ALU access memory (read/write) write register file (lw) Branch: fetch the instruction select registers (for beq) test condition, calculate target addr., need ALU First two steps are common

Functional Units - to build datapath review

Review: How Registers work Similar to D Flip Flop N-bit input and output Write Enable input Write Enable: negated (0): Data Out will not change asserted (1): Data Out will become Data In after clock edge Write Enable Data In Data Out N N Clk As far as storage elements are concerned, we will need a N-bit register that is similar to the D flip-flop I showed you in class. The significant difference here is that the register will have a Write Enable input. That is the content of the register will NOT be updated if Write Enable is not asserted (0). The content is updated at the clock tick ONLY if the Write Enable signal is asserted (1). +1 = 31 min. (Y:11)

MIPS Register File Register File consists of 32 registers: RW R1 R2 Register File consists of 32 registers: Two 32-bit outputs: Read data 1 & Read data 2 A 32-bit input bus: write data Register selection: R1 (read register 1) selects the register to put on read data 1 R2 (read register 2) selects the register to put on read data 2 RW (write register) selects the register to be written (write data) when Write Enable is 1 (Regwrite) Clock input (CLK) The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: Read data1 & read data 2 valid after “access time.” 5 5 5 Write Enable Read data 1 Write data 32 32 32-bit Registers 32 Read data 2 Clk 32 We will also need a register file that consists of 32 32-bit registers with two output busses (busA and busB) and one input bus. The register specifiers Ra and Rb select the registers to put on busA and busB respectively. When Write Enable is 1, the register specifier Rw selects the register to be written via busW. In our simplified version of the register file, the write operation will occurs at the clock tick. Keep in mind that the clock input is a factor ONLY during the write operation. During read operation, the register file behaves as a combinational logic block. That is if you put a valid value on Ra, then bus A will become valid after the register file’s access time. Similarly if you put a valid value on Rb, bus B will become valid after the register file’s access time. In both cases (Ra and Rb), the clock input is not a factor. +2 = 33 min. (Y:13)

Memory review Write data read data Memory (Data) Write Enable Address Memory (Data) Input: Data In (Write data) Output: Data Out (Read Data) Memory word selection: Address selects word Write Enable = 1: address selects memory word to be written via the Data In (Memwrite) Clock input (CLK) (omitted from Book diag for simplicity) The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: Address valid => Data Out valid after “access time.” Instruction memory data not shown (similar) Write data read data Data In DataOut 32 32 Clk The last storage element you will need for the datapath is the idealized memory to store your data and instructions. This idealized memory block has just one input bus (DataIn) and one output bus (DataOut). When Write Enable is 0, the address selects the memory word to put on the Data Out bus. When Write Enable is 1, the address selects the memory word to be written via the DataIn bus at the next clock tick. Once again, the clock input is a factor ONLY during the write operation. During read operation, it behaves as a combinational logic block. That is if you put a valid value on the address lines, the output bus DataOut will become valid after the access time of the memory. +2 = 35 min. (Y:15)

Clocking - Review Clk Setup Hold Setup Hold Don’t Care . Remember, we will be using a clocking methodology where all storage elements are clocked by the same clock edge. Consequently, our cycle time will be the sum of: (a) The Clock-to-Q time of the input registers. (b) The longest delay path through the combinational logic block. (c) The set up time of the output register. (d) And finally the clock skew. In order to avoid hold time violation, you have to make sure this inequality is fulfilled. +2 = 18 min. (X:58) All storage elements are clocked by the same clock edge Cycle Time = CLK-to-Q + Longest Delay Path + Setup + Clock Skew (CLK-to-Q + Shortest Delay Path - Clock Skew) > Hold Time

Single-Cycle: Instruction Fetch Datapath Next Address Logic Instruction fetch Inst. In instr. memory program counter points to current instruction adder increments PC to point to next inst. For branch inst., the next inst. address may not be valid Read address Instruction Inst memory

R-type Datapath R[rd] <- R[rs] op R[rt] Example: add rd, rs, rt Ra, Rb, and Rw come from instruction’s rs, rt, and rd fields ALUctr and RegWr: control logic after decoding the instruction op rs rt rd shamt funct 6 11 16 21 26 31 6 bits 5 bits Rd Rs Rt ALU control Write 5 5 5 Read data 1 Rw R1 R2 Write data 32 32 32-bit Registers Result ALU 32 32 Clk Read data 2 32

Complete R-type Datapath Next Address Logic ALU control Read register1 read data 1 zero Read address Read register2 result register file Instruction read data 2 Write register write data Inst memory Write

Timing: One complete cycle Clk Clk-to-Q PC Old Value New Value Instruction Memory Access Time Rs, Rt, Rd, Op, Func Old Value New Value Delay through Control Logic ALUctr Old Value New Value RegWr Old Value New Value Register File Access Time Let’s take a more quantitative picture of what is happening. At each clock tick, the Program Counter will present its latest value to the Instruction memory after Clk-to-Q time. After a delay of the Instruction Memory Access time, the Opcode, Rd, Rs, Rt, and Function fields will become valid on the instruction bus. Once we have the new instruction, that is the Add or Subtract instruction, on the instruction bus, two things happen in parallel. First of all, the control unit will decode the Opcode and Func field and set the control signals ALUctr and RegWr accordingly. We will cover this in the next lecture. While this is happening (points to Control Delay), we will also be reading the register file (Register File Access Time). Once the data is valid on busA and busB, the ALU will perform the Add or Subtract operation based on the ALUctr signal. Hopefully, the ALU is fast enough that it will finish the operation (ALU Delay) before the next clock tick. At the next clock tick, the output of the ALU will be written into the register file because the RegWr signal will be equal to 1. +3 = 45 min. (Y:25) Read data 1& 2 Old Value New Value ALU Delay Write data Old Value New Value Rd Rs Rt RegWr ALUctr Register Write Occurs Here 5 5 5 Read data 1 Rw Ra Rb Write data 32 32 32-bit Registers Result ALU 32 32 Clk Read data 2 32

Load/Store Datapath fetch same as R lw $1, offset-value($2) ; sw $1, offset-value($2) register file (get base reg.) ALU to calculate memory address data memory: read OR write sign extension (offset ext.) data memory Read data1 rg 1 read data2 rg2 Write reg write data address write data read data Register file sign ext. 32 16

Branch Inst. Datapath beq $1, $2, offset if ($1=$2) goto PC+offset*4 target beq $1, $2, offset if ($1=$2) goto PC+offset*4 ALU for branch condition Adder for computing branch target address Shift left 2: increases the range of offset by 4 Zero: control logic to decide if branch. Add shift left 2 Registers Read Reg 1 zero Data1 Inst. ALU To branch control logic Read Reg 2 Data2 32 ALU control sign ext. 16

Complete Datapath for : R, LD/ST, BEQ I n s t r u c i o m e y R a d 1 6 3 2 A L U l M x g W S h f 4 Z D Executes basic instructions in single clock cycle Any resource can only be once during a single cycle

Datapath controlled by control unit Identify your controls Identify your controls

Single-Cycle: Control Signals input: 6-bit opcode output: 9 control lines ALU control: input: ALUop + 6-bit (function field) output: 3 lines for I, J type, ALU control depends on only ALUop Main op func

ALU Control, Truth Table *ALUop: output of main control R-: ALUop=10, lw/sw: ALUop=00 *ALU Control: combinational logic 8 inputs, 3 output.

Datapath with Control unit

Datapath with Control unit

Datapath timings Rformat timing= 400 +200+30 +120 +30 (IF – WB) 100 100 400 120 200 350 30 30 Rformat timing= 400 +200+30 +120 +30 (IF – WB) OR = 400 + 100 (IF – cntl – Pcmux)

Control Unit -- Control Signal Definitions PCsrc = branch AND zero

Example 1: Execution flow for add $1, $1, $3 (4 steps + bypass) 1. IF 1. IF 3. EX, ALU func. 2.D 4.Bypass 1. IF 5 5. WB write back result

Example 2: LW S0, OFF(S1) Memory address = OFF + S1 1. IF 3. EX, calc address 2.D 4.Mem rd OFF 5. WB write back result

Example 3: BEQ S1, S0, cs330 target address = PC + offset x 4 Update PC with target addr. If successful 1. IF 3. EX, compare s1:s0 2.D

Single-Cycle: J-type So far, datapath can handle R-type, lw/sw, beq How about J-type? J-type j L1 P.372 jal L1 Exercise 5.6 address= current PC = Actual address L1 =

Single-Cycle: Datapath + Control including jump inst

What’s wrong with Single cycle CPI=1 processor? Inst Memory ALU Data Mem Reg File cmp Arithmetic & Logical Load Store Branch Critical Path RegW Long Cycle Time All instructions take as much time as the slowest Real memory is slow

Single Cycle Timing Diagram Clk Single Cycle Implementation: Load Store Waste Here are the timing diagrams showing the differences between the single cycle, multiple cycle, and pipeline implementations. For example, in the pipeline implementation, we can finish executing the Load, Store, and R-type instruction sequence in seven cycles. In the multiple clock cycle implementation, however, we cannot start executing the store until Cycle 6 because we must wait for the load instruction to complete. Similarly, we cannot start the execution of the R-type instruction until the store instruction has completed its execution in Cycle 9. In the Single Cycle implementation, the cycle time is set to accommodate the longest instruction, the Load instruction. Consequently, the cycle time for the Single Cycle implementation can be five times longer than the multiple cycle implementation. But may be more importantly, since the cycle time has to be long enough for the load instruction, it is too long for the store instruction so the last part of the cycle here is wasted. +2 = 77 min. (X:57)

CPU VS Microcontroller Microcontroller = CPU + Flash(ROM) + RAM + popular I/O peripherals. 8051 Microcontroller Block Diagram: Used in Lab project Used to implement low cost applications & Embedded Systems Eg automotive, appliances, elevators

Microcontroller Block Diagram: PIC