EECC550 - Shaaban #1 Lec # 5 Winter 2009 1-5-2010 Major CPU Design Steps 1. Analyze instruction set operations using independent RTN ISA => RTN => datapath.

Slides:



Advertisements
Similar presentations
ELEN 350 Multi-Cycle Datapath Adapted from the lecture notes of John Kubiatowicz (UCB) and Hank Walker (TAMU)
Advertisements

1 Chapter Five The Processor: Datapath and Control.
EECC550 - Shaaban #1 Lec # 4 Summer Major CPU Design Steps 1Using independent RTN, write the micro- operations required for all target ISA.
EECC550 - Shaaban #1 Lec # 5 Winter Major CPU Design Steps 1. Analyze instruction set operations using independent RTN ISA => RTN => datapath.
CS61C L26 Single Cycle CPU Datapath II (1) Garcia © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.
Chapter 5 The Processor: Datapath and Control Basic MIPS Architecture Homework 2 due October 28 th. Project Designs due October 28 th. Project Reports.
EECC550 - Shaaban #1 Lec # 5 Winter CPU Design Steps 1. Analyze instruction set operations using independent ISA => RTN => datapath requirements.
Savio Chau Single Cycle Controller Design Last Time: Discussed the Designing of a Single Cycle Datapath Control Datapath Memory Processor (CPU) Input Output.
CPU Organization (Design)
CSE378 Multicycle impl,.1 Drawbacks of single cycle implementation All instructions take the same time although –some instructions are longer than others;
EECC550 - Shaaban #1 Lec # 5 Winter CPU Design Steps 1. Analyze instruction set operations using independent RTN => datapath requirements.
EECC550 - Shaaban #1 Lec # 5 Winter CPU Design Steps 1. Analyze instruction set operations using independent RTN => datapath requirements.
EECC550 - Shaaban #1 Lec # 4 Winter CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional.
ECE 232 L15.Miulticycle.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 15 Multi-cycle.
EECC250 - Shaaban #1 lec #22 Winter The Von-Neumann Computer Model Partitioning of the computing engine into components: –Central Processing.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
ECE 232 L13. Control.1 ©UCB, DAP’ 97 ECE 232 Hardware Organization and Design Lecture 13 Control Design
EECC550 - Shaaban #1 Lec # 4 Winter CPU Organization (Design) Datapath Design: –Capabilities & performance characteristics of principal.
CS61C L26 CPU Design : Designing a Single-Cycle CPU II (1) Garcia, Fall 2006 © UCB Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.
EECC550 - Shaaban #1 Lec # 4 Winter CPU Organization (Design) Datapath Design: –Capabilities & performance characteristics of principal.
EECC550 - Shaaban #1 Lec # 4 Winter Major CPU Design Steps 1Using independent RTN, write the micro- operations required for all target.
EEM 486: Computer Architecture Lecture 3 Designing a Single Cycle Datapath.
EECC550 - Shaaban #1 Selected Chapter 5 For More Practice Exercises Winter The MIPS jump and link instruction, jal is used to support procedure.
EECC550 - Shaaban #1 Lec # 5 Spring CPU Design Steps 1. Analyze instruction set operations using independent RTN => datapath requirements.
Major CPU Design Steps 1. Analyze instruction set operations using independent RTN ISA => RTN => datapath requirements. This provides the the required.
EECC550 - Shaaban #1 Lec # 5 Spring CPU Design Steps 1. Analyze instruction set operations using independent RTN => datapath requirements.
Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.
CS3350B Computer Architecture Winter 2015 Lecture 5.6: Single-Cycle CPU: Datapath Control (Part 1) Marc Moreno Maza [Adapted.
COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections
Computer Organization CS224 Fall 2012 Lesson 26. Summary of Control Signals addsuborilwswbeqj RegDst ALUSrc MemtoReg RegWrite MemWrite Branch Jump ExtOp.
CASE STUDY OF A MULTYCYCLE DATAPATH. Alternative Multiple Cycle Datapath (In Textbook) Minimizes Hardware: 1 memory, 1 ALU Ideal Memory Din Address 32.
CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides
1 CS/COE0447 Computer Organization & Assembly Language Multi-Cycle Execution.
EEM 486: Computer Architecture Designing a Single Cycle Datapath.
C HAPTER 5 T HE PROCESSOR : D ATAPATH AND C ONTROL M ULTICYCLE D ESIGN.
Computer Architecture and Design – ECEN 350 Part 6 [Some slides adapted from A. Sprintson, M. Irwin, D. Paterson and others]
Datapath and Control Unit Design
CS3350B Computer Architecture Winter 2015 Lecture 5.7: Single-Cycle CPU: Datapath Control (Part 2) Marc Moreno Maza [Adapted.
EECC550 - Shaaban #1 Lec # 4 Winter CPU Organization (Design) Datapath Design: –Capabilities & performance characteristics of principal.
1 Processor: Datapath and Control Single cycle processor –Datapath and Control Multicycle processor –Datapath and Control Microprogramming –Vertical and.
1 Processor: Datapath and Control Single cycle processor –Datapath and Control Multicycle processor –Datapath and Control Microprogramming –Vertical and.
LECTURE 6 Multi-Cycle Datapath and Control. SINGLE-CYCLE IMPLEMENTATION As we’ve seen, single-cycle implementation, although easy to implement, could.
ECE-C355 Computer Structures Winter 2008 The MIPS Datapath Slides have been adapted from Prof. Mary Jane Irwin ( )
Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.
EEM 486: Computer Architecture Lecture 3 Designing Single Cycle Control.
Single Cycle Controller Design
Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.
Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:
Design a MIPS Processor (II)
Multi-Cycle Datapath and Control
Problem with Single Cycle Processor Design
IT 251 Computer Organization and Architecture
Designing a Multicycle Processor
Processor (I).
CS/COE0447 Computer Organization & Assembly Language
Multiple Cycle Implementation of MIPS-Lite CPU
CPU Organization (Design)
Chapter Five The Processor: Datapath and Control
Vishwani D. Agrawal James J. Danaher Professor
COMS 361 Computer Organization
COSC 2021: Computer Organization Instructor: Dr. Amir Asif
Processor: Multi-Cycle Datapath & Control
Multi-Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Chapter Four The Processor: Datapath and Control
5.5 A Multicycle Implementation
Instructors: Randy H. Katz David A. Patterson
Alternative datapath (book): Multiple Cycle Datapath
The Processor: Datapath & Control.
COMS 361 Computer Organization
Processor: Datapath and Control
Presentation transcript:

EECC550 - Shaaban #1 Lec # 5 Winter Major CPU Design Steps 1. Analyze instruction set operations using independent RTN ISA => RTN => datapath requirements. –This provides the the required datapath components and how they are connected to meet ISA requirements. 2. Select required datapath components, connections & establish clock methodology ( e.g clock edge-triggered). 3. Assemble datapath meeting the requirements. 4. Identify and define the function of all control points or signals needed by the datapath. –Analyze implementation of each instruction to determine setting of control points that affects its operations and register transfer. 5. Design & assemble the control logic. –Hard-Wired: Finite-state machine implementation. –Microprogrammed. 3 rd Edition Chapter 5.5 – See Handout – Not in 4 th Edition Datapath Control Determine number of cycles per instruction and operations in each cycle. + i.e using a control program

EECC550 - Shaaban #2 Lec # 5 Winter Single Cycle MIPS Datapath: CPI = 1, Long Clock Cycle Jump Not Included T = I x CPI x C

EECC550 - Shaaban #3 Lec # 5 Winter Single Cycle MIPS Datapath Extended To Handle Jump with Control Unit Added In this book version, ORI is not supported—no zero extend of immediate needed. Figure 5.24 page 314 Book figure may have an error! Function Field rs rt PC +4 rd R[rs] R[rt] Branch Target PC PC +4 ALUOp (2-bits) 00 = add 01 = subtract 10 = R-Type imm16 Opcode R[rt]

EECC550 - Shaaban #4 Lec # 5 Winter Drawbacks of Single-Cycle Processor 1.Long cycle time: –All instructions must take as much time as the slowest: Cycle time for load is longer than needed for all other instructions. –Real memory is not as well-behaved as idealized memory Cannot always complete data access in one (short) cycle. 2.Impossible to implement complex, variable-length instructions and complex addressing modes in a single cycle. e.g indirect memory addressing. 3.High and duplicate hardware resource requirements –Any hardware functional unit cannot be used more than once in a single cycle (e.g. ALUs). 4.Cannot pipeline (overlap) the processing of one instruction with the previous instructions. –(instruction pipelining, chapter 6). CPI = 1

EECC550 - Shaaban #5 Lec # 5 Winter Abstract View of Single Cycle CPU PC Next PC Register Fetch ALU Reg. Wrt Mem Access Data Mem Instruction Fetch Result Store ALUctr RegDst ALUSrc ExtOp MemWr Equal Branch, Jump RegWr MemWr MemRd Main Control ALU control op fun Ext One CPU Clock Cycle Duration C = 8ns One instruction per cycle CPI = 1 Assuming the following datapath/control hardware components delays: Memory Units: 2 ns ALU and adders: 2 ns Register File: 1 ns Control Unit < 1 ns 2 ns 1 ns 2 ns 1 ns

EECC550 - Shaaban #6 Lec # 5 Winter Single Cycle Instruction Timing PCInst Memory mux ALUData Mem mux PCReg FileInst Memory mux ALU mux PCInst Memory mux ALUData Mem PCInst Memorycmp mux Reg File Arithmetic & Logical Load Store Branch Critical Path setup (Determines CPU clock cycle, C) Critical Path: Load (e.g 8 ns)

EECC550 - Shaaban #7 Lec # 5 Winter Clock Cycle Time & Critical Path Critical path: the slowest path between any two storage devices Clock Cycle time is a function of the critical path, and must be greater than: –Clock-to-Q + Longest Delay Path through the Combination Logic + Setup + Clock Skew Clk One CPU Clock Cycle Duration C = 8ns here Critical Path Assuming the following datapath/control hardware components delays: Memory Units: 2 ns ALU and adders: 2 ns Register File: 1 ns Control Unit < 1 ns i.e longest delay LW in this case

EECC550 - Shaaban #8 Lec # 5 Winter Reducing Cycle Time: Multi-Cycle Design Cut combinational dependency graph by inserting registers / latches. The same work is done in two or more shorter cycles, rather than one long cycle. storage element Acyclic Combinational Logic storage element Acyclic Combinational Logic (A) storage element Acyclic Combinational Logic (B) => Place registers to: Get a balanced clock cycle length Save any results needed for the remaining cycles One long cycle Two shorter cycles Cycle 1 Cycle 2 e.g CPI =1 e.g CPI =2 Storage Element: Register or memory

EECC550 - Shaaban #9 Lec # 5 Winter Basic MIPS Instruction Processing Steps Obtain instruction from program storage Determine instruction type Obtain operands from registers Compute result value or status Store result in register/memory if needed (usually called Write Back). Update program counter to address of next instruction } Common steps for all instructions Instruction Fetch Instruction Decode Execute Result Store Next Instruction Instruction  Mem[PC] PC  PC + 4 Done by Control Unit Instruction Memory

EECC550 - Shaaban #10 Lec # 5 Winter Partitioning The Single Cycle Datapath Add registers between steps to break into cycles PC Next PC Operand Fetch Exec Reg. File Mem Access Data Mem Instruction Fetch Result Store ALUctr RegDst ALUSrc ExtOp MemWr Branch, Jump RegWr MemWr MemRd Instruction Fetch Cycle (IF) Instruction Decode Cycle (ID) Execution Cycle (EX) Data Memory Access Cycle (MEM) Write back Cycle (WB) Place registers to: Get a balanced clock cycle length Save any results needed for the remaining cycles 2 ns 1 ns 2 ns 1 ns To Control Unit

EECC550 - Shaaban #11 Lec # 5 Winter Example Multi-cycle Datapath PC Next PC Ext ALU Reg. File Mem Acces s Data Mem ALUctr RegDst ALUSrc ExtOp Branch, Jump RegWr MemWr MemRd IR A B R M Reg File MemToReg Equal Registers added: All clock-edge triggered (not shown register write enable control lines) IR: Instruction register A, B: Two registers to hold operands read from register file. R: or ALUOut, holds the output of the main ALU M: or Memory data register (MDR) to hold data read from data memory CPU Clock Cycle Time: Worst cycle delay = C = 2ns (ignoring MUX, CLK-Q delays) Instruction Fetch (IF) 2ns Instruction Decode (ID) 1ns Execution (EX) 2ns Memory (MEM) 2ns Write Back (WB) 1ns To Control Unit Assuming the following datapath/control hardware components delays: Memory Units: 2 ns ALU and adders: 2 ns Register File: 1 ns Control Unit < 1 ns Instruction Fetch Thus Clock Rate: f = 1 / 2ns = 500 MHz

EECC550 - Shaaban #12 Lec # 5 Winter Operations (Dependant RTN) for Each Cycle Instruction Fetch Instruction Decode Execution Memory Write Back R-Type IR  Mem[PC] A  R[rs] B  R[rt] R  A funct B R[rd]  R PC  PC + 4 Logic Immediate IR  Mem[PC] A  R[rs] B  R[rt R  A OR ZeroExt[imm16] R[rt]  R PC  PC + 4 Load IR  Mem[PC] A  R[rs] B  R[rt R  A + SignEx(Im16) M  Mem[R] R[rt]  M PC  PC + 4 Store IR  Mem[PC] A  R[rs] B  R[rt] R  A + SignEx(Im16) Mem[R]  B PC  PC + 4 Branch IR  Mem[PC] A  R[rs] B  R[rt] Zero  A - B If Zero = 1: PC  PC (SignExt(imm16) x4) else (i.e Zero =0): PC  PC + 4 IF ID EX MEM WB Instruction Fetch (IF) & Instruction Decode cycles are common for all instructions

EECC550 - Shaaban #13 Lec # 5 Winter MIPS Multi-Cycle Datapath: Five Cycles of Load Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5 IF IDEXMEMWBLoad 1- Instruction Fetch (IF): Fetch the instruction from instruction Memory. 2- Instruction Decode (ID): Operand Register Fetch and Instruction Decode. 3- Execute (EX): Calculate the effective memory address. 4- Memory (MEM): Read the data from the Data Memory. 5- Write Back (WB): Write the loaded data to the register file. Update PC. CPI = 5

EECC550 - Shaaban #14 Lec # 5 Winter Multi-cycle Datapath Instruction CPI R-Type/Immediate: Require four cycles, CPI = 4 – IF, ID, EX, WB Loads: Require five cycles, CPI = 5 – IF, ID, EX, MEM, WB Stores: Require four cycles, CPI = 4 –IF, ID, EX, MEM Branches/Jumps: Require three cycles, CPI = 3 – IF, ID, EX Average or effective program CPI: 3  CPI  5 depending on program profile (instruction mix).

EECC550 - Shaaban #15 Lec # 5 Winter Single Cycle Vs. Multi-Cycle CPU Single-Cycle CPU: CPI = 1 C = 8ns One million instructions take = I x CPI x C = 10 6 x 1 x 8x10 -9 = 8 msec Multi-Cycle CPU: CPI = 3 to 5 C = 2ns One million instructions take from 10 6 x 3 x 2x10 -9 = 6 msec to 10 6 x 5 x 2x10 -9 = 10 msec depending on instruction mix used. 8ns (125 MHz) Assuming the following datapath/control hardware components delays: Memory Units: 2 ns ALU and adders: 2 ns Register File: 1 ns Control Unit < 1 ns f = 500 MHz f = 125 MHz T = I x CPI x C

EECC550 - Shaaban #16 Lec # 5 Winter Finite State Machine (FSM) Control Model State specifies control points (outputs) for Register Transfer. Control points (outputs) are assumed to depend only on the current state and not inputs (i.e. Moore finite state machine) Transfer (register/memory writes) and state transition occur upon exiting the state on the falling edge of the clock. State X Register Transfer Control Points State Transition Depends on Inputs Control State Next State Logic Output Logic inputs (opcode, conditions) outputs (control points) Next State Last State To datapath Current State Control Unit Design: e.g Flip-Flops Moore Finite State Machine

EECC550 - Shaaban #17 Lec # 5 Winter Control Specification For Multi-cycle CPU Finite State Machine (FSM) - State Transition Diagram IR   MEM[PC] R-type A  R[rs] B  R[rt] R  A fun B R[rd]  R PC  PC + 4 R  A or ZX R[rt]  R PC  PC + 4 ORi R  A + SX R[rt]  M PC  PC + 4 M  MEM[R] LW R  A + SX MEM[R]  B PC  PC + 4 BEQ & Zero BEQ & ~Zero PC  PC + 4 PC  PC + 4+ SX || 00 SW “instruction fetch” “decode / operand fetch” Execute Memory Write-back To instruction fetch 13 states: 4 State Flip-Flops needed (Start state)

EECC550 - Shaaban #18 Lec # 5 Winter Traditional FSM Controller State next State op Equal control points stateopcond next state control points State Transition Table datapath State To datapath Outputs (Control points) Opcode Current State State register (4 Flip-Flops) Output Logic Next State Logic Outputs Inputs

EECC550 - Shaaban #19 Lec # 5 Winter Traditional FSM Controller datapath + state diagram => control Translate RTN statements into control points. Assign states. Implement the controller. More on FSM controller implementation in Appendix C

EECC550 - Shaaban #20 Lec # 5 Winter Mapping RTNs To Control Points Examples & State Assignments IR  MEM[PC] 0000 R-type A  R[rs] B  R[rt] 0001 R  A fun B 0100 R[rd]  R PC  PC R  A or ZX 0110 R[rt]  R PC  PC ORi R  A + SX 1000 R[rt]  M PC  PC M  MEM[R] 1001 LW R  A + SX 1011 MEM[R]  B PC  PC BEQ & Zero BEQ & ~Zero PC  PC PC  PC + 4+SX || SW “instruction fetch” “decode / operand fetch” Execute Memory Write-back imem_rd, IRen Aen, Ben ALUfun, Sen RegDst, RegWr, PCen To instruction fetch state 0000 To instruction fetch state states: 4 State Flip-Flops needed

EECC550 - Shaaban #21 Lec # 5 Winter Detailed Control Specification - State Transition Table Current Op fieldZNext IR PC Ops Exec Mem Write-Back State en selA B Ex Sr ALU S R W MM-R Wr Dst 0000??????? BEQ BEQ R-typex orIx LWx SWx xxxxxxx xxxxxxx xxxxxxx fun xxxxxxx xxxxxxx or xxxxxxx xxxxxxx add xxxxxxx xxxxxxx xxxxxxx add xxxxxxx R ORI LW SW BEQ IF ID Can be combined in one state More on FSM controller implementation in Appendix C

EECC550 - Shaaban #22 Lec # 5 Winter Alternative Multiple Cycle Datapath (In Textbook) Minimizes Hardware: 1 memory, 1 ALU Ideal Memory Din Address 32 Dout MemWr 32 ALU 32 ALUOp ALU Control 32 IRWr Instruction Reg 32 Reg File Ra Rw busW Rb busA 32 busB RegWr Rs Rt Mux 0 1 Rt Rd PCWr ALUSrcA Mux 01 RegDst Mux PC MemtoReg Extend Mux Imm 32 ALUSrcB Mux Zero PCWrCondPCSrc 32 IorD Mem Data Reg ALU Out B A << 2 MemRd PC

EECC550 - Shaaban #23 Lec # 5 Winter Alternative Multiple Cycle Datapath (In Textbook) Shared instruction/data memory unit A single ALU shared among instructions Shared units require additional or widened multiplexors Temporary registers to hold data between clock cycles of the instruction: Additional registers: Instruction Register (IR), Memory Data Register (MDR), A, B, ALUOut (Figure 5.27 page 322) rs rt rd imm16 i.e MDR

EECC550 - Shaaban #24 Lec # 5 Winter Alternative Multiple Cycle Datapath With Control Lines (Fig 5.28 In Textbook) (Figure 5.28 page 323) (ORI not supported, Jump supported) PC+ 4 Branch Target rs rt rd imm16 32 PC

EECC550 - Shaaban #25 Lec # 5 Winter The Effect of The 1-bit Control Signals Signal Name RegDst RegWrite ALUSrcA MemRead MemWrite MemtoReg IorD IRWrite PCWrite PCWriteCond Effect when deasserted (=0) The register destination number for the write register comes from the rt field (instruction bits 20:16). None The first ALU operand is the PC None The value fed to the register write data input comes from ALUOut register. The PC is used to supply the address to the memory unit. None Effect when asserted (=1) The register destination number for the write register comes from the rd field (instruction bits 15:11). The register on the write register input is written with the value on the Write data input. The First ALU operand is register A (i.e R[rs]) Content of memory specified by the address input are put on the memory data output. Memory contents specified by the address input is replaced by the value on the Write data input. The value fed to the register write data input comes from data memory register (MDR). The ALUOut register is used to supply the the address to the memory unit. The output of the memory is written into Instruction Register (IR) The PC is written; the source is controlled by PCSource The PC is written if the Zero output of the ALU is also active. (Figure 5.29 page 324)

EECC550 - Shaaban #26 Lec # 5 Winter The Effect of The 2-bit Control Signals Signal Name ALUOp ALUSrcB PCSource Value (Binary) Effect The ALU performs an add operation The ALU performs a subtract operation The funct field of the instruction determines the ALU operation (R-Type) The second input of the ALU comes from register B The second input of the ALU is the constant 4 The second input of the ALU is the sign-extended 16-bit immediate (imm16) field of the instruction in IR The second input of the ALU is is the sign-extended 16-bit immediate field of IR shifted left 2 bits (for branches) Output of the ALU (PC+4) is sent to the PC for writing The content of ALUOut (the branch target address) is sent to the PC for writing The jump target address (IR[25:0] shifted left 2 bits and concatenated with PC+4[31:28] is sent to the PC for writing (Figure 5.29 page 324) i.e jump address (i.e R[rs])

EECC550 - Shaaban #27 Lec # 5 Winter Instruction Fetch Instruction Decode Execution Memory Write Back R-Type IR  Mem[PC] PC  PC + 4 A  R[rs] B  R[rt] ALUout  PC + (SignExt(imm16) x4) ALUout  A funct B R[rd]  ALUout Load IR  Mem[PC] PC  PC + 4 A  R[rs] B  R[rt] ALUout  PC + (SignExt(imm16) x4) ALUout   A + SignEx(Imm16) MDR  Mem[ALUout] R[rt]  MDR Store IR  Mem[PC] PC  PC + 4 A  R[rs] B  R[rt] ALUout  PC + (SignExt(imm16) x4) ALUout  A + SignEx(Imm16) Mem[ALUout]  B Branch IR  Mem[PC] PC  PC + 4 A  R[rs] B  R[rt] ALUout  PC + (SignExt(imm16) x4) Zero  A - B Zero: PC  ALUout Jump IR  Mem[PC] PC  PC + 4 A  R[rs] B  R[rt] ALUout  PC + (SignExt(imm16) x4) PC  Jump Address IF ID EX MEM WB Instruction Fetch (IF) & Instruction Decode (ID) cycles are common for all instructions Operations (Dependant RTN) for Each Cycle

EECC550 - Shaaban #28 Lec # 5 Winter High-Level View of Finite State Machine Control First steps are independent of the instruction class Then a series of sequences that depend on the instruction opcode Then the control returns to fetch a new instruction. Each box above represents one or several state. (Figure 5.32) (Figure 5.33) (Figure 5.34)(Figure 5.35)(Figure 5.36) (Figure 5.31 page 332)

EECC550 - Shaaban #29 Lec # 5 Winter FSM State Transition Diagram (From Book) IF ID EX MEM WB (Figure 5.38 page 339) Total 10 states More on FSM controller implementation in Appendix C R[rd]  ALUout IR  Mem[PC] PC  PC + 4 ALUout  A func B A  R[rs] B  R[rt] ALUout  PC + (SignExt(imm16) x4) Zero  A -B Zero: PC  ALUout ALUout   A + SignEx(Imm16) PC  Jump Address R[rt]  MDR MDR  Mem[ALUout] Mem[ALUout]  B

EECC550 - Shaaban #30 Lec # 5 Winter Instruction Fetch (IF) and Decode (ID) FSM States IF ID (Figure 5.33)(Figure 5.34)(Figure 5.35)(Figure 5.36) (Figure 5.32 page 333) IR  Mem[PC] PC  PC + 4 A  R[rs] B  R[rt] ALUout  PC + (SignExt(imm16) x4)

EECC550 - Shaaban #31 Lec # 5 Winter Instruction Fetch (IF) Cycle (State 0) (Figure 5.28 page 323) (ORI not supported, Jump supported) PC+ 4 Branch Target rs rt rd imm16 32 PC IR  Mem[PC] PC  PC MemRead = 1 ALUSrcA = 0 IorD = 0 IRWrite =1 ALUSrcB = 01 ALUOp = 00 (add) PCWrite = 1 PCSource = Add 1

EECC550 - Shaaban #32 Lec # 5 Winter Instruction Decode (ID) Cycle (State 1) (Figure 5.28 page 323) (ORI not supported, Jump supported) PC+ 4 Branch Target rs rt rd imm16 32 PC A  R[rs] B  R[rt] ALUout  PC + (SignExt(imm16) x4) ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 (add) 00 Add 11 0 (Calculate branch target)

EECC550 - Shaaban #33 Lec # 5 Winter Load/Store Instructions FSM States EX MEM WB To Instruction Fetch (Figure 5.32) (From Instruction Decode) (Figure 5.33 page 334) ALUout  A + SignEx(Imm16) MDR  Mem[ALUout] Mem[ALUout]  B R[rt]  MDR i.e Effective address calculation

EECC550 - Shaaban #34 Lec # 5 Winter Load/Store Execution (EX) Cycle (State 2) (Figure 5.28 page 323) (ORI not supported, Jump supported) PC+ 4 Branch Target rs rt rd imm16 32 PC ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 (add) 00 Add 10 1 ALUout  A + SignEx(Imm16) Effective address calculation

EECC550 - Shaaban #35 Lec # 5 Winter (Figure 5.28 page 323) Load Memory (MEM) Cycle (State 3) (ORI not supported, Jump supported) PC+ 4 Branch Target rs rt rd imm16 32 PC MDR  Mem[ALUout] MemRead = 1 IorD = 1 1 1

EECC550 - Shaaban #36 Lec # 5 Winter (Figure 5.28 page 323) Load Write Back (WB) Cycle (State 4) (ORI not supported, Jump supported) PC+ 4 Branch Target rs rt rd imm16 32 PC R[rt]  MDR RegWrite = 1 MemtoReg = 1 RegDst =

EECC550 - Shaaban #37 Lec # 5 Winter (Figure 5.28 page 323) Store Memory (MEM) Cycle (State 5) (ORI not supported, Jump supported) PC+ 4 Branch Target rs rt rd imm16 32 PC Mem[ALUout]  B MemWrite = 1 IorD = 1 1 1

EECC550 - Shaaban #38 Lec # 5 Winter R-Type Instructions FSM States EX WB To State 0 (Instruction Fetch) (Figure 5.32) (From Instruction Decode) (Figure 5.34 page 335) ALUout  A funct B R[rd]  ALUout

EECC550 - Shaaban #39 Lec # 5 Winter R-Type Execution (EX) Cycle (State 6) (Figure 5.28 page 323) (ORI not supported, Jump supported) PC+ 4 Branch Target rs rt rd imm16 32 PC ALUout  A funct B ALUSrcA = 1 ALUSrcB = 00 ALUOp = 10 (R-Type) R-Type

EECC550 - Shaaban #40 Lec # 5 Winter (Figure 5.28 page 323) R-Type Write Back (WB) Cycle (State 7) (ORI not supported, Jump supported) PC+ 4 Branch Target rs rt rd imm16 32 PC R[rd]  ALUout RegWrite = 1 MemtoReg = 0 RegDst =

EECC550 - Shaaban #41 Lec # 5 Winter Jump Instruction Single EX State Branch Instruction Single EX State EX To State 0 (Instruction Fetch) (Figure 5.32) (From Instruction Decode) To State 0 (Instruction Fetch) (Figure 5.32) (From Instruction Decode) (Figures 5.35, 5.36 page 337) PC  Jump Address Zero  A - B Zero : PC  ALUout

EECC550 - Shaaban #42 Lec # 5 Winter (Figure 5.28 page 323) Branch Execution (EX) Cycle (State 8) (ORI not supported, Jump supported) PC+ 4 Branch Target rs rt rd imm16 32 PC Zero  A - B Zero : PC  ALUout ALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 (Subtract) PCWriteCond = 1 PCSource = Subtract 00 1

EECC550 - Shaaban #43 Lec # 5 Winter (Figure 5.28 page 323) Jump Execution (EX) Cycle (State 9) (ORI not supported, Jump supported) PC+ 4 Branch Target rs rt rd imm16 32 PC PC  Jump Address PCWrite = 1 PCSource =

EECC550 - Shaaban #44 Lec # 5 Winter MIPS Multi-cycle Datapath Performance Evaluation What is the average CPI? –State diagram gives CPI for each instruction type. –Workload (program) below gives frequency of each type. TypeCPI i for typeFrequency CPI i x freqI i Arith/Logic 440%1.6 Load 5 30%1.5 Store 410%0.4 branch 320%0.6 Average CPI: 4.1 Better than CPI = 5 if all instructions took the same number of clock cycles (5). T = I x CPI x C

EECC550 - Shaaban #45 Lec # 5 Winter You are to add support for a new instruction, swap that exchanges the values of two registers to the MIPS multicycle datapath of Figure 5.28 on page 232 swap $rs, $rt Swap used the R-Type format with: the value of field rs = the value of field rd Add any necessary datapaths and control signals to the multicycle datapath. Find a solution that minimizes the number of clock cycles required for the new instruction without modifying the register file. Justify the need for the modifications, if any. Show the necessary modifications to the multicycle control finite state machine of Figure 5.38 on page 339 when adding the swap instruction. For each new state added, provide the dependent RTN and active control signal values. Adding Support for swap to Multi Cycle Datapath i.e No additional register write ports R[rt]  R[rs] R[rs]  R[rt]

EECC550 - Shaaban #46 Lec # 5 Winter Adding swap Instruction Support to Multi Cycle Datapath Swap $rs, $rt R[rt]  R[rs] R[rs]  R[rt] We assume here rs = rd in instruction encoding The outputs of A and B should be connected to the multiplexor controlled by MemtoReg if one of the two fields (rs and rd) contains the name of one of the registers being swapped. The other register is specified by rt. The MemtoReg control signal becomes two bits. op rs rt rd [31-26] [25-21] [20-16] [10-6] rs rt rd imm16 PC+ 4 Branch Target R[rs] R[rt]

EECC550 - Shaaban #47 Lec # 5 Winter A  R[rs] B  R[rt] ALUout  PC + (SignExt(imm16) x4) IR  Mem[PC] PC  PC + 4 IF ID R[rd]  B R[rt]  A ALUout  A func B R[rd]  ALUout ALUout   A + SignEx(Imm16) EX MEM WB Swap takes 4 cycles WB1 WB2 Adding swap Instruction Support to Multi Cycle Datapath Zero  A -B Zero: PC  ALUout

EECC550 - Shaaban #48 Lec # 5 Winter You are to add support for a new instruction, add3, that adds the values of three registers, to the MIPS multicycle datapath of Figure 5.28 on page 232 For example: add3 $s0,$s1, $s2, $s3 Register $s0 gets the sum of $s1, $s2 and $s3. The instruction encoding uses a modified R-format, with an additional register specifier rx added replacing the five low bits of the “funct” field. Add necessary datapath components, connections, and control signals to the multicycle datapath without modifying the register bank or adding additional ALUs. Find a solution that minimizes the number of clock cycles required for the new instruction. Justify the need for the modifications, if any. Show the necessary modifications to the multicycle control finite state machine of Figure 5.38 on page 339 when adding the add3 instruction. For each new state added, provide the dependent RTN and active control signal values. Adding Support for add3 to Multi Cycle Datapath OPrs rt rd rx $s1$s2 Not used 6 bits [31-26] 5 bits [25-21] 5 bits [20-16] 5 bits [15-11] add3 5 bits [4-0] $s0$s3 6 bits [10-5]

EECC550 - Shaaban #49 Lec # 5 Winter add3 instruction support to Multi Cycle Datapath Add3 $rd, $rs, $rt, $rx R[rd]  R[rs] + R[rt] + R[rx] rx is a new register specifier in field [0-4] of the instruction No additional register read ports or ALUs allowed 1. ALUout is added as an extra input to first ALU operand MUX to use the previous ALU result as an input for the second addition. 2. A multiplexor should be added to select between rt and the new field rx containing register number of the 3rd operand (bits 4-0 for the instruction) for input for Read Register 2. This multiplexor will be controlled by a new one bit control signal called ReadSrc. op rs rt rd rx [31-26] [25-21] [20-16] [10-6] [4-0] Modified R-Format WriteB 3. WriteB control line added to enable writing R[rx] to B PC+ 4 Branch Target imm16 rx rd rs rt

EECC550 - Shaaban #50 Lec # 5 Winter add3 instruction support to Multi Cycle Datapath A  R[rs] B  R[rt] ALUout  PC + (SignExt(imm16) x4) IR  Mem[PC] PC  PC + 4 IF ID ALUout  A + B B  R[rx] ALUout  ALUout + B ALUout  A func B Zero  A -B Zero: PC  ALUout ALUout   A + SignEx(Im16) EX MEM WB EX1 EX2 R[rd]  ALUout Add3 takes 5 cycles WriteB