Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE 308 – Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Multicycle Implementation.

Similar presentations


Presentation on theme: "CSE 308 – Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Multicycle Implementation."— Presentation transcript:

1 CSE 308 – Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Multicycle Implementation

2 Multicycle Implementation© Muhamed Mudawar, COE 308 – KFUPMSlide 2 Drawbacks of Single Cycle Processor  Long cycle time  All instructions take as much time as the slowest  Functional units are duplicated raising cost  Each functional unit can be used once per clock cycle Instruction Fetch Store ALUMemory Write Instruction Fetch Arithmetic & Logical Register ReadALUReg Write Instruction Fetch Branch Load Memory Read Instruction Fetch longest delay ALURegister Read ALU Reg Write

3 Multicycle Implementation© Muhamed Mudawar, COE 308 – KFUPMSlide 3 Solution = Multicycle Implementation  Break instruction execution into five steps  Instruction fetch  Instruction decode and register read  Execution, memory address calculation, or branch completion  Memory access or ALU instruction completion  Load instruction completion  One step = One clock cycle (clock cycle is reduced)  First 2 steps are the same for all instructions Instruction # cyclesInstruction# cycles ALU4Branch3 Load5Store4

4 Multicycle Implementation© Muhamed Mudawar, COE 308 – KFUPMSlide 4 MIPS Multicycle Datapath ALU is used to increment upper 30 bits of PC, to compute branch target and load/store address, and to execute ALU instructions Registers are used to store values at the end of each clock cycle for use during next cycle Same memory is used for instructions and data 4 Imm16 32 ALUALU RA RB BusA BusB RW BusW Address Instruction or data Memory 30 PC 00 30 5 Rs Extender 5 Rt muxmux 0 1 muxmux 0 1 mux 0 1 Rd 5 muxmux 0 1 zero Data_in MDR IR Registers A B muxmux 0 1 32 ALUout 32 PC[31:28], Imm26 30 muxmux 2 0 1 32 30

5 Multicycle Implementation© Muhamed Mudawar, COE 308 – KFUPMSlide 5 Multicycle Datapath Changes  Eliminating some of the components  Single memory unit for both instructions and data  Single ALU eliminating branch address adder and PC adder Note: modern CPUs maintain separate instruction and data memories as well as separate address adders, but we reduce them here because the same component can be used for different purposes in different cycles  Adding temporary registers  Instruction Register: IR  Memory Data Register: MDR  Register file output data registers: A and B  ALU output register: ALUout  Required to store major unit output values for use in next cycle

6 Multicycle Implementation© Muhamed Mudawar, COE 308 – KFUPMSlide 6 Multicycle Datapath Changes – cont’d  This multicycle design can accommodate  One memory access per cycle  IR register saves fetched instruction  MDR register saves the read memory data  One register file access per cycle  Two registers can be read concurrently into A and B registers  One ALU operation per cycle  ALUout register saves the ALU output  Additional multiplexers are also needed  Mux before the memory address to select PC or ALUout address  Mux before 1 st ALU input to select PC to increment or A register  Extended mux before PC to increment PC, branch, or jump

7 Multicycle Implementation© Muhamed Mudawar, COE 308 – KFUPMSlide 7 Multicycle Datapath + Control Signals 4 Imm16 32 ALUALU RA RB BusA BusB RW BusW Address MemData Memory 30 PC 00 30 Rs Extender Rt muxmux 0 1 muxmux 0 1 mux 0 1 Rd muxmux 0 1 zero Data_in MDR IR Registers A B muxmux 0 1 32 ALUout 32 PC[31:28], Imm26 30 32 30 MemReadMemWrite ExtOp RegDst MemtoReg IorDIRWriteRegWriteALUSrcA ALUSrcB ALUCtrl PCWrite PCSource muxmux 2 0 1 PCWrite and IRWrite to enable the writing of PC and IR registers IorD to select memory address as either PC for instruction or ALUout for data address PCSource to select PC input ALUSrcA and ALUSrcB to select ALU inputs More control signals than single-cycle CPU NextPC

8 Multicycle Implementation© Muhamed Mudawar, COE 308 – KFUPMSlide 8 SignalEffect when ‘0’Effect when ‘1’ RegDstDestination register = RtDestination register = Rd RegWriteNoneRegister(RW) ← BusW ExtOp16-bit immediate is zero-extended16-bit immediate is sign-extended ALUSrcA1 st ALU operand is PC (upper 30-bit)1 st ALU operand is the A register ALUSrcB2 nd ALU operand is the B register2 nd ALU input is extended-imm16 MemReadNoneMemData ← Memory[address] MemWriteNoneMemory[address] ← Data_in MemtoRegBusW = ALUoutBusW = MDR IorDMemory Address = PCMemory Address = ALUout IRWriteNoneIR ← MemData PCWriteNonePC ← NextPC Control Signals SignalValueEffect PCSource 00NextPC = PC[31:2] + 1 (increment upper 30 bits of PC) 01NextPC = ALUout = PC[31:2] + 1 + sign-extend(imm16) (for branch) 10NextPC = PC[31:28], imm26 (for jump)

9 Multicycle Implementation© Muhamed Mudawar, COE 308 – KFUPMSlide 9 ALUALU PC 00 muxmux 0 1 muxmux 2 0 1 32 30 1. Instruction Fetch Cycle 4 Imm16 32 RA RB BusA BusB RW BusW Rs Extender Rt muxmux 0 1 muxmux 0 1 Rd zero MDR Registers A B 32 ALUout 32 PC[31:28], Imm26 30 32 30 IorD = 0 MemRead = 1 MemWrite = 0 IRWrite = 1 RegWrite = 0 ALUSrcA = 0 ALUSrcB = x ALUCtrl = INC RegDst = x PCWrite = 1 PCSource = 0 mux 0 1 MemtoReg = x IR←Memory[PC] PC← PC + 4 muxmux 0 1 Address MemData Memory Data_in IR ExtOp = x IorD = 0 MemRead = 1 IRWrite = 1 ALUSrcA = 0,ALU = INC PCSource = 0,PCWrite = 1 MemWrite = 0, RegWrite = 0 Don’t care about rest 30 NextPC 1 st ALU input = 00, PC[31:2]

10 Multicycle Implementation© Muhamed Mudawar, COE 308 – KFUPMSlide 10 2. Decode and Register Fetch Address MemData Memory muxmux 0 1 Rd muxmux 0 1 zero Data_in MDR IR Imm16 32 ALUALU 30 Extender muxmux 0 1 32 30 ALUout 32 muxmux 0 1 RA RB BusA BusB RW BusW Registers A B Rs Rt 30 32 ALUSrcB = 1 ALUCtrl = ADD mux 0 1 MemtoReg = x A ← Reg[Rs], B ← Reg[Rt] A, B are written on every cycle ExtOp = 1 ALUout ← PC[31:2] + sign-ext(Im16) (branch address) ALUSrcA = 0, ALUSrcB = 1, ExtOp = 1, ALU = ADD MemRead = MemWrite = IRWrite = RegWrite = 0 PCSource = 10, PCWrite = J (Jump Completion)Compute branch address in advance During this cycle, the instruction is decoded to determine the control signals for the next cycles 1 st ALU input = 00, PC[31:2] PC[31:28], Imm26 30 4 muxmux 2 0 1 NextPC 30 PC 00 IorD = x MemRead = 0 MemWrite = 0 IRWrite = 0 RegWrite = 0 ALUSrcA = 0 RegDst = x PCWrite = J PCSource = 10

11 Multicycle Implementation© Muhamed Mudawar, COE 308 – KFUPMSlide 11 Rs Rt A B 3a. Execute Cycle for R-type 4 Imm16 RA RB BusA BusB RW BusW Address MemData Memory 30 Extender muxmux 0 1 Rd muxmux 0 1 zero Data_in MDR IR Registers 32 PC[31:28], Imm26 30 32 30 IorD = x MemRead = 0 MemWrite = 0 IRWrite = 0 RegWrite = 0 ALUSrcA = 1 ALUSrcB = 0 ALUCtrl = funct RegDst = x PCWrite = 0 PCSource = x muxmux 2 0 1 mux 0 1 MemtoReg = x NextPC For R-type ALU instructions: ALUout ← A funct B ALUCtrl depends on the function field ExtOp = x ALUSrcA = 1, ALUSrcB = 0, ALU = funct PCWrite = MemRead = MemWrite = IRWrite = RegWrite = 0 Don’t care about rest PC 00 During this cycle, the ALU can perform different operations depending on the instruction class 30 32 ALUALU muxmux 0 1 muxmux 0 1 ALUout 32

12 Multicycle Implementation© Muhamed Mudawar, COE 308 – KFUPMSlide 12 Rs Rt 3b. Compute Address for Load/Store 4 Address MemData Memory 30 muxmux 0 1 Rd muxmux 0 1 zero Data_in MDR 32 PC[31:28], Imm26 30 32 30 IorD = x MemRead = 0 MemWrite = 0 IRWrite = 0 RegWrite = 0 ALUSrcA = 1 ALUSrcB = 1 ALUCtrl = ADD RegDst = x PCWrite = 0 PCSource = x muxmux 2 0 1 mux 0 1 MemtoReg = x NextPC ALUout ← A + sign-extend(Immediate16) ExtOp = sign ALUSrcA = 1,ALUSrcB = 1 ExtOp = sign,ALU = ADD PCWrite = MemRead = MemWrite = IRWrite = RegWrite = 0 PC 00 For load/store instructions, ALU computes memory address during 3 rd cycle 30 32 IR A B RA RB BusA BusB RW BusW Registers Imm16 Extender ALUALU 32 muxmux 0 1 muxmux 0 1 ALUout 32 Same control signals can be used with I-type ALU

13 Multicycle Implementation© Muhamed Mudawar, COE 308 – KFUPMSlide 13 ALUout Rs Rt A B 3c. Branch Completion 4 Imm16 RA RB BusA BusB RW BusW Address MemData Memory 30 Extender muxmux 0 1 Rd muxmux 0 1 Data_in MDR IR Registers 32 PC[31:28], Imm26 30 32 30 ALUSrcB = 0 ALUCtrl = SUB mux 0 1 MemtoReg = x if (branch) PC ← ALUout ALUout is branch target address computed during the second cycle ExtOp = x ALUSrcA = 1, ALUSrcB = 0,PCSource = 01 ALUCtrl = SUB,PCWrite = Branch MemRead = MemWrite = IRWrite = RegWrite = 0 For a branch, ALU compares A with B and if the branch is taken then PC becomes ALUout zero ALUALU 32 muxmux 0 1 muxmux 0 1 Branch depends on the zero condition Branch = beq. zero + bne. zero 32 30 muxmux 2 0 1 NextPC PC 00 30 IorD = x MemRead = 0 MemWrite = 0 IRWrite = 0 RegWrite = 0 ALUSrcA = 1 RegDst = x PCWrite = Branch PCSource = 01

14 Multicycle Implementation© Muhamed Mudawar, COE 308 – KFUPMSlide 14 32 Rs A B 4a. ALU Instruction Completion 4 Imm16 Address MemData Memory 30 Extender muxmux 0 1 Data_in MDR IR 32 PC[31:28], Imm26 30 IorD = x MemRead = 0 MemWrite = 0 IRWrite = 0 RegWrite = 1 ALUSrcA = x ALUSrcB = x ALUCtrl = x RegDst = 1 or 0 PCWrite = 0 PCSource = x MemtoReg = 0 Reg[Rd]←ALUout (for R-type) Reg[Rt]←ALUout (for I-type ALU instruction) ExtOp = x RegDst = 1 (for R-type and 0 for I-type) MemtoReg = 0,RegWrite = 1 PCWrite = MemRead = MemWrite = IRWrite = 0, and don’t care about rest During 4 th cycle, ALU instruction completes writing its result into the destination register zero ALUALU 32 muxmux 0 1 muxmux 0 1 30 muxmux 2 0 1 NextPC PC 00 30 ALUout mux 0 1 32 RA RB BusA BusB RW BusW Registers Rt muxmux 0 1 Rd

15 Multicycle Implementation© Muhamed Mudawar, COE 308 – KFUPMSlide 15 32 Rs 4b. Memory Access for Load & Store 4 Imm16 30 Extender PC[31:28], Imm26 30 IorD = 1 MemRead = 1 MemWrite = 0 IRWrite = 0 RegWrite = 0 ALUSrcA = x ALUSrcB = x ALUCtrl = x RegDst = x PCWrite = 0 PCSource = x MemtoReg = x MDR ← Memory[ALUout](for load) Memory[ALUout] ← B(for store) ExtOp = x IorD = 1,MemRead = 1 (load) MemWrite = 1 (store) PCWrite = IRWrite = RegWrite = 0 Load & store access memory during 4 th cycle zero ALUALU 32 muxmux 0 1 muxmux 0 1 30 muxmux 2 0 1 NextPC PC 00 30 ALUout mux 0 1 Rt muxmux 0 1 Rd IR Address MemData Memory Data_in MDR muxmux 0 1 32 A B RA RB BusA BusB RW BusW Registers

16 Multicycle Implementation© Muhamed Mudawar, COE 308 – KFUPMSlide 16 32 Rs A B 5. Load Instruction Completion 4 Imm16 Address MemData Memory 30 Extender muxmux 0 1 Data_in MDR IR 32 PC[31:28], Imm26 30 IorD = x MemRead = 0 MemWrite = 0 IRWrite = 0 RegWrite = 1 ALUSrcA = x ALUSrcB = x ALUCtrl = x RegDst = 0 PCWrite = 0 PCSource = x MemtoReg = 1 Reg[Rt]←MDR ExtOp = x RegDst = 0 (Rt) MemtoReg = 1 RegWrite = 1 PCWrite = IRWrite = 0, MemRead = MemWrite = 0 Don’t care about rest During the 5 th cycle, the load instruction completes writing its result into register Rt zero ALUALU 32 muxmux 0 1 muxmux 0 1 30 muxmux 2 0 1 NextPC PC 00 30 ALUout 32 Rt Rd mux 0 1 RA RB BusA BusB RW BusW Registers muxmux 0 1

17 Multicycle Implementation© Muhamed Mudawar, COE 308 – KFUPMSlide 17 Instruction Execution Summary CycleActionRegister Transfers 1Fetch instructionIR ← Memory[PC], PC ← PC + 4 2 Decode instruction Fetch registers Compute branch address in advance Jump completion (case of a jump) Generate control signals A ← Reg[Rs], B ← Reg[Rt] ALUout ← PC[31:2] + sign-extend(Imm16) PC ← PC[31:28], Im26 3 Case 1: Execute R-type ALU Case 2: Execute I-type ALU Case 3: Compute load/store address Case 4: Branch completion ALUout ← A funct B ALUout ← A op extend(Imm16) ALUout ← A + sign-extend(Imm16) if (Branch) PC ← ALUout 4 Case 1: Write ALU result for R-type Case 2: Write ALU result for I-type Case 3: Access memory for load Case 4: Access memory for store Reg[Rd] ← ALUout Reg[Rt] ← ALUout MDR ← Memory[ALUout] Memory[ALUout] ← B 5Load instruction completionReg[Rt] ← MDR

18 Multicycle Implementation© Muhamed Mudawar, COE 308 – KFUPMSlide 18 Defining the Control  Control for multicycle datapath is more complex  Because instruction is executed as a sequence of steps  Values of control signals depend upon:  What instruction is being executed  Which cycle is being performed  Multicycle control is a Finite State Machine (FSM)  While single-cycle control is a combinational logic  Two implementation techniques for multicycle control  Set of states and transitions implemented directly in logic  Microprogramming: a programming representation for control

19 Multicycle Implementation© Muhamed Mudawar, COE 308 – KFUPMSlide 19 Multicycle Datapath + Control 4 Imm16 32 ALUALU RA RB BusA BusB RW BusW Address MemData Memory 30 PC 00 30 Rs Extender Rt muxmux 0 1 muxmux 0 1 mux 0 1 Rd muxmux 0 1 Data_in MDR IR Registers A B muxmux 0 1 32 ALUout 32 PC[31:28], Im26 30 32 30 ExtOp MemtoReg ALUSrcB ALUCtrl muxmux 2 0 1 ALU Control NextPC Op 6 PCSourceALUSrcARegWriteRegDst PCWrite IorDMemReadMemWrite zero funct 6 ALUop Main Control IRWrite

20 Multicycle Implementation© Muhamed Mudawar, COE 308 – KFUPMSlide 20 High Level View of FSM Control Instruction fetch R-typeLoad/Store Branch Start Instruction decode, register fetch, and branch address computation (op = R-type) I-type ALU (op = LW) or (op = SW) (op = J) (op = BEQ) or (op = BNE) (op = ANDI) or (op = ORI) or …

21 Multicycle Implementation© Muhamed Mudawar, COE 308 – KFUPMSlide 21 Branch Completion State Diagram for Multicycle Control Start BEQ or BNE Control signal values default to zero when they are not specified IorD = 0 MemRead = 1 IRWrite = 1 ALUSrcA = 0 ALUop = INC PCSource = 00 PCWrite = 1 0 Instruction fetch ALUSrcA = 0 ALUSrcB = 1 ALUop = ADD Extop = 1 PCSource = 10 PCWrite = J 1 ALUSrcA = 1 ALUSrcB = 0 ALUop = SUB PCSource = 01 PCWrite = Branch 8 67 ALUSrcA = 1 ALUSrcB = 0 ALUop = R-type R-type ALU RegDst = 1 (Rd) MemtoReg = 0 RegWrite = 1 R-type Completion IorD = 1 MemWrite = 1 5 Store Access RegDst = 0 (Rt) MemtoReg = 1 RegWrite = 1 4 Load Completion ALUSrcA = 1 ALUSrcB = 1 Extop = sign ALUop = ADD 2 Address Computation IorD = 1 MemRead = 1 3 Load Access LW SW R-type LW SW Decode, Register fetch, Jump completion J 910 ALUSrcA = 1 ALUSrcB = 1 Extop = zero ALUop = op I-type ALU RegDst = 0 (Rt) MemtoReg = 0 RegWrite = 1 I-type Completion ORI, ANDI, …

22 Multicycle Implementation© Muhamed Mudawar, COE 308 – KFUPMSlide 22 Finite State Machine Controller  Implemented as …  Comb control logic  State register RegDst RegWrite ExtOp ALUSrcA ALUSrcB MemRead MemWrite MemtoReg IorD IRWrite PCWrite PCSource State register next state Op current state zero clk ALUop Combinational Control logic 0xx1xx00x10x01100INC 1lw, swx2x100100xx0010ADD 1Rtypex6x100100xx0010ADD 1 beq bne x8x100100xx0010ADD 1jx0x100100xx0110ADD 1ori, …x9x100100xx0010ADD 2lwx3x101100xx00xADD 2swx5x101100xx00xADD 3xx4xx0xx10x100xx 4xx00x1xx001x00xx 5xx0xx0xx01x100xx 6xx7xx01000xx00xRtype 7xx01x1xx000x00xx 8 bne beq 0101 0xx01000xx0Br01SUB 9xx10x001100xx00xOp 10xx00x1xx000x00xx current statenext state RegDst zero RegWriteALUSrcAALUSrcB MemRead MemWriteMemtoRegIorDIRWrite PCWrite PCSource ALUop Op ExtOp State Transition and Output Table

23 Multicycle Implementation© Muhamed Mudawar, COE 308 – KFUPMSlide 23  Reduces hardware  One unified memory for instruction and data, and one ALU  Reduces clock cycle and time  When compared to single-cycle implementation  Breaks instruction execution into steps (step = 1 cycle)  Internal registers in datapath  Save intermediate data for later cycles  Finite State Machine (FSM) specification of control  Implementation of control  Hardwired control  a sequential machine  Microprogramming (covered in textbook) Multicycle Implementation Summary


Download ppt "CSE 308 – Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Multicycle Implementation."

Similar presentations


Ads by Google