Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 15 - Multi-Cycle Processor Design 2 Fall 2006 Reading: , C.4 - C.5, Verilog Handout Section 6-10 Homework: 5.32, 5.34, 5.35, 5.36, 5.49, 5.55 Portions of these slides are derived from: Textbook figures © 1998 Morgan Kaufmann Publishers all rights reserved Tod Amon's COD2e Slides © 1998 Morgan Kaufmann Publishers all rights reserved Dave Patterson’s CS 152 Slides - Fall 1997 © UCB Rob Rutenbar’s Slides - Fall 1999 CMU other sources as noted
ECE 313 Fall 2006Lecture 15 - Multicycle Design 22 Outline - Multicycle Design Overview Datapath Design Controller Design Aside: FSM Design in Verilog Performance Considerations Extending the Design: An Example Microprogramming Exceptions
ECE 313 Fall 2006Lecture 15 - Multicycle Design 23 Review State Machine Design Traditional Approach: Create State Diagram Create State Transition Table Assign State Codes Write Excitation Equations & Minimize HDL-Based State Machine Design Create State Diagram (optional) Write HDL description of state machine Synthesize
ECE 313 Fall 2006Lecture 15 - Multicycle Design 24 Review - State Transition Table / Diagram Transition List - lists edges in STD PSConditionNSOutput IDLEARM' + DOOR'IDLE0 IDLEARM*DOORBEEP0 BEEPARMWAIT1 BEEPARM'IDLE1 WAITARMBEEP0 WAITARM'IDLE0 IDLE BEEP Honk=1 WAIT ARMDOOR ARM ARM’ ARM’ + ARMDOOR’ = ARM’ + DOOR’
ECE 313 Fall 2006Lecture 15 - Multicycle Design 25 Coding FSMs in Verilog Clocked always block - state register Combinational always block - next state logic output logic
ECE 313 Fall 2006Lecture 15 - Multicycle Design 26 Coding FSMs in Verilog - Code Skeleton Part 1 - Declarations module fsm(inputs, outputs); input...; reg...; parameter [NBITS-1:0] S0 = 2'b00; S1 = 2'b01; S2 = 2b'10; S3 = 2b'11; reg [NBITS-1 :0] CURRENT_STATE; reg [NBITS-1 :0] NEXT_STATE; State Codes State Variable
ECE 313 Fall 2006Lecture 15 - Multicycle Design 27 Coding FSMs in Verilog - Code Skeleton Part 2 - State Register, Logic Specification clk) begin CURRENT_STATE <= NEXT_STATE; end or xin) begin case (CURRENT_STATE) S0:... determine NEXT_STATE, outputs S1 :... determine NEXT_STATE, outputs end case end // always endmodule
ECE 313 Fall 2006Lecture 15 - Multicycle Design 28 FSM Example - Car Alarm Part 1 - Declarations, State Register module car_alarm (arm, door, reset, clk, honk ); input arm, door, reset, clk; output honk; reg honk; parameter IDLE=0,BEEP=1,HWAIT=2; reg [1:0] current_state, next_state; reset or posedge clk) if (reset) current_state <= IDLE; else current_state <= next_state;
ECE 313 Fall 2006Lecture 15 - Multicycle Design 29 FSM Example - Car Alarm Part 2 - Logic Specification or arm or door) case (current_state) IDLE : begin honk = 0; if (arm && door) next_state = BEEP; else next_state = IDLE; end BEEP: begin honk = 1; if (arm) next_state = HWAIT; else next_state = IDLE; end IDLE BEEP Honk=1 WAIT ARMDOOR ARM ARM’ ARM’ + ARMDOOR’ = ARM’ + DOOR’
ECE 313 Fall 2006Lecture 15 - Multicycle Design 210 FSM Example - Car Alarm Part 3 - Logic Specification (cont’d) HWAIT : begin honk = 0; if (arm) next_state = BEEP; else next_state = IDLE; end default : begin honk = 0; next_state = IDLE; end endcase endmodule IDLE BEEP Honk=1 WAIT ARMDOOR ARM ARM’ ARM’ + ARMDOOR’ = ARM’ + DOOR’
ECE 313 Fall 2006Lecture 15 - Multicycle Design 211 FSM Example - Verilog Handout Divide-by-Three Counter S0 out=0 S1 out=0 S1 out=1 reset
ECE 313 Fall 2006Lecture 15 - Multicycle Design 212 Verilog Code - Divide by Three Counter Part 1 module divideby3FSM(clk, reset, out); inputclk; inputreset; outputout; reg[1:0] state; reg[1:0]nextstate; parameterS0 = 2’b00; parameterS1 = 2’b01; parameterS2 = 2’b10; // State Register clk or posedge reset) if (reset) state <= S0; else state <= nextstate;
ECE 313 Fall 2006Lecture 15 - Multicycle Design 213 Verilog Code - Divide by Three Counter Part 2 // Next State Logic case (state) S0: nextstate = S1; S1: nextstate = S2; S2: nextstate = S0; default: nextstate = S0; endcase // Output Logic assign out = (state == S2); endmodule
ECE 313 Fall 2006Lecture 15 - Multicycle Design 214 Verilog Example: MIPS Control Unit
ECE 313 Fall 2006Lecture 15 - Multicycle Design 215 Review: Full Multicycle Implementation
ECE 313 Fall 2006Lecture 15 - Multicycle Design 216 MIPS Control Unit “Skeleton” - Part 1 module mips_control( clk, reset, Op, PCWrite, PCWriteCond, IorD, MemRead, MemWrite, MemtoReg, IRWrite, PCSource, ALUOp ALUSrcB, ALUSrcA, RegWrite, RegDst ); input clk; input reset; input [5:0] Op; output PCWrite; output PCWriteCond; output IorD; output MemRead; output MemWrite; output MemtoReg; output IRWrite; output [1:0] PCSource; output [1:0] ALUOp; output ALUSrcA; output [1:0] ALUSrcB; output RegWrite; output RegDst; port declarations
ECE 313 Fall 2006Lecture 15 - Multicycle Design 217 MIPS Control Unit “Skeleton” - Part 2 reg PCWrite; reg PCWriteCond; reg IorD; reg MemRead; reg MemWrite; reg MemtoReg; reg IRWrite; reg [1:0] PCSource; reg [1:0] ALUOp; reg ALUSrcA; reg [1:0] ALUSrcB; reg RegWrite; reg RegDst; parameter R_FORMAT = 6'd0; parameter LW = 6'd35; parameter SW = 6'd43; parameter BEQ = 6'd4; parameter J = 6’d2; parameter S0=4'd0, S1=4'd1, S2=4'd2, S3=4'd3, S4=4'd4, S5=4'd5, S6=4'd6, S7=4'D7, S8=4'd8, S9=4'd9; Symbolic Constants - opcodes Symbolic Constants - state codes reg declarations for output ports
ECE 313 Fall 2006Lecture 15 - Multicycle Design 218 MIPS Control Unit “Skeleton” - Part 3 reg [3:0] current_state, next_state; clk) begin if (reset) current_state <= S0; else current_state <= next_state; end or Op) begin // default values PCWrite = 1'b0; PCWriteCond = 1'b0; IorD = 1'bx; MemRead = 1'b0; MemWrite = 1'b0; MemtoReg = 1'bx; IRWrite = 1'b0; PCSource = 2'bxx; ALUOp = 2'bxx; ALUSrcA = 1'bx; ALUSrcB = 2'bxx; RegWrite = 1'b0; RegDst = 1'bx; case (current_state) S0: begin MemRead = 1'b1; ALUSrcA = 1'b0; IorD = 1'b0; IRWrite = 1'b1; ALUSrcB = 2'b01; ALUOp = 2'b00; PCWrite = 1'b1; PCSource = 2'b00; next_state = S1; end … endcase end endmodule Add code here! Default Values More Default Values
ECE 313 Fall 2006Lecture 15 - Multicycle Design 219 Controller Implementation Typical Implementation: Figure 5-37, p. 338 Variations Random logic PLA ROM address lines = inputs data lines = outputs contents = “truth table” Datapath control outputs Inputs from Instr. Reg (opcode) Combinational Control Logic State Next State
ECE 313 Fall 2006Lecture 15 - Multicycle Design 220 Outline - Multicycle Design Overview Datapath Design Controller Design Aside: FSM Design in Verilog Performance Considerations Extending the Design: An Example Microprogramming Exceptions
ECE 313 Fall 2006Lecture 15 - Multicycle Design 221 Performance of a Multicycle Implementation What is the CPI of the Multicycle Implementation? Using measured instruction mix from SPECINT2000 lw5 cycles25% sw4 cycles10% R-type4 cycles52% branch3 cycles11% jump3 cycles2% What is the CPI? CPI = (5 cycles * 0.25) + (4 cycles * 0.10) + (4 cycles * 0.53) + (3 cycles * 0.11) + (3 cycles * 0.02) CPI = 4.12 cycles per instruction
ECE 313 Fall 2006Lecture 15 - Multicycle Design 222 Performance Continued Assuming a 200ps clock, what is average execution time/instruction? Sec/Instr = 4.12 CPI * 200ps/cycle) = 824ps/instr How does this compare to the Single-Cycle Case? Sec/Instr = 1 CPI * 600ps/cycle = 600ps/instr Single-Cycle is 1.38 times faster than Multicycle Why is Single-Cycle faster than Multicycle? Branch & jump are the same speed (600ps vs 600ps) R-type & store are faster (600ps vs 800ps) Load word is faster (600ps vs 1000ps)
ECE 313 Fall 2006Lecture 15 - Multicycle Design 223 Outline - Multicycle Design Overview Datapath Design Controller Design Aside: FSM Design in Verilog Performance Considerations Extending the Design: An Example Microprogramming Exceptions
ECE 313 Fall 2006Lecture 15 - Multicycle Design 224 Multicycle Example Problem Extend the design to implement the “jr” (jump register) instruction: jr rsPC = Reg[rs] Format: Steps: 1.Review instruction requirements (register transfer) 2.Modify datapath 3.Modify control logic 0 rs bits5 bits 6 bits
ECE 313 Fall 2006Lecture 15 - Multicycle Design 225 Reg[rs] Example Problem: Datapath What needs to be changed?
ECE 313 Fall 2006Lecture 15 - Multicycle Design 226 Example Problem: Control What needs to be changed? PCWrite PCSource = 11 (OP = ‘JR‘)
ECE 313 Fall 2006Lecture 15 - Multicycle Design 227 Outline - Multicycle Design Overview Datapath Design Controller Design Aside: FSM Design in Verilog Performance Considerations Extending the Design: An Example Microprogramming Exceptions
ECE 313 Fall 2006Lecture 15 - Multicycle Design 228 Control Implementation - Another View Separate Logic into two pieces Output Logic (this is a Moore Machine - why?) Next-State Logic
ECE 313 Fall 2006Lecture 15 - Multicycle Design 229 Microprogramming - Motivation Problems with graphical approach to FSM Design Unwieldy for large number of states (real processors may have hundreds of instructions -> hundreds of states) Unwieldy if instruction types vary radically (can you say… x86?) Most states are sequential (state 4 follows state 3; state 3 follows state 2; state 7 follows state 6; etc. Idea: expand on ROM implementation of control
ECE 313 Fall 2006Lecture 15 - Multicycle Design 230 Consider Output Logic in ROM ROM Characteristics - "lookup table" State code for each state is a ROM address Control outputs for each state are a ROM word
ECE 313 Fall 2006Lecture 15 - Multicycle Design 231 Microprogramming - Basic Idea Idea: expand on ROM control implementation One state = one ROM word = one microinstruction State sequences form a microprogram Each state code becomes a microinstruction address
ECE 313 Fall 2006Lecture 15 - Multicycle Design 232 Microprogramming - Sequencer Design
ECE 313 Fall 2006Lecture 15 - Multicycle Design 233 Describing Microcode Each microinstruction is lots of 1's and 0's To ease understanding: Break into fields related to different datapath functions Use mnemonics to describe different field values Datapath Control Signals ALU controlSequencingLabel AddRead PCALUSeqstring Subt Func Code PC A B 4 Extend Extshft Read Write ALU Write MDR Read ALU Write ALU ALUOut-cond Jump address Fetch Dispatch i See also: Figure C.5.1, p. C-28 SRC1Reg. controlMemoryPCWrite control ALUOp ALUSrcA ALUSrcB RegWrite RegDst MemRead SRC2 MemWrite IRWrite PCWrite PCWriteCond IorD PCSource MemtoReg AddrCtl Sequencer Control Signal
ECE 313 Fall 2006Lecture 15 - Multicycle Design 234 Microcode for Multicycle Implementation
ECE 313 Fall 2006Lecture 15 - Multicycle Design 235 Sequencer Implementation Details
ECE 313 Fall 2006Lecture 15 - Multicycle Design 236 Microcoding Tradeoffs +Makes design easier +Flexible Easy to adapt to changes in organization, timing, technology Can make changes late in design cycle Can add more instructions just by adding microcode -Costly to implement -Slow - "extra level" of interpretation
ECE 313 Fall 2006Lecture 15 - Multicycle Design 237 Microcoding Perspective Not used in modern RISC processors simple instructions -> simple control hardwired control -> faster execution pipelining used to enhance performance Used heavily in CISC processors Traditional CISC: all instructions microcoded multiple dispatch ROMs to handle different instruction classes, addressing modes, etc. Current CISC (see Section 5.9) Microinstructions pipelined like RISC instructions! Simple instructions translate to one microinstruction Complex instructions translate to multiple microinstructions
ECE 313 Fall 2006Lecture 15 - Multicycle Design 238 Instruction Decoding in the Pentium 4 Source: “The Microarchitecture of the Pentium® 4 Processor”, Intel Technology Journal, First Quarter
ECE 313 Fall 2006Lecture 15 - Multicycle Design 239 Instruction Decoding in the Pentium 4 Source: “The Microarchitecture of the Pentium® 4 Processor”, Intel Technology Journal, First Quarter
ECE 313 Fall 2006Lecture 15 - Multicycle Design 240 Coming Up Implementing Exceptions Pipelined Design