Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple Cycle Implementation of MIPS-Lite CPU

Similar presentations


Presentation on theme: "Multiple Cycle Implementation of MIPS-Lite CPU"— Presentation transcript:

1 Multiple Cycle Implementation of MIPS-Lite CPU
CS365 Lecture 8

2 Review on Single Cycle Datapath
Subset of the core MIPS ISA Arithmetic/Logic instructions: AND, OR, ADD, SUB, SLT Data flow instructions: LW, SW Branch instructions: BEQ, J Five steps in processor design Analyze the instruction Determine the datapath components Assemble the components Determine the control Design the control unit CS465 Multi-cycle CPU

3 Complete Single Cycle Datapath
How lw, sw, R-Type, beq, j instructions work? CS465 Multi-cycle CPU

4 Delays in Single Cycle Datapath
What are the delays for lw, sw, R-Type, beq, j instructions? CS465 Multi-cycle CPU

5 Single Cycle Implementation
Calculate cycle time assuming negligible delays except: memory (2ns), ALU and adders (2ns), register file access (1ns) Instruction class Instruction Fetch Register Access ALU Register/ Memory Access R-Type X R 6 Load M 8 Store 7 Branch 5 Jump 2 CS465 Multi-cycle CPU

6 Remarks on Single Cycle Datapath
Single cycle datapath ensures the execution of any instruction within one clock cycle Functional units must be duplicated if used multiple times by one instruction, e.g. ALU Why? Functional units can be shared if used by different instructions Single cycle datapath is not efficient in time Clock cycle time is determined by the instruction taking the longest time, eg. lw in MIPS Variable clock cycle time is too complicated Alternative design/implementation approaches Multiple clock cycles per instruction – this lecture Pipelining – Chapter 6 CS465 Multi-cycle CPU

7 Multi-cycle Approach Single cycle problems: One solution:
What if we had a more complicated instruction like FP? Wasteful of area One solution: Use a “smaller” cycle time Have different instructions take different numbers of cycles A “multicycle” datapath P C M e m o r y A d s I n t u c i a D g R # L U B O CS465 Multi-cycle CPU

8 Multistage Execution of Instructions
We have informally described instructions as executing in several steps Instruction fetch and PC update Instruction decode and reading from registers Performing an ALU computation Reading/writing data memory Storing data back to register file What if we made these stages explicit in the hardware design? CS465 Multi-cycle CPU

9 Benefits Performance benefits Cost benefits
Each instruction can execute only the stages that are necessary E.g. branch instruction beq Instructions can complete as soon as possible, instead of being limited by the slowest instruction Cost benefits As an added bonus, we can eliminate some of the extra hardware from the single-cycle datapath We will restrict to using each functional unit once per cycle We could reuse some units in a different cycle during the execution of a single instruction E.g. beq can compare operand and compute target in different cycles CS465 Multi-cycle CPU

10 Multicycle Approach We will be reusing functional units
ALU used to compute address and to increment PC Memory used for instruction and data Unlike the single cycle implementation, our control signals will not be determined solely by instruction E.g., what should the ALU do for a “subtract” instruction? Need to specify not only instruction but also which cycle in instruction’s execution We will use a finite state machine for control CS465 Multi-cycle CPU

11 Review: Finite State Machines
A set of states and Next state function (determined by current state and the input) Output function (determined by current state and possibly input) There are two types of FSM according to the output function N e x t - s a f u n c i o C r l k O p I CS465 Multi-cycle CPU

12 Moore Machine vs. Mealy Machine
The output function depends on the current state Mealy machine The output function depends on both the current state and the current input We will use a Moore machine Refer to Appendix B.10 (page B-67) N e x t - s a f u n c i o C r l k O p I CS465 Multi-cycle CPU

13 Multicycle Approach Break up the instructions into steps, each step takes a cycle Balance the amount of work to be done Restrict each cycle to use only one major functional unit At the end of a cycle Store values for use in later cycles Introduce additional “internal” registers S h i f t l e 2 P C M m o r y D a W d u x 1 R g s 4 I n c [ 5 ] 3 6 A L U Z B O CS465 Multi-cycle CPU

14 Changes to Datapath A single memory unit is used to store both instructions and data Memory address specified by PC or ALUout Need a new multiplexor (IorD) to control whether the memory is being accessed to read an instruction or read/write data (for lw/sw) P C M e m o r y A d s I n t u c i a D g R # L U B O CS465 Multi-cycle CPU

15 Changes to Datapath New internal registers added to support multi-cycle operation Store data output by functional elements for use in subsequent cycles Memory access: Instruction register (IR), memory data register (MDR); register access:A, B; ALU op: ALUOut All except IR hold data only between adjacent cycles Only IR needs to hold the instruction until the end of execution of that instruction, hence need a write control P C M e m o r y A d s I n t u c i a D g R # L U B O CS465 Multi-cycle CPU

16 Changes to Datapath ALU used for all arithmetic (including computing branch target address and PC+4) Need a new multiplexor for top input to ALU (ALUSrcA): choose between PC and register A Bottom input multiplexor (ALUSrcB) to ALU extended to have four inputs (instead of two in single cycle datapath): register B, sign-extended immediate, address offset (sign-extended and shifted), constant 4 P C M e m o r y A d s I n t u c i a D g R # L U B O CS465 Multi-cycle CPU

17 Multicycle Datapath with Control
Figure 5.27 CS465 Multi-cycle CPU

18 Multicycle Datapath & Control
S h i f t l e 2 P C M u x 1 R g s r W d a I n c o [ 5 ] 4 3 6 A L U Z m y B D O p - J 8 With support to jump and branch Figure 5.28 CS465 Multi-cycle CPU

19 Five Execution Steps Instruction fetch
Instruction decode and register fetch Execution, memory address computation, or branch completion Memory access or R-type instruction completion Write-back step INSTRUCTIONS TAKE FROM CYCLES! CS465 Multi-cycle CPU

20 High Level View of FSM Control
Figure 5.31 CS465 Multi-cycle CPU

21 Step 1: Instruction Fetch
Operations Use PC to get instruction and put it in the Instruction Register Increment the PC by 4 and put the result back in PC Described in RTL (Register-Transfer Language) IR = Memory[PC]; PC = PC + 4; Control signals Assert MemRead, IRwrite Set IorD to 0  select PC as the source of the address Increment PC by 4: setting ALUSrcA to 0 (sending PC to ALU), ALUSrcB to 01 (sending 4), ALUOp  00 (for add), store PC+4 to PC  PCSource 00, and PCWrite  1 What is the advantage of updating the PC now? succintly CS465 Multi-cycle CPU

22 Step 2: Instr. Decode & Reg. Fetch
Operations Read registers rs and rt in case we need them Compute the branch address in case the instruction is a branch RTL: A = Reg[IR[25-21]]; B = Reg[IR[20-16]]; ALUOut = PC + (sign-extend(IR[15-0]) << 2); Control signals We are not setting any control lines based on the instruction type yet (we are busy "decoding" it in our control logic) Set ALUSrcA to 0 so that PC is sent to ALU, ALUSrcB to 11 so that sign-extended and shifted offset to ALU CS465 Multi-cycle CPU

23 Instruction Fetch & Decode
A L U S r c = B 1 O p M e m R a d I o D W i t P C u n s f h / g ( ' ) - y E Q J F 5 . 3 4 35 36 CS465 Multi-cycle CPU

24 Step 3: Execution ALU is performing one of three functions, based on instruction type Need to set ALUSrcA, ALUSrcB, ALUOp Memory reference (LW/SW) ALUOut = A + sign-extend(IR[15-0]); R-type: ALUOut = A op B; Branch: if (A==B) PC = ALUOut; Need to set PCWriteCond, PCSource, use Zero Jump PC = PC[31:28]|(IR[25:0], 2’b00) Need to set PCWrite, PCSource CS465 Multi-cycle CPU

25 Step 4: R-type or Memory-access
Loads and stores access memory MDR = Memory[ALUOut]; or Memory[ALUOut] = B; Need to set MemRead/MemWrite, IorD R-type instructions finish Reg[IR[15-11]] = ALUOut; Need to set RegDst, RegWrite, MemtoReg The write actually takes place at the end of the cycle on the edge CS465 Multi-cycle CPU

26 Step 5: Write-back For LW instruction:
Reg[IR[20-16]]= MDR; Need to set RegWrite, MemtoReg, RegDst What about all the other instructions? CS465 Multi-cycle CPU

27 FSM for Memory Access Instr.
W r i t I o D = 1 R a d A L U S c B O p g s y u n ( ' ) - b k 4 2 5 3 F T . CS465 Multi-cycle CPU

28 FSM for R-format Instructions
L U S r c = 1 B O p R e g D s t W i M m o E x u n - y l 6 7 ( ) F a T 5 . 3 2 CS465 Multi-cycle CPU

29 FSM for Branch Instruction
p l e t i 8 ( O = ' E Q ) F s 1 T g u 5 . 3 2 A L U S P C W d CS465 Multi-cycle CPU

30 FSM for Jump J u m p c o l e t i n 9 ( O = ' ) F r s a 1 T g 5 . 3 2 P
g 5 . 3 2 P C W S CS465 Multi-cycle CPU

31 Summary of Steps CS465 Multi-cycle CPU

32 Complete Finite State Machine
W r i t e S o u c = 1 A L U B O p n d R g D s M m I a f h / J l E x y - b k ( ' ) Q 4 9 8 6 2 7 5 3 CS465 Multi-cycle CPU

33 Implementing the Control
Value of control signals is dependent upon: What instruction is being executed Which step is being performed Use the information we’ve accumulated to specify a finite state machine Specify the finite state machine graphically, or Use microprogramming Implementation can be derived from specification CS465 Multi-cycle CPU

34 Finite State Machine for Control
Implementation: P C W r i t e o n d I D M m R g S u c A L U O p B s N 3 2 1 5 4 a f l CS465 Multi-cycle CPU

35 Simple Questions How many cycles will it take to execute this code? lw $t2, 0($t3) lw $t3, 4($t3) beq $t2, $t3, Label #assume not taken add $t5, $t2, $t3 sw $t5, 8($t3) Label: ... What is going on during the 8th cycle of execution? In what cycle does the actual addition of $t2 and $t3 takes place? CS465 Multi-cycle CPU

36 Exceptions Hardest part of control is to implement exceptions & interrupts Type of event From where? MIPS terminology I/O device request External Interrupt Invoke the operating system from user program Internal Exception Arithmetic overflow Using an undefined instruction Hardware malfunctions Internal or External Exception or interrupt CS465 Multi-cycle CPU

37 How Are Exceptions Handled?
In our design, we will consider two types of exceptions Arithmetic overflow Execution of an undefined instruction Actions on exception Save address of offending instruction in the Exception Program Counter (EPC) Transfer control to the operating system at a pre-specified address (exception handler) OS then takes appropriate action CS465 Multi-cycle CPU

38 Exception Handling For the OS to take appropriate action, it must know the reason for the exception Two ways to communicate reason to OS Have a Status register which holds a field that indicates the reason for the exception Vectored interrupts Address to which control is transferred depends upon the cause of the exception MIPS uses first method above; it has a register called Cause (in addition to the EPC register) Undefined instruction =0 Arithmetic overflow =1 CS465 Multi-cycle CPU

39 CPU w/ Exception Support
h i f t l e 2 M m o r y D a W d u x 1 I n s c [ 5 ] 4 g 3 6 A L U Z B R P C O p - J 8 E CS465 Multi-cycle CPU

40 Exception Handling Datapath additions Control signals Three steps
EPC, Cause (for undefined instruction, Cause = 0, arithmetic overflow Cause = 1) Control signals EPCWrite, CauseWrite IntCause (sets the Cause register) PCSrc has to be modified so that one of its sources is the OS entry point Three steps Write Cause EPC = PC – 4 (Have to use ALU, so need to expand multiplexors for ALUSrcA and ALUSrcB Write PC CS465 Multi-cycle CPU

41 CPU w/ Exception Support
h i f t l e 2 M m o r y D a W d u x 1 I n s c [ 5 ] 4 g 3 6 A L U Z B R P C O p - J 8 E CS465 Multi-cycle CPU

42 States for Handling Exceptions
1 T o s t a e b g i n x r u c S = A L U B O p E P C W I PC CS465 Multi-cycle CPU

43 FSM including Exception Support
A L U S r c = 1 B O p P C W i t e o n d u R g D s M m I a f h / J l E x y - b k ( ' ) Q 4 9 8 6 2 7 5 3 v w CS465 Multi-cycle CPU

44 Exercise Given the following instruction mix, what is the CPI for the multi-cycle CPU, assuming each step requires 1 clock cycle? Loads: 20%, stores: 10%, branches: 13%, jumps: 2%, ALU: 55% Idea Get the cycles needed for every instruction type: these are their individual CPIs Calculate the average CPI considering the instruction mix (instruction frequency) CS465 Multi-cycle CPU

45 Cycles for Individual Instructions
Loads: 5, stores: 4 ALU instructions: 4 Branches: 3 Jumps: 3 CS465 Multi-cycle CPU

46 Exercise Given the following instruction mix, what is the CPI for the multi-cycle CPU, assuming each step requires 1 clock cycle? Loads: 20%, stores: 10%, branches: 13%, jumps: 2%, ALU: 55% CPI for each of them: loads: 5, stores: 4, branches: 3, jumps:3, ALU: 4 Average CPI = 0.20    4 = 4.05 Compare with the worst-case CPI of 5.0 if all instructions take the same number of cycles CS465 Multi-cycle CPU

47 Summary Multiple-cycle datapath and control Typical steps
Break up the instructions into steps, each step takes a cycle Reduced clock cycle time; different instructions take varied cycles to implement Functional units can be reused Typical steps Instruction fetch Instruction decode and register fetch Execution: R-type, memory address computation, condition check Memory access or R-type write Write back from memory CS465 Multi-cycle CPU

48 Next Lecture Topic Reading Pipelining Ch6 Patterson and Hennessy CS465
Multi-cycle CPU


Download ppt "Multiple Cycle Implementation of MIPS-Lite CPU"

Similar presentations


Ads by Google