CpE 442 Designing a Multiple Cycle Controller

CpE 442 Designing a Multiple Cycle Controller
Start X:40

Review of a Multiple Cycle Implementation
The root of the single cycle processor’s problems: The cycle time has to be long enough for the slowest instruction Solution: Break the instruction into smaller steps Execute each step (instead of the entire instruction) in one cycle Cycle time: time it takes to execute the longest step Keep all the steps to have similar length This is the essence of the multiple cycle processor The advantages of the multiple cycle processor: Cycle time is much shorter Different instructions take different number of cycles to complete Load takes five cycles Jump only takes three cycles Allows a functional unit to be used more than once per instruction Well, the root of these problems of course is that fact that the Single Cycle Processor’s cycle time has to be long enough for the slowest instruction. The solution is simple. Just break the instruction into smaller steps and instead of executing an entire instruction in one cycle, we will execute each of these steps in one cycle. Since the cycle time in this case will then be the time it takes to execute the longest step, our goal should be keeping all the steps to have similar length when we break up the instruction. Well the last two bullets pretty much summarize what a multiple cycle processor is all about. The first advantage of the multiple cycle processor is of course shorter cycle time than the single cycle processor. The cycle time now only has to be long enough to execute part of the instruction (point to “breaking into steps). But may be more importantly, now different instructions can take different number of cycles to complete. For example: (1) The load instruction will take five cycles to complete. (2) But the Jump instruction will only take three cycles. This feature greatly reduce the idle time inside the processor. Finally, the multiple cycle implementation allows a functional unit to be used more than once per instruction as long as it is used on different clock cycles. For example, as I will show you later in today’s lecture, we can use the ALU to increment the Program Counter as well as doing address calculation. +3 = 11 min. (X:51)

Review: Instruction Fetch Cycle, In the Beginning
Every cycle begins right AFTER the clock tick: mem[PC] PC<31:0> + 4 Clk One “Logic” Clock Cycle You are here! PCWr=? ALU PC 32 MemWr=? IRWr=? 32 32 As far as LOGIC is concerned, I think the easiest way to think about a clock cycle is that a clock cycle begins right AFTER a clock tick and ends at the next clock tick. Notice that I have intentionally shown the L time to be much longer than the H time because I want to emphasis the H and L time does not affect your design as long as you use the simple clocking methodology where all storage elements are triggered at the same clock tick. The only important thing here is the time between the two clock ticks, the cycle time. Most of the time, however, the high and low time are the same because it is much easier to generate a clock that has the same length of high and same length of low time. Well enough about clock ticks. Let’s see what happens at the beginning (You are Here) of the Ifetch cycle: (a) We need to fetch the instruction from Memory so we sent the address to the memory. (b) We also needs to update the PC so we better send the address to the ALU as well. ***** What values do you think the control signals PCwr and ALUop have at this point? Well since we are only at the beginning of the cycle (Your are Here), these two signals will still have the old values from the last cycle of the previous value. +2 = 27 min. (Y:07) Clk RAdr 4 32 32 Ideal Memory Instruction Reg ALU Control 32 WrAdr 32 Dout Din 32 ALUop=? 32 Clk

Review: Instruction Fetch Cycle, The End
Every cycle ends AT the next clock tick (storage element updates): IR <-- mem[PC] PC<31:0> <-- PC<31:0> + 4 Clk One “Logic” Clock Cycle You are here! PCWr=1 ALU PC 32 MemWr=0 IRWr=1 32 00 32 As time goes by, the output of the memory will become valid and the ALU, with ALUOp sets to Add, will finish the 32-bit add. Hopefully, we are smart enough to set the cycle time so the time between the clock tick is long enough to allow these (output of Memory and ALU) to stabilize. So at the end of the cycle, the clock tick will trigger the Instruction Register to save the current instruction word (output of Instruction Memory). Similarly, the Program Counter register is triggered (point to the clock input) to save the next instruction’s address (output of ALU). Unlike the single cycle datapath where we use a 30-b it PC to save hardware, here we use a 32-bit PC for the multiple cycle datapath. We are using the 32-bit ALU to do the PC update anyway so unlike the single cycle datapath where we can reduce the length of two adders by two bits, here the only thing we save are two bits in the PC register. Why bother? The Memory Unit here is also used to store data and the ALU here is also use for instruction execution. Therefore, we know we will need some MUXes in front of them. +2 = 29 min. (Y:09) Clk RAdr 4 32 32 Ideal Memory Instruction Reg ALU Control 32 32 WrAdr Dout Din 32 ALUOp = Add 32 Clk

Putting it all together: Multiple Cycle Datapath
PCWr PCWrCond PCSrc BrWr Zero IorD MemWr IRWr RegDst RegWr ALUSelA 1 Target 32 32 Mux PC Mux 1 32 Zero Rs ALU Mux 1 Ra 32 RAdr 5 32 Rt Rb busA 32 32 Ideal Memory Instruction Reg 5 Reg File 32 Mux 1 Rt 4 Rw 32 WrAdr 32 1 32 32 Rd Din Dout busW busB 32 2 32 ALU Control Putting it all together, here it is: the multiple cycle datapath we set out to built. +1 = 47 min. (Y:47) Mux 1 3 << 2 Extend Imm 16 32 ALUOp ExtOp MemtoReg ALUSelB

Instruction Fetch Cycle: Overall Picture
1: PCWr, IRWr ALUOp=Add Others: 0s x: PCWrCond RegDst, Mem2R Ifetch PCWr=1 PCWrCond=x PCSrc=0 BrWr=0 Zero IorD=0 MemWr=0 IRWr=1 ALUSelA=0 1 Target 32 32 Mux PC Mux 1 32 Zero ALU Mux 1 32 RAdr 32 busA 32 Ideal Memory For example, the Memory can get its read address from the PC for instruction fetch but it can also get the read address from other part of the datapath for data fetch. Similarly, the ALU can get its operands from the PC and a constant 4 as we showed on the last slide, but we know the ALU can also gets its operands from the register file. We will fill in the details later but for now, we know we need to set the control signals IorD, MemWr, ALUSelA, ALUSelB to zeros and IRWr and PCWr to 1s. Notice that I have added a MUX in the PC feedback path because we know for the branch instruction, the next PC will have a value OTHER than PC plus 4 (ALU inputs). We will worry about how we get this “other” value (Target) later. For now, let’s just say we have to set the MUX control (PCSrc) to zero to select the PC plus 4 value. The settings of all the control signals are summarized in this circle. Due to space limitation, I have only shown the signals that have values other than zeros. I want to emphasis that this is the picture at the END of the Instruction Fetch Cycle. From now on, all the pictures I show refer to the END where all the evaluation is completed and control signals are settled. The beginning of a cycle is very simple and I will not show it. +2 = 31 min. (Y:11) Instruction Reg 32 4 32 WrAdr 32 1 32 32 Din Dout busB 32 2 32 ALU Control 3 ALUSelB=00 ALUOp=Add

Register Fetch / Instruction Decode (Continue)
1: BrWr, ExtOp ALUOp=Add Others: 0s x: RegDst, PCSrc ALUSelB=10 IorD, MemtoReg Rfetch/Decode busA <- Reg[rs] ; busB <- Reg[rt] ; Target <- PC + SignExt(Imm16)*4 PCWr=0 PCWrCond=0 PCSrc=x BrWr=1 Zero IorD=x MemWr=0 IRWr=0 RegDst=x RegWr=0 ALUSelA=0 1 Target 32 32 Mux PC Mux 1 32 Zero Rs ALU Mux 1 Ra 32 RAdr 5 32 Rt Rb busA 32 32 Ideal Memory Instruction Reg 5 Reg File 32 Mux 1 Rt 4 What we can do is to use this ALU to calculate the branch address in advance. (1) We will set ALUSelA to 0 such that the PC is fed to the ALU input. (2) The other ALU input will come from (ALUSelB=10) the Sign Extended (ExtOp=1) version of the 16-bit immediate filed. Once we added (ALUOp = Add) these two numbers together, we will save the result in the Target register (BrWr = 1). We cannot write the ALU output to the PC yet (PCWr = 0) because once again, we are jumping the gun here. We do NOT know whether we have a branch instruction yet. The OP and Func is still being decode in this cycle (Control) so we cannot update the PC to this value unless we are SURE we have a branch and the branch condition is met (AND-OR). Once again, I have summarized all the control signals settings inside this circle. So far the Instruction Fetch and Register Read (Slash) Instruction Decode cycles are shared by all instructions. But by the end of the end of this cycle, we will know exactly what instruction we have (Control output). Let’s say, by God, we do have a branch, what do we do? +2 = 35 min. (Y:15) Rw 32 WrAdr 32 1 32 Rd 32 Din Dout busW busB 32 2 32 ALU Control 3 << 2 Beq Control Op Rtype Imm 6 ALUSelB=10 Ori Func Extend Memory 6 16 32 ALUOp=Add : ExtOp=1

R-type Execution ALU Output <- busA op busB RExec PCWr=0 PCWrCond=0
1: RegDst ALUOp=Rtype ALUSelB=01 x: PCSrc, IorD MemtoReg ALUSelA ExtOp RExec ALU Output <- busA op busB PCWr=0 PCWrCond=0 PCSrc=x BrWr=0 Zero IorD=x MemWr=0 IRWr=0 RegDst=1 RegWr=0 ALUSelA=1 1 Target 32 32 Mux PC Mux 1 32 Zero Rs ALU Mux 1 Ra 32 RAdr 5 32 Rt Rb busA 32 32 Ideal Memory Instruction Reg 5 Reg File 32 Mux 1 Rt 4 Rw 32 WrAdr 32 1 32 Rd 32 Din Dout busW busB 32 Once again, fetching the registers Rs and Rt in the previous cycle pays off. We need these two registers now and they are already on busA and busB, respectively. So all we need is set the ALUSelA and ALUSelB to feed busA and busB into the ALU and tell the ALU local control we have a R-type instruction (ALUOp). The ALU will then do the right thing and generate the correct result (ALU output) at the end of this cycle. Notice that I have set RegDst to 1 here even though we are not writing the register file (RegWr is zero). Register file is not written until the next cycle. You would think RegDst should be don’t care at this point. ****** Anybody want to guess why I set RegDst to 1 at this point? Remember what I said about the address (Rw) MUST be stable BEFORE we set Write Enable (RegWr) to one for the Real memory and register file that do not have a clock input. Here by setting RegDst to one, I can guarantee the Rw specifier will be stable by next cycle where I will perform the write by setting RegWr to 1. +2 = 39 min. (Y:19) 2 32 ALU Control Mux 1 3 << 2 Extend Imm 16 32 ALUOp=Rtype ExtOp=x MemtoReg=x ALUSelB=01

R-type Completion R[rd] <- ALU Output Rfinish PCWr=0 PCWrCond=0
1: RegDst, RegWr ALUOp=Rtype ALUselA x: IorD, PCSrc ALUSelB=01 ExtOp Rfinish R[rd] <- ALU Output PCWr=0 PCWrCond=0 PCSrc=x BrWr=0 Zero IorD=x MemWr=0 IRWr=0 RegDst=1 RegWr=1 ALUSelA=1 1 Target 32 32 Mux PC Mux 1 32 Zero Rs ALU Mux 1 Ra 32 RAdr 5 32 Rt Rb busA 32 32 Ideal Memory Instruction Reg 5 Reg File 32 Mux 1 Rt 4 Rw 32 WrAdr 32 1 32 32 Rd Din Dout busW busB 32 So here is the picture where we finish off the R-type instruction by writing the ALU output back to the register file (MemtoReg=0 and RegWr = 1). Notice that in order to keep the ALU output from changing, the ALUSelA, ALUSelB, and ALUOp control signals must remain the same as the previous cycle. This brings us to a side topic I want to cover. +1 = 40 min. (Y:20) 2 32 ALU Control Mux 1 3 << 2 Extend Imm 16 32 ALUOp=Rtype ExtOp=x MemtoReg=0 ALUSelB=01

Outline of Today’s Lecture
Recap (5 minutes) Review of FSM control (15 minutes) From Finite State Diagrams to Microprogramming (25 minutes) ABCs of microprogramming (25 minutes) In the next fifteen minutes or so, I will give you an introduction to the concept of multiple cycle processor implementation. Then I will show you how to build a datapath for a multiple cycle processor by looking at what needs to be done for the different MIPS instructions. I will also show you a phenomenon called multiple cycle delay path and how to avoid them. Finally, I will summarize what we learned. So let’s get on with today’s lecture. +1 = 6 min. (X:46)

Overview of Next Two Lectures
Control may be designed using one of several initial representations. The choice of sequence control, and how logic is represented, can then be determined independently; the control can then be implemented with one of several methods using a structured logic technique. Initial Representation Finite State Diagram Microprogram Sequencing Control Explicit Next State Microprogram counter Function + Dispatch ROMs Logic Representation Logic Equations Truth Tables Implementation Technique PLA ROM “hardwired control” “microprogrammed control”

Initial Representation: Finite State Diagram
1: PCWr, IRWr ALUOp=Add Others: 0s x: PCWrCond RegDst, Mem2R Ifetch 1 1: BrWr, ExtOp ALUOp=Add Others: 0s x: RegDst, PCSrc ALUSelB=10 IorD, MemtoReg Rfetch/Decode 8 1: PCWrCond ALUOp=Sub x: IorD, Mem2Reg ALUSelB=01 RegDst, ExtOp ALUSelA BrComplete PCSrc 2 beq AdrCal 1: ExtOp ALUSelA ALUSelB=11 ALUOp=Add lw or sw x: MemtoReg Ori PCSrc Rtype 10 lw sw OriExec 3 6 1: RegDst ALUOp=Rtype ALUSelB=01 x: PCSrc, IorD MemtoReg ALUSelA ExtOp RExec 5 ALUOp=Or IorD, PCSrc 1: ALUSelA ALUSelB=11 x: MemtoReg SWMem LWmem 1: ExtOp ALUSelA, IorD 1: ExtOp MemWr ALUSelB=11 ALUSelA ALUOp=Add ALUSelB=11 x: MemtoReg ALUOp=Add PCSrc Well, we pretty much concentrate on the multiple cycle datapath today. But if you think about it, by grouping all the control signals in circles along the way, we have pretty much specified the control in a state diagram. All instructions start out at the Instruction Fetch cycle and continue to the Instruction Decode slash Register Fetch cycle. Once the instruction is decoded, we will either go to the Branch Complete cycle to complete the branch or go to one of the following: (1) R-type execution or OR immediate execution of R-type or Or immediate instructions. (2) Or we will go to the memory address calculation cycle for load and store instruction. The rest is pretty much straight forward. Before I go any further, I like to tell you a small story ... So if you ask me how to implement this state diagram in hardware, I will say: “It is so simple even ...” +5 = 75 min. (Y:55) x: PCSrc,RegDst 11 MemtoReg OriFinish 7 1: RegDst, RegWr ALUOp=Rtype ALUselA x: IorD, PCSrc ALUSelB=01 ExtOp Rfinish ALUOp=Add x: PCSrc 1: ALUSelA ALUSelB=11 MemtoReg RegWr, ExtOp IorD 1: ALUSelA ALUOp=Or x: IorD, PCSrc RegWr ALUSelB=11 4 LWwr

Sequencing Control: Explicit Next State Function
Control Logic Multicycle Datapath Inputs Opcode State Reg Next state number is encoded just like datapath controls

Logic Representative: Logic Equations
Alternatively, prior state & condition S4, S5, S7, S8, S9, S11 -> State0 _________________ -> State 1 _________________ -> State 2 _________________ -> State 3 _________________ -> State 4 State2 & op = sw -> State 5 _________________ -> State 6 State 6 -> State 7 _________________ -> State 8 State2 & op = jmp -> State 9 _________________ -> State 10 State 10 -> State 11 Next state from current state State 0 -> State1 State 1 -> S2, S6, S8, S10 State 2 ->__________ State 3 ->__________ State 4 ->State 0 State 5 -> State 0 State 6 -> State 7 State 7 -> State 0 State 8 -> State 0 State 9-> State 0 State 10 -> State 11 State 11 -> State 0 State 2 ->State3 & State4 State 3 ->State 4 RHS _____________State0 -> State 1 State1 & op = lw | op= sw -> State 2 State2 & op = lw -> State 3 State 3 -> State 4 State2 & op = sw -> State 5 State2 & op = Rtype -> State 6 State 6 -> State 7 State2 & op = beq -> State 8 State2 & op = jmp -> State 9 State2 & op = ori-> State 10

Implementation Technique: Programmed Logic Arrays
Each output line the logical OR of logical AND of input lines or their complement: AND minterms specified in top AND plane, OR sums specified in bottom OR plane Op5 R = beq = lw = sw = ori = jmp = Op4 Op3 Op2 Op1 Op0 S3 S2 S1 S0 0 = 0000 1 = 0001 2 = 0010 3 = 0011 4 = 0100 5 = 0101 6 = 0110 7 = 0111 8 = 1000 9 = 1001 10 = 1010 11 = 1011 NS3 NS2 NS1 NS0

Implementation Technique: Programmed Logic Arrays
Each output line the logical OR of logical AND of input lines or their complement: AND minterms specified in top AND plane, OR sums specified in bottom OR plane Op5 lw = sw = R = ori = beq = jmp = Op4 Op3 Op2 Op1 Op0 S3 S2 S1 S0 0 = 0000 1 = 0001 2 = 0010 3 = 0011 4 = 0100 5 = 0101 6 = 0110 7 = 0111 8 = 1000 9 = 1001 10 = 1010 11 = 1011 NS3 NS2 NS1 NS0

Multicycle Control Given numbers of FSM, can turn determine next state as function of inputs, including current state Turn these into Boolean equations for each bit of the next state lines Can implement easily using PLA What if many more states, many more conditions? What if need to add a state?

Next Iteration: Using Sequencer for Next State
Before Explicit Next State: Next try variation 1 step from right hand side Few sequential states in small FSM: suppose added floating point? Still need to go to non-sequential states: e.g., state 1 => 2, 6, 8, 10 Initial Representation Finite State Diagram Microprogram Sequencing Control Explicit Next State Microprogram counter Function + Dispatch ROMs Logic Representation Logic Equations Truth Tables Implementation Technique PLA ROM “hardwired control” “microprogrammed control”

Sequencer-based control unit
Control Logic Multicycle Datapath Outputs Inputs Types of “branching” • Set state to 0 • Dispatch (state 1 & 2) • Use incremented state number 1 State Reg MicroPC Adder Address Select Logic Opcode

Sequencer-based control unit details
Control Logic Inputs Dispatch ROM 1 Op Name State Rtype jmp beq ori lw sw 0010 Dispatch ROM 2 lw sw 0101 1 State Reg Adder Mux 3 2 1 Address Select Logic ROM2 ROM1 Opcode

Next Iteration: Using a ROM for implementation
Initial Representation Finite State Diagram Microprogram Sequencing Control Explicit Next State Microprogram counter Function + Dispatch ROMs Logic Representation Logic Equations Truth Tables Implementation Technique PLA ROM “hardwired control” “microprogrammed control”

Implementing Control with a ROM
Instead of a PLA, use a ROM with one word per state (“Control word”) State number Control Word Bits Control Word Bits 1-0 … … 00

“microprogrammed control”
Next Iteration: Using Microprogram for Representation Initial Representation Finite State Diagram Microprogram Sequencing Control Explicit Next State Microprogram counter Function + Dispatch ROMs Logic Representation Logic Equations Truth Tables Implementation Technique PLA ROM ROM can be thought of as a sequence of control words Control word can be thought of as instruction: “microinstruction” Rather than program in binary, use assembly language “hardwired control” “microprogrammed control”

Microprogramming: General Concepts
Control is the hard part of processor design ° Datapath is fairly regular and well-organized ° Memory is highly regular ° Control is irregular and global Microprogramming: -- A Particular Strategy for Implementing the Control Unit of a processor by "programming" at the level of register transfer operations Microarchitecture: -- Logical structure and functional capabilities of the hardware as seen by the microprogrammer

Macroinstruction Interpretation
User program plus Data this can change! Main Memory ADD SUB AND . one of these is mapped into one of these DATA execution unit CPU control memory AND microsequence e.g., Fetch Calc Operand Addr Fetch Operand(s) Calculate Save Answer(s)

Variations on Microprogramming
° Horizontal Microcode – control field for each control point in the machine ° Vertical Microcode – compact microinstruction format for each class of microoperation branch: µseq-op µadd execute: ALU-op A,B,R memory: mem-op S, D µseq µaddr A-mux B-mux bus enables register enables Horizontal Vertical

Extreme Horizontal 3 1 input select N3 N2 N1 N0 1 bit for each loadable register enbMAR enbAC Incr PC ALU control Depending on bus organization, many potential control combinations simply not wrong, i.e., implies transfers that can never happen at the same time. Makes sense to encode fields to save ROM space Example: gate Rx and gate Ry to same bus should never happen encoded in single bit which is decoded rather than two separate bits NOTE: encoding should be just sufficient that parallel actions that the datapath supports should still be specifiable in a single microinstruction

Horizontal vs. Vertical Microprogramming
NOTE: previous organization is not TRUE horizontal microprogramming; register decoders give flavor of encoded microoperations Most microprogramming-based controllers vary between: horizontal organization (1 control bit per control point) vertical organization (fields encoded in the control memory and must be decoded to control something) Vertical + easier to program, not very different from programming a RISC machine in assembly language - extra level of decoding may slow the machine down Horizontal + more control over the potential parallelism of operations in the datapath - uses up lots of control store

Vax Microinstructions
VAX Microarchitecture: 96 bit control store, 30 fields, 4096 µinstructions for VAX ISA encodes concurrently executable "microoperations" 95 87 84 68 65 63 11 USHF UALU USUB UJMP 001 = left 010 = right . 101 = left3 010 = A-B-1 100 = A+B+1 00 = Nop 01 = CALL 10 = RTN Jump Address ALU Control Subroutine Control ALU Shifter Control

Microprogramming Pros and Cons
Ease of design Flexibility Easy to adapt to changes in organization, timing, technology Can make changes late in design cycle, or even in the field Can implement very powerful instruction sets (just more control memory) Generality Can implement multiple instruction sets on same machine. Can tailor instruction set to application. Compatibility Many organizations, same instruction set Costly to implement Slow

Microprogramming one inspiration for RISC
If simple instruction could execute at very high clock rate… If you could even write compilers to produce microinstructions… If most programs use simple instructions and addressing modes… If microcode is kept in RAM instead of ROM so as to fix bugs … If same memory used for control memory could be used instead as cache for “macroinstructions”… Then why not skip instruction interpretation by a microprogram and simply compile directly into lowest language of machine?

Summary: Multicycle Control
Microprogramming and hardwired control have many similarities, perhaps biggest difference is initial representation and ease of change of implementation, with ROM generally being easier than PLA Initial Representation Finite State Diagram Microprogram Sequencing Control Explicit Next State Microprogram counter Function + Dispatch ROMs Logic Representation Logic Equations Truth Tables Implementation Technique PLA ROM “hardwired control” “microprogrammed control”

CpE 442 Designing a Multiple Cycle Controller

Similar presentations

Presentation on theme: "CpE 442 Designing a Multiple Cycle Controller"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CpE 442 Designing a Multiple Cycle Controller

Similar presentations

Presentation on theme: "CpE 442 Designing a Multiple Cycle Controller"— Presentation transcript:

Similar presentations

About project

Feedback