Presentation is loading. Please wait.

Presentation is loading. Please wait.

CPSC 321 Computer Architecture and Engineering Lecture 8 Designing a Multicycle Processor Instructor: Rabi Mahapatra & Hank Walker Adapted from the lecture.

Similar presentations


Presentation on theme: "CPSC 321 Computer Architecture and Engineering Lecture 8 Designing a Multicycle Processor Instructor: Rabi Mahapatra & Hank Walker Adapted from the lecture."— Presentation transcript:

1 CPSC 321 Computer Architecture and Engineering Lecture 8 Designing a Multicycle Processor Instructor: Rabi Mahapatra & Hank Walker Adapted from the lecture notes of John Kubiatowicz (UCB)

2 Recap: A Single Cycle Datapath 32 ALUctr Clk busW RegWr 32 busA 32 busB 555 RwRaRb 32 32-bit Registers Rs Rt Rd RegDst Extender Mux 32 16 imm16 ALUSrc ExtOp Mux MemtoReg Clk Data In WrEn 32 Adr Data Memory 32 MemWr ALU Instruction Fetch Unit Clk Equal Instruction 0 1 0 1 01 Imm16RdRtRs nPC_sel

3 Recap: The “Truth Table” for the Main Control Main Control op 6 ALU Control (Local) func 3 6 ALUop ALUctr 3 RegDst ALUSrc :

4 Recap: PLA Implementation of the Main Control op.... op.. op.. op.. op.. R-typeorilwswbeqjump RegWrite ALUSrc MemtoReg MemWrite Branch Jump RegDst ExtOp ALUop

5 Recap: Systematic Generation of Control °In our single-cycle processor, each instruction is realized by exactly one control command or “microinstruction” in general, the controller is a finite state machine microinstruction can also control sequencing (see later) Control Logic / Store (PLA, ROM) OPcode Datapath Instruction Decode Conditions Control Points microinstruction

6 The Big Picture: Where are We Now? °The Five Classic Components of a Computer °Today’s Topic: Designing the Datapath for the Multiple Clock Cycle Datapath Control Datapath Memory Processor Input Output

7 Abstract View of our single cycle processor °looks like a FSM with PC as state PC Next PC Register Fetch ALU Reg. Wrt Mem Access Data Mem Instruction Fetch Result Store ALUctr RegDst ALUSrc ExtOp MemWr Equal nPC_sel RegWr MemWr MemRd Main Control ALU control op fun Ext

8 What’s wrong with our CPI=1 processor? °Long Cycle Time °All instructions take as much time as the slowest °Real memory is not as nice as our idealized memory cannot always get the job done in one (short) cycle PCInst Memory mux ALUData Mem mux PCReg FileInst Memory mux ALU mux PCInst Memory mux ALUData Mem PCInst Memorycmp mux Reg File Arithmetic & Logical Load Store Branch Critical Path setup

9 Reducing Cycle Time °Cut combinational dependency graph and insert register / latch °Do same work in two fast cycles, rather than one slow one °May be able to short-circuit path and remove some components for some instructions! storage element Acyclic Combinational Logic storage element Acyclic Combinational Logic (A) storage element Acyclic Combinational Logic (B) 

10 Basic Limits on Cycle Time °Next address logic PC <= branch ? PC + offset : PC + 4 °Instruction Fetch InstructionReg <= Mem[PC] °Register Access A <= R[rs] °ALU operation R <= A + B PC Next PC Operand Fetch Exec Reg. File Mem Access Data Mem Instruction Fetch Result Store ALUctr RegDst ALUSrc ExtOp MemWr nPC_sel RegWr MemWr MemRd Control

11 Partitioning the CPI=1 Datapath °Add registers between smallest steps °Place enables on all registers PC Next PC Operand Fetch Exec Reg. File Mem Access Data Mem Instruction Fetch Result Store ALUctr RegDst ALUSrc ExtOp MemWr nPC_sel RegWr MemWr MemRd Equal

12 Example Multicycle Datapath °Critical Path ? PC Next PC Operand Fetch Instruction Fetch nPC_sel IR Reg File Ext ALU Reg. File Mem Acces s Data Mem Result Store RegDst RegWr MemWr MemRd S M MemToReg Equal ALUctr ALUSrc ExtOp A B E

13 Recall: Step-by-step Processor Design Step 1: ISA => Logical Register Transfers Step 2: Components of the Datapath Step 3: RTL + Components => Datapath Step 4: Datapath + Logical RTs => Physical RTs Step 5: Physical RTs => Control

14 Step 4: R-type (add, sub,...) °Logical Register Transfer °Physical Register Transfers inst Logical Register Transfers ADDUR[rd] <– R[rs] + R[rt]; PC <– PC + 4 inst Physical Register Transfers IR <– MEM[pc] ADDUA<– R[rs]; B <– R[rt] S <– A + B R[rd] <– S; PC <– PC + 4 Exec Reg. File Mem Acces s Data Mem SM Reg File PC Next PC IR Inst. Mem Time A B E

15 Step 4: Logical immed °Logical Register Transfer °Physical Register Transfers inst Logical Register Transfers ORIR[rt] <– R[rs] OR ZExt(Im16); PC <– PC + 4 inst Physical Register Transfers IR <– MEM[pc] ORIA<– R[rs]; B <– R[rt] S <– A or ZExt(Im16) R[rt] <– S; PC <– PC + 4 Exec Reg. File Mem Acces s Data Mem SM Reg File PC Next PC IR Inst. Mem Time A B E

16 Step 4 : Load °Logical Register Transfer °Physical Register Transfers inst Logical Register Transfers LWR[rt] <– MEM[R[rs] + SExt(Im16)]; PC <– PC + 4 inst Physical Register Transfers IR <– MEM[pc] LWA<– R[rs]; B <– R[rt] S <– A + SExt(Im16) M <– MEM[S] R[rd] <– M; PC <– PC + 4 Exec Reg. File Mem Acces s Data Mem SM Reg File PC Next PC IR Inst. Mem A B E Time

17 Step 4 : Store °Logical Register Transfer °Physical Register Transfers inst Logical Register Transfers SWMEM[R[rs] + SExt(Im16)] <– R[rt]; PC <– PC + 4 inst Physical Register Transfers IR <– MEM[pc] SWA<– R[rs]; B <– R[rt] S <– A + SExt(Im16); MEM[S] <– BPC <– PC + 4 Exec Reg. File Mem Acces s Data Mem SM Reg File PC Next PC IR Inst. Mem A B E Time

18 Step 4 : Branch °Logical Register Transfer °Physical Register Transfers inst Logical Register Transfers BEQif R[rs] == R[rt] then PC <= PC + 4+SExt(Im16) || 00 else PC <= PC + 4 Exec Reg. File Mem Acces s Data Mem SM Reg File PC Next PC IR Inst. Mem inst Physical Register Transfers IR <– MEM[pc] BEQE<– (R[rs] = R[rt]) if !E then PC <– PC + 4 else PC <– PC+4+SExt(Im16)||00 A B E Time

19 Alternative datapath (book): Multiple Cycle Datapath °Miminizes Hardware: 1 memory, 1 adder Ideal Memory WrAdr Din RAdr 32 Dout MemWr 32 ALU 32 ALUOp ALU Control Instruction Reg 32 IRWr 32 Reg File Ra Rw busW Rb 5 5 32 busA 32busB RegWr Rs Rt Mux 0 1 Rt Rd PCWr ALUSelA Mux 01 RegDst Mux 0 1 32 PC MemtoReg Extend ExtOp Mux 0 1 32 0 1 2 3 4 16 Imm 32 << 2 ALUSelB Mux 1 0 Target 32 Zero PCWrCondPCSrcBrWr 32 IorD ALU Out

20 Our Control Model °State specifies control points for Register Transfer °Transfer occurs upon exiting state (same falling edge) Control State Next State Logic Output Logic inputs (conditions) outputs (control points) State X Register Transfer Control Points Depends on Input

21 Step 4  Control Specification for multicycle proc IR <= MEM[PC] R-type A <= R[rs] B <= R[rt] S <= A fun B R[rd] <= S PC <= PC + 4 S <= A or ZX R[rt] <= S PC <= PC + 4 ORi S <= A + SX R[rt] <= M PC <= PC + 4 M <= MEM[S] LW S <= A + SX MEM[S] <= B PC <= PC + 4 BEQ PC <= Next(PC,Equal) SW “instruction fetch” “decode / operand fetch” Execute Memory Write-back

22 Traditional FSM Controller State 6 4 11 next State op Equal control points stateopcond next state control points Truth Table datapath State

23 Step 5  (datapath + state diagram  control) °Translate RTs into control points °Assign states °Then go build the controller

24 Mapping RTs to Control Points IR <= MEM[PC] R-type A <= R[rs] B <= R[rt] S <= A fun B R[rd] <= S PC <= PC + 4 S <= A or ZX R[rt] <= S PC <= PC + 4 ORi S <= A + SX R[rt] <= M PC <= PC + 4 M <= MEM[S] LW S <= A + SX MEM[S] <= B PC <= PC + 4 BEQ PC <= Next(PC,Equal) SW “instruction fetch” “decode” imem_rd, IRen ALUfun, Sen RegDst, RegWr, PCen Aen, Ben, Een Execute Memory Write-back

25 Assigning States IR <= MEM[PC] R-type A <= R[rs] B <= R[rt] S <= A fun B R[rd] <= S PC <= PC + 4 S <= A or ZX R[rt] <= S PC <= PC + 4 ORi S <= A + SX R[rt] <= M PC <= PC + 4 M <= MEM[S] LW S <= A + SX MEM[S] <= B PC <= PC + 4 BEQ PC <= Next(PC) SW “instruction fetch” “decode” 0000 0001 0100 0101 0110 0111 1000 1001 1010 00111011 1100 Execute Memory Write-back

26 (Mostly) Detailed Control Specification (missing  0) 0000???????00011 0001BEQx00111 1 1 0001R-typex01001 1 1 0001ORIx01101 1 1 0001LWx10001 1 1 0001SWx10111 1 1 0011xxxxxx000001 0 x 0 x 0011xxxxxx100001 1 x 0 x 0100xxxxxxx01010 1 fun 1 0101xxxxxxx00001 0 0 1 1 0110xxxxxxx01110 0 or 1 0111xxxxxxx00001 0 0 1 0 1000xxxxxxx10011 0 add 1 1001xxxxxxx10101 0 1 1010 xxxxxxx00001 0 1 1 0 1011xxxxxxx11001 0 add 1 1100xxxxxxx0000 1 00 1 0 StateOp fieldEqNext IRPCOpsExecMemWrite-Back en selA B EEx Sr ALU S R W MM-R Wr Dst R: ORi: LW: SW: -all same in Moore machine BEQ:

27 Performance Evaluation °What is the average CPI? state diagram gives CPI for each instruction type workload gives frequency of each type TypeCPI i for typeFrequency CPI i x freqI i Arith/Logic440%1.6 Load530%1.5 Store410%0.4 branch320%0.6 Average CPI:4.1

28 Controller Design °The state digrams that arise define the controller for an instruction set processor are highly structured °Use this structure to construct a simple “microsequencer” °Control reduces to programming this very simple device  microprogramming sequencer control datapath control micro-PC sequencer microinstruction

29 Example: Jump-Counter op-code Map ROM Counter zero inc load 0000 i i+1 i None of above: Do nothing (for wait states)

30 Using a Jump Counter IR <= MEM[PC] R-type A <= R[rs] B <= R[rt] S <= A fun B R[rd] <= S PC <= PC + 4 S <= A or ZX R[rt] <= S PC <= PC + 4 ORi S <= A + SX R[rt] <= M PC <= PC + 4 M <= MEM[S] LW S <= A + SX MEM[S] <= B PC <= PC + 4 BEQ PC <= Next(PC) SW “instruction fetch” “decode” 0000 0001 0100 0101 0110 0111 1000 1001 1010 00111011 1100 inc load zero inc Execute Memory Write-back

31 Our Microsequencer op-code Map ROM Micro-PC Z I L datapath control taken

32 Microprogram Control Specification 0000?inc1 00010load1 1 00110zero1 0 00111zero1 1 0100xinc0 1 fun 1 0101xzero1 00 1 1 0110xinc0 0 or 1 0111xzero1 00 1 0 1000xinc1 0 add 1 1001xinc1 0 1 1010 xzero1 01 1 0 1011xinc1 0 add 1 1100xzero 1 00 1 0 µPC TakenNext IRPCOpsExecMemWrite-Back en selA B Ex Sr ALU S R W MM-R Wr Dst R: ORi: LW: SW: BEQ

33 Adding the Dispatch ROM °Sequencer-based control unit from last lecture Called “microPC” or “µPC” vs. state register Control ValueEffect 00 Next µaddress = 0 01 Next µaddress = dispatch ROM 10 Next µaddress = µaddress + 1 ROM: Opcode microPC 1 µAddress Select Logic Adder ROM Mux 0 012 R-type0000000100 BEQ0001000011 ori0011010110 LW1000111000 SW1010111011

34 Example: Controlling Memory PC Instruction Memory Inst. Reg addr data IR_en InstMem_rd IM_wait

35 Controller handles non-ideal memory IR <= MEM[PC] R-type A <= R[rs] B <= R[rt] S <= A fun B R[rd] <= S PC <= PC + 4 S <= A or ZX R[rt] <= S PC <= PC + 4 ORi S <= A + SX R[rt] <= M PC <= PC + 4 M <= MEM[S] LW S <= A + SX MEM[S] <= B BEQ PC <= Next(PC) SW “instruction fetch” “decode / operand fetch” Execute Memory Write-back ~wait wait ~waitwait PC <= PC + 4 ~wait wait

36 Overview of Control °Control may be designed using one of several initial representations. The choice of sequence control, and how logic is represented, can then be determined independently; the control can then be implemented with one of several methods using a structured logic technique. Initial Representation Finite State Diagram Microprogram Sequencing ControlExplicit Next State Microprogram counter Function + Dispatch ROMs Logic RepresentationLogic EquationsTruth Tables Implementation PLAROM Technique “hardwired control”“microprogrammed control”

37 Microprogramming (Maurice Wilkes) °Control is the hard part of processor design ° Datapath is fairly regular and well-organized ° Memory is highly regular ° Control is irregular and global Microprogramming: -- A Particular Strategy for Implementing the Control Unit of a processor by "programming" at the level of register transfer operations Microarchitecture: -- Logical structure and functional capabilities of the hardware as seen by the microprogrammer Historical Note: IBM 360 Series first to distinguish between architecture & organization Same instruction set across wide range of implementations, each with different cost/performance

38 “Macroinstruction” Interpretation Main Memory execution unit control memory CPU ADD SUB AND DATA...... User program plus Data this can change! AND microsequence e.g., Fetch Calc Operand Addr Fetch Operand(s) Calculate Save Answer(s) one of these is mapped into one of these

39 New Finite State Machine (FSM) Spec IR <= MEM[PC] PC <= PC + 4 R-type ALUout <= A fun B R[rd] <= ALUout ALUout <= A or ZX R[rt] <= ALUout ORi ALUout <= A + SX R[rt] <= M M <= MEM[ALUout] LW ALUout <= A + SX MEM[ALUout] <= B SW “instruction fetch” “decode” Execute Memory Write-back 0000 0001 0100 0101 0110 0111 1000 1001 1010 1011 1100 BEQ 0010 0011 If A = B then PC <= ALUout ALUout <= PC +SX Q: How improve to do something in state 0001?

40 Finite State Machine (FSM) Spec IR <= MEM[PC] PC <= PC + 4 R-type ALUout <= A fun B R[rd] <= ALUout ALUout <= A or ZX R[rt] <= ALUout ORi ALUout <= A + SX R[rt] <= M M <= MEM[ALUout] LW ALUout <= A + SX MEM[ALUout] <= B SW “instruction fetch” “decode” 0000 0001 0100 0101 0110 0111 1000 1001 1010 1011 1100 BEQ 0010 If A = B then PC <= ALUout ALUout <= PC +SX Execute Memory Write-back

41 sequencer control micro-PC  -sequencer: fetch,dispatch, sequential Dispatch ROM Opcode Inputs Microprogramming °Microprogramming is a fundamental concept implement an instruction set by building a very simple processor and interpreting the instructions essential for very complex instructions and when few register transfers are possible overkill when ISA matches datapath 1-1  -Code ROM To DataPath Decode datapath control microinstruction (  )

42 Designing a Microinstruction Set 1) Start with list of control signals 2) Group signals together that make sense (vs. random): called “fields” 3) Place fields in some logical order (e.g., ALU operation & ALU operands first and microinstruction sequencing last) 4) To minimize the width, encode operations that will never be used at the same time 5) Create a symbolic legend for the microinstruction format, showing name of field values and how they set the control signals Use computers to design computers

43 Again: Alternative multicycle datapath (book) °Miminizes Hardware: 1 memory, 1 adder Ideal Memory WrAdr Din RAdr 32 Dout MemWr 32 ALU 32 ALUOp ALU Control 32 IRWr Instruction Reg 32 Reg File Ra Rw busW Rb 5 5 32 busA 32 busB RegWr Rs Rt Mux 0 1 Rt Rd PCWr ALUSelA Mux 01 RegDst Mux 0 1 32 PC MemtoReg Extend ExtOp Mux 0 1 32 0 1 2 3 4 16 Imm 32 << 2 ALUSelB Mux 1 0 32 Zero PCWrCondPCSrc 32 IorD Mem Data Reg ALU Out B A

44 1&2) Start with list of control signals, grouped into fields Signal nameEffect when deassertedEffect when asserted ALUSelA1st ALU operand = PC1st ALU operand = Reg[rs] RegWriteNoneReg. is written MemtoRegReg. write data input = ALUReg. write data input = memory RegDstReg. dest. no. = rtReg. dest. no. = rd MemReadNoneMemory at address is read, MDR <= Mem[addr] MemWriteNoneMemory at address is written IorDMemory address = PCMemory address = S IRWriteNoneIR <= Memory PCWriteNonePC <= PCSource PCWriteCond NoneIF ALUzero then PC <= PCSource PCSource PCSource = ALU PCSource = ALUout ExtOpZero ExtendedSign Extended Single Bit Control Signal nameValueEffect ALUOp00ALU adds 01ALU subtracts 10ALU does function code 11ALU does logical OR ALUSelB002nd ALU input = 4 012nd ALU input = Reg[rt] 102nd ALU input = extended,shift left 2 112nd ALU input = extended Multiple Bit Control

45 3&4) Microinstruction Format: unencoded vs. encoded fields Field NameWidthControl Signals Set wide narrow ALU Control42ALUOp SRC121ALUSelA SRC253ALUSelB, ExtOp ALU Destination32RegWrite, MemtoReg, RegDst Memory32MemRead, MemWrite, IorD Memory Register11IRWrite PCWrite Control32PCWrite, PCWriteCond, PCSource Sequencing32AddrCtl Total width2415bits

46 5) Legend of Fields and Symbolic Names Field NameValues for FieldFunction of Field with Specific Value ALUAddALU adds Subt. ALU subtracts Func codeALU does function code OrALU does logical OR SRC1PC1st ALU input = PC rs1st ALU input = Reg[rs] SRC242nd ALU input = 4 Extend2nd ALU input = sign ext. IR[15-0] Extend02nd ALU input = zero ext. IR[15-0] Extshft2nd ALU input = sign ex., sl IR[15-0] rt2nd ALU input = Reg[rt] destinationrd ALUReg[rd] = ALUout rt ALUReg[rt] = ALUout rt Mem Reg[rt] = Mem MemoryRead PCRead memory using PC Read ALURead memory using ALUout for addr Write ALUWrite memory using ALUout for addr Memory registerIRIR = Mem PC writeALUPC = ALU ALUoutCondIF ALU Zero then PC = ALUout SequencingSeqGo to sequential µinstruction FetchGo to the first microinstruction DispatchDispatch using ROM.

47 Quick check: what do these fieldnames mean? CodeNameRegWriteMemToRegRegDest 00---0XX 01rd ALU101 10rt ALU100 11rt MEM110 CodeNameALUSelBExtOp 000---XX 001400X 010rt01X 011ExtShft101 100Extend111 111Extend0110 Destination: SRC2:

48 Specific Sequencer Sequencer-based control unit Called “microPC” or “µPC” vs. state register Code NameEffect 00 fetchNext µaddress = 0 01dispatchNext µaddress = dispatch ROM 10 seqNext µaddress = µaddress + 1 ROM: Opcode microPC 1 µAddress Select Logic Adder ROM Mux 0 012 R-type0000000100 BEQ0001000011 ori0011010110 LW1000111000 SW1010111011

49 Microprogram it yourself! LabelALU SRC1SRC2Dest.MemoryMem. Reg. PC Write Sequencing Fetch:AddPC4Read PCIRALUSeq

50 Microprogram it yourself! LabelALU SRC1SRC2Dest.MemoryMem. Reg. PC Write Sequencing Fetch:AddPC4Read PCIRALUSeq AddPCExtshftDispatch Rtype:FuncrsrtSeq rd ALUFetch Lw:AddrsExtend Seq Read ALU Seq rt MEM Fetch Sw:AddrsExtendSeq Write ALUFetch Ori:Orrs Extend0 Seq rt ALUFetch Beq:Subt.rsrt ALUoutCond.Fetch

51 Microprogramming Pros and Cons °Ease of design °Flexibility Easy to adapt to changes in organization, timing, technology Can make changes late in design cycle, or even in the field °Can implement very powerful instruction sets (just more control memory) °Generality Can implement multiple instruction sets on same machine. Can tailor instruction set to application. °Compatibility Many organizations, same instruction set °Costly to implement °Slow

52 Summary °Microprogramming is a fundamental concept implement an instruction set by building a very simple processor and interpreting the instructions essential for very complex instructions and when few register transfers are possible Control design reduces to Microprogramming °Design of a Microprogramming language 1.Start with list of control signals 2.Group signals together that make sense (vs. random): called “fields” 3.Place fields in some logical order (e.g., ALU operation & ALU operands first and microinstruction sequencing last) 4.To minimize the width, encode operations that will never be used at the same time 5.Create a symbolic legend for the microinstruction format, showing name of field values and how they set the control signals


Download ppt "CPSC 321 Computer Architecture and Engineering Lecture 8 Designing a Multicycle Processor Instructor: Rabi Mahapatra & Hank Walker Adapted from the lecture."

Similar presentations


Ads by Google