Presentation is loading. Please wait.

Presentation is loading. Please wait.

Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides.

Similar presentations


Presentation on theme: "Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides."— Presentation transcript:

1 Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides

2 Dave Patterson serving as ACM president

3 Recap: Processor Design is a Process °Bottom-up assemble components in target technology to establish critical timing °Top-down specify component behavior from high-level requirements °Iterative refinement establish partial solution, expand and improve datapath control processor Instruction Set Architecture => Reg. FileMuxALURegMemDecoderSequencer CellsGates

4 Recap: A Single Cycle Datapath °We have everything except control signals (underlined) How to generate the control signals?

5 The Truth Table for the Main Control R-typeorilwswbeqjump RegDst ALUSrc MemtoReg RegWrite MemWrite Branch Jump ExtOp ALUop (Symbolic) x R-type Or Add x 1 x x 0 x x Subtract x x x x xxx op ALUop x x x Main Control op 6 ALU Control (Local) func 3 6 ALUop ALUctr 3 RegDst ALUSrc : Conditions

6 PLA Implementation of the Main Control RegWrite ALUSrc MemtoReg MemWrite Branch Jump RegDst ExtOp ALUop

7 Systematic Generation of Control °In our single-cycle processor, each instruction is realized by exactly one control command or microinstruction in general, the controller is a finite state machine microinstruction can also control sequencing Control Logic / Store (PLA, ROM) OPcode Datapath Instruction Decode Conditions Control Points microinstruction

8 Behavioral models of Datapath Components entity adder16 is generic (ccOut_delay : TIME := 12 ns; adderOut_delay: TIME := 12 ns); port(A, B: in vlbit_1d(15 downto 0); DOUT: out vlbit_1d(15 downto 0); CIN: in vlbit; COUT: out vlbit); end adder16; architecture behavior of adder16 is begin adder16_process: process(A, B, CIN) variable tmp : vlbit_1d(18 downto 0); variable adder_out : vlbit_1d(15 downto 0); variable carry: vlbit; begin tmp := addum (addum (A, B), CIN); adder_out := tmp(15 downto 0); carry :=tmp(16); COUT <= carry after ccOut_delay; DOUT <= adder_out after adderOut_delay; end process; end behavior; 16 AB DOUT CinCout addum adds two n-bit vectors to produce an n+1 bit vector

9 Note on VHDL variables °Can only be used in processes °Are declared before the begin keyword of a process statement °Assignment operator is := °Can be assigned an initial value by adding the := and some constant expression after the type part of the declaration °Do not have or cause events and are modified differently than signals °In the following code, if x is a bit signal, then the process will count the number of rising and falling edges that occur on the signal x count: process (x) variable changes : integer := -1; begin changes:=changes+1; end process;

10 Behavioral Specification of Control Logic °Decode / Control-store address modeled by Case statement °Each arm drives control signals for that operation just like the microinstruction either can be symbolic entity maincontrol is port(opcode:in vlbit_1d(5 downto 0); equal_cond:in vlbit; extopout vlbit; ALUsrcout vlbit; ALUopout vlbit_1d(2 downto 0); MEMwr out vlbit; MemtoRegout vlbit; RegWrout vlbit; RegDstout vlbit; nPCout vlbit; end maincontrol;

11 Abstract View of our single cycle processor °looks like a FSM with PC as state PC Next PC Register Fetch ALU Reg. Wrt Mem Access Data Mem Instruction Fetch Result Store ALUctr RegDst ALUSrc ExtOp MemWr Equal nPC_sel RegWr MemWr MemRd Main Control ALU control op fun Ext

12 Whats wrong with our CPI=1 processor? °Long Cycle Time °All instructions take as much time as the slowest °Real memory is not so nice as our idealized memory cannot always get the job done in one (short) cycle PCInst Memory mux ALUData Mem mux PCReg FileInst Memory mux ALU mux PCInst Memory mux ALUData Mem PCInst Memorycmp mux Reg File Arithmetic & Logical Load Store Branch Critical Path setup It seems the timing of Branch has been deliberately shortened with the addition of a highly efficient comparator.

13 Memory Access Time °Physics => fast memories are small (large memories are slow) question: register file vs. memory °=> Use a hierarchy of memories Storage Array selected word line address storage cell bit line sense amps address decoder Cache Processor 1 cycle proc. bus L2 Cache mem. bus 2-3 cycles cycles memory

14 Reducing Cycle Time °Cut combinational dependency graph and insert register / latch °Do same work in two fast cycles, rather than one slow one storage element Acyclic Combinational Logic storage element Acyclic Combinational Logic (A) storage element Acyclic Combinational Logic (B) =>

15 Basic Limits on Cycle Time °Next address logic PC = (branch == 1) ? PC offset : PC + 4 °Instruction Fetch InstructionReg <= Mem[PC] °Register Access A <= R[rs] °ALU operation S <= A + B PC Next PC Operand Fetch Exec Reg. File Mem Access Data Mem Instruction Fetch Result Store ALUctr RegDst ALUSrc ExtOp MemWr nPC_sel RegWr MemWr MemRd Control

16 Partitioning the CPI=1 Datapath °Add registers between smallest steps PC Next PC Operand Fetch Exec Reg. File Mem Access Data Mem Instruction Fetch Result Store ALUctr RegDst ALUSrc ExtOp MemWr nPC_sel RegWr MemWr MemRd

17 Example Multicycle Datapath PC Next PC Operand Fetch Ext ALU Reg. File Mem Access Data Mem Instruction Fetch Result Store ALUctr RegDst ALUSrc ExtOp nPC_sel RegWr MemWr MemRd IR A B S M Reg File MemToReg Equal

18 Recall: Step-by-step Processor Design Step 1: ISA => Logical Register Transfers Step 2: Components of the Datapath Step 3: RTL + Components => Datapath Step 4: Datapath + Logical RTs => Physical RTs Step 5: Physical RTs => Control

19 Step 4: R-type (add, sub,...) °Logical Register Transfer °Physical Register Transfers inst Logical Register Transfers ADDUR[rd] < – R[rs] + R[rt]; PC < – PC + 4 inst Physical Register Transfers IR < – MEM[PC] ADDUA< – R[rs]; B < – R[rt] S < – A + B R[rd] < – S; PC < – PC + 4 Exec Reg. File Mem Access Data Mem ABSM Reg File Equal PC Next PC IR Inst. Mem

20 Step 4:Logical immed °Logical Register Transfer °Physical Register Transfers inst Logical Register Transfers ORiR[rt] < – R[rs] OR zx(Im16); PC < – PC + 4 inst Physical Register Transfers IR < – MEM[pc] ORiA< – R[rs]; B < – R[rt] S < – A OR ZeroExt(Im16) R[rt] < – S; PC < – PC + 4 Exec Reg. File Mem Access Data Mem ABSM Reg File Equal PC Next PC IR Inst. Mem

21 Step 4 : Load °Logical Register Transfer °Physical Register Transfers inst Logical Register Transfers LWR[rt] < – MEM(R[rs] + sx(Im16)); PC < – PC + 4 inst Physical Register Transfers IR < – MEM[PC] LWA< – R[rs]; B < – R[rt] S < – A + SignEx(Im16) M < – MEM[S] R[rt] < – M; PC < – PC + 4 Exec Reg. File Mem Access Data Mem ABSM Reg File Equal PC Next PC IR Inst. Mem

22 Step 4 : Store °Logical Register Transfer °Physical Register Transfers inst Logical Register Transfers SWMEM(R[rs] + sx(Im16) < – R[rt]; PC < – PC + 4 inst Physical Register Transfers IR < – MEM[PC] SWA< – R[rs]; B < – R[rt] S < – A + SignEx(Im16); MEM[S] < – BPC < – PC + 4 Exec Reg. File Mem Access Data Mem ABS M Reg File Equal PC Next PC IR Inst. Mem

23 Step 4 : Branch °Logical Register Transfer °Physical Register Transfers inst Logical Register Transfers BEQif R[rs] == R[rt] then PC <= PC sx(Im16) << 2 else PC <= PC + 4 inst Physical Register Transfers IR < – MEM[PC] A<– R[rs]; B <– R[rt] BEQ|Eq PC < – PC + 4 inst Physical Register Transfers IR < – MEM[PC] A<– R[rs]; B <– R[rt] BEQ|Eq PC < – PC sx(Im16)<<2 Exec Reg. File Mem Access Data Mem A B SM Reg File Equa l PC Next PC IR Inst. Mem

24 Alternative datapath: Multiple Cycle Datapath °Minimizes Hardware: 1 memory (Princeton organization), 1 adder (ALU) Ideal Memory WrAdr Din RAdr 32 Dout MemWr 32 ALU 32 ALUOp ALU Control Instruction Reg 32 IRWr 32 Reg File Ra Rw busW Rb busA 32busB RegWr Rs Rt Mux 0 1 Rt Rd PCWr ALUSelA Mux 01 RegDst Mux PC MemtoReg Extend ExtOp Mux Imm 32 << 2 ALUSelB Mux 1 0 Target 32 Zero PCWrCondPCSelBrWr 32 IorD ALU Out 2 Note: On p. 323 of COD ed. 3, RAdr & WrAdr are combined into Address.

25 Seminal works of Mealy and Moore °G.H. Mealy, A method for synthesizing sequential circuits, Bell Technical Journal, vol. 34, pp. 1045~1079, °E.F. Moore, Gedanken-experiments on sequential machines, Automata Studies (C.E. Shannon & J. McCarthy, eds), pp.29~153, Princeton University Press, Note: Gedanken = thought, mind Aphorisms: °"Anything on earth you want to do is play. Anything on earth you have to do is work. Play will never kill you, work will. I never worked a day in my life." - Dr. Leila Denmark, 105 (in 2004), USA's oldest practicing physician. °The purpose of computing is insight, not numbers and Anything a faculty member can learn, a student can easily. - Richard Hamming, inventor of Hamming code.

26 Control Model °State specifies control points for Register Transfer °Transfer occurs upon exiting state (same falling edge) Control State Next State Logic Output Logic inputs (conditions) outputs (control points) State X Register Transfer Control Points Depends on Input

27 Step 4 => Control Specification for multicycle proc IR <= MEM[PC] R-type A <= R[rs] B <= R[rt] S <= A fun B R[rd] <= S PC <= PC + 4 S <= A or ZX R[rt] <= S PC <= PC + 4 ORi S <= A + SX R[rt] <= M PC <= PC + 4 M <= MEM[S] LW S <= A + SX MEM[S] <= B PC <= PC + 4 BEQ & Equal BEQ & ~Equal PC <= PC + 4 +SX << 2 SW instruction fetch decode / operand fetch Execute Memory read/write Write-back

28 Traditional FSM Controller state next state op Equal control signals (current) state opcond next state control signals Truth Table datapath state

29 Step 5: datapath + state diagram => control °Translate RTs (register transfers) into control points °Assign states °Then go build the controller

30 Mapping RTs to Control Signals IR <= MEM[PC] R-type A <= R[rs] B <= R[rt] S <= A fun B R[rd] <= S PC <= PC + 4 S <= A or ZX R[rt] <= S PC <= PC + 4 ORi S <= A + SX R[rt] <= M PC <= PC + 4 M <= MEM[S] LW S <= A + SX MEM[S] <= B PC <= PC + 4 BEQ & Equal BEQ & ~Equal PC <= PC + 4 +SX << 2 SW instruction fetch decode / operand fetch Execute Memory read/write Write-back imem_rd, IRen Aen, Ben ALUfun, Sen RegDst, RegWr, PCen

31 Assigning States (State Assignment) IR <= MEM[PC] R-type A <= R[rs] B <= R[rt] S <= A fun B R[rd] <= S PC <= PC + 4 S <= A or ZX R[rt] <= S PC <= PC + 4 ORi S <= A + SX R[rt] <= M PC <= PC + 4 M <= MEM[S] LW S <= A + SX MEM[S] <= B PC <= PC + 4 BEQ & Equal BEQ & ~Equal PC <= PC + 4 +SX << 2 SW instruction fetch decode / operand fetch Execute Memory read/write Write-back

32 Detailed Control Specification 0000??????? BEQ BEQ R-typex orIx LWx SWx xxxxxxx xxxxxxx xxxxxxx fun xxxxxxx xxxxxxx or xxxxxxx xxxxxxx add xxxxxxx xxxxxxx xxxxxxx add xxxxxxx StateOp fieldEqNext IRPCOpsExecMemWrite-Back enen selA en B en Ex Sr ALU S en R W M en M-R Wr Dst R: ORi: LW: SW: -all same in Moore machine

33 Performance Evaluation °What is the average CPI? state diagram gives CPI for each instruction type workload gives frequency of each type TypeCPI i for typeFrequency CPI i x freq i Arith/Logic440%1.6 Load530%1.5 Store410%0.4 branch320%0.6 Average CPI:4.1

34 Controller Design °The state diagrams that define the controller for an instruction set processor are highly structured °Use this structure to construct a simple microsequencer °Control reduces to programming this very simple device microprogramming sequencer control datapath control micro-PC sequencer microinstruction

35 Example: Jump-Counter op-code Mapping ROM Counter zero inc load 0000 i i+1 i

36 Using a Jump Counter IR <= MEM[PC] R-type A <= R[rs] B <= R[rt] S <= A fun B R[rd] <= S PC <= PC + 4 S <= A or ZX R[rt] <= S PC <= PC + 4 ORi S <= A + SX R[rt] <= M PC <= PC + 4 M <= MEM[S] LW S <= A + SX MEM[S] <= B PC <= PC + 4 BEQ & Equal BEQ & ~Equal PC <= PC + 4 PC <= PC + SX || 00 SW instruction fetch decode Execute Memory Write-back inc loadinc zero inc

37 Our Microsequencer op-code Mapping ROM Micro-PC Z I L datapath control taken Traditional FSM Controller 5

38 Microprogram Control Specification 0000?inc inc 00010load 00101zero zero xinc0 1 fun xzero xinc0 0 or xzero xinc1 0 add xinc xzero xinc1 0 add xzero µPC TakenNext IRPCOpsExecMemWrite-Back en selA B Ex Sr ALU S R W MM-R Wr Dst R: ORi: LW: SW: BEQ (IR A B S M) en

39 Mapping ROM– mapping opcode to PC R-type BEQ Ori LW SW opcode PC (input) (output) Mapping ROM load PC

40 Example: Controlling Memory PC Instruction Memory Inst. Reg addr Data (instruction) IR_en InstMem_rd IM_wait

41 Controller handles non-ideal memory IR <= MEM[PC] R-type A <= R[rs] B <= R[rt] S <= A fun B R[rd] <= S PC <= PC + 4 S <= A or ZX R[rt] <= S PC <= PC + 4 ORi S <= A + SX R[rt] <= M PC <= PC + 4 M <= MEM[S] LW S <= A + SX MEM[S] <= B BEQ & Equal BEQ & ~Equal PC <= PC + 4 PC <= PC + SX || 00 SW instruction fetch decode / operand fetch Execute Memory Write-back ~wait wait ~waitwait PC <= PC + 4 ~wait wait

42 Really Simple Time-State Control instruction fetch decode Execute Memory IR <= MEM[PC] R-type A <= R[rs] B <= R[rt] S <= A fun B R[rd] <= S PC <= PC + 4 S <= A or ZX R[rt] <= S PC <= PC + 4 ORi S <= A + SX R[rt] <= M PC <= PC + 4 M <= MEM[S] LW S <= A + SX MEM[S] <= B BEQ & Equal BEQ & ~Equal PC <= PC + 4 PC <= PC + SX || 00 SW ~wait wait PC <= PC + 4 wait write-back

43 Time-state Control Path °Local decode and control at each stage Exec Reg. File Mem Access Data Mem ABSM Reg File Equal PC Next PC IR Inst. Mem Valid IRex Dcd Ctrl IRmem Ex Ctrl IRwb Mem Ctrl WB Ctrl Valid IR IRex IRmem IRwb pipeline

44 Overview of Control °Control may be designed using one of several initial representations. The choice of sequence control, and how logic is represented, can then be determined independently; the control can then be implemented with one of several methods using a structured logic technique. Initial Representation Finite State Diagram Microprogram Sequencing ControlExplicit Next State Microprogram counter Function + Dispatch ROMs ( Mapping ROMs) Logic RepresentationLogic EquationsTruth Tables Implementation PLAROM Technique hardwired control microprogrammed control

45 Summary °Disadvantages of the Single Cycle Processor Long cycle time Cycle time is too long for all instructions except the Load °Multiple Cycle Processor: Divide the instructions into smaller steps Execute each step (instead of the entire instruction) in one cycle °Partition datapath into equal-size chunks to minimize cycle time ~10 levels of logic between latches °Follow same 5-step method for designing real processors

46 Summary (contd) °Control is specified by finite state diagram °Specialize state-diagrams easily captured by microsequencer simple increment & branch fields datapath control fields °Control design reduces to Microprogramming °Control is more complicated with: complex instruction sets restricted datapaths (see the book) °Simple Instruction set and powerful datapath => simple control could try to reduce hardware (see the book) rather go for speed => many instructions at once (Pipeline) !


Download ppt "Designing a Multicycle Processor Adapted from Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides."

Similar presentations


Ads by Google