Presentation is loading. Please wait.

Presentation is loading. Please wait.

John Kubiatowicz (http.cs.berkeley.edu/~kubitron)

Similar presentations


Presentation on theme: "John Kubiatowicz (http.cs.berkeley.edu/~kubitron)"— Presentation transcript:

1 John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
CS 152 Computer Architecture and Engineering Lecture 9 Designing a Multicycle Processor Feb 24, 1999 John Kubiatowicz (http.cs.berkeley.edu/~kubitron) lecture slides: 2/24/99 ©UCB Spring 1999

2 Recap: Processor Design is a Process
Bottom-up assemble components in target technology to establish critical timing Top-down specify component behavior from high-level requirements Iterative refinement establish partial solution, expand and improve Instruction Set Architecture processor datapath control Reg. File Mux ALU Reg Mem Decoder Sequencer Cells Gates 2/24/99 ©UCB Spring 1999

3 Recap: A Single Cycle Datapath
Instruction<31:0> nPC_sel Instruction Fetch Unit Rd Rt <21:25> <16:20> <11:15> <0:15> RegDst Clk 1 Mux Rs Rt Rt Rs Rd Imm16 RegWr ALUctr 5 5 5 MemtoReg busA Zero MemWr Rw Ra Rb busW The result of the last lecture is this single-cycle datapath. +1 = 6 min. (X:46) 32 32 32-bit Registers ALU 32 busB 32 Clk 32 Mux Mux 32 WrEn Adr 1 1 Data In 32 Data Memory imm16 Extender 32 16 Clk ALUSrc ExtOp 2/24/99 ©UCB Spring 1999

4 Recap: The “Truth Table” for the Main Control
op 6 ALU (Local) func 3 ALUop ALUctr RegDst ALUSrc : R-type ori lw sw beq jump RegDst ALUSrc MemtoReg RegWrite MemWrite Branch Jump ExtOp ALUop (Symbolic) 1 x “R-type” Or Add Subtract xxx op ALUop <2> ALUop <1> ALUop <0> Now that we have taken care of the Local Control (ALU Control), let’s refocus our attention to the Main Controller. The job of the Main Control is to look at the Opcode field of the instruction and generate these control signals for the datapath (RegDst, ... ExtOp) as well as the 3-bit ALUop field for the ALU Control. Here, I have shown you the symbolic value of the ALUop field as well as the actual bit assignment. For example here (2nd column), the R-type ALUop is encode as 100 and the Add operation (3rd column) is encoded as 000.. This is call a quote “Truth Table” unquote because if you think about it, this is like having the truth table rotates 90 degrees. Let me show you what I mean by that. +3 = 65 min. (Y:45) 2/24/99 ©UCB Spring 1999

5 Recap: PLA Implementation of the Main Control
op<0> op<5> . <0> R-type ori lw sw beq jump RegWrite ALUSrc MemtoReg MemWrite Branch Jump RegDst ExtOp ALUop<2> ALUop<1> ALUop<0> Similarly, for ALUSrc, we need to OR the ori, load, and store terms together because we need to assert the ALUSrc signals whenever we have the Ori, load, or store instructions. The RegDst, MemtoReg, MemWrite, Branch, and Jump signals are very simple. They don’t need to OR any product terms together because each is asserted for only one instruction. For example, RegDst is asserted ONLY for R-type instruction and MemtoReg is asserted ONLY for load instruction. ExtOp, on the other hand, needs to be set to 1 for both the load and store instructions so the immediate field is sign extended properly. Therefore, we need to OR the load and store terms together to form the signal ExtOp. Finally, we have the ALUop signals. But clever encoding of the ALUop field, we are able to keep them simple so that no OR gates is needed. If you don’t already know, this regular structure with an array of AND gates followed by another array of OR gates is called a Programmable Logic Array, or PLA for short. It is one of the most common ways to implement logic function and there are a lot of CAD tools available to simplify them. +3 = 70 min. (Y:50) 2/24/99 ©UCB Spring 1999

6 Recap: Systematic Generation of Control
OPcode Control Logic / Store (PLA, ROM) Decode microinstruction Conditions Instruction Control Points Datapath In our single-cycle processor, each instruction is realized by exactly one control command or “microinstruction” in general, the controller is a finite state machine microinstruction can also control sequencing (see later) 2/24/99 ©UCB Spring 1999

7 The Big Picture: Where are We Now?
The Five Classic Components of a Computer Today’s Topic: Designing the Datapath for the Multiple Clock Cycle Datapath Processor Input Control Memory So where are in in the overall scheme of things. Well, we just finished designing the processor’s datapath. Now I am going to show you how to design the control for the datapath. +1 = 7 min. (X:47) Datapath Output 2/24/99 ©UCB Spring 1999

8 Outline of Today’s Lecture
Recap: single cycle processor VHDL versions Faster designs Multicycle Datapath Performance Analysis Multicycle Control 2/24/99 ©UCB Spring 1999

9 Behavioral models of Datapath Components
entity adder16 is generic (ccOut_delay : TIME := 12 ns; adderOut_delay: TIME := 12 ns); port(A, B: in vlbit_1d(15 downto 0); DOUT: out vlbit_1d(15 downto 0); CIN: in vlbit; COUT: out vlbit); end adder16; architecture behavior of adder32 is begin adder16_process: process(A, B, CIN) variable tmp : vlbit_1d(18 downto 0); variable adder_out : vlbit_1d(31 downto 0); variable carry : vlbit; tmp := addum (addum (A, B), CIN); adder_out := tmp(15 downto 0); carry :=tmp(16); COUT <= carry after ccOut_delay; DOUT <= adder_out after adderOut_delay; end process; end behavior; 16 A B DOUT Cin Cout 2/24/99 ©UCB Spring 1999

10 Behavioral Specification of Control Logic
entity maincontrol is port(opcode: in vlbit_1d(5 downto 0); equal_cond: in vlbit; extop out vlbit; ALUsrc out vlbit; ALUop out vlbit_1d(1 downto 0); MEMwr out vlbit; MemtoReg out vlbit; RegWr out vlbit; RegDst out vlbit; nPC out vlbit; end maincontrol; architecture behavior of maincontrol is begin control: process(opcode,equal_cond) constant ORIop: vlbit_ld(5 downto 0) := “001101”; -- extop only 0 (no extend) for ORI inst case opcode is when ORIop => extop <= 0; when others => extop <= 1; end case; end process; end behavior; Decode / Control-store address modeled by Case statement Each arm drives control signals for that operation just like the microinstruction either can be symbolic 2/24/99 ©UCB Spring 1999

11 Abstract View of our single cycle processor
Main Control op ALU control fun ALUSrc Equal ExtOp MemRd MemWr RegDst RegWr MemWr nPC_sel ALUctr Reg. Wrt Register Fetch ALU Ext Mem Access PC Next PC Instruction Fetch Result Store Data Mem looks like a FSM with PC as state 2/24/99 ©UCB Spring 1999

12 What’s wrong with our CPI=1 processor?
Arithmetic & Logical PC Inst Memory Reg File ALU mux mux setup Load PC Inst Memory Reg File ALU Data Mem mux mux setup Critical Path Store PC Inst Memory Reg File ALU Data Mem mux Branch PC Inst Memory Reg File cmp mux Long Cycle Time All instructions take as much time as the slowest Real memory is not as nice as our idealized memory cannot always get the job done in one (short) cycle 2/24/99 ©UCB Spring 1999

13 Memory Access Time Physics => fast memories are small (large memories are slow) question: register file vs. memory => Use a hierarchy of memories Storage Array selected word line storage cell address bit line address decoder sense amps mem. bus proc. bus memory L2 Cache Cache Processor 1 cycle 2-3 cycles cycles 2/24/99 ©UCB Spring 1999

14 Reducing Cycle Time Cut combinational dependency graph and insert register / latch Do same work in two fast cycles, rather than one slow one May be able to short-circuit path and remove some components for some instructions! storage element storage element Acyclic Combinational Logic Acyclic Combinational Logic (A) storage element Acyclic Combinational Logic (B) storage element 2/24/99 ©UCB Spring 1999

15 Basic Limits on Cycle Time
Next address logic PC <= branch ? PC + offset : PC + 4 Instruction Fetch InstructionReg <= Mem[PC] Register Access A <= R[rs] ALU operation R <= A + B Control MemRd MemWr nPC_sel RegDst RegWr MemWr ALUSrc ALUctr ExtOp Reg. File Operand Fetch Exec Instruction Fetch Mem Access PC Next PC Result Store Data Mem 2/24/99 ©UCB Spring 1999

16 Partitioning the CPI=1 Datapath
Add registers between smallest steps Equal MemRd MemWr nPC_sel RegDst RegWr MemWr ExtOp ALUSrc ALUctr Reg. File Operand Fetch Exec Instruction Fetch Mem Access PC Next PC Result Store Data Mem 2/24/99 ©UCB Spring 1999

17 Example Multicycle Datapath
Ext ALU Reg. File Mem Access Data Result Store RegDst RegWr MemWr MemRd S M MemToReg Equal ALUctr ALUSrc ExtOp nPC_sel E Reg File A PC IR Next PC B Instruction Fetch Operand Fetch Critical Path ? 2/24/99 ©UCB Spring 1999

18 Administrative Issues
Read Chapter 5 Tapes now available in 205 McLaughlin Hall This lecture and next one slightly different from the book Midterm next Wednesday 3/3/99: 5:30pm to 8:30pm, 277 Cory No class on that day Midterm reminders: Pencil, calculator, one 8.5” x 11” (both sides) of handwritten notes Sit in every other chair, every other row (odd row & odd seat) Meet at LaVal’s pizza after the midterm 2/24/99 ©UCB Spring 1999

19 Recall: Step-by-step Processor Design
Step 1: ISA => Logical Register Transfers Step 2: Components of the Datapath Step 3: RTL + Components => Datapath Step 4: Datapath + Logical RTs => Physical RTs Step 5: Physical RTs => Control 2/24/99 ©UCB Spring 1999

20 Step 4: R-rtype (add, sub, . . .)
Logical Register Transfer Physical Register Transfers inst Logical Register Transfers ADDU R[rd] <– R[rs] + R[rt]; PC <– PC + 4 inst Physical Register Transfers IR <– MEM[pc] ADDU A<– R[rs]; B <– R[rt] S <– A + B R[rd] <– S; PC <– PC + 4 Time E Reg. File Reg File A S Exec PC IR Next PC Inst. Mem B Mem Access M Data Mem 2/24/99 ©UCB Spring 1999

21 Step 4: Logical immed Logical Register Transfer
Physical Register Transfers inst Logical Register Transfers ORI R[rt] <– R[rs] OR ZExt(Im16); PC <– PC + 4 inst Physical Register Transfers IR <– MEM[pc] ORI A<– R[rs]; B <– R[rt] S <– A or ZExt(Im16) R[rt] <– S; PC <– PC + 4 Time E Reg. File Reg File A S Exec PC IR Next PC Inst. Mem B Mem Access M Data Mem 2/24/99 ©UCB Spring 1999

22 Step 4 : Load Logical Register Transfer Physical Register Transfers
inst Logical Register Transfers LW R[rt] <– MEM[R[rs] + SExt(Im16)]; PC <– PC + 4 Logical Register Transfer Physical Register Transfers inst Physical Register Transfers IR <– MEM[pc] LW A<– R[rs]; B <– R[rt] S <– A + SExt(Im16) M <– MEM[S] R[rd] <– M; PC <– PC + 4 Time E Reg. File Reg File A S Exec PC IR Next PC Inst. Mem B Mem Access M Data Mem 2/24/99 ©UCB Spring 1999

23 Step 4 : Store Logical Register Transfer Physical Register Transfers
inst Logical Register Transfers SW MEM[R[rs] + SExt(Im16)] <– R[rt]; PC <– PC + 4 Logical Register Transfer Physical Register Transfers inst Physical Register Transfers IR <– MEM[pc] SW A<– R[rs]; B <– R[rt] S <– A + SExt(Im16); MEM[S] <– B PC <– PC + 4 Time E Reg. File Reg File A S Exec PC IR Next PC Inst. Mem B Mem Access M Data Mem 2/24/99 ©UCB Spring 1999

24 Step 4 : Branch Logical Register Transfer Physical Register Transfers
inst Logical Register Transfers BEQ if R[rs] == R[rt] then PC <= PC + SExt(Im16) || 00 else PC <= PC + 4 inst Physical Register Transfers IR <– MEM[pc] BEQ E<– (R[rs] == R[rt]) if E then PC <– PC else PC <– PC + SExt(Im16) || 00 Time E Reg. File Reg File S A Exec PC IR Next PC Inst. Mem B Mem Access M Data Mem 2/24/99 ©UCB Spring 1999

25 Alternative datapath (book): Multiple Cycle Datapath
Miminizes Hardware: 1 memory, 1 adder PCWr PCWrCond PCSrc BrWr Zero IorD MemWr IRWr RegDst RegWr ALUSelA 1 Target 32 32 Mux PC Mux 1 32 Zero Rs Mux 1 Ra 32 RAdr 5 32 Rt Rb busA 32 32 Ideal Memory ALU Putting it all together, here it is: the multiple cycle datapath we set out to built. +1 = 47 min. (Y:47) Instruction Reg 5 Reg File 32 ALU Out Mux 1 Rt 4 Rw 32 WrAdr 32 32 32 Rd 1 Din Dout busW busB 32 32 2 ALU Control Mux 1 3 << 2 Extend Imm 16 32 ALUOp ExtOp MemtoReg ALUSelB 2/24/99 ©UCB Spring 1999

26 Our Control Model State specifies control points for Register Transfer
Transfer occurs upon exiting state (same falling edge) inputs (conditions) Next State Logic State X Register Transfer Control Points Control State Depends on Input Output Logic outputs (control points) 2/24/99 ©UCB Spring 1999

27 Step 4 => Control Specification for multicycle proc
“instruction fetch” IR <= MEM[PC] A <= R[rs] B <= R[rt] “decode / operand fetch” LW R-type ORi SW BEQ Execute Memory Write-back S <= A fun B S <= A or ZX S <= A + SX S <= A + SX PC <= Next(PC,Equal) M <= MEM[S] MEM[S] <= B PC <= PC + 4 R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4 2/24/99 ©UCB Spring 1999

28 Traditional FSM Controller
next state state op cond control points Truth Table next State control points 11 Equal 6 State 4 op 2/24/99 datapath State ©UCB Spring 1999

29 Step 5: datapath + state diagram control
Translate RTs into control points Assign states Then go build the controller 2/24/99 ©UCB Spring 1999

30 Mapping RTs to Control Points
IR <= MEM[PC] “instruction fetch” imem_rd, IRen A <= R[rs] B <= R[rt] “decode” Aen, Ben, Een LW R-type ORi SW BEQ Execute Memory Write-back S <= A fun B PC <= Next(PC,Equal) ALUfun, Sen S <= A or ZX S <= A + SX S <= A + SX M <= MEM[S] MEM[S] <= B PC <= PC + 4 R[rd] <= S PC <= PC + 4 RegDst, RegWr, PCen R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4 2/24/99 ©UCB Spring 1999

31 Assigning States “instruction fetch” IR <= MEM[PC] 0000 “decode”
A <= R[rs] B <= R[rt] “decode” 0001 LW R-type ORi SW BEQ Execute Memory Write-back S <= A fun B S <= A or ZX S <= A + SX S <= A + SX PC <= Next(PC) 0100 0110 1000 1011 0011 M <= MEM[S] MEM[S] <= B PC <= PC + 4 1001 1100 R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4 0101 0111 1010 2/24/99 ©UCB Spring 1999

32 Detailed Control Specification
State Op field Eq Next IR PC Ops Exec Mem Write-Back en sel A B Ex Sr ALU S R W M M-R Wr Dst 0000 ?????? ? 0001 BEQ x 0001 R-type x 0001 orI x 0001 LW x 0001 SW x 0011 xxxxxx 0011 xxxxxx 0100 xxxxxx x fun 1 0101 xxxxxx x 0110 xxxxxx x or 1 0111 xxxxxx x 1000 xxxxxx x add 1 1001 xxxxxx x 1010 xxxxxx x 1011 xxxxxx x add 1 1100 xxxxxx x -all same in Moore machine BEQ: R: ORi: LW: SW: 2/24/99 ©UCB Spring 1999

33 Performance Evaluation
What is the average CPI? state diagram gives CPI for each instruction type workload gives frequency of each type Type CPIi for type Frequency CPIi x freqIi Arith/Logic 4 40% 1.6 Load 5 30% 1.5 Store 4 10% 0.4 branch 3 20% 0.6 Average CPI: 4.1 2/24/99 ©UCB Spring 1999

34 Controller Design The state digrams that arise define the controller for an instruction set processor are highly structured Use this structure to construct a simple “microsequencer” Control reduces to programming this very simple device microprogramming sequencer control datapath control microinstruction micro-PC sequencer 2/24/99 ©UCB Spring 1999

35 Example: Jump-Counter
i i 0000 i+1 Map ROM None of above: Do nothing (for wait states) op-code zero inc load Counter 2/24/99 ©UCB Spring 1999

36 Using a Jump Counter “instruction fetch” IR <= MEM[PC] 0000 inc
A <= R[rs] B <= R[rt] “decode” 0001 load LW R-type ORi SW BEQ Execute Memory Write-back S <= A fun B S <= A or ZX S <= A + SX S <= A + SX PC <= Next(PC) 0100 0110 1000 1011 0011 inc inc inc inc zero M <= MEM[S] MEM[S] <= B PC <= PC + 4 1001 1100 inc R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4 zero 0101 0111 1010 zero zero 2/24/99 ©UCB Spring 1999 zero

37 Our Microsequencer taken datapath control Z I L Micro-PC op-code
Map ROM 2/24/99 ©UCB Spring 1999

38 Microprogram Control Specification
µPC Taken Next IR PC Ops Exec Mem Write-Back en sel A B Ex Sr ALU S R W M M-R Wr Dst 0000 ? inc 1 load 1 1 zero zero 0100 x inc fun 1 0101 x zero 0110 x inc or 1 0111 x zero 1000 x inc add 1 1001 x inc x zero 1011 x inc add 1 1100 x zero BEQ R: ORi: LW: SW: 2/24/99 ©UCB Spring 1999

39 Mapping ROM R-type 000000 0100 BEQ 000100 0011 ori 001101 0110
LW SW 2/24/99 ©UCB Spring 1999

40 Example: Controlling Memory
PC addr InstMem_rd Instruction Memory IM_wait data Inst. Reg IR_en 2/24/99 ©UCB Spring 1999

41 Controller handles non-ideal memory
“instruction fetch” IR <= MEM[PC] wait ~wait A <= R[rs] B <= R[rt] “decode / operand fetch” LW R-type ORi SW BEQ Execute Memory Write-back PC <= Next(PC) S <= A fun B S <= A or ZX S <= A + SX S <= A + SX M <= MEM[S] MEM[S] <= B ~wait wait wait ~wait R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4 PC <= PC + 4 2/24/99 ©UCB Spring 1999

42 Really Simple Time-State Control
instruction fetch IR <= MEM[PC] wait ~wait A <= R[rs] B <= R[rt] decode LW R-type ORi SW BEQ Execute S <= A fun B S <= A or ZX S <= A + SX S <= A + SX Memory M <= MEM[S] MEM[S] <= B wait wait R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4 PC <= Next(PC) write-back PC <= PC + 4 2/24/99 ©UCB Spring 1999

43 Time-state Control Path
Local decode and control at each stage Valid IRex IR IRwb Inst. Mem IRmem WB Ctrl Dcd Ctrl Ex Ctrl Mem Ctrl Equal Reg. File Reg File A S Exec PC Next PC B Mem Access M Data Mem 2/24/99 ©UCB Spring 1999

44 “microprogrammed control”
Overview of Control Control may be designed using one of several initial representations. The choice of sequence control, and how logic is represented, can then be determined independently; the control can then be implemented with one of several methods using a structured logic technique. Initial Representation Finite State Diagram Microprogram Sequencing Control Explicit Next State Microprogram counter Function + Dispatch ROMs Logic Representation Logic Equations Truth Tables Implementation PLA ROM Technique “hardwired control” “microprogrammed control” 2/24/99 ©UCB Spring 1999

45 Summary Disadvantages of the Single Cycle Processor Long cycle time
Cycle time is too long for all instructions except the Load Multiple Cycle Processor: Divide the instructions into smaller steps Execute each step (instead of the entire instruction) in one cycle Partition datapath into equal size chunks to minimize cycle time ~10 levels of logic between latches Follow same 5-step method for designing “real” processor 2/24/99 ©UCB Spring 1999

46 Summary (cont’d) Control is specified by finite state digram
Specialize state-diagrams easily captured by microsequencer simple increment & “branch” fields datapath control fields Control design reduces to Microprogramming Control is more complicated with: complex instruction sets restricted datapaths (see the book) Simple Instruction set and powerful datapath => simple control could try to reduce hardware (see the book) rather go for speed => many instructions at once! 2/24/99 ©UCB Spring 1999

47 Where to get more information?
Next two lectures: Multiple Cycle Controller: Appendix C of your text book. Microprogramming: Section 5.5 of your text book. D. Patterson, “Microprograming,” Scientific America, March D. Patterson and D. Ditzel, “The Case for the Reduced Instruction Set Computer,” Computer Architecture News 8, 6 (October 15, 1980) ******* Keep the old slide: ******** So in the next town, the driver pretended to be the professor and started giving the lecture. Everything was going so well that both the audience and the real professor were impressed. That is until someone in the audience asked a technical question. But just when the professor was about to say “Got You.” His driver replied calmly: “Well this question is easily to answer. In fact, it is SO simple that even my driver sitting at the back can answer it.” So if you ask me how to implement this state diagram in hardware, I will say: “It is so simple even ... ********** New Slide ********** Well all jokes aside, Professor Patterson is probably one on the most qualified person to cover the next two topics. If you don’t already know, Professor Patterson’s PhD thesis was on microprogramming. He also wrote a very good article on microprogramming in Scientific America. But after spending early part of his career showing how great microprograming was, he then spend the next few years convining people what a bad idea microprogramming has become. One of the main feature of RISC (points to the reference) is that its control is hardwired instead of microprogrammed, a big difference from all other computers at that time. It is not uncommon to see people change their minds about something but the amazing part here is that Professor Patterson was right both times. So I am sure you guys will learn a lot in the next two lectures. I will see you guys in a week. +3 = 78 min. (Y:58) 2/24/99 ©UCB Spring 1999


Download ppt "John Kubiatowicz (http.cs.berkeley.edu/~kubitron)"

Similar presentations


Ads by Google