Presentation is loading. Please wait.

Presentation is loading. Please wait.

Give qualifications of instructors: DAP

Similar presentations


Presentation on theme: "Give qualifications of instructors: DAP"— Presentation transcript:

1 ECE 232 Hardware Organization and Design Lecture 14 Multi-cycle Processor Design
Give qualifications of instructors: DAP teaching computer architecture at Berkeley since 1977 Co-athor of textbook used in class Best known for being one of pioneers of RISC currently author of article on future of microprocessors in SciAm Sept 1995 RY took 152 as student, TAed 152,instructor in 152 undergrad and grad work at Berkeley joined NextGen to design fact 80x86 microprocessors one of architects of UltraSPARC fastest SPARC mper shipping this Fall Maciej Ciesielski

2 Why single-cycle is not good enough Design of a multi-cycle processor
Outline Review Single-cycle processor design VHDL models of datapath Why single-cycle is not good enough Design of a multi-cycle processor Multi-cycle Datapath Multi-cycle Control Performance analysis credential: bring a computer die photo wafer : This can be an hidden slide. I just want to use this to do my own planning. I have rearranged Culler’s lecture slides slightly and add more slides. This covers everything he covers in his first lecture (and more) but may We will save the fun part, “ Levels of Organization,” at the end (so student can stay awake): I will show the internal stricture of the SS10/20. Notes to Patterson: You may want to edit the slides in your section or add extra slides to taylor your needs.

3 Recap: Processor Design is a Process
Bottom-up assemble components in target technology to establish critical timing Top-down specify component behavior from high-level requirements Iterative refinement establish partial solution, expand and improve datapath control processor Instruction Set Architecture => Reg. File Mux ALU Reg Mem Decoder Sequencer Cells Gates

4 Recap: A Single Cycle Datapath
Datapath with control signals (underline) Instruction Fetch Unit Clk Instruction<31:0> <21:25> <16:20> <11:15> <0:15> nPC_sel Rd Rt 32 ALUctr Clk busW RegWr busA busB 5 Rw Ra Rb 32 32-bit Registers Rs Rt RegDst Extender Mux 16 imm16 ALUSrc ExtOp MemtoReg Data In WrEn Adr Data Memory MemWr ALU Zero 1 Imm16 Rd The result of the last lecture is this single-cycle datapath. +1 = 6 min. (X:46)

5 Recap: The “Truth Table” for the Main Control
op 6 ALU (Local) func 3 ALUop ALUctr RegDst ALUSrc : R-type ori lw sw beq jump RegDst ALUSrc MemtoReg RegWrite MemWrite Branch Jump ExtOp ALUop (Symbolic) 1 x “R-type” Or Add Subtract xxx op ALUop <2> ALUop <1> ALUop <0> Now that we have taken care of the Local Control (ALU Control), let’s refocus our attention to the Main Controller. The job of the Main Control is to look at the Opcode field of the instruction and generate these control signals for the datapath (RegDst, ... ExtOp) as well as the 3-bit ALUop field for the ALU Control. Here, I have shown you the symbolic value of the ALUop field as well as the actual bit assignment. For example here (2nd column), the R-type ALUop is encode as 100 and the Add operation (3rd column) is encoded as 000.. This is call a quote “Truth Table” unquote because if you think about it, this is like having the truth table rotates 90 degrees. Let me show you what I mean by that. +3 = 65 min. (Y:45)

6 Recap: PLA Implementation of the Main Control
op<0> op<5> . <0> R-type ori lw sw beq jump RegWrite ALUSrc MemtoReg MemWrite Branch Jump RegDst ExtOp ALUop<2> ALUop<1> ALUop<0> Similarly, for ALUSrc, we need to OR the ori, load, and store terms together because we need to assert the ALUSrc signals whenever we have the Ori, load, or store instructions. The RegDst, MemtoReg, MemWrite, Branch, and Jump signals are very simple. They don’t need to OR any product terms together because each is asserted for only one instruction. For example, RegDst is asserted ONLY for R-type instruction and MemtoReg is asserted ONLY for load instruction. ExtOp, on the other hand, needs to be set to 1 for both the load and store instructions so the immediate field is sign extended properly. Therefore, we need to OR the load and store terms together to form the signal ExtOp. Finally, we have the ALUop signals. But clever encoding of the ALUop field, we are able to keep them simple so that no OR gates is needed. If you don’t already know, this regular structure with an array of AND gates followed by another array of OR gates is called a Programmable Logic Array, or PLA for short. It is one of the most common ways to implement logic function and there are a lot of CAD tools available to simplify them. +3 = 70 min. (Y:50)

7 Recap: Systematic Generation of Control
Control Logic / Store (PLA, ROM) OPcode Datapath Instruction Decode Conditions Control Points microinstruction In our single-cycle processor, each instruction is realized by exactly one control command or “microinstruction” in general, the controller is a Finite State Machine microinstruction can also control sequencing (see later)

8 The Big Picture: Where are We Now?
The Five Classic Components of a Computer Today’s topic: designing the datapath for the multiple clock cycle datapath Processor Input Control Memory Datapath Output So where are in in the overall scheme of things. Well, we just finished designing the processor’s datapath. Now I am going to show you how to design the control for the datapath. +1 = 7 min. (X:47)

9 Behavioral models of Datapath Components
entity adder16 is generic (ccOut_delay : TIME := 12 ns; adderOut_delay: TIME := 12 ns); port(A, B: in std_logic_vector (15 downto 0); DOUT: out std_logic_vector (15 downto 0); CIN: in bit; COUT: out bit); end adder16; Attention: Altera VHDL simulation software does not support delay architecture behavior of adder32 is begin adder16_process: process(A, B, CIN) variable tmp : std_logic_vector (18 downto 0); variable adder_out : std_logic_vector (31 downto 0); variable carry : bit; tmp := addum (addum (A, B), CIN); adder_out := tmp(15 downto 0); carry :=tmp(16); COUT <= carry after ccOut_delay; DOUT <= adder_out after adderOut_delay; end process; end behavior; 16 A B DOUT Cin Cout

10 Behavioral Specification of Control Logic
entity maincontrol is port(opcode: in std_logic_vector d(5 downto 0); equal_cond: in bit; extop out bit; ALUsrc out bit; ALUop out std_logic_vector d(1 downto 0); MEMwr out bit; MemtoReg out bit; RegWr out bit; RegDst out bit; nPC out bit; end maincontrol; Decode / Control-store address modeled by Case statement Each arm drives control signals for that operation just like the microinstruction either can be symbolic

11 Abstract View of our Single Cycle Processor
PC Next PC Register Fetch ALU Reg. Wrt Mem Access Data Instruction Result Store ALUctr RegDst ALUSrc ExtOp MemWr Equal nPC_sel RegWr MemRd Main Control control op fun Ext Looks like an FSM with PC as state

12 What’s wrong with our CPI=1 processor?
Arithmetic & Logical PC Reg File Inst Memory mux ALU setup Load PC Inst Memory mux ALU Data Mem Reg File setup Critical Path Store PC Inst Memory mux ALU Data Mem Reg File Branch PC Inst Memory cmp mux Reg File Long cycle time All instructions take as much time as the slowest Real memory is not so nice as our idealized memory cannot always get the job done in one (short) cycle

13 Memory Access Time Physics => fast memories are small (large memories are slow) question: register file vs. memory => Use a hierarchy of memories Storage Array selected word line storage cell address bit line address decoder sense amps mem. bus proc. bus memory L2 Cache Cache Processor 1 cycle 2-3 cycles cycles

14 => Reducing Cycle Time
Cut combinational dependency graph and insert register / latch Do same work in two fast cycles, rather than one slow one storage element Acyclic Combinational Logic storage element Acyclic Combinational Logic (A) Logic (B) =>

15 Basic Limits on Cycle Time
Next address logic PC <= branch ? PC + offset : PC + 4 Instruction Fetch InstructionReg <= Mem[PC] Register Access A <= R[rs] ALU operation R <= A + B PC Next PC Operand Fetch Exec Reg. File Mem Access Data Instruction Result Store ALUctr RegDst ALUSrc ExtOp MemWr nPC_sel RegWr MemRd Control

16 Partitioning the CPI=1 Datapath
Add registers between smallest steps PC Next PC Operand Fetch Exec Reg. File Mem Access Data Instruction Result Store ALUctr RegDst ALUSrc ExtOp MemWr nPC_sel RegWr MemRd

17 Example Multicycle Datapath
MemToReg MemRd MemWr RegDst RegWr nPC_sel ALUSrc ALUctr ExtOp Equal Reg. File Ext ALU Reg File A R PC IR Next PC B Mem Access M Result Store Data Mem Execute; comp. mem address Instruction Fetch Operand Fetch Memory access Critical Path ?

18 Disadvantages of the Single Cycle Processor
Summary Disadvantages of the Single Cycle Processor Long cycle time Cycle time is too long for all instructions except the Load Multiple Cycle Processor: Divide the instructions into smaller steps Execute each step (instead of the entire instruction) in one cycle Partition datapath into equal size chunks to minimize cycle time ~10 levels of logic between latches Follow same 5-step method for designing “real” processor


Download ppt "Give qualifications of instructors: DAP"

Similar presentations


Ads by Google