Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 ECE 313 - Computer Organization Lecture 11 - Processor.

Similar presentations


Presentation on theme: "Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 ECE 313 - Computer Organization Lecture 11 - Processor."— Presentation transcript:

1 Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 nestorj@lafayette.edu ECE 313 - Computer Organization Lecture 11 - Processor Implementation: Overview, Single-Cycle Design Fall 2004 Reading: 5.1-5.4 Homework Due 10/27: 4.1, 4.2, 4.3, 4.6, 4.19 - 4.22 Portions of these slides are derived from: Textbook figures © 1998 Morgan Kaufmann Publishers all rights reserved Tod Amon's COD2e Slides © 1998 Morgan Kaufmann Publishers all rights reserved Dave Patterson’s CS 152 Slides - Fall 1997 © UCB Rob Rutenbar’s 18-347 Slides - Fall 1999 CMU other sources as noted

2 ECE 313 Fall 2004Lecture 11 - Processor Design2 Roadmap for the Term: Major Topics  Computer Systems Overview  Technology Trends  Performance  Instruction Sets (and Software)  Logic and Arithmetic  Processor Implementation   Memory Systems  Input/Output

3 ECE 313 Fall 2004Lecture 11 - Processor Design3 Outline - Processor Implementation  Overview   Review of Processor Operation  Steps in Processor Design  Implementation Styles  The “MIPS Lite” Instruction Subset  Single-Cycle Implementation  Multi-Cycle Implementation  Pipelined Implementation

4 ECE 313 Fall 2004Lecture 11 - Processor Design4 Review: The “Five Classic Components”  Processor  Datapath  Control  Memory  Input  Output Input Processor Control Datapath Output Memory 1001010010110000 0010100101010001 1111011101100110 1001010010110000

5 ECE 313 Fall 2004Lecture 11 - Processor Design5 Review: Processor Operation  Executing Programs - the “fetch/execute” cycle  Processor fetches instruction from memory  Processor executes “machine language” instruction Perform calculation Read/write data  Repeat with “next” instruction Processor Control Datapath 1001010010110000 0010100101010001 1111011101100110 1001010010110000 Memory 1111011101100110 1001010010110000 PC Address Instruction

6 ECE 313 Fall 2004Lecture 11 - Processor Design6 Processor Design Goals  Design hardware that:  Fetches instructions from memory  Executes instructions as specified by ISA  Design considerations  Cost  Speed  Power

7 ECE 313 Fall 2004Lecture 11 - Processor Design7 Steps in Processor Design 1.Analyze instruction set; get datapath requirements 2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals

8 ECE 313 Fall 2004Lecture 11 - Processor Design8 Processor Implementation Styles  Single Cycle  Perform each instruction in 1 clock cycle  Disadvantage: only as fast as “slowest” instruction  Multi-Cycle  Break fetch/execute cycle into multple steps  Perform 1 step in each clock cycle  Pipelined  Execute each instruction in multiple steps  Perform 1 step / instruction in each clock cycle  Process multiple instructions in parallel - “assembly line”

9 ECE 313 Fall 2004Lecture 11 - Processor Design9 “MIPS Lite” - A Pedagogical Example  Use a MIPS to illustrate processor design  Limit initial design to a subset of instructions:  Memory access: lw, sw  Arithmetic/Logical: add, sub, and, or, slt  Branch/Jump: beq, j  Add instructions as we go along (e.g., addi )

10 ECE 313 Fall 2004Lecture 11 - Processor Design10 Review - MIPS Instruction Formats  Field definitions:  op: instruction opcode  rs, rt, rd: source (2) and destination (1) register numbers  shamt: shift amount  funct: function code (works with opcode to specify op)  offset/immediate: address offset or immediate value  address: target address for jumps oprsrtoffset 6 bits5 bits 16 bits oprsrtrdfunctshamt 6 bits5 bits 6 bits R-Format I-Format opaddress 6 bits26 bits J-Format

11 ECE 313 Fall 2004Lecture 11 - Processor Design11 MIPS Instruction Subset  Arithmetic & Logical Instructions add $s0, $s1, $s2 sub $s0, $s1, $s2 and $s0, $s1, $s2 or $s0, $s1, $s2  Data Transfer Instructions lw $s1, offset($s0) sw $s2, offset($s3)  Branch beq $s0, offset j address

12 ECE 313 Fall 2004Lecture 11 - Processor Design12 MIPS Instruction Execution  General Procedure 1.Fetch Instruction from memory 2.Decode Instruction, read register values 3.If necessary, perform an ALU operation 4.If load or store, do memory access 5.Write results back to register file and increment PC  Register Transfers provide a concise description

13 ECE 313 Fall 2004Lecture 11 - Processor Design13 Register Transfers for the MIPS Subset  Instruction Fetch Instruction <- MEM[PC]  Instruction Execution Instr.Register Transfers add R[rd] <- R[rs] + R[rt];PC <- PC + 4 sub R[rd] <- R[rs] - R[rt];PC <- PC + 4 and R[rd] <- R[rs] & R[rt];PC <- PC + 4 or R[rd] <- R[rs] | R[rt];PC <- PC + 4 lw R[rt] <- MEM[R[rs] + s_extend(offset)]; PC<- PC + 4 sw MEM[R[rs] + sign_extend(offset)] <- R[rt];PC <- PC + 4 beq if (R[rs] == R[rt]) then PC <- PC+4 + s_extend(offset<<2) else PC <- PC + 4 j PC <- upper(PC)@(address << 2)

14 ECE 313 Fall 2004Lecture 11 - Processor Design14 Outline - Processor Implementation  Overview  Single-Cycle Implementation 1.Analyze instruction set; get datapath requirements  2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals  Multi-Cycle Implementation  Pipelined Implementation

15 ECE 313 Fall 2004Lecture 11 - Processor Design15 1. Instruction Set Requirements  Memory  Read Instructions  Read and Write Data  Registers - 32  read (from rs field in instruction)  read (from rt field in instruction)  write (from rd or rt field in instruction)  PC  Sign Extender  Add and Subtract (register values)  Add 4 or extended immediate to PC Review register transfers for details!

16 ECE 313 Fall 2004Lecture 11 - Processor Design16 Outline - Processor Implementation  Overview  Single-Cycle Implementation 1.Analyze instruction set; get datapath requirements 2.Select datapath components and  establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals  Multi-Cycle Implementation  Pipelined Implementation

17 ECE 313 Fall 2004Lecture 11 - Processor Design17 2. (a) Choose Datapath Components  Combinational Components  Adder  ALU  Multiplexer  Sign Extender  Storage Components  Registers  Register File  Memory

18 ECE 313 Fall 2004Lecture 11 - Processor Design18 Datapath Combinational Components NOTES: - Blue-green inputs are control lines - Blue lines often hidden to suppress detail AdderALU Multiplexer Sign Extender

19 ECE 313 Fall 2004Lecture 11 - Processor Design19 Datapath Storage - Registers  Registers store multiple bit values  New value loaded on clock edge when EN asserted

20 ECE 313 Fall 2004Lecture 11 - Processor Design20 Datapath Storage: Idealized Memory  Data Read  Place Address on ADDR  Assert MemRead  Data Available on RD after memory “access time”  Data Write  Place address on ADDR  Place data input on WD  Assert MemWrite  Data written on clock edge

21 ECE 313 Fall 2004Lecture 11 - Processor Design21 Datapath Storage: Register File  Register File - 32 registers (including $zero )  Two data outputs RD1, RD2  Assert register number RN1/RN2  Read output RD1/RD2 after “access time” (propagation delay)  One data input WD  Assert register number WN  Assert value on WD  Assert RegWrite  Value loaded on clock edge  Implemented as a small multiport memory

22 ECE 313 Fall 2004Lecture 11 - Processor Design22 2. (b) Choose Clocking Methodology  Clocking methodology defines  When signals can be read from storage elements  When signals can be written to storage elements  Typical clocking methodologies  Single-Phase Edge Triggered  Single-Phase Level Triggered  Multiple-Phase Level Triggered  Authors’ choice: Single-Phase Edge Triggered  All registers updated on one edge of clock cycle  Simplest to work with

23 ECE 313 Fall 2004Lecture 11 - Processor Design23 Review: Edge-Triggered Clocking  Controls sequential circuit operation  Register outputs change after first clock edge  Combinational logic determines “next state”  Storage elements store new state on next clock edge Adder Mux Combinational LogicRegister Output Register Input Clock

24 ECE 313 Fall 2004Lecture 11 - Processor Design24 Review: Edge-Triggered Clocking  Propagation delay - t prop Logic (including register outputs) Interconnect  Register setup time - t setup Clock Adder Mux Combinational LogicRegister Output Register Input t prop t setup t clock > t prop + t setup t clock = t prop + t setup + t slack

25 ECE 313 Fall 2004Lecture 11 - Processor Design25 Outline - Processor Implementation  Overview  Single-Cycle Implementation 1.Analyze instruction set; get datapath requirements 2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements  4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals  Multi-Cycle Implementation  Pipelined Implementation

26 ECE 313 Fall 2004Lecture 11 - Processor Design26 3. Assemble Datapath  Tasks processor must implement 1.Fetch Instruction from memory 2.Decode Instruction, read register values 3.If necessary, perform an ALU operation 4.If memory address, perform load/store 5.Write results back to register file and increment PC  How can we do this with the datapath hardware?

27 ECE 313 Fall 2004Lecture 11 - Processor Design27 Datapath for Instruction Fetch Instruction <- MEM[PC] PC <- PC + 4

28 ECE 313 Fall 2004Lecture 11 - Processor Design28 Datapath for R-Type Instructions add rd, rs, rt R[rd] <- R[rs] + R[rt];

29 ECE 313 Fall 2004Lecture 11 - Processor Design29 Datapath for Load/Store Instructions lw rt, offset(rs) R[rt] <- MEM[R[rs] + s_extend(offset)];

30 ECE 313 Fall 2004Lecture 11 - Processor Design30 Datapath for Load/Store Instructions sw rt, offset(rs) MEM[R[rs] + sign_extend(offset)] <- R[rt]

31 ECE 313 Fall 2004Lecture 11 - Processor Design31 Datapath for Branch Instructions beq rs, rt, offset if (R[rs] == R[rt]) then PC <- PC+4 + s_extend(offset<<2)

32 ECE 313 Fall 2004Lecture 11 - Processor Design32 Putting it all together…  Goal: merge datapaths for each function  Instruction Fetch  R-Type Instructions  Load/Store Instructions  Branch instructions  Add multiplexers to steer data as needed

33 ECE 313 Fall 2004Lecture 11 - Processor Design33 Example: combine R-Type and Load/Store Datapaths  Select an ALU input from either Register File output RD2 (for R-Type) Sign-extender output (for LW/SW)  Select Register File input WD1 from either ALU output (for R-Type) Memory output RD (for LW)

34 ECE 313 Fall 2004Lecture 11 - Processor Design34 Combined Datapath: R-Type and Load/Store Instructions

35 ECE 313 Fall 2004Lecture 11 - Processor Design35 Combined Datapath: Executing a R-Type Instruction add rd,rs,rt

36 ECE 313 Fall 2004Lecture 11 - Processor Design36 Combined Datapath: Executing a load instruction lw rt,offset(rs)

37 ECE 313 Fall 2004Lecture 11 - Processor Design37 Combined Datapath: Executing a store instruction sw rt,offset(rs)

38 ECE 313 Fall 2004Lecture 11 - Processor Design38 Complete Single-Cycle Datapath

39 ECE 313 Fall 2004Lecture 11 - Processor Design39 Complete Datapath Executing add add rd, rs, rt

40 ECE 313 Fall 2004Lecture 11 - Processor Design40 Complete Datapath Executing load lw rt,offset(rs)

41 ECE 313 Fall 2004Lecture 11 - Processor Design41 Complete Datapath Executing store sw rt,offset(rs)

42 ECE 313 Fall 2004Lecture 11 - Processor Design42 Complete Datapath Executing branch beq r1,r2,offset

43 ECE 313 Fall 2004Lecture 11 - Processor Design43 Refining the Complete Datapath  Depending on the instruction, register file input WN is fed by different fields of the instruction  R-Type Instructions: rd field (bits 15:11)  Load Instructin: rt field (bits 21:16)  Result: need an additional multiplexer on WN input oprsrtoffset 6 bits5 bits 16 bits oprsrtrdfunctshamt 6 bits5 bits 6 bits R-Format I-Format

44 ECE 313 Fall 2004Lecture 11 - Processor Design44 Complete Datapath (Refined)

45 ECE 313 Fall 2004Lecture 11 - Processor Design45 Complete Single-Cycle Datapath Control signals shown in blue

46 ECE 313 Fall 2004Lecture 11 - Processor Design46 Outline - Processor Implementation  Overview  Single-Cycle Implementation 1.Analyze instruction set; get datapath requirements 2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction  5.Assemble control logic to generate control signals  Multi-Cycle Implementation  Pipelined Implementation

47 ECE 313 Fall 2004Lecture 11 - Processor Design47 Control Unit Design  Desired function:  Given an instruction word….  Generate control signals needed to execute instruction  Implemented as a combinational logic function:  Inputs Instruction word - op and funct fields ALU status output - Zero  Outputs - processor control points ALU control signals Multiplexer control signals Register File & memory control signal

48 ECE 313 Fall 2004Lecture 11 - Processor Design48 Determining Control Points  For each instruction type, determine proper value for each control point (control signal) 00 11  X ( don’t care - either 1 or 0 )  Ultimately … use these values to build a truth table

49 ECE 313 Fall 2004Lecture 11 - Processor Design49 Review: ALU Control Signals  Functions: Fig B.5.13 (also in Ch. 5 - p. 301) ALU control inputFunction 000AND 001OR 010add 110subtract 111set on less than

50 ECE 313 Fall 2004Lecture 11 - Processor Design50 Control Signals - R-Type Instruction Control signals shown in blue 1 0 0 0 1 ??? Value depends on funct 0 0

51 ECE 313 Fall 2004Lecture 11 - Processor Design51 0 Control Signals - lw Instruction Control signals shown in blue 0 010 1 1 1 0 1

52 ECE 313 Fall 2004Lecture 11 - Processor Design52 0 Control Signals - sw Instruction Control signals shown in blue X 010 1 X 0 1 0

53 ECE 313 Fall 2004Lecture 11 - Processor Design53 Control Signals - beq Instruction Control signals shown in blue X 110 0 X 0 0 0 1 if Zero=1

54 ECE 313 Fall 2004Lecture 11 - Processor Design54 Outline - Processor Implementation  Overview  Single-Cycle Implementation 1.Analyze instruction set; get datapath requirements 2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals   Multi-Cycle Implementation  Pipelined Implementation

55 ECE 313 Fall 2004Lecture 11 - Processor Design55 Control Unit Structure

56 ECE 313 Fall 2004Lecture 11 - Processor Design56 More notes about Control Unit Structure  Control unit as shown: one huge logic block  Idea: decompose into smaller logic blocks  Smaller blocks can be faster  Smaller blocks are easier to work with  Observation (rephrased):  The only control signal that depends on the funct field is the ALU Operation signal  Idea: separate logic for ALU control

57 ECE 313 Fall 2004Lecture 11 - Processor Design57 Modified Control Unit Structure This is called “derived control” or “Local decoding”

58 ECE 313 Fall 2004Lecture 11 - Processor Design58 Datapath with Modified Control Unit

59 ECE 313 Fall 2004Lecture 11 - Processor Design59 Review from Ch. 4: ALU Function  Functions: Fig B.5.13 (also in Ch. 5 - p. 301) ALU control inputFunction 000AND 001OR 010add 110subtract 111set on less than

60 ECE 313 Fall 2004Lecture 11 - Processor Design60 ALU Usage in Processor Design  Usage depends on instruction type  Instruction type (specified by opcode)  funct field (r-type instructions only)  Encode instruction type in ALUOp signal OperationDesired Action lwadd swadd beqsubtract add subsubtract and or slt and or set on less than ALU Ctl. 010 110 010 110 000 001 111 funct XXXXXX 100000 100010 100100 100101 101010 Instr. type data transfer branch r-type ALUOp 00 01 10 XXXXXX means “don’t care”

61 ECE 313 Fall 2004Lecture 11 - Processor Design61 ALU Control - Truth Table (Fig. 5-13)  Use don’t care values to minimize length  Ignore F5, F4 (they are always “10”)  Assume ALUOp never equals “11” Operation 010 110 010 110 000 001 111 ALUOp1 0 X 1 1 1 1 1 ALUOp0 0 1 X X X X X F5 X X F4 X F3 X 0 0 0 0 1 F2 X 0 0 1 1 0 F1 X 0 1 0 0 1 F0 X 0 0 0 1 0 XXXXX XX XX XX XX XX

62 ECE 313 Fall 2004Lecture 11 - Processor Design62 ALU Control - Implementation  Figure C.2.3, page C-6

63 ECE 313 Fall 2004Lecture 11 - Processor Design63 One More Modification - for Branch  BEQ instruction depends on Zero output of ALU  No other instruction uses Zero output  Local decoding  Implement with new "Branch" control signal  Add AND gate to generate PCSelect

64 ECE 313 Fall 2004Lecture 11 - Processor Design64 Processor Design - Branch Modification

65 ECE 313 Fall 2004Lecture 11 - Processor Design65 Control Unit Implementation  Review: Opcodes for key instructions  Control Unit Truth Table: Fill in the blanks (or see Fig. 5-18, p. 308)  Implementation: Decoder + 2 Gates (Fig. C.2.5) Op5Op4Op3Op2Op1Op0 RegDstALUSrcMemtoRegRegWriteMemReadMemWriteBranchALUOp1ALUOp0 000000 100011 101011 000100 OP RT lw sw beq InputOutput

66 ECE 313 Fall 2004Lecture 11 - Processor Design66 Control Unit Implementation Source: Tod Amon's COD2e Slides ©Morgan Kaufmann Publishers

67 ECE 313 Fall 2004Lecture 11 - Processor Design67 Final Extension: Implementing j (jump)  Instruction Format  Register Transfer: PC <- (PC + 4)[31:28] @ ( I[25:0] << 2 )  Remember, it’s unconditional 000010address 6 bits26 bits J-Format

68 ECE 313 Fall 2004Lecture 11 - Processor Design68 Final Extension: Implementing jump

69 ECE 313 Fall 2004Lecture 11 - Processor Design69 The Problem with Single-Cycle Processor Implementation: Performance  Performance is limited by the slowest instruction  Example: suppose we have the following delays  Memory read/write200ps  ALU and adders100ps  Register File read/write50ps  What is the critical path for each instruction?  R-format200 + 50 + 100 + 0 + 50400ps  Load word200 + 50 + 100 + 200 + 50600ps  Store word200 + 50 + 100 + 200550ps  Branch200 + 50 + 100350ps  Jump200200ps

70 ECE 313 Fall 2004Lecture 11 - Processor Design70 Alternatives to Single-Cycle  Multicycle Processor Implementation  Shorter clock cycle  Multiple clock cycles per instruction  Some instructions take more cycles then others  Less hardware required  Pipelined Implementation  Overlap execution of instructions  Try to get short cycle times and low CPI  More hardware required … but also more performance!

71 ECE 313 Fall 2004Lecture 11 - Processor Design71 Outline - Processor Implementation  Overview  Single-Cycle Implementation  Multi-Cycle Implementation  1.Analyze instruction set; get datapath requirements 2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals  Pipelined Implementation


Download ppt "Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 ECE 313 - Computer Organization Lecture 11 - Processor."

Similar presentations


Ads by Google