Chapter 5 Processor Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides We're ready to look at an implementation of the MIPS Simplified.

Chapter 5 Processor Design

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides We're ready to look at an implementation of the MIPS Simplified to contain only: memory-reference instructions: lw, sw arithmetic-logical instructions: add, sub, and, or, slt control flow instructions: beq, j Generic Implementation: use the program counter (PC) to supply instruction address get the instruction from memory read registers use the instruction to decide exactly what to do All instructions use the ALU after reading the registers Why? memory-reference? arithmetic? control flow? The Processor: Datapath & Control

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Abstract / Simplified View: Two types of functional units: elements that operate on data values (combinational) elements that contain state (sequential) More Implementation Details

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Our Implementation An edge triggered methodology Typical execution: read contents of some state elements, send values through some combinational logic write results to one or more state elements

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Built using D flip-flops Register File Do you understand? What is the “Mux” above?

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Register File Note: we still use the real clock to determine when to write

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Simple Implementation Include the functional units we need for each instruction Why do we need this stuff?

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Information flow – add instruction PC Address bus Data bus Instruction memory Read register 1 Read register 2 Write register Write data Read data 1 Read data 2 ALU rs rt rd Register operands Rd = Rs + Rt Instruction address Instruction code Register file 4 + Write register Write PC Function select 000000 rs rt rd shamt 100000 opcode function

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Information flow – lw instruction PC Address bus Data bus Instruction memory Read register 1 Read register 2 Write register Write data Read data 1 Read data 2 ALU rs rt rd Register file 4 + Write register Write PC Function select Address bus Data out Data memory Sign extend Read memory 16-bit displacement 32-bit displacement 100011 rs rt displacement opcode Rt = Mem[Rs+disp] operand address

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Information flow – sw instruction PC Address bus Data bus Instruction memory Read register 1 Read register 2 Write register Write data Read data 1 Read data 2 ALU rs Register file 4 + Write register Write PC Function select Address bus Data out Data memory Sign extend Read memory 16-bit constant 32-bit displacement Write memory Data in 101011 rs rt displacement opcode Mem[Rs+disp] = Rt rt rd

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Information flow – beq instruction PC Address bus Data bus Instruction memory Read register 1 Read register 2 Write register Write data Read data 1 Read data 2 ALU rs Register file 4 + Write register Write PC Function select Address bus Data out Data memory Sign extend Read memory Write memory Data in 000100 rs rt displacement Shift x 2 + branch address zero pc + 4 pc offset (shifted) rt rd

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Control Unit Selecting the operations to perform (ALU, read/write, etc.) Controlling the flow of data (multiplexor inputs) Information comes from the 32 bits of the instruction Example: add $8, $17, $18 Instruction Format: 000000 10001 10010 01000 00000 100000 op rs rtrd shamtfunct ALU's operation based on instruction type & function code

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Ex., what should the ALU do with this instruction lw $1, 100($2) 35 2 1 100 op rs rt 16 bit offset ALU control input 0000 AND 0001OR 0010add 0110subtract 0111set-on-less-than 1100NOR Refer to previous mux-based ALU design for control coding ALU Function Control

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Must describe hardware to compute 4-bit ALU control input given instruction type 00 = lw, sw 01 = beq, 10 = arithmetic derive function code for arithmetic Describe it using a truth table (can turn into gates): ALUOp computed from instruction type ALU Function Control

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Control Unit Design – Identify Control Points Text – Fig. 5.24

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Control Logic

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Performance of Single Cycle Implementation Cycle time determined by length of the longest path Calculate cycle time assuming negligible delays except: memory (200ps), ALU/adders (100ps), register file access (50ps) Instruction Class Instruction Memory Register Read ALU Operation Data Memory Register WriteTotal R-type20050100050400 ps LW2005010020050600 ps SW20050100200550 ps BEQ20050100350 ps J200200 ps

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Muticycle Approach Single Cycle Problems: what if we had a more complicated instruction like floating point? wasteful of area (ex. multiple adder circuits) clock cycle must accommodate longest delay path (wasted time for many instructions) One Solution: partition hardware into multiple stages use a “smaller” cycle time – to accommodate longest “stage” have different instructions take different numbers of cycles reuse components for different functions in different cycles use one ALU for instruction operands and next PC value use one memory for instructions and data use a finite state machine for control

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Break up the instructions into steps, each step takes a cycle balance the amount of work to be done restrict each cycle to use only one major functional unit At the end of a cycle store values for use in later cycles (easiest thing to do) introduce additional “internal” registers Multicycle Approach

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Instructions from ISA perspective Consider each instruction from perspective of ISA. Example: The add instruction changes a register. Register specified by bits 15:11 of instruction. Instruction specified by the PC. New value is the sum (“op”) of two registers. Registers specified by bits 25:21 and 20:16 of the instruction Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] op Reg[Memory[PC][20:16]] In order to accomplish this we must break up the instruction. (kind of like introducing variables when programming)

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Breaking down an instruction ISA definition of arithmetic: Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] op Reg[Memory[PC][20:16]] Could break down to: IR <= Memory[PC] A <= Reg[IR[25:21]] B <= Reg[IR[20:16]] ALUOut <= A op B Reg[IR[15:11]] <= ALUOut We forgot an important part of the instruction execution! PC <= PC + 4

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Idea behind multicycle approach We define each instruction from the ISA perspective (do this!) Break it down into steps following our rule that data flows through at most one major functional unit (e.g., balance work across steps) Introduce new registers as needed (e.g, A, B, ALUOut, MDR, etc.) Finally try and pack as much work into each step (avoid unnecessary cycles) while also trying to share steps where possible (minimizes control, helps to simplify solution) Result: Our book’s multicycle Implementation!

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Instruction Fetch Instruction Decode and Register Fetch Execution, Memory Address Computation, or Branch Completion Memory Access or R-type instruction completion Write-back step INSTRUCTIONS TAKE FROM 3 - 5 CYCLES! Five Execution Steps

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Use PC to get instruction and put it in the Instruction Register. Increment the PC by 4 and put the result back in the PC. Can be described succinctly using RTL "Register-Transfer Language" IR <= Memory[PC]; PC <= PC + 4; Can we figure out the values of the control signals? What is the advantage of updating the PC now? Step 1: Instruction Fetch

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Read registers rs and rt in case we need them Compute the branch address in case the instruction is a branch RTL: A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]]; ALUOut <= PC + (sign-extend(IR[15:0]) << 2); We aren't setting any control lines based on the instruction type (we are busy "decoding" it in our control logic) Step 2: Instruction Decode and Register Fetch

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides ALU is performing one of three functions, based on instruction type Memory Reference: ALUOut <= A + sign-extend(IR[15:0]); R-type: ALUOut <= A op B; Branch: if (A==B) PC <= ALUOut; Step 3 (instruction dependent)

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Loads and stores access memory MDR <= Memory[ALUOut]; or Memory[ALUOut] <= B; R-type instructions finish Reg[IR[15:11]] <= ALUOut; The write actually takes place at the end of the cycle on the edge Step 4 (R-type or memory-access)

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Reg[IR[20:16]] <= MDR; Which instruction needs this? Step 5 (Write-back step)

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Summary:

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Complete Multi-Cycle Datapath Text – Fig. 5.28

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Finite state machines: a set of states and next state function (determined by current state and the input) output function (determined by current state and possibly input) Review: finite state machines We’ll use a Moore machine (output based only on current state) Combinational Logic Memory Outputs (control Signals) Next State Present State Inputs (opcode)

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Value of control signals is dependent upon: what instruction is being executed which step is being performed Use the information we’ve accumulated to specify a finite state machine specify the finite state machine graphically, or use microprogramming Implementation can be derived from specification Implementing the Control

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides State transitions next state is function of current state and inputs Timing control signals load registers/memory asserted if name only otherwise not asserted Path setup control signals control mux’s and ALU exact value listed don’t care if not listed Graphical Specification of FSM Text – Fig. 5.38

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Fetch & decode cycles

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Execute LW/SW

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Execute R-type

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Execute BEQ & J

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Implementation: Finite State Machine for Control

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides PLA Implementation

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides ROM = "Read Only Memory" values of memory locations are fixed ahead of time 2 10 x 20 = 20K bits ROM can be used to implement the controller truth table m = 10 address inputs: 6 opcode + 4 state bits n = 20 outputs: 16 datapath-control signals + 4 state bits Rather wasteful, since for lots of the entries, the outputs are the same — i.e., opcode is often ignored ROM Implementation mn 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 1 1 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 1 0 1 1 1 0 1 1 1

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Complex instructions: the "next state" is often current state + 1 Another Implementation Style

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Details Dispatch ROM 1Dispatch ROM 2 OpOpcode nameValueOpOpcode nameValue 000000R-format0110100011 lw 0011 000010 jmp 1001101011 sw 0101 000100 beq 1000 100011 lw 0010 101011 sw 0010

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Microprogramming

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides A specification methodology appropriate if hundreds of opcodes, modes, cycles, etc. signals specified symbolically using microinstructions Will two implementations of the same architecture have the same microcode? What would a microassembler do? Microprogramming

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Microinstruction format

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides No encoding: 1 bit for each datapath operation faster, requires more memory (logic) used for Vax 780 — an astonishing 400K of memory! Lots of encoding: send the microinstructions through logic to get control signals uses less memory, slower Historical context of CISC: Too much logic to put on a single chip with everything else Use a ROM (or even RAM) to hold the microcode It’s easy to add new instructions Maximally vs. Minimally Encoded

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Microcode: Trade-offs Distinction between specification and implementation is sometimes blurred Specification Advantages: Easy to design and write Design architecture and microcode in parallel Implementation (off-chip ROM) Advantages Easy to change since values are in memory Can emulate other architectures Can make use of internal registers Implementation Disadvantages, SLOWER now that: Control is implemented on same chip as processor ROM is no longer faster than RAM No need to go back and make changes

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Historical Perspective In the ‘60s and ‘70s microprogramming was very important for implementing machines This led to more sophisticated ISAs and the VAX In the ‘80s RISC processors based on pipelining became popular Pipelining the microinstructions is also possible! Implementations of IA-32 architecture processors since 486 use: “hardwired control” for simpler instructions (few cycles, FSM control implemented using PLA or random logic) “microcoded control” for more complex instructions (large numbers of cycles, central control store) The IA-64 architecture uses a RISC-style ISA and can be implemented without a large central control store

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Chapter 5 Summary If we understand the instructions… We can build a simple processor! If instructions take different amounts of time, multicycle is better Datapath implemented using: Combinational logic for arithmetic State holding elements to remember bits Control implemented using: Combinational logic for single-cycle implementation Finite state machine for multi-cycle implementation

Chapter 5 Processor Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides We're ready to look at an implementation of the MIPS Simplified.

Similar presentations

Presentation on theme: "Chapter 5 Processor Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides We're ready to look at an implementation of the MIPS Simplified."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 5 Processor Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides We're ready to look at an implementation of the MIPS Simplified.

Similar presentations

Presentation on theme: "Chapter 5 Processor Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides We're ready to look at an implementation of the MIPS Simplified."— Presentation transcript:

Similar presentations

About project

Feedback