Lecture 14: Processors CS 2011 Fall 2014, Dr. Rozier
BOMB LAB STATUS
MP2
Lab Phases: Recursive Phase 1 – Factorial Phase 2 - Fibonacci
Lab Phases: Arrays Phase 4 – Sum Array Phase 5 – Find Item Phase 6 – Bubble Sort
Lab Phases: Trees Array representation: [1,2,3,4,5,6,7,0,0,0,0,0,0,0,0] Phase 7 – Tree Height Phase 8 – Tree Traversal [1,2,5,0,0,4,0,0,3,6,0,0,7,0,0]
PROCESSORS
What needs to be done to “Process” an Instruction? Check the PC Fetch the instruction from memory Decode the instruction and set control lines appropriately Execute the instruction – Use ALU – Access Memory – Branch Store Results PC = PC + 4, or PC = branch target
CPU Overview
Chapter 4 — The Processor — 10 Can’t just join wires together Use multiplexers
CPU + Control
Logic Design Basics Information encoded in binary – Low voltage = 0, High voltage = 1 – One wire per bit – Multi-bit data encoded on multi-wire buses Combinational element – Operate on data – Output is a function of input State (sequential) elements – Store information
Combinational Elements AND-gate – Y = A & B A B Y I0 I1 Y MuxMux S Multiplexer Y = S ? I1 : I0 A B Y + A B Y ALU F Adder Y = A + B Arithmetic/Logic Unit Y = F(A, B)
Storing Data?
S-R Latch S – set R – reset Feedback keeps the bit “trapped”.
S-R Latch Characteristic TableExcitation Table SRQ_nextActionQQ_nextSR 00Qhold000X 010reset set XN/A11X0
D Flip-Flop We can note in the S-R Latch that S is the complement of R in state changes
D Flip-Flop Feed D and ~D to a gated S-R Latch to create a one input synchronous SR-Latch We’ll call it a D Flip-Flop, just to be difficult.
D Flip-Flop D – input signal E – enable signal, sometimes called clock or control E/CDQ~QNotes 0XQ_prev~Q_prev
D Flip-Flop D – input signal E – enable signal, sometimes called clock or control E/CDQ~QNotes 0XQ_prev~Q_prevNo change 1001Reset 1110Set
Adding the Clock
More Realistic
Register File
Sequential Elements Register: stores data in a circuit – Uses a clock signal to determine when to update the stored value – Edge-triggered: update when Clk changes from 0 to 1 D Clk Q D Q
Sequential Elements Register with write control – Only updates on clock edge when write control input is 1 – Used when stored value is required later D Clk Q Write D Q Clk
Clocking Methodology Combinational logic transforms data during clock cycles – Between clock edges – Input from state elements, output to state element – Longest delay determines clock period
Building a Datapath Datapath – Elements that process data and addresses in the CPU Registers, ALUs, mux’s, memories, … We will build a MIPS datapath incrementally – Refining the overview design
Pipeline Fetch Decode Issue Integer Multiply Floating Point Load Store Write Back
Instruction Fetch 32-bit register Increment by 4 for next instruction
ALU Read two register operands Perform arithmetic/logical operation Write register result
Load/Store Instructions Read register operands Calculate address Load: Read memory and update register Store: Write register value to memory
Branch Instructions?
Datapath With Control
ALU Instruction
Load Instruction
Branch-on-Equal Instruction
Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction memory register file ALU data memory register file Not feasible to vary period for different instructions Violates design principle – Making the common case fast We will improve performance by pipelining
Pipelining Analogy Pipelined laundry: overlapping execution – Parallelism improves performance §4.5 An Overview of Pipelining Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup = 2n/0.5n ≈ 4 = number of stages
MIPS Pipeline Five stages, one step per stage 1.IF: Instruction fetch from memory 2.ID: Instruction decode & register read 3.EX: Execute operation or calculate address 4.MEM: Access memory operand 5.WB: Write result back to register
Pipeline Performance Assume time for stages is – 100ps for register read or write – 200ps for other stages Compare pipelined datapath with single-cycle datapath InstrInstr fetchRegister read ALU opMemory access Register write Total time lw200ps100 ps200ps 100 ps800ps sw200ps100 ps200ps 700ps R-format200ps100 ps200ps100 ps600ps beq200ps100 ps200ps500ps
Pipeline Performance Single-cycle (T c = 800ps) Pipelined (T c = 200ps)
Pipeline Speedup If all stages are balanced – i.e., all take the same time – Time between instructions pipelined = Time between instructions nonpipelined Number of stages If not balanced, speedup is less Speedup due to increased throughput – Latency (time for each instruction) does not decrease
WRAP UP
For next time Homework Exercises: 3.4.2, – Due Tuesday 11/4 Read Chapter