Lecture 14: Processors CS 2011 Fall 2014, Dr. Rozier.

Lecture 14: Processors CS 2011 Fall 2014, Dr. Rozier

BOMB LAB STATUS

Lab Phases: Recursive Phase 1 – Factorial Phase 2 - Fibonacci

Lab Phases: Arrays Phase 4 – Sum Array Phase 5 – Find Item Phase 6 – Bubble Sort

Lab Phases: Trees Array representation: [1,2,3,4,5,6,7,0,0,0,0,0,0,0,0] Phase 7 – Tree Height Phase 8 – Tree Traversal [1,2,5,0,0,4,0,0,3,6,0,0,7,0,0] 1 2 3 4 5 6 7

PROCESSORS

What needs to be done to “Process” an Instruction? Check the PC Fetch the instruction from memory Decode the instruction and set control lines appropriately Execute the instruction – Use ALU – Access Memory – Branch Store Results PC = PC + 4, or PC = branch target

CPU Overview

Chapter 4 — The Processor — 10 Can’t just join wires together Use multiplexers

CPU + Control

Logic Design Basics Information encoded in binary – Low voltage = 0, High voltage = 1 – One wire per bit – Multi-bit data encoded on multi-wire buses Combinational element – Operate on data – Output is a function of input State (sequential) elements – Store information

Combinational Elements AND-gate – Y = A & B A B Y I0 I1 Y MuxMux S Multiplexer Y = S ? I1 : I0 A B Y + A B Y ALU F Adder Y = A + B Arithmetic/Logic Unit Y = F(A, B)

Storing Data?

S-R Latch S – set R – reset Feedback keeps the bit “trapped”.

S-R Latch Characteristic TableExcitation Table SRQ_nextActionQQ_nextSR 00Qhold000X 010reset0110 101set1001 11XN/A11X0

D Flip-Flop We can note in the S-R Latch that S is the complement of R in state changes

D Flip-Flop Feed D and ~D to a gated S-R Latch to create a one input synchronous SR-Latch We’ll call it a D Flip-Flop, just to be difficult.

D Flip-Flop D – input signal E – enable signal, sometimes called clock or control E/CDQ~QNotes 0XQ_prev~Q_prev 1001 1110

D Flip-Flop D – input signal E – enable signal, sometimes called clock or control E/CDQ~QNotes 0XQ_prev~Q_prevNo change 1001Reset 1110Set

Adding the Clock

More Realistic

Register File

Sequential Elements Register: stores data in a circuit – Uses a clock signal to determine when to update the stored value – Edge-triggered: update when Clk changes from 0 to 1 D Clk Q D Q

Sequential Elements Register with write control – Only updates on clock edge when write control input is 1 – Used when stored value is required later D Clk Q Write D Q Clk

Clocking Methodology Combinational logic transforms data during clock cycles – Between clock edges – Input from state elements, output to state element – Longest delay determines clock period

Building a Datapath Datapath – Elements that process data and addresses in the CPU Registers, ALUs, mux’s, memories, … We will build a MIPS datapath incrementally – Refining the overview design

Pipeline Fetch Decode Issue Integer Multiply Floating Point Load Store Write Back

Instruction Fetch 32-bit register Increment by 4 for next instruction

ALU Read two register operands Perform arithmetic/logical operation Write register result

Load/Store Instructions Read register operands Calculate address Load: Read memory and update register Store: Write register value to memory

Branch Instructions?

Datapath With Control

ALU Instruction

Load Instruction

Branch-on-Equal Instruction

Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction memory  register file  ALU  data memory  register file Not feasible to vary period for different instructions Violates design principle – Making the common case fast We will improve performance by pipelining

Pipelining Analogy Pipelined laundry: overlapping execution – Parallelism improves performance §4.5 An Overview of Pipelining Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup = 2n/0.5n + 1.5 ≈ 4 = number of stages

MIPS Pipeline Five stages, one step per stage 1.IF: Instruction fetch from memory 2.ID: Instruction decode & register read 3.EX: Execute operation or calculate address 4.MEM: Access memory operand 5.WB: Write result back to register

Pipeline Performance Assume time for stages is – 100ps for register read or write – 200ps for other stages Compare pipelined datapath with single-cycle datapath InstrInstr fetchRegister read ALU opMemory access Register write Total time lw200ps100 ps200ps 100 ps800ps sw200ps100 ps200ps 700ps R-format200ps100 ps200ps100 ps600ps beq200ps100 ps200ps500ps

Pipeline Performance Single-cycle (T c = 800ps) Pipelined (T c = 200ps)

Pipeline Speedup If all stages are balanced – i.e., all take the same time – Time between instructions pipelined = Time between instructions nonpipelined Number of stages If not balanced, speedup is less Speedup due to increased throughput – Latency (time for each instruction) does not decrease

WRAP UP

For next time Homework Exercises: 3.4.2, 3.4.4 3.10.1 – 3.10.5 Due Tuesday 11/4 Read Chapter 4.1-4.4

Lecture 14: Processors CS 2011 Fall 2014, Dr. Rozier.

Similar presentations

Presentation on theme: "Lecture 14: Processors CS 2011 Fall 2014, Dr. Rozier."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 14: Processors CS 2011 Fall 2014, Dr. Rozier.

Similar presentations

Presentation on theme: "Lecture 14: Processors CS 2011 Fall 2014, Dr. Rozier."— Presentation transcript:

Similar presentations

About project

Feedback