Download presentation
Presentation is loading. Please wait.
Published byAngelina Cross Modified over 8 years ago
1
Lecture 14: Processors CS 2011 Fall 2014, Dr. Rozier
2
BOMB LAB STATUS
3
MP2
4
Lab Phases: Recursive Phase 1 – Factorial Phase 2 - Fibonacci
5
Lab Phases: Arrays Phase 4 – Sum Array Phase 5 – Find Item Phase 6 – Bubble Sort
6
Lab Phases: Trees Array representation: [1,2,3,4,5,6,7,0,0,0,0,0,0,0,0] Phase 7 – Tree Height Phase 8 – Tree Traversal [1,2,5,0,0,4,0,0,3,6,0,0,7,0,0] 1 2 3 4 5 6 7
7
PROCESSORS
8
What needs to be done to “Process” an Instruction? Check the PC Fetch the instruction from memory Decode the instruction and set control lines appropriately Execute the instruction – Use ALU – Access Memory – Branch Store Results PC = PC + 4, or PC = branch target
9
CPU Overview
10
Chapter 4 — The Processor — 10 Can’t just join wires together Use multiplexers
11
CPU + Control
12
Logic Design Basics Information encoded in binary – Low voltage = 0, High voltage = 1 – One wire per bit – Multi-bit data encoded on multi-wire buses Combinational element – Operate on data – Output is a function of input State (sequential) elements – Store information
13
Combinational Elements AND-gate – Y = A & B A B Y I0 I1 Y MuxMux S Multiplexer Y = S ? I1 : I0 A B Y + A B Y ALU F Adder Y = A + B Arithmetic/Logic Unit Y = F(A, B)
14
Storing Data?
15
S-R Latch S – set R – reset Feedback keeps the bit “trapped”.
16
S-R Latch Characteristic TableExcitation Table SRQ_nextActionQQ_nextSR 00Qhold000X 010reset0110 101set1001 11XN/A11X0
17
D Flip-Flop We can note in the S-R Latch that S is the complement of R in state changes
18
D Flip-Flop Feed D and ~D to a gated S-R Latch to create a one input synchronous SR-Latch We’ll call it a D Flip-Flop, just to be difficult.
19
D Flip-Flop D – input signal E – enable signal, sometimes called clock or control E/CDQ~QNotes 0XQ_prev~Q_prev 1001 1110
20
D Flip-Flop D – input signal E – enable signal, sometimes called clock or control E/CDQ~QNotes 0XQ_prev~Q_prevNo change 1001Reset 1110Set
21
Adding the Clock
22
More Realistic
23
Register File
24
Sequential Elements Register: stores data in a circuit – Uses a clock signal to determine when to update the stored value – Edge-triggered: update when Clk changes from 0 to 1 D Clk Q D Q
25
Sequential Elements Register with write control – Only updates on clock edge when write control input is 1 – Used when stored value is required later D Clk Q Write D Q Clk
26
Clocking Methodology Combinational logic transforms data during clock cycles – Between clock edges – Input from state elements, output to state element – Longest delay determines clock period
27
Building a Datapath Datapath – Elements that process data and addresses in the CPU Registers, ALUs, mux’s, memories, … We will build a MIPS datapath incrementally – Refining the overview design
28
Pipeline Fetch Decode Issue Integer Multiply Floating Point Load Store Write Back
29
Instruction Fetch 32-bit register Increment by 4 for next instruction
30
ALU Read two register operands Perform arithmetic/logical operation Write register result
31
Load/Store Instructions Read register operands Calculate address Load: Read memory and update register Store: Write register value to memory
32
Branch Instructions?
33
Datapath With Control
34
ALU Instruction
35
Load Instruction
36
Branch-on-Equal Instruction
37
Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction memory register file ALU data memory register file Not feasible to vary period for different instructions Violates design principle – Making the common case fast We will improve performance by pipelining
38
Pipelining Analogy Pipelined laundry: overlapping execution – Parallelism improves performance §4.5 An Overview of Pipelining Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup = 2n/0.5n + 1.5 ≈ 4 = number of stages
39
MIPS Pipeline Five stages, one step per stage 1.IF: Instruction fetch from memory 2.ID: Instruction decode & register read 3.EX: Execute operation or calculate address 4.MEM: Access memory operand 5.WB: Write result back to register
40
Pipeline Performance Assume time for stages is – 100ps for register read or write – 200ps for other stages Compare pipelined datapath with single-cycle datapath InstrInstr fetchRegister read ALU opMemory access Register write Total time lw200ps100 ps200ps 100 ps800ps sw200ps100 ps200ps 700ps R-format200ps100 ps200ps100 ps600ps beq200ps100 ps200ps500ps
41
Pipeline Performance Single-cycle (T c = 800ps) Pipelined (T c = 200ps)
42
Pipeline Speedup If all stages are balanced – i.e., all take the same time – Time between instructions pipelined = Time between instructions nonpipelined Number of stages If not balanced, speedup is less Speedup due to increased throughput – Latency (time for each instruction) does not decrease
43
WRAP UP
44
For next time Homework Exercises: 3.4.2, 3.4.4 3.10.1 – 3.10.5 Due Tuesday 11/4 Read Chapter 4.1-4.4
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.