Download presentation
Presentation is loading. Please wait.
Published byLester Boone Modified over 8 years ago
1
Gary MarsdenSlide 1University of Cape Town Pipelining Technique where multiple instructions are overlapped in execution (key for speed)
2
Gary MarsdenSlide 2University of Cape Town Analogy Each step is called a pipe stage or pipe segment Pipelining improves throughput rather than the speed of a given instruction –Concorde vs 747 Only possible in multi-cycle datapaths All stages must be ready to proceed at same time Clock cycle determined by slowest stage Goal: balance length of each stage
3
Gary MarsdenSlide 3University of Cape Town Pipelined datapath Having 5 steps in an instruction means a 5 stage pipeline –5 instructions being executed at a given time 1.IF: Instruction fetch 2.ID: Instruction Decode 3.EX: Execute and effective address calculation 4.MEM: Memory access 5.WB: Write back
4
Gary MarsdenSlide 4University of Cape Town Comparative timing Reg Write Total time 1 1 8 7 6 5
5
Gary MarsdenSlide 5University of Cape Town View of datapath
6
Gary MarsdenSlide 6University of Cape Town Progression in pipe General left-right progression –Like a car assembly line Two exceptions –Write back stage places result back in to register which is in the middle of the datapath –Selection of PC value - could be a branch Right-left flow may affect subsequent instructions Like multi-path, we need registers to hold values between stages
7
Gary MarsdenSlide 7University of Cape Town Symbolic view
8
Gary MarsdenSlide 8University of Cape Town Extra registers
9
Gary MarsdenSlide 9University of Cape Town Execution of load instruction
10
Gary MarsdenSlide 10University of Cape Town Execution of store
11
Gary MarsdenSlide 11University of Cape Town Ooops When doing a write back for ‘lw’ we don’t know where to write!
12
Gary MarsdenSlide 12University of Cape Town A note on notations
13
Gary MarsdenSlide 13University of Cape Town Pipeline control Just like we did for the single datapath machine, but with a twist –Label control lines on existing data path –Assume PC written on each cycle (no PCWrite) –To control pipeline stage, need only control values for that stage –Usual five stages for control: IF, ID, EXE, MEM, WB
14
Gary MarsdenSlide 14University of Cape Town Pipeline control diagram
15
Gary MarsdenSlide 15University of Cape Town Buffering pipeline control
16
Gary MarsdenSlide 16University of Cape Town Another scary picture
17
Gary MarsdenSlide 17University of Cape Town Observations Although a new instruction starts every clock cycle, still need 5 cycles to complete Takes four cycles before we are up to full efficiency When stage is inactive, control lines are deasserted Control sequencing is implicit in pipeline stages –No mInstructions like before
18
Gary MarsdenSlide 18University of Cape Town Data hazard Sequences of instructions with dependencies make high-performance pipelines hard to design –Sub $2,$1,$3; –AND $12, $2, $5 –Oopsie! Resolving –Forbid the compiler to do this Interleave only independent instructions Use a No-op (wasteful) –Stall –Forward
19
Gary MarsdenSlide 19University of Cape Town Data hazard diagram
20
Gary MarsdenSlide 20University of Cape Town Overcoming data hazards The ‘add’ problem we can overcome with hardware design –Write register file in first half of clock cycle, read in second Doesn’t help with ‘and’ and ‘or’ –Need to detect hazard and forward correct value
21
Gary MarsdenSlide 21University of Cape Town Detecting hazards We can’t get the computer to draw a diagram, instead we use the following notation –1(a) EX/MEM.WriteReg = IF/ID.ReadReg1 –1(b) EX/MEM.WriteReg = IF/ID.ReadReg2 –2(a) MEM/WB.WriteReg = IF/ID.ReadReg1 –2(b) MEM/WB.WriteReg = IF/ID.ReadReg2
22
Gary MarsdenSlide 22University of Cape Town Forwarding If we can detect a hazard, we can forward the correct value as soon as it is available –We will see how to do this soon By ‘forwarding’ we can pull the value from the appropriate pipeline register rather than waiting for it to be written back at the end of an instruction
23
Gary MarsdenSlide 23University of Cape Town Forwarding to resolve hazards
24
Gary MarsdenSlide 24University of Cape Town Achieving control for forwarding
25
Gary MarsdenSlide 25University of Cape Town Until…
26
Gary MarsdenSlide 26University of Cape Town Stalls Forwarding is an efficient way to solve data hazards, but not all can be solved this way
27
Gary MarsdenSlide 27University of Cape Town Load Word problems We cannot forward when an instruction tries to read a register following a lw that is writing to that register We need to detect this –Hazard detection unit in addition to the forwarding unit Conditions –If(ID/EX.MemRead AND –((ID/EX.RegWrite = IF/ID.RegRead1) OR –(ID/EX.RegWrite = IF/ID.RegRead2) )) lw is only instruction to set this line
28
Gary MarsdenSlide 28University of Cape Town Stalls Once detected, we have to stall execution until the value is available (whereupon it is forwarded) Sometimes called a ‘bubble’ the idea being that we send an air bubble up the pipe, not data Not strictly true –The control unit just gets the stalled stages of the pipeline to repeat what they were doing until the value is available
29
Gary MarsdenSlide 29University of Cape Town Bubbles in the pipe
30
Gary MarsdenSlide 30University of Cape Town Adding hazard detection
31
Gary MarsdenSlide 31University of Cape Town Branch hazards Another type of hazard involves branches: an instruction must be fetched every cycle to keep the pipeline full… but the decision about a branch does not come to the MEM stage Called a ‘control’ or a ‘branch’ hazard –Occur less frequently than data hazards –Are simple to understand –Not much we can do really
32
Gary MarsdenSlide 32University of Cape Town Effect of a branch
33
Gary MarsdenSlide 33University of Cape Town Coping with branching Stall subsequent instructions on ‘beq’ –This increases the cost of a branch from one cycle to four cycles Assume branch not taken –Carry on as before –Only penalty will be if the branch is taken –We can then ‘flush’ buffers
34
Gary MarsdenSlide 34University of Cape Town Lessening the impact Currently branch decisions are made at stage 4 We could save one stage by getting the value from the buffer at stage 3 (like forwarding) Can even calculate the branch in first stage! –Move branch adder from MEM to ID stage –Add a bunch of XOR gates to do comparison of register values (do not use the ALU) –Need to alter forwarding unit to cope with this Impact down to one lost cycle
35
Gary MarsdenSlide 35University of Cape Town Datapath to lessen branch impact
36
Gary MarsdenSlide 36University of Cape Town Branch prediction ‘Assume branch not taken’ is a very primitive form of branch prediction We can use a ‘branch prediction buffer’ or ‘branch history table’ to see what happened the last time the branch was executed –Think about loops Buffers are usually 2-bit –One bit buffers can flip-flop –2 bit buffers need two wrong guesses before they change
37
Gary MarsdenSlide 37University of Cape Town It doesn’t stop there Some processors support ‘superpipeline’ –These are simply pipelines with more stages Others have ‘superscalar’ pipelines –Basically the entire pipeline is replicated –Big overhead in control –Usually between 2 to 9 datapaths 4 superscalar pipelines give a CPI of 0.25! Final wrinkle is dynamic pipeline scheduling –Copes with stalls, stalling the next instruction but allowing, non-dependent, subsequent instructions to go
38
Gary MarsdenSlide 38University of Cape Town Pipelining for real Both the Pentium and PPC 604 use dynamically scheduled pipelines –Have a 512 entry branch prediction table
39
Gary MarsdenSlide 39University of Cape Town Pipelines in reality 30% of Pentium is legacy
40
Gary MarsdenSlide 40University of Cape Town Pentium fetch/execute 1. Prefetch/Fetch: Instructions are fetched from the instruction cache and aligned in prefetch buffers for decoding. 2. Decode1: Instructions are decoded into the Pentium's internal instruction format. Branch prediction also takes place at this stage. 3. Decode2: Same as above, and microcode ROM kicks in here, if necessary. Also, address computations take place at this stage. 4. Execute: The integer hardware executes the instruction. 5. Write-back: The results of the computation are written back to the register file
41
Gary MarsdenSlide 41University of Cape Town Pentium branch prediction 3 types of prediction –Only 20% miss
42
Gary MarsdenSlide 42University of Cape Town P4 pipeline 20 stages deep
43
Gary MarsdenSlide 43University of Cape Town PowerPC processor Scary!
44
Gary MarsdenSlide 44University of Cape Town Beware Pipelining is not as easy as it looks –Subtle and complex interplay Instruction set has a huge impact on pipeline efficiency –Variable instruction lengths and addressing modes problematic Increasing depth of pipe does not always improve performance
45
Gary MarsdenSlide 45University of Cape Town Performance trade off
46
Gary MarsdenSlide 46University of Cape Town Comparisons
47
Gary MarsdenSlide 47University of Cape Town Summary Pipleines speed up throughput Pipeline has stages corresponding to execution steps of multi-cycle instructions Requires buffers and special purpose components to be added Problems with data hazards –Forward and stalling Problems with branch prediction, –Do nothing, assume not taken, move comparison early, use branch prediction table
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.