Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Similar presentations


Presentation on theme: "Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)"— Presentation transcript:

1 Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

2 Gary MarsdenSlide 2University of Cape Town Analogy  Each step is called a pipe stage or pipe segment  Pipelining improves throughput rather than the speed of a given instruction –Concorde vs 747  Only possible in multi-cycle datapaths  All stages must be ready to proceed at same time  Clock cycle determined by slowest stage  Goal: balance length of each stage

3 Gary MarsdenSlide 3University of Cape Town Pipelined datapath  Having 5 steps in an instruction means a 5 stage pipeline –5 instructions being executed at a given time 1.IF: Instruction fetch 2.ID: Instruction Decode 3.EX: Execute and effective address calculation 4.MEM: Memory access 5.WB: Write back

4 Gary MarsdenSlide 4University of Cape Town Comparative timing Reg Write Total time 1 1 8 7 6 5

5 Gary MarsdenSlide 5University of Cape Town View of datapath

6 Gary MarsdenSlide 6University of Cape Town Progression in pipe  General left-right progression –Like a car assembly line  Two exceptions –Write back stage places result back in to register which is in the middle of the datapath –Selection of PC value - could be a branch  Right-left flow may affect subsequent instructions  Like multi-path, we need registers to hold values between stages

7 Gary MarsdenSlide 7University of Cape Town Symbolic view

8 Gary MarsdenSlide 8University of Cape Town Extra registers

9 Gary MarsdenSlide 9University of Cape Town Execution of load instruction

10 Gary MarsdenSlide 10University of Cape Town Execution of store

11 Gary MarsdenSlide 11University of Cape Town Ooops  When doing a write back for ‘lw’ we don’t know where to write!

12 Gary MarsdenSlide 12University of Cape Town A note on notations

13 Gary MarsdenSlide 13University of Cape Town Pipeline control  Just like we did for the single datapath machine, but with a twist –Label control lines on existing data path –Assume PC written on each cycle (no PCWrite) –To control pipeline stage, need only control values for that stage –Usual five stages for control: IF, ID, EXE, MEM, WB

14 Gary MarsdenSlide 14University of Cape Town Pipeline control diagram

15 Gary MarsdenSlide 15University of Cape Town Buffering pipeline control

16 Gary MarsdenSlide 16University of Cape Town Another scary picture

17 Gary MarsdenSlide 17University of Cape Town Observations  Although a new instruction starts every clock cycle, still need 5 cycles to complete  Takes four cycles before we are up to full efficiency  When stage is inactive, control lines are deasserted  Control sequencing is implicit in pipeline stages –No mInstructions like before

18 Gary MarsdenSlide 18University of Cape Town Data hazard  Sequences of instructions with dependencies make high-performance pipelines hard to design –Sub $2,$1,$3; –AND $12, $2, $5 –Oopsie!  Resolving –Forbid the compiler to do this Interleave only independent instructions Use a No-op (wasteful) –Stall –Forward

19 Gary MarsdenSlide 19University of Cape Town Data hazard diagram

20 Gary MarsdenSlide 20University of Cape Town Overcoming data hazards  The ‘add’ problem we can overcome with hardware design –Write register file in first half of clock cycle, read in second  Doesn’t help with ‘and’ and ‘or’ –Need to detect hazard and forward correct value

21 Gary MarsdenSlide 21University of Cape Town Detecting hazards  We can’t get the computer to draw a diagram, instead we use the following notation –1(a) EX/MEM.WriteReg = IF/ID.ReadReg1 –1(b) EX/MEM.WriteReg = IF/ID.ReadReg2 –2(a) MEM/WB.WriteReg = IF/ID.ReadReg1 –2(b) MEM/WB.WriteReg = IF/ID.ReadReg2

22 Gary MarsdenSlide 22University of Cape Town Forwarding  If we can detect a hazard, we can forward the correct value as soon as it is available –We will see how to do this soon  By ‘forwarding’ we can pull the value from the appropriate pipeline register rather than waiting for it to be written back at the end of an instruction

23 Gary MarsdenSlide 23University of Cape Town Forwarding to resolve hazards

24 Gary MarsdenSlide 24University of Cape Town Achieving control for forwarding

25 Gary MarsdenSlide 25University of Cape Town Until…

26 Gary MarsdenSlide 26University of Cape Town Stalls  Forwarding is an efficient way to solve data hazards, but not all can be solved this way

27 Gary MarsdenSlide 27University of Cape Town Load Word problems  We cannot forward when an instruction tries to read a register following a lw that is writing to that register  We need to detect this –Hazard detection unit in addition to the forwarding unit  Conditions –If(ID/EX.MemRead AND –((ID/EX.RegWrite = IF/ID.RegRead1) OR –(ID/EX.RegWrite = IF/ID.RegRead2) )) lw is only instruction to set this line

28 Gary MarsdenSlide 28University of Cape Town Stalls  Once detected, we have to stall execution until the value is available (whereupon it is forwarded)  Sometimes called a ‘bubble’ the idea being that we send an air bubble up the pipe, not data  Not strictly true –The control unit just gets the stalled stages of the pipeline to repeat what they were doing until the value is available

29 Gary MarsdenSlide 29University of Cape Town Bubbles in the pipe

30 Gary MarsdenSlide 30University of Cape Town Adding hazard detection

31 Gary MarsdenSlide 31University of Cape Town Branch hazards  Another type of hazard involves branches: an instruction must be fetched every cycle to keep the pipeline full… but the decision about a branch does not come to the MEM stage  Called a ‘control’ or a ‘branch’ hazard –Occur less frequently than data hazards –Are simple to understand –Not much we can do really

32 Gary MarsdenSlide 32University of Cape Town Effect of a branch

33 Gary MarsdenSlide 33University of Cape Town Coping with branching  Stall subsequent instructions on ‘beq’ –This increases the cost of a branch from one cycle to four cycles  Assume branch not taken –Carry on as before –Only penalty will be if the branch is taken –We can then ‘flush’ buffers

34 Gary MarsdenSlide 34University of Cape Town Lessening the impact  Currently branch decisions are made at stage 4  We could save one stage by getting the value from the buffer at stage 3 (like forwarding)  Can even calculate the branch in first stage! –Move branch adder from MEM to ID stage –Add a bunch of XOR gates to do comparison of register values (do not use the ALU) –Need to alter forwarding unit to cope with this  Impact down to one lost cycle

35 Gary MarsdenSlide 35University of Cape Town Datapath to lessen branch impact

36 Gary MarsdenSlide 36University of Cape Town Branch prediction  ‘Assume branch not taken’ is a very primitive form of branch prediction  We can use a ‘branch prediction buffer’ or ‘branch history table’ to see what happened the last time the branch was executed –Think about loops  Buffers are usually 2-bit –One bit buffers can flip-flop –2 bit buffers need two wrong guesses before they change

37 Gary MarsdenSlide 37University of Cape Town It doesn’t stop there  Some processors support ‘superpipeline’ –These are simply pipelines with more stages  Others have ‘superscalar’ pipelines –Basically the entire pipeline is replicated –Big overhead in control –Usually between 2 to 9 datapaths 4 superscalar pipelines give a CPI of 0.25!  Final wrinkle is dynamic pipeline scheduling –Copes with stalls, stalling the next instruction but allowing, non-dependent, subsequent instructions to go

38 Gary MarsdenSlide 38University of Cape Town Pipelining for real  Both the Pentium and PPC 604 use dynamically scheduled pipelines –Have a 512 entry branch prediction table

39 Gary MarsdenSlide 39University of Cape Town Pipelines in reality  30% of Pentium is legacy

40 Gary MarsdenSlide 40University of Cape Town Pentium fetch/execute  1. Prefetch/Fetch: Instructions are fetched from the instruction cache and aligned in prefetch buffers for decoding.  2. Decode1: Instructions are decoded into the Pentium's internal instruction format. Branch prediction also takes place at this stage.  3. Decode2: Same as above, and microcode ROM kicks in here, if necessary. Also, address computations take place at this stage.  4. Execute: The integer hardware executes the instruction.  5. Write-back: The results of the computation are written back to the register file

41 Gary MarsdenSlide 41University of Cape Town Pentium branch prediction  3 types of prediction –Only 20% miss

42 Gary MarsdenSlide 42University of Cape Town P4 pipeline  20 stages deep

43 Gary MarsdenSlide 43University of Cape Town PowerPC processor  Scary!

44 Gary MarsdenSlide 44University of Cape Town Beware  Pipelining is not as easy as it looks –Subtle and complex interplay  Instruction set has a huge impact on pipeline efficiency –Variable instruction lengths and addressing modes problematic  Increasing depth of pipe does not always improve performance

45 Gary MarsdenSlide 45University of Cape Town Performance trade off

46 Gary MarsdenSlide 46University of Cape Town Comparisons

47 Gary MarsdenSlide 47University of Cape Town Summary  Pipleines speed up throughput  Pipeline has stages corresponding to execution steps of multi-cycle instructions  Requires buffers and special purpose components to be added  Problems with data hazards –Forward and stalling  Problems with branch prediction, –Do nothing, assume not taken, move comparison early, use branch prediction table


Download ppt "Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)"

Similar presentations


Ads by Google