Presentation is loading. Please wait.

Presentation is loading. Please wait.

PipelinedImplementation Part II PipelinedImplementation.

Similar presentations


Presentation on theme: "PipelinedImplementation Part II PipelinedImplementation."— Presentation transcript:

1 PipelinedImplementation Part II PipelinedImplementation

2 – 2 – Processor Overview Make the pipelined processor work! Data Hazards Instruction having register R as source follows shortly after instruction having register R as destination Common condition, don’t want to slow down pipeline Control Hazards Mispredict conditional branch Our design predicts all branches as being taken Naïve pipeline executes two extra instructions Getting return address for ret instruction PIPE- executes three extra instructions Making Sure It Really Works What if multiple special cases happen simultaneously?

3 – 3 – Processor Suggested Reading - Chap 4.5

4 – 4 – Processor Branch Misprediction Example Should only execute first 7 instructions 0x000: xorl %eax,%eax 0x002: jne t # Not taken 0x007: irmovl $1, %eax # Fall through 0x00d: nop 0x00e: nop 0x00f: nop 0x010: halt 0x011: t: irmovl $3, %edx # Target (Should not execute) 0x017: irmovl $4, %ecx # Should not execute 0x01d: irmovl $5, %edx # Should not execute demo-j.ys

5 – 5 – Processor Branch Misprediction Trace Incorrectly execute two instructions at branch target

6 – 6 – Processor 0x000: irmovl Stack,%esp # Intialize stack pointer 0x006: nop # Avoid hazard on %esp 0x007: nop 0x008: nop 0x009: call p # Procedure call 0x00e: irmovl $5,%esi # Return point 0x014: halt 0x020:.pos 0x20 0x020: p: nop# procedure 0x021: nop 0x022: nop 0x023: ret 0x024: irmovl $1,%eax # Should not be executed 0x02a: irmovl $2,%ecx # Should not be executed 0x030: irmovl $3,%edx # Should not be executed 0x036: irmovl $4,%ebx # Should not be executed 0x100:.pos 0x100 0x100: Stack: # Stack: Stack pointer Return Example Require lots of nops to avoid data hazards demo-ret.ys

7 – 7 – Processor Incorrect Return Example Incorrectly execute 3 instructions following ret

8 – 8 – Processor Handling Misprediction Predict branch as taken Fetch 2 instructions at target Cancel when mispredicted Detect branch not-taken in execute stage On following cycle, replace instructions in execute and decode by bubbles No side effects have occurred yet Figure 4.63 P346

9 – 9 – Processor Detecting Mispredicted Branch ConditionTrigger Mispredicted Branch E_icode = IJXX & !e_Bch Figure 4.64 P347

10 – 10 – Processor Control for Misprediction 0x000:xorl%eax,%eax 123456789 FDEMWFDEMW 0x002:jne t # Not taken FDEMWFDEMW EMW 10 # demo-j.ys 0x011: t:irmovl$2,%edx# Target bubble 0x017:irmovl$3,%ebx# Target+1 FD EMW D F bubble 0x007:irmovl$1,%eax# Fall through 0x00d:nop FDEMWFDEMW FDEMWFDEMW ConditionFDEMW Mispredicted Branch normalbubblebubblenormalnormal Figure 4.63 P346 Figure 4.66 P348

11 – 11 – Processor 0x000: irmovl Stack,%esp # Initialize stack pointer 0x006: call p # Procedure call 0x00b: irmovl $5,%esi # Return point 0x011: halt 0x020:.pos 0x20 0x020: p: irmovl $-1,%edi # procedure 0x026: ret 0x027: irmovl $1,%eax # Should not be executed 0x02d: irmovl $2,%ecx # Should not be executed 0x033: irmovl $3,%edx # Should not be executed 0x039: irmovl $4,%ebx # Should not be executed 0x100:.pos 0x100 0x100: Stack: # Stack: Stack pointer Return Example Previously executed three additional instructions demo-retb.ys

12 – 12 – Processor 0x026: ret FDEM W bubble FDEM W FDEMW FDEMW 0x00b:irmovl$5,%esi# Return FDEMW # demo-retb FDEMW F valC  5 rB  %esi F valC  5 rB  %esi W valM= 0x0b W valM= 0x0b Correct Return Example As ret passes through pipeline, stall at fetch stage While in decode, execute, and memory stage fetch the same instruction after ret 3 times. Inject bubble into decode stage Release stall when reach write-back stage Figure 4.61 P344

13 – 13 – Processor Detecting Return ConditionTrigger Processing ret IRET in { D_icode, E_icode, M_icode } Figure 4.64 P347

14 – 14 – Processor 0x026: ret FDEM W bubble FDEM W FDEMW FDEMW 0x00b:irmovl$5,%esi# Return FDEMW # demo-retb FDEMW Control for Return ConditionFDEMW Processing ret stallbubblenormalnormalnormal Figure 4.66 P348

15 – 15 – Processor Special Control Cases DetectionAction ConditionTrigger Processing ret IRET in { D_icode, E_icode, M_icode } Load/Use Hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB } Mispredicted Branch E_icode = IJXX & !e_Bch ConditionFDEMW Processing ret stallbubblenormalnormalnormal Load/Use Hazard stallstallbubblenormalnormal Mispredicted Branch normalbubblebubblenormalnormal Figure 4.64 P347 Figure 4.66 P348

16 – 16 – Processor Implementing Pipeline Control Combinational logic generates pipeline control signals Action occurs at start of following cycle Figure 4.68 P351

17 – 17 – Processor Initial Version of Pipeline Control bool F_stall = # Conditions for a load/use hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB } || # Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode }; bool D_stall = # Conditions for a load/use hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB }; bool D_bubble = # Mispredicted branch (E_icode == IJXX && !e_Bch) || # Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode }; bool E_bubble = # Mispredicted branch (E_icode == IJXX && !e_Bch) || # Load/use hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB};

18 – 18 – Processor Control Combinations Special cases that can arise on same clock cycle Combination A Not-taken branch ret instruction at branch target Combination B Instruction that reads from memory to %esp Followed by ret instruction Figure 4.67 P349

19 – 19 – Processor Control Combination A Should handle as mispredicted branch Stalls F pipeline register But PC selection logic will be using M_valA anyhow JXX E D M Mispredict JXX E D M Mispredict E ret D M 1 E D M 1 E D M 1 Combination A ConditionFDEMW Processing ret stallbubblenormalnormalnormal Mispredicted Branch normalbubblebubblenormalnormal Combinationstallbubblebubblenormalnormal

20 – 20 – Processor Control Combination B Would attempt to bubble and stall pipeline register D Signaled by processor as pipeline error Load E Use D M Load/use E ret D M 1 E D M 1 E D M 1 Combination B ConditionFDEMW Processing ret stallbubblenormalnormalnormal Load/Use Hazard stallstallbubblenormalnormal Combinationstall bubble + stall bubblenormalnormal

21 – 21 – Processor Handling Control Combination B Load/use hazard should get priority ret instruction should be held in decode stage for additional cycle Load E Use D M Load/use E ret D M 1 E D M 1 E D M 1 Combination B ConditionFDEMW Processing ret stallbubblenormalnormalnormal Load/Use Hazard stallstallbubblenormalnormal Combinationstallstallbubblenormalnormal

22 – 22 – Processor Corrected Pipeline Control Logic Load/use hazard should get priority ret instruction should be held in decode stage for additional cycle ConditionFDEMW Processing ret stallbubblenormalnormalnormal Load/Use Hazard stallstallbubblenormalnormal Combinationstallstallbubblenormalnormal bool D_bubble = # Mispredicted branch (E_icode == IJXX && !e_Bch) || # Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode } # but not condition for a load/use hazard && !(E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB });

23 – 23 – Processor Pipeline Summary Data Hazards Most handled by forwarding No performance penalty Load/use hazard requires one cycle stall Control Hazards Cancel instructions when detect mispredicted branch Two clock cycles wasted Stall fetch stage while ret passes through pipeline Three clock cycles wasted Control Combinations Must analyze carefully First version had subtle bug Only arises with unusual instruction combination

24 – 24 – Processor Performance Metrics Clock rate Measured in Megahertz or Gigahertz Function of stage partitioning and circuit design Keep amount of work per stage small Rate at which instructions executed CPI: cycles per instruction On average, how many clock cycles does each instruction require? Function of pipeline design and benchmark programs E.g., how frequently are branches mispredicted? 4.5.10

25 – 25 – Processor CPI for PIPE CPI  1.0 Fetch instruction each clock cycle Effectively process new instruction almost every cycle Although each individual instruction has latency of 5 cycles CPI > 1.0 Sometimes must stall or cancel branches Computing CPI C clock cycles I instructions executed to completion B bubbles injected (C = I + B) CPI = C/I = (I+B)/I = 1.0 + B/I Factor B/I represents average penalty due to bubbles

26 – 26 – Processor CPI for PIPE (Cont.) B/I = LP + MP + RP LP: Penalty due to load/use hazard stalling Fraction of instructions that are loads0.25 Fraction of load instructions requiring stall0.20 Number of bubbles injected each time1  LP = 0.25 * 0.20 * 1 = 0.05 MP: Penalty due to mispredicted branches Fraction of instructions that are cond. jumps 0.20 Fraction of cond. jumps mispredicted0.40 Number of bubbles injected each time 2  MP = 0.20 * 0.40 * 2 = 0.16 RP: Penalty due to ret instructions Fraction of instructions that are returns0.02 Number of bubbles injected each time 3  RP = 0.02 * 3 = 0.06 Net effect of penalties 0.05 + 0.16 + 0.06 = 0.27  CPI = 1.27 (Not bad!) Typical Values

27 – 27 – Processor Fetch Logic Revisited During Fetch Cycle  Select PC  Read bytes from instruction memory  Examine icode to determine instruction length  Increment PCTiming Steps 2 & 4 require significant amount of time

28 – 28 – Processor Standard Fetch Timing Must Perform Everything in Sequence Can’t compute incremented PC until know how much to increment it by Select PC Mem. ReadIncrement need_regids, need_valC 1 clock cycle

29 – 29 – Processor A Fast PC Increment Circuit 3-bit adder need_ValC need_regids 0 29-bit incre- menter MUX High-order 29 bits Low-order 3 bits High-order 29 bitsLow-order 3 bits 01 PC incrPC SlowFast carry

30 – 30 – Processor Modified Fetch Timing 29-Bit Incrementer Acts as soon as PC selected Output not needed until final MUX Works in parallel with memory read Select PC Mem. Read Incrementer need_regids, need_valC 3-bit add MUX 1 clock cycle Standard cycle

31 – 31 – Processor More Realistic Fetch Logic Fetch Box Integrated into instruction cache Fetches entire cache block (16 or 32 bytes) Selects current instruction from current block Works ahead to fetch next block As reaches end of current block At branch target

32 – 32 – Processor Exception Condition: Instruction encounter an error condition Deal flow: Break the program flow Invoke the exception hander provided by OS. (Maybe) continue the program flow E.g. page fault exception

33 – 33 – Processor Exceptions Conditions under which pipeline cannot continue normal operationCauses Halt instruction(Current) Bad address for instruction or data(Previous) Invalid instruction(Previous) Pipeline control error(Previous) Desired Action Complete some instructions Either current or previous (depends on exception type) Discard others Call exception handler Like an unexpected procedure call

34 – 34 – Processor Exception Examples Detect in Fetch Stage irmovl $100,%eax rmmovl %eax,0x10000(%eax) # invalid address jmp $-1 # Invalid jump target.byte 0xFF # Invalid instruction code halt # Halt instruction Detect in Memory Stage

35 – 35 – Processor Exceptions in Pipeline Processor #1 Desired Behavior rmmovl should cause exception # demo-exc1.ys irmovl $100,%eax rmmovl %eax,0x10000(%eax) # Invalid address nop.byte 0xFF # Invalid instruction code 0x000: irmovl $100,%eax 1234 FDEM FDE 0x006: rmmovl %eax,0x10000(%eax) 0x00c: nop 0x00d:.byte 0xFF FD F W 5 M E D Exception detected

36 – 36 – Processor Exceptions in Pipeline Processor #2 Desired Behavior No exception should occur # demo-exc2.ys 0x000: xorl %eax,%eax # Set condition codes 0x002: jne t # Not taken 0x007: irmovl $1,%eax 0x00d: irmovl $2,%edx 0x013: halt 0x014: t:.byte 0xFF # Target 0x000: xorl %eax,%eax 123 FDE FD 0x002: jne t 0x014: t:.byte 0xFF 0x???: (I’m lost!) F Exception detected 0x007: irmovl $1,%eax 4 M E F D W 5 M D F E E D M 6 M E W 7 W M 8 W 9

37 – 37 – Processor Maintaining Exception Ordering Add exception status field to pipeline registers Fetch stage sets to either “AOK,” “ADR” (when bad fetch address), or “INS” (illegal instruction) Decode & execute pass values through Memory either passes through or sets to “ADR” Exception triggered only when instruction hits write back F predPC W icodevalEvalMdstEdstMexc M BchicodevalEvalAdstEdstMexc E icodeifunvalCvalAvalBdstEdstMsrcAsrcBexc D rBvalCvalPicodeifunrAexc

38 – 38 – Processor Side Effects in Pipeline Processor Desired Behavior rmmovl should cause exception No following instruction should have any effect # demo-exc3.ys irmovl $100,%eax rmmovl %eax,0x10000(%eax) # invalid address addl %eax,%eax # Sets condition codes 0x000: irmovl $100,%eax 1234 FDEM FDE 0x006: rmmovl %eax,0x10000(%eax) 0x00c: addl %eax,%eax FD W 5 M E Exception detected Condition code set

39 – 39 – Processor Avoiding Side Effects Presence of Exception Should Disable State Update When detect exception in memory stage Disable condition code setting in execute Must happen in same clock cycle When exception passes to write-back stage Disable memory write in memory stage Disable condition code setting in execute stageImplementation Hardwired into the design of the PIPE simulator You have no control over this

40 – 40 – Processor Rest of Exception Handling Calling Exception Handler Push PC onto stack Either PC of faulting instruction or of next instruction Usually pass through pipeline along with exception status Jump to handler address Usually fixed address Defined as part of ISAImplementation Haven’t tried it yet!

41 – 41 – Processor Modern CPU Design

42 – 42 – Processor Processor Summary Design Technique Create uniform framework for all instructions Want to share hardware among instructions Connect standard logic blocks with bits of control logicOperation State held in memories and clocked registers Computation done by combinational logic Clocking of registers/memories sufficient to control overall behavior Enhancing Performance Pipelining increases throughput and improves resource utilization Must make sure maintains ISA behavior


Download ppt "PipelinedImplementation Part II PipelinedImplementation."

Similar presentations


Ads by Google