EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3.

Presentation on theme: "EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3."— Presentation transcript:

EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3

Optimizing CPU Performance Golden Rule: t CPU = N inst *CPI*t CLK Given this, what are our options –Reduce the number of instructions executed –Reduce the cycles to execute an instruction –Reduce the clock period Our first focus: Reducing CPI –Approach: Instruction Level Parallelism (ILP)

Why ILP? Vs. Requirements –Parallelism –Large window –Limited control deps –Eliminate “false” deps –Find run-time deps

How Much ILP is There?

How Large Must the “Window” Be?

ALU Operation GOOD, Branch BAD Expected Number of Branches Between Mispredicts E(X) ~ 1/(1-p) E.g., p = 95%, E(X) ~ 20 brs, 100-ish insts

How Accurate are Branch Predictors?

Impact of Physical Storage Limitations Each instruction “in flight” must have storage for its result –Really worse than this because of mispeculation…

Registers GOOD, Memory BAD Benefits of registers –Well described deps –Fast access –Finite resource Memory loses these benefits for flexibility *p = … *q = … … = *p ?

“Bottom Line” for an Ambitious Design

First Optimization: Out-of-Order Writeback

Playing by the Rules: In-order Writeback DIV.D ADD IFIDD1D2D3D4MEMWB IFIDEXMEMWB D5

Playing by the Rules: In-order Writeback DIV.D ADDIFIDEXMEMWB What’s wrong with this picture? Divide by Zero! IFIDD1D2D3D4MEMWBD5

Playing by the Rules: In-order Writeback DIV.D ADDIFIDEXMEMWB What’s wrong with this picture? Divide by Zero! IFIDD1D2D3D4MEMWBD5 DIV.D ADDIFIDEXMEMWB IFIDD1D2D3D4MEMWBD5 stall

Another Way to Get in the Same Mess Many systems use microcode –Simplifies mapping of complex instructions to CPU resources iA32 add-with-carry –ADC (EAX),EBX tmp = MEM[EAX] tmp = tmp + EBX+CF, update CF MEM[EAX] = tmp Side Effect! Potential Fault!

Exceptions and Interrupts Exception Type Sync/AsyncMaskable?Restartable? I/O requestAsyncYes System callSyncNoYes BreakpointSyncYes OverflowSyncYes Page faultSyncNoYes Misaligned access SyncNoYes Memory ProtectSyncNoYes Machine CheckAsync/SyncNo Power failureAsyncNo

Solution: Precise Interrupts Implementation approaches –Don’t E.g., Cray-1 –Force in-order WB E.g., ARM SA-1 –Force in-order checks E.g., Alpha 21064 –Buffer speculative results E.g., P4, Alpha 21264 History buffer Future file/Reorder buffer Instructions Completely Finished No Instruction Has Executed At All PC Precise State Speculative State

MEM Precise Interrupts via the Reorder Buffer @ Alloc –Allocate result storage at Tail @ Sched –Get inputs (ROB T-to-H then ARF) –Wait until all inputs ready @ WB –Write results/fault to ROB –Indicate result is ready @ CT –Wait until inst @ Head is done –If fault, initiate handler –Else, write results to ARF –Deallocate entry from ROB IFID AllocSched EX ROB CT HeadTail PC Dst regID Dst value Except? Reorder Buffer (ROB) –Circular queue of spec state –May contain multiple definitions of same register In-order Any order ARF

Reorder Buffer Example Code Sequence f1 = f2 / f3 r3 = r2 + r3 r4 = r3 – r2 Initial Conditions - reorder buffer empty - f2 = 3.0 - f3 = 2.0 - r2 = 6 - r3 = 5 ROB Time HT regID: f1 result: ? Except: ? HT regID: f1 result: ? Except: ? regID: r3 result: ? Except: ? HT regID: f1 result: ? Except: ? regID: r3 result: 11 Except: N regID: r4 result: ? Except: ? r3 regID: r8 result: 2 Except: n regID: r8 result: 2 Except: n regID: r8 result: 2 Except: n

Reorder Buffer Example Code Sequence f1 = f2 / f3 r3 = r2 + r3 r4 = r3 – r2 Initial Conditions - reorder buffer empty - f2 = 3.0 - f3 = 2.0 - r2 = 6 - r3 = 5 ROB Time HT regID: f1 result: ? Except: ? regID: r3 result: 11 Except: n regID: r4 result: 5 Except: n HT regID: f1 result: ? Except: y regID: r3 result: 11 Except: n regID: r4 result: 5 Except: n regID: r8 result: 2 Except: n regID: r8 result: 2 Except: n HT regID: f1 result: ? Except: y regID: r3 result: 11 Except: n regID: r4 result: 5 Except: n

Reorder Buffer Example Code Sequence f1 = f2 / f3 r3 = r2 + r3 r4 = r3 – r2 Initial Conditions - reorder buffer empty - f2 = 3.0 - f3 = 2.0 - r2 = 6 - r3 = 5 ROB Time HT HT first inst of fault handler

Download ppt "EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3."

Similar presentations