Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 7810 Lecture 10 Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors O. Mutlu, J. Stark, C. Wilkerson, Y.N.

Similar presentations


Presentation on theme: "CS 7810 Lecture 10 Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors O. Mutlu, J. Stark, C. Wilkerson, Y.N."— Presentation transcript:

1 CS 7810 Lecture 10 Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors O. Mutlu, J. Stark, C. Wilkerson, Y.N. Patt Proceedings of HPCA-9 February 2003

2 In-Flight Windows #1 p33 #2 p34 c #3 p35 c #95 p127 c #96 p128 c #4 p36.... p 33-128 p 1-32 Load instruction – cache miss 300 cycles Physical Register File Reorder Buffer

3 In-Flight Windows #1 p33 #2 p34 c #3 p35 c #95 p127 c #96 p128 c #4 p36.... p 33-128 p 1-32 Load instruction – cache miss 300 cycles #97 Load instruction – cache miss 300 cycles Physical Register File Reorder Buffer

4 Memory Bottlenecks 128-entry window, real L2  0.77 IPC 128-entry window, perfect L2  1.69 2048-entry window, real L2  1.15 2048-entry window, perfect L2  2.02 128-entry window, real L2, runahead  0.94

5 Runahead Trace Cache Current Rename IssueQ Regfile (128) Checkpointed Regfile (32) Retired Rename ROB FUs L1 D Runahead Cache When the oldest instruction is a cache miss, behave like it causes a context-switch: checkpoint the committed registers, rename table, return address stack, and branch history register assume a bogus value and start a new thread this thread cannot modify program state, but can prefetch

6 Runahead Trace Cache Current Rename IssueQ Regfile (128) Checkpointed Regfile (32) Retired Rename ROB FUs L1 D Runahead Cache When the cache miss returns, copy the registers and the mapping and start executing from that ld/st instruction cost of copying back and forth is not trivial many instructions get executed twice

7 Runahead Trace Cache Current Rename IssueQ Regfile (128) Checkpointed Regfile (32) Retired Rename ROB FUs L1 D Runahead Cache Note that some values are missing: Do not bother to execute instrs that have invalid inputs Accelerates the thread and generates accurate prefetches Unknown store addresses are ignored

8 Runahead Trace Cache Current Rename IssueQ Regfile (128) Checkpointed Regfile (32) Retired Rename ROB FUs L1 D Runahead Cache Runahead instrs write to registers (as before), but runahead stores write to the runahead cache: Runahead cache and L1D are accessed in parallel If a block gets evicted out of runahead cache, data is lost

9 Runahead Trace Cache Current Rename IssueQ Regfile (128) Checkpointed Regfile (32) Retired Rename ROB FUs L1 D Runahead Cache The branch predictor gets accessed/updated twice Cannot resolve branch mispredicts if the branch has an invalid input

10 Another Form of Runahead Primary Thread Runahead Thread Occasional State Copy and Re-start

11 Methodology 80 benchmarks – 147 code sequences (that are memory-bound) – each 30M instructions – SPEC, Web, Media, Server, workstation, productivity Pentium 4 hardware prefetcher – eight stream buffers that stay 256 bytes ahead Also evaluate a “future baseline” with twice as many resources Perfect memory disam, 500-cycle memory access

12 Methodology

13 Results Runahead improves performance by 22% Synergistic interaction between prefetch & runahead – is the stream buffer not keeping up?

14 Other Results Runahead with a 128-entry window does as well as a 384-entry window A better front-end improves benefits from runahead On average, 431 useful instructions per runahead and 280 after a mispredict Without the runahead cache, only half the improvement is observed

15 Unanswered Questions How many re-execs? How many invalid instrs? How much wasted power? – re-execs, double writes to checkpoints How many accesses to hash tables, pointers, and branch-dependent data?

16 Alternative Approaches Does runahead lead to excessive power and verification complexity? Better stride prefetchers or stream buffers? Is this the best way to support a large in-flight window (register file, issueq, ROB)?

17 Next Week’s Paper “Delaying Physical Register Allocation Through Virtual-Physical Registers”, T. Monreal, A. Gonzalez, M. Valero, J. Gonzalez, V. Vinals, Proceedings of MICRO-32, November 1999

18 Title Bullet


Download ppt "CS 7810 Lecture 10 Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors O. Mutlu, J. Stark, C. Wilkerson, Y.N."

Similar presentations


Ads by Google