Presentation is loading. Please wait.

Presentation is loading. Please wait.

1/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl and Andreas Moshovos AENAO Research Group Department of Electrical.

Similar presentations


Presentation on theme: "1/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl and Andreas Moshovos AENAO Research Group Department of Electrical."— Presentation transcript:

1 1/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl and Andreas Moshovos AENAO Research Group Department of Electrical and Computer Engineering University of Toronto {pakl, moshovos}@eecg.toronto.edu

2 2/25 HIPEAC 2008 TurboROB Accelerate Recovery – Improve Performance Recovering From Control Flow Mispredictions Execution Timeline Misprediction Discovered Recover Processor State Redirect Fetch Resume Execution Predict a Branch Outcome Predicted Path Correct Path

3 3/25 HIPEAC 2008 TurboROB State-of-the-Art Recovery Misprediction Discovered Predict a Branch Outcome whatold value Log of Changes ROB State Snapshot Scalability and/or Performance Issues

4 4/25 HIPEAC 2008 TurboROB Make common case fast: –Recover only at branches Store only as much as needed: –Partial Log Turbo-ROB Misprediction Discovered Predict a Branch Outcome Log of Changes ROB Partial Log of Changes

5 5/25 HIPEAC 2008 TurboROB Outline Control Flow Mispeculation Recovery TurboROB Methodology and Results Summary

6 6/25 HIPEAC 2008 TurboROB State Recovery Example: Register Alias Table RAT Architectural Register Physical Register # arch. regs Lg(# arch. regs) A add r1, r2, 100 B breq r1, E Csub r1, r2, r2 Original Code A add p4, p2, 100 B breq p4, E Csub r5, p2, p2 Renamed Code p1 p2 p3 p4p5 p4

7 7/25 HIPEAC 2008 TurboROB ROB: Slow, Fine-Grain Recovery Too slow: recovery latency proportional to number of instructions to squash Reorder Buffer BBBBB 1.Misprediction discovered 2. Locate newest instruction 3. Undo RAT updates in reverse order Program Order RAT INVALID Each entry contains 1.Architectural destination register 2.Its previous RAT map

8 8/25 HIPEAC 2008 TurboROB Global Checkpoints: Fast, Coarse-Grain Recovery Branch w/ GC: Recovery is “Instantaneous” Reorder Buffer BBBBB 1.Misprediction discovered Program Order RAT INVALID checkpoint

9 9/25 HIPEAC 2008 TurboROB Impact of More Checkpoints More checkpoints ? –Power hungry structure –Increased delay Only a few checkpoints can practically be implemented –Cannot always cover all branches architectural register physical register Actual Implementation Working Copy checkpoints RAT Concept

10 10/25 HIPEAC 2008 TurboROB Intelligent Checkpointing & BranchTap Use Few Checkpoints Effectively BranchTap: –Throttle Speculation BBBBB checkpoint

11 11/25 HIPEAC 2008 TurboROB Conventional Mechanisms: Recovery Scenarios BBB BBB checkpoint BBB Re-Execution

12 12/25 HIPEAC 2008 TurboROB Outline Background Turbo-ROB Methodology and Results Summary

13 13/25 HIPEAC 2008 TurboROB Turbo-ROB We only need to reverse the first subsequent change for every RAT entry ROB Recovery B R1 usefulredundant ~ Recovery Cost R2 R1

14 14/25 HIPEAC 2008 TurboROB Turbo-ROB Replacing the ROB BBB TROB BBB Re-Execution

15 15/25 HIPEAC 2008 TurboROB Selective Turbo-ROB w/ ROB BBB TROB Selective Turbo-ROB w/ GCs BBB TROB checkpoint

16 16/25 HIPEAC 2008 TurboROB Outline Background TurboROB Methodology and Results Summary

17 17/25 HIPEAC 2008 TurboROB Results Overview TROB as an ROB replacement –BranchTap offers better performance than ROB –Fewer resources –Even for smaller windows Selective TROB as a GC reduction mechanism –TROB reduces pressure for GCs –Offload a critical structure: RAT In the paper: –Selective TROB as an ROB accelerator –Even the smallest TROB accelerates recovery

18 18/25 HIPEAC 2008 TurboROB Methodology Simulator based on Simplescalar –Alpha/OSF 24 SPEC CPU 2000 benchmarks Reference Inputs Processor configurations –4-way OoO core –128/256/512 in-flight instructions –1K-entry confidence table for low confidence branch identification / similar results with Anyweak 1B committed instructions after skipping 2B

19 19/25 HIPEAC 2008 TurboROB “Perfect Checkpointing” Configuration A checkpoint is auto-magically taken at all mispredicted branches –All recoveries are fast We report the “deterioration relative to perfect checkpointing”

20 20/25 HIPEAC 2008 TurboROB TROB Replacing the ROB/512-Entry Window 64-entry TROB == ROB on the Average Pathological cases exist  256-entry needed 512-Entry TROB better than ROB better

21 21/25 HIPEAC 2008 TurboROB TROB Replacing the ROB/128-Entry Window 64-Entry 50% better than ROB Fewer pathological cases 128-Entry TROB better than ROB better

22 22/25 HIPEAC 2008 TurboROB sTROB and Global Checkpoints/128-Entry Window TROB + 1 GC better than 4GCs better

23 23/25 HIPEAC 2008 TurboROB Summary TROB vs. ROB –Replacement Same resources  better performance Fewer resources  often better performance –Except when accuracy is high –Acceleration: ¼ resources  35% improvement TROB vs. GCs –Reduce pressure from the critical path –With just 1 GC match the performance of four GCs One more alternative for designers –Allows different area/performance/power tradeoffs

24 24/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl and Andreas Moshovos AENAO Research Group Department of Electrical and Computer Engineering University of Toronto {pakl, moshovos}@eecg.toronto.edu

25 25/25 HIPEAC 2008 TurboROB TROB Replacing the ROB/512-Entry Window 64-entry TROB == ROB on the Average Pathological cases exist  256-entry needed 512-Entry TROB better than ROB better

26 26/25 HIPEAC 2008 TurboROB TROB Replacing the ROB/128-Entry Window 64-Entry 50% better than ROB Fewer pathological cases 128-Entry TROB better than ROB better


Download ppt "1/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl and Andreas Moshovos AENAO Research Group Department of Electrical."

Similar presentations


Ads by Google