Presentation is loading. Please wait.

Presentation is loading. Please wait.

NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer.

Similar presentations


Presentation on theme: "NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer."— Presentation transcript:

1 NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer Engineering *North Carolina State University, Raleigh, NC *Digital Enterprise Group *Intel Corporation, Hillsboro, OR

2 NC STATE UNIVERSITY Effect of branch mispredictions Branch misprediction rate of 5%-10% still a problem Each misprediction squashs 100s of inst. Reduces performance: limits window size Increases power: useless speculative work © 2007 Ahmed S. Al-Zawawi ISCA 34 2

3 NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 3 Control independence basics

4 NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 4 Control independence basics

5 NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 5 Control independence basics

6 NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 6 Control independence basics

7 NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 7 Four steps for exploiting CI

8 NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 8 Four steps for exploiting CI 1.Identify reconv. point

9 NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA 34 9 Four steps for exploiting CI 1.Identify reconv. point 2.Remove/Insert CD inst.

10 NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA Four steps for exploiting CI 1.Identify reconv. point 2.Remove/Insert CD inst. 3.Identify CIDD inst.

11 NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA Four steps for exploiting CI 1.Identify reconv. point 2.Remove/Insert CD inst. 3.Identify CIDD inst. 4.Repair CIDD inst. a)Fix data dependencies b)Re-execute CIDD inst.

12 NC STATE UNIVERSITY CIDI-supplied source value © 2007 Ahmed S. Al-Zawawi ISCA Insert correct CD instructions in middle of the window: Repair program order Re-execute CIDD instructions: Re-reference values from CIDI instructions Squash wrong CD instructionsIdentify wrong CD inst. and CIDD inst. CIDD instructions Wrong CD instructions Conventional CI misprediction recovery R CI inst. CD inst.

13 NC STATE UNIVERSITY 2.Dependence order between CIDD & CIDI inst.: Re-executing CIDD instructions requires preserving referenced CIDI instructions 1.Program order between CD & CI inst: Fine-grain retirement using ROB requires reordering the correct CD inst. with the CI inst. © 2007 Ahmed S. Al-Zawawi ISCA Conventional CI limitations Fully decouple CIDI instructions from CD & CIDD instructions Goal of selective misprediction recovery:

14 NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA No need to identify wrong CD and CIDD instructionsInsert correct CD instructions like any new instructions Insert duplicate CIDD instructions like any new instructions Repair program state using self-sufficient recovery program while relaxing program order TCI misprediction recovery R CI inst. CD inst. Correct CD inst. Duplicate CIDD inst. Recovery program

15 NC STATE UNIVERSITY CIDI-supplied source value © 2007 Ahmed S. Al-Zawawi ISCA Leverage checkpointed source values to mimic the effect of program order Exploit coarse-grain checkpoint-based retirement to relax ordering constraints TCI misprediction recovery R Recovery program Checkpoint 2 branch checkpoint Duplicate CIDD inst. Correct CD inst. In-order retirement is not possible when instructions are out of program order Leverage branch checkpoint for correct CD instructions CIDD instructions Checkpoint-based retirement enables aggressive register reclamation (e.g., CPR): Completed instructions free their resources Checkpoint 1 Checkpoint CIDI-supplied source values

16 NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA Transparent Control Independence TCI repairs program state, not program order TCI pipeline is recovery-free Transparent recovery by fetching additional instructions with checkpointed source values TCI pipeline is free-flowing Leverage conventional speculation to execute correct and incorrect instructions quickly and efficiently Completed instructions free their resources

17 NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA TCI microarchitecture Add repair rename map Add selective re-execution buffer (RXB)

18 NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA Predict the branch Instructions execute and leave the pipeline when done

19 NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA Construct recovery program Copy duplicate of CIDD inst. with their source values into RXB

20 NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA Insert correct CD instructions Load branch checkpoint into repair rename map, then fetch correct CD inst.

21 NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA Repair & re-execute CIDD instructions Inject duplicate CIDD inst. with their checkpointed source values

22 NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA Merge repair & spec. rename maps Copy corrected register mappings from repair map to spec. map

23 NC STATE UNIVERSITY 1.Identifying CIDD instructions: Control-flow stack (CFS) detects nested reconv. points Influenced register set (IRS) and branch-sets 2.RXB reconstruction: CIDD inst. of multiple branches are co-mingled A misprediction may require repairing RXB 3.Renaming partial programs: Re-rename recovery program despite its CIDI gaps 4.Merging repair/speculative rename maps © 2007 Ahmed S. Al-Zawawi ISCA TCI implementation details

24 NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA Example: construct the RXB B1 & B2 are branches R1 & R2 are reconvergent points Rectangular inst. are CIDD on B1 Oval inst. are CIDD on B2

25 NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA Dispatch 11 Dont insert 11 into the RXB: CIDI w.r.t. B1 & B2 Fetch correct CD: 11 and 12 Meanwhile pre-read 16 to Temp Buffer Rollback RXB tail, like complete squash Initiate RXB pre-read pointer Start fetching correct CD Dispatch 12 Insert 12 into the RXB: CIDD w.r.t. B1 Example: reconstructing the RXB Objective of this example: Inject recovery program for B2 Reconstruct RXB for B1

26 NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA Dispatch 13 Dont insert 13 into the RXB: CIDI w.r.t. B1 & B2 Reconvergence point detected Correct CD complete Dispatch 14 Insert 14 into the RXB: CIDD w.r.t. B1 Fetch correct CD: 13 and 14 Meanwhile pre-read 18 to Temp Buffer Example: reconstructing the RXB

27 NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA Dispatch 18: CIDD w.r.t. B2 Dont insert 18 into the RXB: Not CIDD w.r.t. B1 Dispatch 20: CIDD w.r.t. B2 Insert 20 into the RXB: CIDD w.r.t. B1 B2 recovery program injection complete B1 recovery program is maintained and compressed Dont dispatch 16: Not CIDD w.r.t. B2 Insert 16 into the RXB: CIDD w.r.t. B1 Begin renaming CIDD instructions from Temp Buffer Meanwhile pre-read 20 into Temp Buffer Example: reconstructing the RXB

28 NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA Simulation methodology Baseline: Checkpoint-based superscalar processor Issue width: 4 Perceptron branch predictor Register file: 256 registers Branch checkpoints: 16 Load store queue: 512 entries L1 I & L1 D: 64KB 4-way (Hit: 1 cycle) L2: 2MB 8-way (Hit:10 cycles, Miss: 200 cycles) Benchmarks: 11 SPEC2000 INT + 4 SPEC95 INT SimPoint: 10M inst. warm-up + 100M inst. simulated

29 NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA CIDD inst. re-renaming models Seq CIDD (TCI): Only CIDD inst. are re-renamed and re-executed Seq CI: [Akkary et al.] [Chou et al.] [Rotenberg et al.] All CI inst. are re-renamed, but only CIDD inst. re-execute Proxy: [Cher et al.] [Gandhi et al.] Uses proxy move instructions to insulate CIDD inst. from source name changes Only proxies are re-renamed Both proxies and CIDD inst. re-execute by holding issue queue entries All models have relaxed order through checkpoint-based substrate

30 NC STATE UNIVERSITY TCI maximum %IPC improvement is 61%(64%)Proxy average %IPC improvement is 6%(11%) © 2007 Ahmed S. Al-Zawawi ISCA Results for 32 & 64 entries issue queue Proxy can degrade performanceSeq CI can degrade performanceTCI average %IPC improvement is 16%(16%)

31 NC STATE UNIVERSITY Proxy is bandwidth efficient, but resource inefficient © 2007 Ahmed S. Al-Zawawi ISCA Varying the issue queue size TCI is both bandwidth and resource efficient Seq CI is bandwidth inefficient, but resource efficient

32 NC STATE UNIVERSITY © 2007 Ahmed S. Al-Zawawi ISCA Varying the RXB size In Seq CI, the RXB limits the window size TCI overcomes problem by only buffering CIDD inst.

33 NC STATE UNIVERSITY Conclusion Recover program state, not program order Transparent branch misprediction recovery using fully decoupled recovery program Resource efficient All instructions execute, drain, and free resources quickly based on conventional speculation Bandwidth efficient TCI only re-sequences CIDD instructions © 2007 Ahmed S. Al-Zawawi ISCA 34 33

34 NC STATE UNIVERSITY Questions


Download ppt "NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer."

Similar presentations


Ads by Google