Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reducing the Scheduling Critical Cycle using Wakeup Prediction HPCA-10 Todd Ehrhart and Sanjay Patel Center for Reliable and High-Performance Computing.

Similar presentations


Presentation on theme: "Reducing the Scheduling Critical Cycle using Wakeup Prediction HPCA-10 Todd Ehrhart and Sanjay Patel Center for Reliable and High-Performance Computing."— Presentation transcript:

1 Reducing the Scheduling Critical Cycle using Wakeup Prediction HPCA-10 Todd Ehrhart and Sanjay Patel Center for Reliable and High-Performance Computing University of Illinois at Urbana-Champaign Feb. 18, 2004

2 Outline n Overview n Analysis n Architecture n Experiments n Conclusions

3 Intuition n Loops and other execution patterns may cause a steady-state in machine delays ABCDABCD 1000 n After a few iterations, may have steady-state –May have (near-)constant delays

4 Basic Observation n Wakeup delay is highly invariant –Bias toward positive deviations

5 So... n Wakeup times can be estimated based on static IP n Idea: –Ignore dependencies –Estimate wakeup times –Wakeup instruction when time expires n Breaks the scheduling critical cycle –Thus, can reduce cycle time n But, there are problems

6 Outline n Overview n Analysis n Architecture n Experiments n Conclusions

7 Architectural Flow Predict Wakeup Time Wait for Wakeup Timeout Execute Instructions If wakeup time wrong, must replay

8 Architectural Flow Predict Wakeup Time Wait for Wakeup Timeout Execute Instructions Re-predict Mis-speculated Instructions Determine Actual Wakeup Time Check Feedback

9 Fixing the problems n Replays –Cost-adjust the wakeup estimate Probability of a replay Cost of a replay

10 Fixing the problems n Replays –Cost-adjust the wakeup estimate Probability of a replay Cost of a replay n Replay cost unknown/unmeasurable –Make replay cost an adjustable parameter Depends on machine load –Use load as feedback value –Goal: maximize retire bandwidth

11 Fixing the problems n Replays –Cost-adjust the wakeup estimate Probability of a replay Cost of a replay n Replay cost unknown/unmeasurable –Make replay cost an adjustable parameter Depends on machine load –Use load as feedback value –Goal: maximize retire bandwidth n Re-prediction –Exponential backoff

12 Cost-adjusted Wakeup Estimate n Being close counts

13 Cost-adjusted Wakeup Estimate, II n After some assumptions and math... n Minimum cost occurs at: –F(d) = Rf(d) and f(d) > Rf '(d) n f(d) is unknown, so use a gradient-descent technique: n Looks like a running average R is replay cost estimate

14 Feedback-adjusted Replay Cost n Cost of replay changes during execution –Program phases, etc. n Add second feedback layer –Observe loads on each class of functional unit Adjust replay cost accordingly To prevent wild oscillations, adjust once every 1000 cycles –Cheap: Needs a few accumulators, and is off critical path r is estimated cost of single replay; R = r * count

15 Re-prediction n An observation (covers 99% of instances): n Return instruction to Self-Schedule Array –but, with twice its previous wakeup time estimate Slope=2

16 Outline n Overview n Analysis n Architecture n Experiments n Conclusions

17 High-Level Architecture

18 Scheduler Architecture

19 Predictor Architectures n Local allowance

20 Predictor Architectures n Global allowance

21 Predictor Architectures n Problem: On miss, cannot fall back on dependency-based wakeup –Cycle-time constraints n Default Predictor –Used on miss in main predictor –Update same as Global Allowance predictor

22 Finding the Actual Wakeup Time n Done in the Register File –conceptually: Reg File Source Register Numbers VRegister InfoCycle Count Ready to Execute? ANDMIN - Wait Time Actual Wakeup Time Set to zero when register written; counts up each cycle.

23 Outline n Overview n Analysis n Architecture n Experiments n Conclusions

24 Setup n X86 trace-driven simulator –Fetch timing effects simulated –7 traces 26m to 100m consecutive insts. From SPECint n 8-wide, 18-deep pipeline n Configurations: –Baseline: 1-cycle wakeup/select –BasePipeSched: 2-cycle wakeup/select –WPLocal: local wakeup prediction (128x4 predictor), r=1 –WPLocalAdj: WPLocal + feedback-adjusted r –WPGlobal: global wakeup prediction (128x4 predictor), r=1 –WPGlobalAdj: WPGlobal + feedback-adjusted r

25 Results 7% IPC drop

26 Ideal Fetch n Approximates high-bandwidth fetch –Trace cache, etc. n Otherwise, same as before.

27 Results: Ideal Fetch 7% IPC drop

28 Resource-constrained n Half the number of functional units in each class n Uses i-cache fetch (like first experiment) n Otherwise, same as others

29 Results: Resource-constrained 9% IPC drop

30 Other Results n Some leeway in prediction accuracy –Doubling predictions results in 27% IPC drop. n Works consistently in deep pipelines –Without pipelined wakeup: –With pipelined wakeup:

31 Outline n Overview n Analysis n Architecture n Experiments n Conclusions

32 Conclusions n Likely to increase performance –IPC drop ~7% –Performance impact of cycle time decrease could exceed that of IPC decrease n Feedback paths are not critical –Simpler design process


Download ppt "Reducing the Scheduling Critical Cycle using Wakeup Prediction HPCA-10 Todd Ehrhart and Sanjay Patel Center for Reliable and High-Performance Computing."

Similar presentations


Ads by Google