Complexity-Effective Issue Queue Design Under Load-Hit Speculation Tali Moreshet and R. Iris Bahar Brown University Division of Engineering
Brown UniversityWCED 2002 Motivation Pipelines are getting deeper Higher clock frequencies Increased architectural complexity Speculatively issued instructions are particularly sensitive to pipeline depth Branch prediction Load hit prediction
Brown UniversityWCED 2002 Pipeline Register File Functional Units Register Rename Unit Data Cache Instruction Cache Issue Queue Load Resolution Loop FetchDecodeIssueExecute forwarding
Brown UniversityWCED 2002 Load Hit Prediction Issue instructions dependent on load as soon as possible Assume load hits in DL1 BUT… Load hit status is known only after dependent instructions may issue
Brown UniversityWCED 2002 Example Exec Issue Exec Cycle: LOAD MULT SUB ADD Issue Speculative window Exec
Brown UniversityWCED 2002 Example ExecIssueExec Cycle: LOAD ADD Speculative window ExecIssue Exec MULT SUB Exec
Brown UniversityWCED 2002 Example IssueExec Cycle: LOAD ADD ExecIssue Speculative window MULT SUB Exec
Brown UniversityWCED 2002 What Happens On a Load Miss? Re-issue instructions in speculative window after a load miss Keep post-issue instructions in issue queue long enough to ensure re-issuing will not be necessary
Brown UniversityWCED 2002 Complexity-Effective Load Hit Speculation As pipeline depth increases: Retain performance benefit Consider complexity of re-issue and prediction policies Consider impact on issue queue design
Brown UniversityWCED 2002 Re-Issue Policies 4 different load hit speculation policies: 1) No load hit speculation 2) Perfect load hit speculation 3) Replay only instructions dependent on load that missed 4) Replay all instructions in speculative window Load hit/miss predictor to limit re-issuing
Brown UniversityWCED 2002 Performance Impact
Brown UniversityWCED 2002 Impact on Issue Queue Occupancy
Brown UniversityWCED 2002 Impact on Issue Queue Occupancy
Brown UniversityWCED 2002 Impact on Issue Queue Occupancy As pipeline depth increases: Issue queue gets cluttered with post-issue instructions(average 55%) Limits the available ILP Inefficient use of complexity in instruction bid/grant arbitration logic
Brown UniversityWCED 2002 The Bid / Grant Loop Prioritize & Select M entries Issue Queue req grant N-wide Bid for issue slot Broadcast grant...
Brown UniversityWCED 2002 Issue Queue Utilization Problem Complexity of bid/grant arbitration logic increases with size of the IQ IQ consists largely of post-issue instructions Limiting the available ILP that a large IQ is supposed to provide Not a complexity-effective design
Brown UniversityWCED 2002 IQ Design Options Increase the IQ size Improve performance – increase available ILP Increase complexity Simplify arbitration logic – use slower circuitry Reduce complexity Hurt performance Reduce IQ size Reduce complexity Hurt performance
Brown UniversityWCED 2002 Double Latency of Issue Queue
Brown UniversityWCED 2002 Smaller IQ (48 Entry)
Brown UniversityWCED 2002 Complexity-Effective Issue Queue Goal Reduce complexity Do not degrade performance Solution: The Dual Issue Queue Move post-issue instructions from main queue to separate replay queue Increase available ILP Reduce size of main IQ
Brown UniversityWCED 2002 Dual Issue Queue Register File Functional Units Register Rename Unit Data Cache Main Issue Queue Replay Issue Queue from Fetch unit Replay_req MIQ RIQ
Brown UniversityWCED 2002 Dual Issue Queue Performance
Brown UniversityWCED 2002 Conclusion Load hit speculation is critical for high performance in deeper pipelines Larger percentage of post-issue instructions in issue queue Complexity-effective issue queue scheme addresses utilization problem For deepest pipelines, overall performance improves while reducing complexity of IQ