Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 7960-4 Lecture 14 Delaying Physical Register Allocation Through Virtual-Physical Registers T. Monreal, A. Gonzalez, M. Valero, J. Gonzalez, V. Vinals.

Similar presentations


Presentation on theme: "CS 7960-4 Lecture 14 Delaying Physical Register Allocation Through Virtual-Physical Registers T. Monreal, A. Gonzalez, M. Valero, J. Gonzalez, V. Vinals."— Presentation transcript:

1 CS 7960-4 Lecture 14 Delaying Physical Register Allocation Through Virtual-Physical Registers T. Monreal, A. Gonzalez, M. Valero, J. Gonzalez, V. Vinals Proceedings of MICRO-32 November 1999

2 Register File Design Considerations Number of ports = 3 x issue width Number of entries = window size + logical-regs Multiple threads  more registers (more power) Wire delays, clock speeds  multiple cycle access Pipelining a RAM structure is hard

3 Register Allocation FetchRenameIssueCompleteWake-up assign pr7 cycle 4cycle 15 write pr7 cycle 30 Commit read pr7 cycle 50 release pr7 cycle 80 useful time – 20 cycno result – 26 cycno activity – 30 cyc

4 Two-Level Register File Base regfileTwo-level regfile

5 Virtual-Physical Registers lr3  vr7 vr7  vr7   vr7 Register map table Virtual map table

6 Virtual-Physical Registers lr3  vr7 vr7  vr7   vr7 Register map table Instruction issues Virtual map table

7 Virtual-Physical Registers lr3  vr7, pr9 vr7  pr9  vr7 (pr9) Register map table Instruction completes Is assigned pr9 vr7, pr9Virtual map table

8 Virtual-Physical Registers lr3  vr7, pr9 vr7  pr9  vr7 (pr9)  pr9 Register map table Virtual map table

9 Lack of Registers Finishes, has no register, keeps re-executing Has physical register Has no physical register In-flight window

10 Lack of Registers Finishes, has no register, keeps re-executing Has physical register Has no physical register In-flight window cycle tcycle t+1 commits gets reg

11 Deadlock Finishes, has no register, keeps re-executing Has physical register Has no physical register In-flight window Who will generate a register for this instr? Solution: Reserve a register for the oldest instruction

12 Sequential Execution Has physical register Has no physical register In-flight window Oldest instr has reserved register

13 Sequential Execution Has physical register Has no physical register In-flight window instr commits, releases another reg, that is then reserved for the new oldest instr

14 Sequential Execution Has physical register Has no physical register In-flight window instr commits, releases another reg, that is then reserved for the new oldest instr Behaves like an in-order processor

15 Reserving All Registers Has physical register Has no physical register Allows quick progress, but almost behaves like a conventional processor

16 Register Stealing Has physical register Has no physical register In-flight window Instr finishes; steals register from the youngest finished instr No reservation of regs The younger instrs may have to execute twice Note the pre-execution effect

17 Implementation Finished instructions have to remain in issueq in case they have to re-execute Issued dependents of the victim instruction need not re-execute The VP tag of the victim has to be broadcast so that unissued dependents can reset the ready bit Can benefit from an instruction reuse buffer? Pre-execution without explicitly attempting it

18 Results Improves the base case by 5% (Int programs) and 24% (FP programs) FP programs have more ILP, better branch prediction, and are more limited by cache misses Re-executions: 10% (int) 58% (fp) Steals: 5% (int) 12% (fp) For the same IPC, VP registers employ 25% fewer registers

19 Next Week’s Paper “Pipeline Gating: Speculation Control for Energy Reduction”, S. Manne, A. Klauser, D. Grunwald, Proceedings of ISCA-25, June 1998

20 Harmonic and Arithmetic Means HM of IPC = N / (1/IPC a + 1/ IPC b + 1/ IPC c ) = N / (CPI a + CPI b + CPI c ) = 1 / AM of CPI Weight each benchmark as if they all execute one instruction If you want to assume each benchmark executes for the same time, HM of CPI or AM of IPC is appropriate

21 Title Bullet


Download ppt "CS 7960-4 Lecture 14 Delaying Physical Register Allocation Through Virtual-Physical Registers T. Monreal, A. Gonzalez, M. Valero, J. Gonzalez, V. Vinals."

Similar presentations


Ads by Google