ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry.

ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry Ponomarev, Kanad Ghose Department of Computer Science State University of New York Binghamton, NY 13902-6000 http://www.cs.binghamton.edu/~lowpower 21 st International Conference on Computer Design (ICCD’03), October 14 th 2003

ICCD’03 2 – Reorder Buffer (ROB) complexities – Motivation for the low-complexity ROB – Low-complexity ROB designs Fully Distributed ROB Retention Latches (RLs) revisited (ICS’02) Combined Scheme – Results – Concluding remarks Outline

ICCD’03 3 P6-style Superscalar Datapath IQ Function Units Instruction Issue F1D1 FU1 FU2 FUm ARF Result/status forwarding buses EX Instruction dispatch Architectural Register File F2 Fetch Decode/Dispatch D2 ROB

ICCD’03 4 IQ Function Units Instruction Issue F1D1 FU1 FU2 FUm ARF Result/status forwarding buses EX Instruction dispatch Architectural Register File F2 Fetch Decode/Dispatch D2 ROB RB PPC 620-style Superscalar Datapath

ICCD’03 5 ROB Port Requirements for a W-way CPU ROB Writeback W write ports to write results Dispatch/Issue 2W read ports to read the source operands Decode/Dispatch W write ports to setup entries Commit W read ports for instruction commitment

ICCD’03 6 What This Work is All About – ROB complexity reduction is important for reducing power and improving performance ROB dissipates a non-trivial fraction of the total chip power ROB accesses stretch over several cycles – Goal of this work: Reduce the complexity and power dissipation of the ROB without sacrificing performance

ICCD’03 7 Comparison of ROB Bitcells (0.18µ, TSMC) Layout of a 32-ported SRAM bitcell Layout of a 16-ported SRAM bitcell Area Reduction – 71% Shorter bit and wordlines

ICCD’03 8 Instruction dispatch P6-style Superscalar Datapath IQ Function Units Instruction Issue F1D1 FU1 FU2 FUm ARF Result/status forwarding buses EX Architectural Register File F2 Fetch Decode/Dispatch D2 ROB

ICCD’03 9 Reorder Buffer Distribution IQ Function Units Instruction Issue F1D1 FU1 FU2 FUm ARF Result/status forwarding buses EX Instruction dispatch Architectural Register File F2 Fetch Decode/Dispatch D2 ROBC 1 ROBC 2 ROBC m ROB Holds pointers to entries within ROBCs ROB Components (ROBCs)

ICCD’03 10 Impact of Distributing the ROB – Each ROBC is effectively is a small Rename Buffer Smaller read/write access energy Faster access time – Distributing physical storage in this manner allows FUs to use shorter buses to write their respective ROBCs Lower energy dissipation on the wires (We have NOT accounted for energy savings from using shorter wires) – Fits in naturally with a multi-clustered datapath design

ICCD’03 11 – Port conflicts result in performance penalty – Interconnection network is more complex Problems with the earlier Multi-banked RF Schemes

ICCD’03 12 – Port conflicts result in performance penalty Totally avoid write port conflicts Minimize read port conflicts at commitment – Interconnection network is more complex and some good news! Problems with the earlier Multi-banked RF Schemes

ICCD’03 13 – Port conflicts result in performance penalty Totally avoid write port conflicts Minimize read port conflicts at commitment – Interconnection network is more complex Completely remove source read ports and some good news! Problems with the earlier Multi-banked RF Schemes

ICCD’03 14 Problems with the earlier Multi-banked RF Schemes – Port conflicts result in performance penalty Totally avoid write port conflicts Minimize read port conflicts at commitment Totally avoid source read port conflicts – Interconnection network is more complex Completely remove source read ports and some good news!

ICCD’03 15 ROBCs Assigned to Each Function Unit 1 2 3 4 n ROBC #1 11 2 3 1 ROBC #2 1 2 3 4 m1 21 ROBC #m 1 FU #m FU #2 FU #1 Centralized ROBDistributed ROBCs FU_id offset

ICCD’03 16 Good News:Write port conflicts are avoided ROBC #1 1 2 3 ROBC #2 1 2 3 4 ROBC #m 1 FU #m FU #2 FU #1 1 write port Distributed ROBCs 1 2 3 4 n 11 m1 21 Centralized ROB FU_id offset

ICCD’03 17 Round Robin Scheduling at Dispatch Time 1 2 3 4 n Int ADD ROBC #1 1 2 FU_id offset Centralized ROBDistributed ROBCs Int ADD ROBC #2 1 2 Int ADD ROBC #3 1 2 Int ADD ROBC #4 1 2 instruction 5

ICCD’03 18 Round Robin Scheduling at Dispatch Time 1 2 3 4 n Int ADD ROBC #1 1 2 FU_id offset Centralized ROBDistributed ROBCs Int ADD ROBC #2 1 2 Int ADD ROBC #3 1 2 Int ADD ROBC #4 1 2 ADD instruction 5

ICCD’03 19 Round Robin Scheduling at Dispatch Time 1 2 3 4 n Int ADD ROBC #1 1 2 FU_id offset Centralized ROBDistributed ROBCs Int ADD ROBC #2 1 2 Int ADD ROBC #3 1 2 Int ADD ROBC #4 1 2 ADD reserved instruction 5

ICCD’03 20 Round Robin Scheduling at Dispatch Time 1 2 3 4 n Int ADD ROBC #1 1 2 FU_id offset Centralized ROBDistributed ROBCs Int ADD ROBC #2 1 2 Int ADD ROBC #3 1 2 Int ADD ROBC #4 1 2 ADD 11 instruction reserved 5 ADD

ICCD’03 21 Round Robin Scheduling at Dispatch Time 1 2 3 4 n Int ADD ROBC #1 1 2 FU_id offset Centralized ROBDistributed ROBCs Int ADD ROBC #2 1 2 Int ADD ROBC #3 1 2 Int ADD ROBC #4 1 2 ADD 11 instruction reserved SUB 5

ICCD’03 22 Round Robin Scheduling at Dispatch Time 1 2 3 4 n Int ADD ROBC #1 1 2 FU_id offset Centralized ROBDistributed ROBCs Int ADD ROBC #2 1 2 Int ADD ROBC #3 1 2 Int ADD ROBC #4 1 2 ADD 11 instruction reserved SUB reserved 5

ICCD’03 23 Round Robin Scheduling at Dispatch Time 1 2 3 4 n Int ADD ROBC #1 1 2 FU_id offset Centralized ROBDistributed ROBCs Int ADD ROBC #2 1 2 Int ADD ROBC #3 1 2 Int ADD ROBC #4 1 2 ADD 11 instruction reserved SUB 21 5

ICCD’03 24 Round Robin Scheduling at Dispatch Time 1 2 3 4 n Int ADD ROBC #1 1 2 FU_id offset Centralized ROBDistributed ROBCs Int ADD ROBC #2 1 2 Int ADD ROBC #3 1 2 Int ADD ROBC #4 1 2 ADD 11 instruction reserved SUB 21 AND 5

ICCD’03 25 Round Robin Scheduling at Dispatch Time 1 2 3 4 n Int ADD ROBC #1 1 2 FU_id offset Centralized ROBDistributed ROBCs Int ADD ROBC #2 1 2 Int ADD ROBC #3 1 2 Int ADD ROBC #4 1 2 ADD 11 instruction reserved SUB 21 reserved AND 5

ICCD’03 26 Round Robin Scheduling at Dispatch Time 1 2 3 4 n Int ADD ROBC #1 1 2 FU_id offset Centralized ROBDistributed ROBCs Int ADD ROBC #2 1 2 Int ADD ROBC #3 1 2 Int ADD ROBC #4 1 2 ADD 11 instruction reserved SUB 21 reserved AND 13 5

ICCD’03 27 Good News:Avoiding Read Port Conflicts 1 2 3 4 n 1 2 FU_id offset Centralized ROBDistributed ROBCs 1 2 1 2 1 2 ADD 11 instruction reserved SUB 21 1 read port To commitment 31 AND reserved 5

ICCD’03 28 Round Robin Scheduling at Dispatch Time 1 2 3 4 n FU_id offset Centralized ROBDistributed ROBCs 1 2 ADD 11 instruction SUB 21 AND 13 MUL 5 Int MUL/DIV ROBC #5

ICCD’03 29 Round Robin Scheduling at Dispatch Time 1 2 3 4 n FU_id offset Centralized ROBDistributed ROBCs 2 1 ADD 11 instruction SUB 21 AND 13 MUL 5 reserved Int MUL/DIV ROBC #5

ICCD’03 30 Round Robin Scheduling at Dispatch Time 1 2 3 4 n FU_id offset Centralized ROBDistributed ROBCs 1 2 ADD 11 instruction reserved SUB 21 AND 13 5 51 MUL Int MUL/DIV ROBC #5 MUL

ICCD’03 31 Round Robin Scheduling at Dispatch Time 1 2 3 4 n FU_id offset Centralized ROBDistributed ROBCs ADD 11 instruction SUB 21 AND 13 DIV 5 51 MUL 1 2 reserved Int MUL/DIV ROBC #5

ICCD’03 32 Round Robin Scheduling at Dispatch Time 1 2 3 4 n FU_id offset Centralized ROBDistributed ROBCs ADD 11 instruction SUB 21 AND 13 DIV 5 51 MUL 1 2 reserved Int MUL/DIV ROBC #5

ICCD’03 33 Round Robin Scheduling at Dispatch Time 1 2 3 4 n FU_id offset Centralized ROBDistributed ROBCs ADD 11 instruction SUB 21 AND 13 5 51 MUL 52 DIV 1 2 reserved Int MUL/DIV ROBC #5 DIV

ICCD’03 34 Read Port Conflicts at Commitment 1 2 3 4 n FU_id offset Centralized ROBDistributed ROBCs ADD 11 instruction SUB 21 AND 13 5 51 MUL 52 DIV 1 2 reserved Int MUL/DIV ROBC #5 reserved To commitment CONFLICT: If MUL and DIV wants to commit in the same cycle 1 read port DIV

ICCD’03 35 Distributed ROB Design 1 ROBC Writeback 1 write port to write results

ICCD’03 36 Distributed ROB Design 1 ROBC Writeback 1 write port to write results Commit 1 read port for instruction commitment

ICCD’03 37 Distributed ROB Design 1: with source read ports ROBC Writeback 1 write port to write results Dispatch/Issue 1 read port to read the source operands Commit 1 read port for instruction commitment

ICCD’03 38 Experimental Setup: the AccuPower (DATE’02) Compiled SPEC benchmarks Datapath specs Performance stats VLSI layout data SPICE deck SPICE Microarchitectural Simulator (Rooted in SimpleScalar) Energy/Power Estimator Power/energy stats SPICE measures of energy per transition Transition counts, Context information

ICCD’03 39 Configuration of the Simulated System Machine width4-way Issue Queue32 entries 96 entriesReorder Buffer Load/Store Queue 32 entries Simulated the execution of SPEC2000 benchmarks

ICCD’03 40 Peak/Average demands on the number of ROBC entries ROBC type Int ADD #1, #2, #3, #4 Int MUL/DIV FP ADD #1, #2, #3, #4 FP MUL/DIV Load SPEC 2000 Integer Average 16.9 4.4 4.1 0.11.6 0.043.8 0.0428.6 9.3 SPEC 2000 FP Average 14.2 4.93.2 0.83.8 0.66.7 1.123.5 7.5 SPEC 2000 Average 15.7 4.63.7 0.42.6 0.35.0 0.526.4 8.5 peak avg.

ICCD’03 41 Peak/Average demands on the number of ROBC entries ROBC type Int ADD #1, #2, #3, #4 Int MUL/DIV FP ADD #1, #2, #3, #4 FP MUL/DIV Load SPEC 2000 Integer Average 16.9 4.4 4.1 0.11.6 0.043.8 0.0428.6 9.3 SPEC 2000 FP Average 14.2 4.93.2 0.83.8 0.66.7 1.123.5 7.5 SPEC 2000 Average 15.7 4.63.7 0.42.6 0.35.0 0.526.4 8.5 peak avg. 888844444416 Number of entries assigned to each ROBC

ICCD’03 42 Peak/Average demands on the number of ROBC entries ROBC type Int ADD #1, #2, #3, #4 Int MUL/DIV FP ADD #1, #2, #3, #4 FP MUL/DIV Load SPEC 2000 Integer Average 16.9 4.4 4.1 0.11.6 0.043.8 0.0428.6 9.3 SPEC 2000 FP Average 14.2 4.93.2 0.83.8 0.66.7 1.123.5 7.5 SPEC 2000 Average 15.7 4.63.7 0.42.6 0.35.0 0.526.4 8.5 peak avg. 888844444416++++++++++= 72 entry 8_4_4_4_16 configuration Number of entries assigned to each ROBC

ICCD’03 43 Percentage of cycles when dispatch blocks for 8_4_4_4_16 ROBC type Int ADD #1, #2, #3, #4 Int MUL/DIV FP ADD #1, #2, #3, #4 FP MUL/DIV Load SPEC 2000 Integer Average 0.90.1005.2 SPEC 2000 FP Average 1.51.00.10.81.9 SPEC 2000 Average 1.20.500.43.8 Average IPC drop% with 8_4_4_4_16 configuration = 4.8%

ICCD’03 44 Percentage of cycles when dispatch blocks for 8_4_4_4_16 ROBC type Int ADD #1, #2, #3, #4 Int MUL/DIV FP ADD #1, #2, #3, #4 FP MUL/DIV Load SPEC 2000 Integer Average 0.90.1005.2 SPEC 2000 FP Average 1.51.00.10.81.9 SPEC 2000 Average 1.20.500.43.8 888844444416++++++++++= 72 entry Number of entries assigned to each ROBC

ICCD’03 45 Reducing performance penalty: 12_6_4_6_20 Configuration ROBC type Int ADD #1, #2, #3, #4 Int MUL/DIV FP ADD #1, #2, #3, #4 FP MUL/DIV Load SPEC 2000 Integer Average 0.90.1005.2 SPEC 2000 FP Average 1.51.00.10.81.9 SPEC 2000 Average 1.20.500.43.8 12 64444620++++++++++= 96 entry 12_6_4_6_20 configuration Number of entries assigned to each ROBC

ICCD’03 46 Performance Results for 12_6_4_6_20 Configuration gapgccgzipparserperltwolfInt Avg.vortexvpr appluartmesamgridswimwupwiseFP Avg. IPC Average IPC drop% with 12_6_4_6_20 configuration = 2.4%

ICCD’03 47 Distributed ROB Design 1: with source read ports ROBC Writeback 1 write port to write results Dispatch/Issue 1 read port to read the source operands Commit 1 read port for instruction commitment

ICCD’03 48 Eliminating All Source Read Ports ROBC Writeback 1 write port to write results Dispatch/Issue 1 read port to read the source operands Commit 1 read port for instruction commitment

ICCD’03 49 Eliminating All Source Read Ports ROBC Writeback 1 write port to write results Commit 1 read port for instruction commitment

ICCD’03 50 Where are the Source Values Coming From? IQ Function Units Instruction Issue F1D1 FU1 FU2 FUm ARF Result/status forwarding buses EX Instruction dispatch Architectural Register File F2 Fetch Decode/Dispatch D2 ROB 1 2 3

ICCD’03 51 Where are the Source Values Coming From ? 96-entry ROB, 4-way processor SPEC2K Benchmarks 62%32%6%

ICCD’03 52 How Efficiently are the Ports Used ? ROB Writeback W write ports to write results Dispatch/Issue 2W read ports to read the source operands Decode/Dispatch W write ports to setup entries Commit W read ports for instruction commitment 6%

ICCD’03 53 Our Solution: Elimination of Read Ports IQ Function Units Instruction Issue F1D1 FU1 FU2 FUm ARF Result/status forwarding buses EX Instruction dispatch Architectural Register File F2 Fetch Decode/Dispatch D2 ROB 1 2 3

ICCD’03 54 Our Solution: Elimination of Read Ports IQ Function Units Instruction Issue F1D1 FU1 FU2 FUm ARF Result/status forwarding buses EX Instruction dispatch Architectural Register File F2 Fetch Decode/Dispatch D2 ROB 1 2 3

ICCD’03 55 Our Solution: Elimination of Read Ports IQ Function Units Instruction Issue F1D1 FU1 FU2 FUm ARF Result/status forwarding buses EX Instruction dispatch Architectural Register File F2 Fetch Decode/Dispatch D2 1 3 ROB

ICCD’03 56 Distributed Reorder Buffer Scheme IQ Function Units Instruction Issue F1D1 FU1 FU2 FUm ARF Result/status forwarding buses EX Instruction dispatch Architectural Register File F2 Fetch Decode/Dispatch D2 ROBC 1 ROBC 2 ROBC m ROB Holds pointers to entries within ROBCs ROBCs

ICCD’03 57 Elimination of Source Read Ports IQ Function Units Instruction Issue F1D1 FU1 FU2 FUm ARF Result/status forwarding buses EX Instruction dispatch Architectural Register File F2 Fetch Decode/Dispatch D2 ROBC 1 ROBC 2 ROBC m ROB ROBCs Holds pointers to entries within ROBCs

ICCD’03 58 Elimination of Source Read Ports IQ Function Units Instruction Issue F1D1 FU1 FU2 FUm ARF Result/status forwarding buses EX Instruction dispatch Architectural Register File F2 Fetch Decode/Dispatch D2 ROBC 1 ROBC 2 ROBC m ROB ROBCs Holds pointers to entries within ROBCs

ICCD’03 59 Completely Eliminating the Source Read Ports on the ROBCs – The Problem: Issue of instructions that require a value stored in a ROBC will stall – Solutions: Forward the value to the waiting instruction at the time of committing the value: LATE FORWARDING

ICCD’03 60 Late Forwarding: Use the Normal Forwarding Buses! IQ Function Units Instruction Issue F1D1 FU1 FU2 FUm ARF Result/status forwarding buses EX Instruction dispatch Architectural Register File F2 Fetch Decode/Dispatch D2 ROBC 1 ROBC 2 ROBC m ROB ROBCs Holds pointers to entries within ROBCs

ICCD’03 61 Late Forwarding: Use the Normal Forwarding Buses! IQ Function Units Instruction Issue F1D1 FU1 FU2 FUm ARF Result/status forwarding buses EX Instruction dispatch Architectural Register File F2 Fetch Decode/Dispatch D2 ROBC 1 ROBC 2 ROBC m ROB Late Forwarding ROBCs Holds pointers to entries within ROBCs

ICCD’03 62 Performance Drop of Simplified ROBC Design Performance Drop % 9.6% Average IPC Drop: bzip2gapgccgzipmcfparserperltwolfInt Avg.vortexvpr appluapsiartequakemesamgridswimwupwiseFP Avg. 37% 17%

ICCD’03 63 IPC Penalty: Source Value Not Accessible within the ROBC Forwarding Late Forwarding/ Commitment Lifetime of a Result Value Result Generation time Value within ARF Value within a ROBC

ICCD’03 64 Improving IPC with No Read Ports – Cache recently generated values in a set of RETENTION LATCHES (RL) – Retention Latches are SMALL and FAST Only 8 to 16 latches needed in the set Entire set has 1 or 2 read ports

ICCD’03 65 Adding Retention Latches into the Picture IQ Function Units Instruction Issue F1D1 FU1 FU2 FUm ARF Result/status forwarding buses EX Instruction dispatch Architectural Register File F2 Fetch Decode/Dispatch D2 ROBC 1 ROBC 2 ROBC m ROB Late Forwarding ROBCs Holds pointers to entries within ROBCs

ICCD’03 66 Adding Retention Latches into the Picture IQ Function Units Instruction Issue F1D1 FU1 FU2 FUm ARF Result/status forwarding buses EX Instruction dispatch Architectural Register File F2 Fetch Decode/Dispatch D2 ROBC 1 ROBC 2 ROBC m ROB Late Forwarding RETENTION LATCHES Holds pointers to entries within ROBCs

ICCD’03 67 Eliminating All Source Read Ports ROBC Writeback 1 write port to write results Commit 1 read port for instruction commitment

ICCD’03 68 Distributed ROB Design 2: with Retention Latches ROBC Writeback 1 write port to write results Commit 1 read port for instruction commitment Eight, 2-ported FIFO RLs

ICCD’03 72 Power Results for 12_6_4_6_20 Configuration gapgccgzipparserperltwolfInt Avg.vortexvpr appluartmesamgridswimwupwiseFP Avg. Power Savings % Power savings%:49%47%23%

ICCD’03 73 Power Results for 12_6_4_6_20 Configuration (Compared to Baseline case with 64 entry Rename Buffers) gapgccgzipparserperltwolfInt Avg.vortexvpr appluartmesamgridswimwupwiseFP Avg. Power Savings % Power savings%:39%37%20%

ICCD’03 74 Summary of Results – Low performance degradation: 1.7% IPC drop on the average (compared to 2-cycle ROB) 3.8% IPC drop on the average (compared to 1-cycle ROB) – ROB Power savings: as high as 49% are realized (compared to P6-style datapath: 96 entry ROB) as high as 39% (compared to Rename Buffer design: 96 entry ROB, 64 entry RB)

ICCD’03 75 Conclusions – We introduced a conflict-free distributed Reorder Buffer design – ROB power savings of as high as 49% are realized with only a small (1.7%) performance penalty – ROB complexity is drastically reduced by Distributing the ROB into multiple banks Reducing the port requirements to no more than 2 ports for each ROB components

ICCD’03 76 ~ Thank You~

ICCD’03 77 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry Ponomarev, Kanad Ghose Department of Computer Science State University of New York Binghamton, NY 13902-6000 http://www.cs.binghamton.edu/~lowpower 21 st International Conference on Computer Design (ICCD’03), October 14 th 2003

ICCD’03 78 Related Work – Replicated (Kessler, IEEE Micro) and distributed (Canal et.al, HPCA’00 and Farkas et.al, MICRO’97) RFs in a clustered organization – Multiple Register Banks (Cruz et.al., ISCA’00 & Balasubramonian et.al., MICRO’01) – Multiple Register Banks with additional pipeline stage to avoid complex arbitration logic (Tseng et.al, ISCA’03 – Multiple Register Banks without write port conflicts (Wallase et.al, PACT’96)

ICCD’03 79 ROB Port Requirements for a W-way CPU ROB Writeback W write ports to write results Dispatch/Issue 2W read ports to read the source operands Decode/Dispatch W write ports to setup entries Commit W read ports for instruction commitment

ICCD’03 80 ROB Port Requirements for a W-way CPU ROB Writeback W write ports To write results Dispatch/Issue 2W read ports to read the source operands Decode/Dispatch 1 W-wide write port to setup entries Commit 1 W-wide read port for instruction commitment

ICCD’03 81 Reducing ROB Power and Complexity ROB Phys.regs. ROB

ICCD’03 82 LOAD Int MUL Int ADD 4 Int ADD 3 Int ADD 2 Int ADD 1 Distribution Centralized ROB FP MUL FP ADD 1FP ADD 2FP ADD 3 FP ADD 4 Smaller structures : shorter bitlines, lower capacitive loading, etc. LESS POWER DISSIPATION! Phys.regs.

ICCD’03 83 LOAD Int ADD 4 Int ADD 3 Int ADD 2 Int ADD 1 Dedicate FUs to ROBCs Centralized ROB Int MULFP MUL FP ADD 1FP ADD 2FP ADD 3 FP ADD 4 Less ports : much smaller structures LESS POWER DISSIPATION! + LESS COMPLEXITY! Phys.regs.

ICCD’03 84 LOAD Int ADD 4 Int ADD 3 Int ADD 2 Int ADD 1 Fully Distributed Reorder Buffer Scheme Centralized ROB Int MULFP MUL FP ADD 1FP ADD 2FP ADD 3 FP ADD 4 Less ports : much smaller structures LESS POWER DISSIPATION! + LESS COMPLEXITY! Phys.regs. ROBCs

ICCD’03 85 Fully Distributed Reorder Buffer Scheme

ICCD’03 86 Fully Distributed Reorder Buffer Scheme – Distributed ROB Components (ROBCs) are assigned to each Function Unit No write port conflicts at writeback stage, and minimal read port conflicts at commitment: Negligible performance penalty Each ROBC can be tailored to the needs of its FU : No over commitment of resources, less complexity – The FIFO structure that maintains pointers to the ROBCs remains centralized

ICCD’03 87 Fully Distributed Reorder Buffer Scheme 1 2 3 4 n ROBC #1 11 2 3 1 FU_id offset ROBC #2 1 2 3 4 m1 21 ROBC #m 1 Centralized ROBDistributed ROBCs

ICCD’03 88 Fully Distributed Reorder Buffer Scheme 1 2 3 4 n ROBC #1 11 2 3 1 ROBC #2 1 2 3 4 m1 21 ROBC #m 1 Centralized ROBDistributed ROBCs FU_id offset

ICCD’03 89 ROB Port Requirements for a W-way CPU ROB Writeback W write ports To write results Dispatch/Issue 2W read ports to read the source operands Decode/Dispatch 1 W-wide write port to setup entries Commit 1 W-wide read port for instruction commitment

ICCD’03 90 Results for the Scheme with Retention Latches gapgccgzipparserperltwolfInt Avg.vortexvpr appluartmesamgridswimwupwiseFP Avg. Power Savings % Power savings%:23%

ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry.

Similar presentations

Presentation on theme: "ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry.

Similar presentations

Presentation on theme: "ICCD’03 1 Distributed Reorder Buffer Schemes for Low Power * *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Oguz Ergin, Dmitry."— Presentation transcript:

Similar presentations

About project

Feedback