Presentation is loading. Please wait.

Presentation is loading. Please wait.

12/03/2001 MICRO’01 Reducing Power Requirements of Instruction Scheduling Through Dynamic Allocation of Multiple Datapath Resources* *supported in part.

Similar presentations


Presentation on theme: "12/03/2001 MICRO’01 Reducing Power Requirements of Instruction Scheduling Through Dynamic Allocation of Multiple Datapath Resources* *supported in part."— Presentation transcript:

1 12/03/2001 MICRO’01 Reducing Power Requirements of Instruction Scheduling Through Dynamic Allocation of Multiple Datapath Resources* *supported in part by DARPA through the PAC-C program and NSF Dmitry Ponomarev, Gurhan Kucuk, Kanad Ghose Department of Computer Science State University of New York Binghamton, NY 13902-6000 http://www.cs.binghamton.edu/~lowpower 34 th International Symposium on Microarchitecture (MICRO-34), December 3rd, 2001

2 12/03/2001 MICRO’01 Presentation Outline Motivation Resource usage in superscalar datapaths Resource allocation strategy Performance results Concluding remarks

3 12/03/2001 MICRO’01 Motivation High-end superscalar CPUs employ a substantial amount of datapath resources Consequences: High overall power dissipation Areal Energy/Power density is at a dangerous level Thus: Energy dissipation needs to be preferably controlled through technology independent techniques

4 12/03/2001 MICRO’01 What This Work is All About Power-hungry resources are allocated on a “one-size-fits-all” basis Unnecessary dissipation from overcommitted resources Examples of resources: Issue Queue, Reorder Buffer, Load/Store Queue, caches, Function units - Resources considered in this work: IQ, ROB, LSQ Main idea: Control resource allocation/deallocation dynamically to track the demands of the application Goals: Must limit any impact on performance Must allow for easy retrofit into existing datapaths Must have a stable and low-overhead control strategy

5 12/03/2001 MICRO’01 Dynamic Resizing of IQ, ROB and LSQ IQ Function Units Instruction Issue F1Dec/ RN1 FU1 FU2 FUm ROB ARF LSQ Result/status forwarding buses EX Instruction dispatch Architectural Register File : resized resource F2 Fetch RN2/ Dis Decode/Dispatch

6 12/03/2001 MICRO’01 Main Issues How do we measure/estimate resource needs? Continuous measurement vs. periodic sampling What is the control strategy? Centralized vs. distributed How is the performance impact limited? Periodic upsizing vs. asynchronous upsizing What are the relevant circuit techniques? Overall redesign vs. simple changes

7 12/03/2001 MICRO’01 Resource Usage in Superscalar Datapath: Example (fpppp)

8 12/03/2001 MICRO’01 Resource Usage in Superscalar Datapath: Example (apsi)

9 12/03/2001 MICRO’01 Incremental Resource Allocation/Deallocation The ROB, IQ and LSQ are each implemented as a set of independent partitions Each partition is a register file, complete with its own sensing and precharge/write logic, multiple ports and through busses All partitions have associative addressing logic

10 12/03/2001 MICRO’01 Partitioned Organization Bitlines or forwarding lines within a partition Precharger array Input/output drivers Bypass switch array Non-associative part Associative part Precharger array Input/output drivers Bypass switch array Associative part Non-associative part Bitlines Forwarding lines Through line Bypass switch Partition 1 Partition 2 Precharger array Input/output drivers Bypass switch array Associative part Non-associative part Partition 3

11 12/03/2001 MICRO’01 Incremental Resource Allocation/Deallocation Allocations are increased by adding a free partition Deallocations are performed by powering down a partition after its contents have been used up Easy to do for the IQ A little more challenging for the ROB and the LSQ because of the FIFO nature.

12 12/03/2001 MICRO’01 Sampling and Downsizing Strategies Downsizing decisions are taken at the end of update period Update periods have a fixed duration of UP cycles Within an update period, multiple samples of the occupancies are taken at regular intervals of SP cycles cycles SP UP

13 12/03/2001 MICRO’01 0 8 16 24 32 Actual occupancy 0 8 16 24 Allocated entries 32 12345678910111213141516171819202122232425262728293031323334 SP SP / UPSP SP / UP 0 A Resizing Example (SP=4, UP=16)

14 12/03/2001 MICRO’01 0 8 16 24 32 0 8 16 24 32 12345678910111213141516171819202122232425262728293031323334 SP SP / UPSP SP / UP 0 Actual occupancy Allocated entries A Resizing Example (SP=4, UP=16)

15 12/03/2001 MICRO’01 0 8 16 24 32 0 8 16 24 32 12345678910111213141516171819202122232425262728293031323334 SP SP / UPSP SP / UP 0 Actual occupancy Allocated entries A Resizing Example (SP=4, UP=16)

16 12/03/2001 MICRO’01 012345678910111213141516171819202122232425262728293031323334 0 8 16 24 32 0 8 16 24 32 SP SP / UPSP SP / UP Actual occupancy Allocated entries A Resizing Example (SP=4, UP=16)

17 12/03/2001 MICRO’01 012345678910111213141516171819202122232425262728293031323334 0 8 16 24 32 0 8 16 24 32 SP SP / UPSP SP / UP 1234Avg. Actual occupancy Allocated entries A Resizing Example (SP=4, UP=16)

18 12/03/2001 MICRO’01 Upsizing Strategy Count the number of cycles when dispatch blocks because the resource is full. If the counter exceeds OT (Overflow Threshold), add one partition -upsizing is more aggressive than downsizing – reduces hit on performance Reset the overflow counter to 0 at the beginning of a new UP (Update Period)

19 12/03/2001 MICRO’01 012345678910111213141516171819202122232425262728293031323334 0 8 16 24 32 0 8 16 24 32 SP SP / UPSP SP / UP 1234Avg. Actual occupancy Allocated entries A Resizing Example (SP=4, UP=16)

20 12/03/2001 MICRO’01 012345678910111213141516171819202122232425262728293031323334 0 8 16 24 32 0 8 16 24 32 SP SP / UPSP SP / UP A Resizing Example (SP=4, UP=16, OT=4) Actual occupancy Allocated entries

21 12/03/2001 MICRO’01 012345678910111213141516171819202122232425262728293031323334 0 8 16 24 32 0 8 16 24 32 SP SP / UPSP SP / UP Actual occupancy Allocated entries A Resizing Example (SP=4, UP=16, OT=4)

22 12/03/2001 MICRO’01 012345678910111213141516171819202122232425262728293031323334 0 8 16 24 32 0 8 16 24 32 SP SP / UPSP SP / UP Actual occupancy Allocated entries 1 A Resizing Example (SP=4, UP=16, OT=4)

23 12/03/2001 MICRO’01 012345678910111213141516171819202122232425262728293031323334 0 8 16 24 32 0 8 16 24 32 SP SP / UPSP SP / UP Actual occupancy Allocated entries 12 A Resizing Example (SP=4, UP=16, OT=4)

24 12/03/2001 MICRO’01 012345678910111213141516171819202122232425262728293031323334 0 8 16 24 32 0 8 16 24 32 SP SP / UPSP SP / UP Actual occupancy Allocated entries 12 A Resizing Example (SP=4, UP=16, OT=4)

25 12/03/2001 MICRO’01 012345678910111213141516171819202122232425262728293031323334 0 8 16 24 32 0 8 16 24 32 SP SP / UPSP SP / UP Actual occupancy Allocated entries 123 A Resizing Example (SP=4, UP=16, OT=4)

26 12/03/2001 MICRO’01 0 8 16 24 32 0 8 16 24 32 Actual occupancy Allocated entries 1234 A Resizing Example (SP=4, UP=16, OT=4) OT = 4 012345678910111213141516171819202122232425262728293031323334 SP SP / UPSP SP / UP

27 12/03/2001 MICRO’01 0 8 16 24 32 0 8 16 24 32 Actual occupancy Allocated entries 1234 A Resizing Example (SP=4, UP=16, OT=4) OT = 4 012345678910111213141516171819202122232425262728293031323334 SP SP / UPSP

28 12/03/2001 MICRO’01 0 8 16 24 32 0 8 16 24 32 Actual occupancy Allocated entries A Resizing Example (SP=4, UP=16, OT=4) 012345678910111213141516171819202122232425262728293031323334 SP SP / UPSP 1234

29 12/03/2001 MICRO’01 0 8 16 24 32 0 8 16 24 32 Actual occupancy Allocated entries A Resizing Example (SP=4, UP=16, OT=4) 012345678910111213141516171819202122232425262728293031323334 SP SP / UPSP 1234

30 12/03/2001 MICRO’01 0 8 16 24 32 0 8 16 24 32 Actual occupancy Allocated entries A Resizing Example (SP=4, UP=16, OT=4) 012345678910111213141516171819202122232425262728293031323334 SP SP / UPSP 1234

31 12/03/2001 MICRO’01 Summary of the Control Strategy Only three parameters used for control: OT (Overflow Threshold) UP (Update Period) SP (Sample Period) Less than 1% power overhead for control logic Advantages: Can easily achieve a desired power/performance tradeoff by adjusting OT and UP Monitoring on a cycle-by-cycle basis is avoided – done once every SP cycles

32 12/03/2001 MICRO’01 General Considerations for Deallocations All information within the partition to be deallocated must be consumed For the IQ, instructions from the partition must be issued For the ROB, entries within the partition must be committed For the LSQ, entries within the partition must start the D- cache access No new instruction should be dispatched to this partition This can cause dispatch to block for a longer duration in the case of the ROB because of its circular nature

33 12/03/2001 MICRO’01 Experimental Setup: the Accupower Toolkit Compiled SPEC benchmarks Datapath specs Performance stats VLSI layout data SPICE deck SPICE Microarchitectural Simulator Energy/Power Estimator Power/energy stats SPICE measures of Energy per transition Transition counts, Context information

34 12/03/2001 MICRO’01 Configuration of the Simulated System Machine width4-way Issue Queue32 entries with 4 partitions 96 entries with 6 partitionsReorder Buffer Load/Store Queue 32 entries with 4 partitions Simulated the execution of SPEC2000 benchmarks.

35 12/03/2001 MICRO’01 Experimental Results: Effect on Performance IPC 1285122048OT IPC Drop %0.9%4.9%19.3%

36 12/03/2001 MICRO’01 Experimental Results: Average Active Size (IQ) IPC 1285122048OT Savings%14%27%51%

37 12/03/2001 MICRO’01 Experimental Results: Average Active Size (ROB) IPC 1285122048OT Savings%19%34%58%

38 12/03/2001 MICRO’01 Experimental Results: Average Active Size (LSQ) IPC 1285122048OT Savings%7%20%47%

39 12/03/2001 MICRO’01 Experimental Results (OT=512, UP=2048, SP=32)

40 12/03/2001 MICRO’01 Experimental Results: Power Reduction mW 1285122048OT Power Savings %40%48%65% IPC Drop %0.9%4.9%19.3%

41 12/03/2001 MICRO’01 Other Matters Dispatch rate modulation on top of resizing does not cause substantial additional power savings and results in higher IPC drops (WCED’01) Note that this work also addresses leakage dissipations! We are in the process of extending this work to add caches, FUs, TLBs, …, and dynamic threshold variation Work in progress on the use of resizing hooks that are exposed to the compiler

42 12/03/2001 MICRO’01 Related Work Adaptive Issue Queue (Buyuktosunoglu et al, PACS’00): Multi-partitioned issue queue Number of partitions dynamically allocated based on the number of ready flags set in entries within active partition IPC drop triggers growth Resizable Issue Queue (Folegnani and Gonzalez, ISCA’01): FIFO issue queue, multi-partitioned Resizing based on number of instruction committed from the “youngest” partition used for downsizing Pipeline Balancing (Bahar and Manne, ISCA’01): For multi-clustered datapath organizations Dynamic resizing of Issue Queue & Dynamic Cluster Activation IPC monitored to allow clusters/issue queue partitions to be turned off with minimal impact on performance Others (IPC monitoring & resource control by OS, dynamic profiling)

43 12/03/2001 MICRO’01 Concluding Remarks Significant power savings with minimal impact on performance are achieved by dynamically resizing multiple datapath resources. 48% power savings with only a 4.9% IPC drop Simple control strategy is used that avoids resource monitoring on a cycle-by-cycle basis Basic techniques are orthogonal to other power reduction strategies like selective bit-slice activation, frequency and voltage scaling and additional circuit techniques


Download ppt "12/03/2001 MICRO’01 Reducing Power Requirements of Instruction Scheduling Through Dynamic Allocation of Multiple Datapath Resources* *supported in part."

Similar presentations


Ads by Google