Presentation is loading. Please wait.

Presentation is loading. Please wait.

Power-Optimal Pipelining in Deep Submicron Technology

Similar presentations


Presentation on theme: "Power-Optimal Pipelining in Deep Submicron Technology"— Presentation transcript:

1 Power-Optimal Pipelining in Deep Submicron Technology
ISLPED 2004 8/10/2004 Power-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and Krste Asanović Computer Architecture Group, MIT CSAIL

2 Traditional Pipelining
Goal: Maximum performance Vdd Clk-Q Setup Propagation Delay Clk Traditionally pipelining was used to increase clock frequency for performance. But the time slack obtained from pipelining can also be used to reduce power consumption. Supply voltage scaling is one of the most effective delay-power tradeoff techniques. Our target digital system has highly parallel computation and we assume that it is pipelinable without CPI loss. And the system has fixed throughput requirement and clock frequency is fixed. In this work, we show how power-optimal pipelining varies for different operating regimes in deep submicron technology. We explore the effects of Vdd, Vth, activity factor, and clock gating. Clk Clk

3 Pipelining as a Low-Power Tool
Goal: Low-Power, Fixed Throughput Vdd Clk-Q Setup Propagation Delay Clk Time Slack Traditionally pipelining was used to increase clock frequency for performance. But the time slack obtained from pipelining can also be used to reduce power consumption. Supply voltage scaling is one of the most effective delay-power tradeoff techniques. Our target digital system has highly parallel computation and we assume that it is pipelinable without CPI loss. And the system has fixed throughput requirement and clock frequency is fixed. In this work, we show how power-optimal pipelining varies for different operating regimes in deep submicron technology. We explore the effects of Vdd, Vth, activity factor, and clock gating. Clk Time Slack Clk

4 Pipelining as a Low-Power Tool
Goal: Low-Power, Fixed Throughput Vdd Clk-Q Setup Propagation Delay Clk Time Slack Traditionally pipelining was used to increase clock frequency for performance. But the time slack obtained from pipelining can also be used to reduce power consumption. Supply voltage scaling is one of the most effective delay-power tradeoff techniques. Our target digital system has highly parallel computation and we assume that it is pipelinable without CPI loss. And the system has fixed throughput requirement and clock frequency is fixed. In this work, we show how power-optimal pipelining varies for different operating regimes in deep submicron technology. We explore the effects of Vdd, Vth, activity factor, and clock gating. Clk Traded for Power (supply voltage scaling) Time Slack Clk

5 Pipelining as a Low-Power Tool
* Clock frequency fixed Flip-flop Power Overhead Pipelining Time slack Traditionally pipelining was used to increase clock frequency for performance. But the time slack obtained from pipelining can also be used to reduce power consumption. Supply voltage scaling is one of the most effective delay-power tradeoff techniques. Our target digital system has highly parallel computation and we assume that it is pipelinable without CPI loss. And the system has fixed throughput requirement and clock frequency is fixed. In this work, we show how power-optimal pipelining varies for different operating regimes in deep submicron technology. We explore the effects of Vdd, Vth, activity factor, and clock gating. Delay

6 Pipelining as a Low-Power Tool
* Clock frequency fixed Traditionally pipelining was used to increase clock frequency for performance. But the time slack obtained from pipelining can also be used to reduce power consumption. Supply voltage scaling is one of the most effective delay-power tradeoff techniques. Our target digital system has highly parallel computation and we assume that it is pipelinable without CPI loss. And the system has fixed throughput requirement and clock frequency is fixed. In this work, we show how power-optimal pipelining varies for different operating regimes in deep submicron technology. We explore the effects of Vdd, Vth, activity factor, and clock gating. Power Saving Supply voltage scaling Delay

7 Power-Optimal Pipelining
Power reduction from pipelining limited by power overhead of increased number of flip-flops  Power-Optimal Pipelining Traditionally pipelining was used to increase clock frequency for performance. But the time slack obtained from pipelining can also be used to reduce power consumption. Supply voltage scaling is one of the most effective delay-power tradeoff techniques. Our target digital system has highly parallel computation and we assume that it is pipelinable without CPI loss. And the system has fixed throughput requirement and clock frequency is fixed. In this work, we show how power-optimal pipelining varies for different operating regimes in deep submicron technology. We explore the effects of Vdd, Vth, activity factor, and clock gating.

8 Power-Optimal Pipelining
Power reduction from pipelining limited by power overhead of increased number of flip-flops  Power-Optimal Pipelining Power Traditionally pipelining was used to increase clock frequency for performance. But the time slack obtained from pipelining can also be used to reduce power consumption. Supply voltage scaling is one of the most effective delay-power tradeoff techniques. Our target digital system has highly parallel computation and we assume that it is pipelinable without CPI loss. And the system has fixed throughput requirement and clock frequency is fixed. In this work, we show how power-optimal pipelining varies for different operating regimes in deep submicron technology. We explore the effects of Vdd, Vth, activity factor, and clock gating. Too shallow pipelining Delay

9 Power-Optimal Pipelining
Power reduction from pipelining limited by power overhead of increased number of flip-flops  Power-Optimal Pipelining Power Too deep pipelining Traditionally pipelining was used to increase clock frequency for performance. But the time slack obtained from pipelining can also be used to reduce power consumption. Supply voltage scaling is one of the most effective delay-power tradeoff techniques. Our target digital system has highly parallel computation and we assume that it is pipelinable without CPI loss. And the system has fixed throughput requirement and clock frequency is fixed. In this work, we show how power-optimal pipelining varies for different operating regimes in deep submicron technology. We explore the effects of Vdd, Vth, activity factor, and clock gating. Too shallow pipelining Delay

10 Power-Optimal Pipelining
Power reduction from pipelining limited by power overhead of increased number of flip-flops  Power-Optimal Pipelining Power Too deep pipelining Traditionally pipelining was used to increase clock frequency for performance. But the time slack obtained from pipelining can also be used to reduce power consumption. Supply voltage scaling is one of the most effective delay-power tradeoff techniques. Our target digital system has highly parallel computation and we assume that it is pipelinable without CPI loss. And the system has fixed throughput requirement and clock frequency is fixed. In this work, we show how power-optimal pipelining varies for different operating regimes in deep submicron technology. We explore the effects of Vdd, Vth, activity factor, and clock gating. Too shallow pipelining Optimal pipelining Optimal Power Saving Delay

11 Contribution Pipelining is an old idea.
Research focus has been on performance impact of pipelining. Idea of using pipelining [Chandrakasan ’92] to lower power has not been fully explored in deep submicron technology. Analysis and circuit-level simulation of Power-Optimal Pipelining for different regimes of Vth, activity factor, clock gating Pipelining is an old idea. Research focus has been on performance impact of pipelining. The idea of using pipelining to lower power was proposed long time ago, however it has not been fully explored especially in deep submicron technology. Analysis and circuit-level simulation of power-optimal pipelining for different regimes of threshold voltages, activity factor, and clock-gating in deep submicron technology.

12 Bottom-to-Top Approach
Impact of pipelining on power component Impact of pipelining on total power (with/without clock-gating) Power Total Power (clock-gated) active inactive active Time The figure shows the approach. Impact of pipelining on each power component is studied first and then, Impact of pipelining on total power is explored. Switching Power Component Leakage Power Component Idle Power Component

13 Bottom-to-Top Approach
Impact of pipelining on power component Impact of pipelining on total power (with/without clock-gating) Power Total Power (not clock-gated) active inactive active Time The figure shows the approach. Impact of pipelining on each power component is studied first and then, Impact of pipelining on total power is explored. *Idle power = power consumed when circuit is idle and not clock-gated Switching Power Component Leakage Power Component Idle Power Component

14 Methodology Target digital system: Fixed throughput, Highly parallel computation, Logic-dominant Test bench BPTM (Berkeley Predictive Technology Model) 70nm process: LVT(0.17/-0.2), MVT(0.19/-0.22), HVT(0.21/-0.24) Hspice simulation at 100°C, Clock = 2 GHz Baseline Clock frequency was set at 2 GHz and no CPI loss was assumed. Logic-dominant stage was assumed rather than wire-dominant one. BPTM 70nm process was used. Test bench consists of PowerPC flip-flops and N stages of FO4 inverters. 24 FO4 inverters were used for the baseline. Simulation was done with Hspice simulation at 100 degree. N FO4 inverters (N = 2 ~ 24) TG flip-flops TG flip-flops One Pipeline Stage

15 Pipelining and Switching Power: Analytical Trend
Optimal Saving O(N2) Flip-flop overhead Switching Power Quadratic reduction of logic switching power O(1/N) Pipelining and Switching Power As N decreases, the number of flip-flops increases and so power overhead does too. In the equation, the ratio of b1 to b0 is approximately the capacitance ratio of flip-flop and FO4 inverter. If N is greater than the ratio, the quadratic term dominates. On the other hand, if N is small, switching power becomes inversely proportional to 1 over N. The graph shows the imaginative switching power scaling curve and power-optimal point.  Vdd2  N2 Optimal FO4 Number of FO4 per stage, N

16 Pipelining and Leakage Power: Analytical Trend
Optimal Saving O(1/N) O(N ) (1<< 2) Flip-flop overhead Superlinear reduction of logic leakage power Leakage Power Optimal FO4 Pipelining and Leakage Power In the equation, the flip-flop overhead is represented by c1 over N. Similar to the switching power case, the ratio of c1 and c0 is approximately the leakage ratio of flip-flop and FO4 inverter. The dependence of leakage on drain voltage is included by the DIBL coefficient. When N is large, leakage power is proportional to N times an exponential of N. The exponential term represents the dependence of leakage current on the drain voltage (from DIBL). In modern deep sub-micron technology, for an appropriate supply voltage range, this term is larger than O(1) but smaller than O(N), therefore leakage power is reduced in a super-linear fashion as N decreases, though less than the quadratic reduction for switching power.  Vdd * e(Vdd)  N DIBL effect Number of FO4 per stage, N

17 Pipelining and Idle Power: Analytical Trend
Clock-gating is not always possible Increased control complexity insufficient setup time of clock enable signal Leakage Power + Flip-flop Switching Power Between leakage power scaling and flip-flop switching power scaling depending on leakage level Pipelining and Idle Power without clock-gating Clock-gating is not always possible due to increased control complexity or insufficient setup time of clock enable signal. Therefore, we look at the power consumption when the stage is idle but not clock-gated. The idle power consists of flip-flop switching power overhead and leakage power.

18 Pipelining and Idle Power: Analytical Trend
Leakage Power Scale Flip-flop Switching Power Scale Idle Power Optimal Saving Optimal Saving O(N) Optimal FO4 Linear reduction of Flip-flop switching power Relative Power O(N ) (1<< 2) O(1/N) Pipelining and Idle Power without clock-gating Clock-gating is not always possible due to increased control complexity or insufficient setup time of clock enable signal. Therefore, we look at the power consumption when the stage is idle but not clock-gated. The idle power consists of flip-flop switching power overhead and leakage power. Optimal FO4  1/N * Vdd2  N O(1/N) Number of FO4 per stage, N Number of FO4 per stage, N

19 Simulation Results: Power Components
Fixed 2 GHz Power Components Switching Power Leakage Power Idle Power Right hand side curve O(N2) O(N ) (1<< 2) O(N) or O(N ) (1<< 2) Saving* 79(HVT)~ 82(LVT)% 70(LVT)~ 75(HVT)% 55(HVT)~ 70(LVT)% N* 6 8 Clock frequency was set at 2 GHz and no CPI loss was assumed. Logic-dominant stage was assumed rather than wire-dominant one. BPTM 70nm process was used. Test bench consists of PowerPC flip-flops and N stages of FO4 inverters. 24 FO4 inverters were used for the baseline. Simulation was done with Hspice simulation at 100 degree. N = Number of FO4 inverters per stage (Not including flip-flop delay) N* = Optimal N Saving* = Optimal power saving by pipelining

20 Optimal Power Saving Optimal FO4 = 6 Optimal FO4 = 6~8 No Clock Gating
*2 GHz *Flip-flop delay not included in optimal FO4 relative power relative power Figure shows the simulated total power when a clock-gating mechanism is present for different activity factors. With a low activity factor, total power curves follow leakage power curves and high Vth leads to more power saving by pipelining. As the activity factor increases, total power curves follow switching power curves and high Vth leads to less power saving by pipelining. activity factor activity factor

21 Optimal Power Saving Optimal FO4 = 6 Optimal FO4 = 6~8 No Clock Gating
Idle Power Leakage Power relative power relative power Switching Power Figure shows the simulated total power when a clock-gating mechanism is present for different activity factors. With a low activity factor, total power curves follow leakage power curves and high Vth leads to more power saving by pipelining. As the activity factor increases, total power curves follow switching power curves and high Vth leads to less power saving by pipelining. Switching Power activity factor activity factor

22 Optimal Power Saving Optimal FO4 = 6 Optimal FO4 = 6~8 No Clock Gating
relative power relative power Figure shows the simulated total power when a clock-gating mechanism is present for different activity factors. With a low activity factor, total power curves follow leakage power curves and high Vth leads to more power saving by pipelining. As the activity factor increases, total power curves follow switching power curves and high Vth leads to less power saving by pipelining. LVT activity factor activity factor

23 Discussion LVT can be fast and power-efficient
enables lower Vdd Flip-flop delay more important than flip-flop power for power-optimal pipelining

24 Limitation of This Work
Effect on optimal logic depth Effect on optimal power saving Super-linear growth of flip-flops Additional memory Reduced glitches Parasitic wire capacitance

25 Conclusion Pipelining is an effective low-power tool when used to support voltage scaling in digital system implementing highly parallel computation. Optimal Logic Depth: 6-8 FO4 ~ 8-10 FO4 including flip-flop delay Optimal Power Saving: 55 – 80% It depends on Vth, AF, Clock-Gating Insights: Pipelining is more effective with High AF Pipelining is most effective at saving switching power Pipelining is more effective with lower Vth Except for when leakage power is dominant. Pipelining is more effective with clock-gating reduced flip-flop overhead.

26 Acknowledgments Thanks to SCALE group members and anonymous reviewers
Funded by NSF CAREER award CCR , NSF ITR award CCR , and a donation from Intel Corporation.

27 BACKUP SLIDES


Download ppt "Power-Optimal Pipelining in Deep Submicron Technology"

Similar presentations


Ads by Google