Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE477 L26 System Power.1Irwin&Vijay, PSU, 2002 Low Power Design in Microarchitectures and Memories [Adapted from Mary Jane Irwin ( www.cse.psu.edu/~mji.

Similar presentations


Presentation on theme: "CSE477 L26 System Power.1Irwin&Vijay, PSU, 2002 Low Power Design in Microarchitectures and Memories [Adapted from Mary Jane Irwin ( www.cse.psu.edu/~mji."— Presentation transcript:

1 CSE477 L26 System Power.1Irwin&Vijay, PSU, 2002 Low Power Design in Microarchitectures and Memories [Adapted from Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg477www.cse.psu.edu/~mjiwww.cse.psu.edu/~cg477

2 CSE477 L26 System Power.2Irwin&Vijay, PSU, 2002 Review: Energy & Power Equations E = C L V DD 2 P 0  1 + t sc V DD I peak P 0  1 + V DD I leakage P = C L V DD 2 f 0  1 + t sc V DD I peak f 0  1 + V DD I leakage Dynamic power (~90% today and decreasing relatively) Short-circuit power (~8% today and decreasing absolutely) Leakage power (~2% today and increasing) f 0  1 = P 0  1 * f clock

3 CSE477 L26 System Power.3Irwin&Vijay, PSU, 2002 Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency EnergyDesign TimeNon-active ModulesRun Time Active Logic Design Reduced V dd Sizing Multi-V dd Clock Gating DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage+ Multi-V T Sleep Transistors Multi-V dd Variable V T + Variable V T

4 CSE477 L26 System Power.4Irwin&Vijay, PSU, 2002 Bus Multiplexing  Share long data buses with time multiplexing (S 1 uses even cycles, S 2 odd) S2S2 S1S1 D1D1 D2D2 S1S1 S2S2 D2D2 D1D1  Buses are a significant source of power dissipation due to high switching activities and large capacitive loading l 15% of total power in Alpha 21064 l 30% of total power in Intel 80386  But what if data samples are correlated (e.g., sign bits)?

5 CSE477 L26 System Power.5Irwin&Vijay, PSU, 2002 Correlated Data Streams Bit position MSB LSB Bit switching probabilities  For a shared (multiplexed) bus advantages of data correlation are lost (bus carries samples from two uncorrelated data streams) l Bus sharing should not be used for positively correlated data streams l Bus sharing may prove advantageous in a negatively correlated data stream (where successive samples switch sign bits) - more random switching

6 CSE477 L26 System Power.6Irwin&Vijay, PSU, 2002 Glitch Reduction by Pipelining  Glitches depend on the logic depth of the circuit - gates deeper in the logic network are more prone to glitching l arrival times of the gate inputs are more spread due to delay imbalances l usually affected more by primary input switching  Reduce logic depth by adding pipeline registers l additional energy used by the clock and pipeline registers PC FetchDecodeExecuteMemoryWriteBack Instruction MAR MDR I$D$ clk pipeline stage isolation register

7 CSE477 L26 System Power.7Irwin&Vijay, PSU, 2002 Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency EnergyDesign TimeNon-active ModulesRun Time Active Logic Design Reduced V dd Sizing Multi-V dd Clock Gating DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage+ Multi-V T Sleep Transistors Multi-V dd Variable V T + Variable V T

8 CSE477 L26 System Power.8Irwin&Vijay, PSU, 2002 Clock Gating  Gate off clock to idle functional units l e.g., floating point units l need logic to generate disable signal -increases complexity of control logic -consumes power -timing critical to avoid clock glitches at OR gate output l additional gate delay on clock signal -gating OR gate can replace a buffer in the clock distribution tree  Most popular method for power reduction of clock signals and functional units RegReg clock disable Functional unit

9 CSE477 L26 System Power.9Irwin&Vijay, PSU, 2002 Clock Gating in a Pipelined Datapath  For idle units (e.g., floating point units in Exec stage, WB stage for instructions with no write back operation) PC FetchDecodeExecuteMemoryWriteBack Instruction MAR MDR I$D$ clk No FPNo WB

10 CSE477 L26 System Power.10Irwin&Vijay, PSU, 2002 Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency EnergyDesign TimeNon-active ModulesRun Time Active Logic Design Reduced V dd Sizing Multi-V dd Clock Gating DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage+ Multi-V T Sleep Transistors Multi-V dd Variable V T + Variable V T

11 CSE477 L26 System Power.11Irwin&Vijay, PSU, 2002 Dynamic Frequency and Voltage Scaling  Intel’s SpeedStep l Hardware that steps down the clock frequency (dynamic frequency scaling – DFS) when the user unplugs from AC power -PLL from 650MHz  500MHz l CPU stalls during SpeedStep adjustment  Transmeta LongRun l Hardware that applies both DFS and DVS (dynamic supply voltage scaling) -32 levels of V DD from 1.1V to 1.6V -PLL from 200MHz  700MHz in increments of 33MHz l Triggered when CPU load change is detected by software -heavier load  ramp up V DD, when stable speed up clock -lighter load  slow down clock, when PLL locks onto new rate, ramp down V DD l CPU stalls only during PLL relock (< 20 microsec)

12 CSE477 L26 System Power.12Irwin&Vijay, PSU, 2002 Dynamic Thermal Management (DTM) Initiation Mechanism: How do we enable technique? Response Mechanism: What technique do we enable? Trigger Mechanism: When do we enable DTM techniques?

13 CSE477 L26 System Power.13Irwin&Vijay, PSU, 2002 DTM Trigger Mechanisms  Mechanism: How to deduce temperature?  Direct approach: on-chip temperature sensors l Based on differential voltage change across 2 diodes of different sizes l May require >1 sensor l Hysteresis and delay are problems  Policy: When to begin responding? l Trigger level set too high means higher packaging costs l Trigger level set too low means frequent triggering and loss in performance  Choose trigger level to exploit difference between average and worst case power

14 CSE477 L26 System Power.14Irwin&Vijay, PSU, 2002 DTM Initiation and Response Mechanisms  Operating system or micro architectural control? l Hardware support can reduce performance penalty by 20-30%  Initiation of policy incurs some delay l When using DVS and/or DFS, much of the performance penalty can be attributed to enabling/disabling overhead l Increasing policy delay reduces overhead; smarter initiation techniques would help as well  Thermal window (100Kcycles+) l Larger thermal windows “smooth” short thermal spikes

15 CSE477 L26 System Power.15Irwin&Vijay, PSU, 2002 DTM Activation and Deactivation Cycle Trigger Reached  Response Delay – Invocation time (e.g., adjust clock) Response Delay  Policy Delay – Number of cycles engaged Policy Delay Check Temp Check Temp Turn Response On Initiation Delay  Initiation Delay – OS interrupt/handler Shutoff Delay Turn Response Off  Shutoff Delay – Disabling time (e.g., re-adjust clock)

16 CSE477 L26 System Power.16Irwin&Vijay, PSU, 2002 DTM Savings Benefits Time Temperature DTM Disabled DTM/Response Engaged Designed for cooling capacity without DTM DTM trigger level Designed for cooling capacity with DTM System Cost Savings

17 CSE477 L26 System Power.17Irwin&Vijay, PSU, 2002 Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency EnergyDesign TimeNon-active ModulesRun Time Active Logic Design Reduced V dd Sizing Multi-V dd Clock Gating DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage+ Multi-V T Sleep Transistors Multi-V dd Variable V T + Variable V T

18 CSE477 L26 System Power.18Irwin&Vijay, PSU, 2002 Speculated Power of a 15mm  P

19 CSE477 L26 System Power.19Irwin&Vijay, PSU, 2002 Review: Variable V T (ABB) at Run Time  V T = V T0 +  (  |-2  F + V SB | -  |-2  F |) where V T0 is the threshold voltage at V SB = 0 V SB is the source-bulk (substrate) voltage  is the body-effect coefficient V SB (V) V T (V)  For an n-channel device, the substrate is normally tied to ground  A negative bias causes V T to increase from 0.45V to 0.85V  Adjusting the substrate bias at run time is called adaptive body-biasing (ABB)


Download ppt "CSE477 L26 System Power.1Irwin&Vijay, PSU, 2002 Low Power Design in Microarchitectures and Memories [Adapted from Mary Jane Irwin ( www.cse.psu.edu/~mji."

Similar presentations


Ads by Google