Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 7: Power.

Similar presentations


Presentation on theme: "Lecture 7: Power."— Presentation transcript:

1 Lecture 7: Power

2 Outline Power and Energy Dynamic Power Static Power 7: Power

3 Power and Energy Power is drawn from a voltage source attached to the VDD pin(s) of a chip. Instantaneous Power: Energy: Average Power: 7: Power

4 Power in Circuit Elements

5 Charging a Capacitor When the gate output rises
Energy stored in capacitor is But energy drawn from the supply is Half the energy from VDD is dissipated in the pMOS transistor as heat, other half stored in capacitor When the gate output falls Energy in capacitor is dumped to GND Dissipated as heat in the nMOS transistor 7: Power

6 Switching Waveforms Example: VDD = 1.0 V, CL = 150 fF, f = 1 GHz
7: Power

7 Switching Power 7: Power

8 Activity Factor Suppose the system clock frequency = f
Let fsw = af, where a = activity factor If the signal is a clock, a = 1 If the signal switches once per cycle, a = ½ Dynamic power: 7: Power

9 Short Circuit Current When transistors switch, both nMOS and pMOS networks may be momentarily ON at once Leads to a blip of “short circuit” current. < 10% of dynamic power if rise/fall times are comparable for input and output We will generally ignore this component 7: Power

10 Power Dissipation Sources
Ptotal = Pdynamic + Pstatic Dynamic power: Pdynamic = Pswitching + Pshortcircuit Switching load capacitances Short-circuit current Static power: Pstatic = (Isub + Igate + Ijunct + Icontention)VDD Subthreshold leakage Gate leakage Junction leakage Contention current 7: Power

11 Power Dissipation Power dissipation breakdown in the Niagra 2 processor (Sun-8 core – 84W) 7: Power

12 Dynamic Power Example 1 billion transistor chip 50M logic transistors
Average width: 12 l Activity factor = 0.1 950M memory transistors Average width: 4 l Activity factor = 0.02 1.0 V 65 nm process C = 1 fF/mm (gate) fF/mm (diffusion) Estimate dynamic power 1 GHz. Neglect wire capacitance and short-circuit current. 7: Power

13 Solution 7: Power

14 Dynamic Power Reduction
Try to minimize: Activity factor Capacitance Supply voltage Frequency 7: Power

15 Activity Factor Estimation
Let Pi = Prob(node i = 1) Pi = 1-Pi ai = Pi * Pi Completely random data has P = 0.5 and a = 0.25 Data is often not completely random e.g. upper bits of 64-bit words representing bank account balances are usually 0 Data propagating through ANDs and ORs has lower activity factor Depends on design, but typically a ≈ 0.1 7: Power

16 Switching Probability
7: Power

17 Example A 4-input AND is built out of two levels of gates
Estimate the activity factor at each node if the inputs have P = 0.5 7: Power

18 Example Compare the two cases below: 7: Power

19 Example 7: Power

20 Example 7: Power

21 Clock Gating The best way to reduce the activity is to turn off the clock to registers in unused blocks Saves clock activity (a = 1) Eliminates all switching activity in the block Requires determining if block will be used 7: Power

22 Capacitance Gate capacitance Fewer stages of logic Small gate sizes
Wire capacitance Good floorplanning to keep communicating blocks close to each other Drive long wires with inverters or buffers rather than complex gates 7: Power

23 Voltage / Frequency Run each block at the lowest possible voltage and frequency that meets performance requirements Voltage Domains Provide separate supplies to different blocks Level converters required when crossing from low to high VDD domains Dynamic Voltage Scaling Adjust VDD and f according to workload 7: Power

24 Voltage Domains 7: Power

25 Voltage Domains 7: Power

26 Voltage Domains The easiest approach is to associate each block in a floorplan with a voltage You can also perform clustered voltage scaling 7: Power

27 Voltage Domains Dynamic voltage scaling 7: Power

28 Voltage Domains 7: Power

29 Short Circuit Currents

30 How to keep Short-Circuit Currents Low?
Short circuit current goes to zero if tfall >> trise, but can’t do this for cascade logic, so ...

31 Minimizing Short-Circuit Power
Vdd =3.3 Vdd =2.5 Vdd =1.5

32 Resonant Circuits Especially useful in clocking. IBM has demonstrated resonant clocking for a practical processor. 7: Power

33 Static Power Static power is consumed even when chip is quiescent.
Leakage draws power from nominally OFF devices Ratioed circuits burn power in fight between ON transistors 7: Power

34 Static Power Example Revisit power estimation for 1 billion transistor chip Estimate static power consumption Subthreshold leakage Normal Vt: nA/mm High Vt: 10 nA/mm High Vt used in all memories and in 95% of logic gates Gate leakage 5 nA/mm Junction leakage negligible 7: Power

35 Solution 7: Power

36 Subthreshold Leakage For Vds > 50 mV Typical values in 65 nm
Ioff = leakage at Vgs = 0, Vds = VDD Typical values in 65 nm Ioff = 100 Vt = 0.3 V Ioff = 10 nA/mm @ Vt = 0.4 V Ioff = 1 nA/mm @ Vt = 0.5 V h = 0.1 kg = 0.1 S = 100 mV/decade 7: Power

37 Stack Effect Series OFF transistors have less leakage
Vx > 0, so N2 has negative Vgs Leakage through 2-stack reduces ~10x Leakage through 3-stack reduces further 7: Power

38 Threshold Effect 7: Power

39 Leakage Control Leakage and delay trade off
Aim for low leakage in sleep and low delay in active mode To reduce leakage: Increase Vt: multiple Vt Use low Vt only in critical circuits Increase Vs: stack effect Input vector control in sleep Decrease Vb Reverse body bias in sleep Or forward body bias in active mode 7: Power

40 Gate Leakage Extremely strong function of tox and Vgs
Negligible for older processes Approaches subthreshold leakage at 65 nm and below in some processes An order of magnitude less for pMOS than nMOS Control leakage in the process using tox > 10.5 Å High-k gate dielectrics help Some processes provide multiple tox e.g. thicker oxide for 3.3 V I/O transistors Control leakage in circuits by limiting VDD 7: Power

41 NAND3 Leakage Example 100 nm process Ign = 6.3 nA Igp = 0
Ioffn = 5.63 nA Ioffp = 9.3 nA Data from [Lee03] 7: Power

42 Junction Leakage From reverse-biased p-n junctions
Between diffusion and substrate or well Ordinary diode leakage is negligible Band-to-band tunneling (BTBT) can be significant Especially in high-Vt transistors where other leakage is small Worst at Vdb = VDD Gate-induced drain leakage (GIDL) exacerbates Worst for Vgd = -VDD (or more negative) 7: Power

43 Power Gating Turn OFF power to blocks when they are idle to save leakage Use virtual VDD (VDDV) Gate outputs to prevent invalid logic levels to next block Voltage drop across sleep transistor degrades performance during normal operation Size the transistor wide enough to minimize impact Switching wide sleep transistor costs dynamic power Only justified when circuit sleeps long enough 7: Power

44 Power Gating When a block is gated, the state must either be saved or reset upon power-up. Either use registers with a second VDD. Or save everything to memory. Power gating may be done externally with a disable input to a voltage regulator or internally with high VT header or footer switches. External power gating eliminates leakage altogether, but it takes a long time and significant energy. The power transistor actually consists of many transistors in parallel which should be controlled individually to combat Ldi/dt and IR drops. Also, best Ion/Ioff is obtained for specific L and W values. 7: Power

45 Multiple Thresholds Selective application of multiple threshold voltages can maintain performance on critical paths with low-Vt transistors while reducing leakage on other paths with high-Vt transistors. Using multiple thresholds adds to the cost of the process. One can alternatively use non-minimum L transistors for non-critical paths, thus raising the threshold voltages via the short-channel effect. For example, in Intel’s 65nm process, 10% longer transistors reduces Ion by 10%, but Ioff 3 times. 7: Power

46 Variable Thresholds Using body bias, one can dynamically adjust threshold voltages. This is called variable threshold CMOS (VTCMOS). Use low-Vt devices and reverse body bias during sleep. Alternatively, use high-Vt devices and forward body bias during operation. Too much reverse body bias (e.g. < -1.2V) leads to greater junction leakage due to BTBT. Too much forward body bias (e.g.>0.4V) leads to large current through the body to source diodes. 7: Power

47 Variable Thresholds Below is an n-well process with body bias.
Normally, triple well processes should be utilized. 7: Power

48 Input Vector Control Applying the pattern that consumes the least power during sleeping could minimize the power in that block. Be careful that applying this pattern itself causes power dissipation. 7: Power

49 Energy-Delay Optimization
What is the best choice for VDD and Vt in a certain technology and application? What does “best” mean? Let us start with minimum energy. Energy corresponds to PDP. It occurs in the subthreshold region where VDD < Vt. Von Neumann said that this could be found from thermodynamics and was kTln2. Meindl found the minimum voltage that the inverter could operate at by equating the slope at the switching point to -1. 7: Power

50 Energy-Delay Optimization
He took n = 1 for subthreshold operation. The minimum voltage turns out to be The energy stored on the gate capacitance of a MOSFET is The minimum charge is q. Emin = kTln2 = 2.9 X J. 0.5mm 5V process, 1.5 X 10-13, 65nm 1V 3 X J. 7: Power

51 Energy-Delay Optimization
However, this situation does not really minimize energy because the circuits run so slowly that the leakage energy dominates. The true minimum energy is at a point where switching and leakage energies are balanced. In subthreshold operation, current drops exponentially with VDD-Vt, switching energy improves quadratically with VDD. Ignoring DIBL, gate and junction leakage, and short circuit power one can find the minimum energy point easily. 7: Power

52 Energy-Delay Optimization
7: Power

53 Minimum Energy The delay of N gates operating in subthreshold region is given by The energy consumed in one cycle is 7: Power

54 Minimum Energy Note that this equation depends on switching activity.
Also, only inverters were used in the analysis. Other gates can also be considered. Temperature effects the behavior strongly. Even in this case, taking the derivative and equating to zero yields messy equations. Contour plots are more informative. 7: Power

55 Minimum Energy a = 1 a = 0.1 7: Power

56 Minimum Energy The minimum energy points are not practical because the energy is decreased about 10 times, but the frequency is decreased – times. A better alternative to take into account both energy and speed is energy delay product (EDP). 7: Power

57 Minimum EDP First, ignore leakage.
Use the alpha-power law to include velocity saturation. EDP is given by Differentiating with respect to VDD and setting to zero 7: Power

58 Minimum EDP a is typically between 1 and 2.
Hence, the optimum VDD is around 2Vt. Differentiating with respect to Vt gives the optimum Vt to be zero. This is because leakage was neglected. Leakage should also be introduced and the equation should be solved again. The results are messy, but can be described in terms of contour plots. The dashed lines represent speed normalized to the minimum EDP point. 7: Power

59 Minimum EDP 7: Power

60 Minimum Energy under a Delay Constraint
7: Power

61 Low Power Architectures

62 Power Management Modes

63 Power Management Modes
Intel Atom Processor HFM: 2 GHz, 1 V, 2 W. LFM: 600 MHz, 0.75 V. Sleep modes: C1-C6 For a typical workload, the chip spends 80% - 90% of its time in C6 mode. The average power drops to 220mW. Chips are usually designed for average power. Software designed to spend maximum power and burn chips is called thermal virus. 7: Power

64 Pitfalls and Fallacies
Oversizing gates Designing for speed regardless of power. Reporting power at a a given frequency rather than energy per operation. Reporting PDP where actually EDP should be used. Failing to account for leakage. 7: Power


Download ppt "Lecture 7: Power."

Similar presentations


Ads by Google