August 9, 2006Agrawal: VDAT'06 Tutorial II1 Low-Power Electronics and Systems Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and.

August 9, 2006Agrawal: VDAT'06 Tutorial II1 Low-Power Electronics and Systems Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849, USA http://www.eng.auburn.edu/~vagrawal vagrawal@eng.auburn.edu

August 9, 2006Agrawal: VDAT'06 Tutorial II2 Contents Introduction Dynamic power –Short circuit power –Reduced supply voltage operation –Glitch elimination Static (leakage) power reduction Low power systems –State encoding –Processor and multi-core design Books on low-power design

August 9, 2006Agrawal: VDAT'06 Tutorial II3 Introduction Why is it a concern? Power Consumption of VLSI Chips

August 9, 2006Agrawal: VDAT'06 Tutorial II4 ISSCC, Feb. 2001, Keynote “Ten years from now, microprocessors will run at 10GHz to 30GHz and be capable of processing 1 trillion operations per second -- about the same number of calculations that the world's fastest supercomputer can perform now. “Unfortunately, if nothing changes these chips will produce as much heat, for their proportional size, as a nuclear reactor....” Patrick P. Gelsinger Senior Vice President General Manager Digital Enterprise Group INTEL CORP.

August 9, 2006Agrawal: VDAT'06 Tutorial II5 VLSI Chip Power Density 4004 8008 8080 8085 8086 286 386 486 Pentium® P6 1 10 100 1000 10000 19701980199020002010 Year Power Density (W/cm 2 ) Hot Plate Nuclear Reactor Rocket Nozzle Sun’s Surface Source: Intel 

August 9, 2006Agrawal: VDAT'06 Tutorial II6 Meaning of Low-Power Design Design practices that reduce power consumption at least by one order of magnitude; in practice 50% reduction is often acceptable. General considerations in low-power design –Algorithms and architectures –High-level and software techniques –Gate and circuit-level methods –Power estimation techniques –Test power

August 9, 2006Agrawal: VDAT'06 Tutorial II7 Topics in Low-Power Power dissipation in CMOS circuits Device technology –Low-power CMOS technologies –Energy recovery methods Circuit and gate level methods –Logic synthesis –Dynamic power reduction techniques –Leakage power reduction System level methods –Microprocessors –Arithmetic circuits –Low power memory technology Test power Power estimation methods and tools

August 9, 2006Agrawal: VDAT'06 Tutorial II8 Power in a CMOS Gate V DD i DD (t) Ground

August 9, 2006Agrawal: VDAT'06 Tutorial II9 Power Dissipation in CMOS Logic (0.25µ) %75%5%20 P total (0→1) = C L V DD 2 + t sc V DD I peak + V DD I leakage CLCL V DD

August 9, 2006Agrawal: VDAT'06 Tutorial II10 Power and Energy Instantaneous power (Watts) P(t) = i DD (t) V DD Peak power (Watts) P peak = Max {P(t)} Average power (Watts) P av = [ ∫ 0 T P(t) dt ]/T Energy (Joules) E = ∫ 0 T P(t) dt

August 9, 2006Agrawal: VDAT'06 Tutorial II11 Low-Power Design Techniques Circuit and gate level methods – Reduced supply voltage – Adiabatic switching and charge recovery – Logic design for reduced activity – Reduced Glitches – Transistor sizing – Pass-transistor logic – Pseudo-nMOS logic – Multi-threshold gates

August 9, 2006Agrawal: VDAT'06 Tutorial II12 Low-Power Design Techniques Functional and architectural methods –Clock suppression –Clock frequency reduction –Supply voltage reduction –Power down –Algorithmic and Software methods

August 9, 2006Agrawal: VDAT'06 Tutorial II13 Test Power Power grid on a VLSI chip is designed for certain current capacity during functional operation: –Average current → heat dissipation –Peak current → noise, ground bounce Problem – Tests like scan or BIST are nonfunctional and may cause higher than the functional circuit activity; a functionally good chip can fail the test.

August 9, 2006Agrawal: VDAT'06 Tutorial II14 Power Estimation Methods Spice: Accurate but expensive Logic-level –Event-driven simulation –Statistical –Probabilistic High-level: Hierarchical

August 9, 2006Agrawal: VDAT'06 Tutorial II15 Components of Power Dynamic –Signal transitions Logic activity Glitches –Short-circuit Static –Leakage P total =P dyn + P stat =P tran + P sc + P stat

August 9, 2006Agrawal: VDAT'06 Tutorial II16 Power of a Transition: P tran V DD Ground CLCL R on R=large v i (t) v o (t) i c (t)

August 9, 2006Agrawal: VDAT'06 Tutorial II17 Charging of a Capacitor V C R i(t) v(t) Charge on capacitor, q(t)=C v(t) Current, i(t)=dq(t)/dt=C dv(t)/dt t = 0

August 9, 2006Agrawal: VDAT'06 Tutorial II18 i(t)=C dv(t)/dt=[V – v(t)] /R dv(t)V – v(t) ───=───── dt RC dv(t) dt ∫ ─────=∫───── V – v(t) RC -t ln [V – v(t)]=──+ A RC Initial condition, t = 0, v(t) = 0 → A = ln V -t v(t)=V [1 – exp(───)] RC

August 9, 2006Agrawal: VDAT'06 Tutorial II19 -t v(t)=V [1 – exp( ── )] RC dv(t) V -t i(t)=C ───=── exp( ── ) dt R RC

August 9, 2006Agrawal: VDAT'06 Tutorial II20 Total Energy Per Charging Transition from Power Supply ∞∞ V 2 -t E trans =∫ V i(t) dt=∫ ── exp( ── ) dt 00 R RC =CV 2

August 9, 2006Agrawal: VDAT'06 Tutorial II21 Energy Dissipated per Transition in Resistance (R) of “On” Transistors ∞ V 2 ∞ -2t R ∫ i 2 (t) dt=R ── ∫ exp( ── ) dt 0 R 2 0 RC 1 = ─ CV 2 2

August 9, 2006Agrawal: VDAT'06 Tutorial II22 Energy Stored in Charged Capacitor ∞∞ -t V -t ∫ v(t) i(t) dt = ∫ V [1- exp( ── )] ─ exp( ── ) dt 00 RC R RC 1 = ─ CV 2 2

August 9, 2006Agrawal: VDAT'06 Tutorial II23 Transition Power Gate output rising transition –Energy dissipated in pMOS transistor = CV 2 /2 –Energy stored in capacitor = CV 2 /2 Gate output falling transition –Energy dissipated in nMOS transistor = CV 2 /2 Energy dissipated per transition = CV 2 /2 Power dissipation: P trans =E trans α f ck =α f ck CV 2 /2 α=activity factor

August 9, 2006Agrawal: VDAT'06 Tutorial II24 Short Circuit Current, i sc (t) Time (ns) 0 1 Amp Volt V DD i sc (t) 0 V i (t) V o (t) V DD - V Tp V Tn tBtB tEtE I scmaxf V DD V i (t)V o (t) GND

August 9, 2006Agrawal: VDAT'06 Tutorial II25 Short-Circuit Energy per Transition E scf = ∫ t B t E V DD i sc (t)dt = (t E – t B ) I scmaxf V DD /2 E scf = t f (V DD - |V Tp | -V Tn ) I scmaxf /2 E scr = t r (V DD - |V Tp | -V Tn ) I scmaxr /2 E scf = 0, when V DD = |V Tp | + V Tn

August 9, 2006Agrawal: VDAT'06 Tutorial II26 Short-Circuit Power and Voltage Scaling Decreases and eventually becomes zero when V DD is scaled down but the threshold voltages are not scaled down. References: –M. A. Ortega and J. Figueras, “Short Circuit Power Modeling in Submicron CMOS,” PATMOS’96, Aug. 1996, pp. 147-166. –T. Sakurai and A. Newton, “Alpha-power Law MOSFET model and Its Application to a CMOS Inverter,” IEEE J. Solid State Circuits, vol. 25, April 1990, pp. 584-594.

August 9, 2006Agrawal: VDAT'06 Tutorial II27 P sc and Output Capacitance V DD Ground CLCL R on R=large v i (t) v o (t) i c (t)+i sc (t) tftf trtr v o (t) ─── R↑

August 9, 2006Agrawal: VDAT'06 Tutorial II28 i sc and Output Capacitance -t V DD [ 1- exp ( ───── )] v o (t) R↓ tf (t)C I sc (t) =──── =────────────── R↑ tf (t)

August 9, 2006Agrawal: VDAT'06 Tutorial II29 i scmax and Output Capacitance Small C Large C tftf 1 ──── R↑ tf (t) i scmax v o (t) i t

August 9, 2006Agrawal: VDAT'06 Tutorial II30 P sc, Output Rise Times, Capacitance For given input rise and fall times short circuit power decreases as output capacitance increases. Short circuit power increases with increase of input rise and fall times. Short circuit power is reduced if output rise and fall times are smaller than the input rise and fall times.

August 9, 2006Agrawal: VDAT'06 Tutorial II31 Effects of Scaling Down 1-16% short-circuit power at 0.7 micron 4-37% at 0.35 micron 12-60% at 0.17 micron Reference: S. R. Vemuru and N. Steinberg, “Short Circuit Power Dissipation Estimation for CMOS Logic Gates,” IEEE Trans. on Circuits and Systems I, vol. 41, Nov. 1994, pp. 762-765.

August 9, 2006Agrawal: VDAT'06 Tutorial II32 Summary: Short-Circuit Power Short-circuit power is consumed by each transition (increases with input transition time). Reduction requires that gate output transition should not be faster than the input transition (faster gates can consume more short-circuit power). Increasing the output load capacitance reduces short-circuit power. Scaling down of supply voltage with respect to threshold voltages reduces short-circuit power.

August 9, 2006Agrawal: VDAT'06 Tutorial II33 Dynamic Power V DD Ground CLCL R R Dynamic Power = C L V DD 2 /2 + P sc ViVi VoVo i sc

August 9, 2006Agrawal: VDAT'06 Tutorial II34 Dynamic Power Reduction Reduce power per transition –Reduced voltage operation – voltage scaling –Capacitance minimization – device sizing Reduce number of transitions –Glitch elimination

August 9, 2006Agrawal: VDAT'06 Tutorial II35 CMOS Dynamic Power Dynamic Power = Σ0.5 α i f clk C Li V DD 2 All gates i ≈ 0.5 α f clk C L V DD 2 ≈ α 01 f clk C L V DD 2 whereαaverage gate activity factor α 01 = 0.5α, average 0→1 trans. f clk clock frequency C L total load capacitance V DD supply voltage

August 9, 2006Agrawal: VDAT'06 Tutorial II36 Example: 0.25μm CMOS Chip f = 500MHz Average capacitance = 15fF/gate V DD = 2.5V 10 6 gates Power= α 01 f C L V DD 2 = α 01 ×500×10 6 ×(15×10 -15 ×10 6 ) ×2.5 2 = 46.9W, for α 01 = 1.0

August 9, 2006Agrawal: VDAT'06 Tutorial II37 Signal Activity, α T=1/f Clock α 01 = 1.0 α 01 = 0.5 Comb. signals

August 9, 2006Agrawal: VDAT'06 Tutorial II38 Reducing Dynamic Power Dynamic power reduction is –Quadratic with reduction of supply voltage –Linear with reduction of capacitance

August 9, 2006Agrawal: VDAT'06 Tutorial II39 0.25μm CMOS Inverter, V DD =2.5V 0 -4 -8 -12 -16 -20 V in (V) V out (V) V in (V) 2.5 2.0 1.5 1.0 0.5 0 0 0.5 1.0 1.5 2.0 2.5 Gain

August 9, 2006Agrawal: VDAT'06 Tutorial II40 0.25μm CMOS Inverter, V DD < 2.5V 0.2 0.15 0.1 0.05 0 V in (V) V out (V) V in (V) 2.5 2.0 1.5 1.0 0.5 0 0 0.5 1.0 1.5 2.0 2.50 0.05 0.1 0.15 0.2 V out (V) Gain = -1

August 9, 2006Agrawal: VDAT'06 Tutorial II41 Lower Bound on V DD For proper operation of gate, maximum gain (for Vin = V DD /2) should be greater than 1. Gain max = -(1/n)[exp(V DD /2Φ T ) – 1] = -1 n = 1.5 Φ T = kT/q = 26mV V DD = 48V V DDmin > 2 to 4 times kT/q or ~100mV at room temperature (27 o C) Ref.: J. M. Rabaey, A. Chandrakasan and B. Nikolić, Digital Integrated Circuits, Upper Saddle River, New Jersey: Pearson Education, 2003.

August 9, 2006Agrawal: VDAT'06 Tutorial II42 Impact of V DD on Performance C L V DD Inverter delay = K─────── (V DD – V t ) α 0.6V1.8V3.0V V DD Power Delay 40 30 20 10 0 Delay (ns) V DD =V t Power (log scale)

August 9, 2006Agrawal: VDAT'06 Tutorial II43 Optimum Power × Delay V DD 3 Power × Delay, PD=constant ×─────── (V DD – V t ) α For minimum power-delay product, d(PD)/dV DD = 0 3V t V DD =─── 3 – α For long channel devices, α = 2, V DD = 3V t For very short channel devices, α = 1, V DD = 1.5V t

August 9, 2006Agrawal: VDAT'06 Tutorial II44 Transistor Sizing for Performance Problem: If we increase W/L to make the charging or discharging of load capacitance, then the increased W increases the load for the driving gate C in CLCL

August 9, 2006Agrawal: VDAT'06 Tutorial II45 Fixed-Taper Buffer V in V out CLCL C in 1 α α2α2 α i-1 α n-1 C i = α i-1 C in C L = α n C in Delay = t 0 Ref.: J. Segura and C. F. Hawkins, CMOS Electronics, How It Works, How It Fails, Piscataway, New Jersey: IEEE Press, 2004.

August 9, 2006Agrawal: VDAT'06 Tutorial II46 Buffer (Cont.) α n = C L /C in ln (C L /C in ) n = ────── ln α ith stage delay, t i = αt 0, i = 1,... n, because each stage drives a stage α times bigger than itself.

August 9, 2006Agrawal: VDAT'06 Tutorial II47 Buffer (Cont.) n Total delay =Σ ti=nαt 0 i=1 = ln(C L /C in ) αt 0 /ln(α)

August 9, 2006Agrawal: VDAT'06 Tutorial II48 Buffer (Cont.) Differentiating total delay with respect to α and equating to 0, we get α opt = e ≈ 2.7 The optimum number of stages is n opt = ln(C L /C in )

August 9, 2006Agrawal: VDAT'06 Tutorial II49 Further Reading B. S. Cherkauer and E. G. Friedman, “A Unified Design Methodology for CMOS Tapered Buffers,” IEEE Trans. VLSI Systems, vol. 3, no. 1, pp. 99-111, March 1995.

August 9, 2006Agrawal: VDAT'06 Tutorial II50 Logic Activity and Glitches 4 5 7 6 1 2 3 d=2 d=1

August 9, 2006Agrawal: VDAT'06 Tutorial II51 Glitch Power Reduction Design a digital circuit for minimum transient energy consumption by eliminating hazards

August 9, 2006Agrawal: VDAT'06 Tutorial II52 Theorem 1 For correct operation with minimum energy consumption, a Boolean gate must produce no more than one event per transition. Output logic state changes One transition is necessary Output logic state unchanged No transition is necessary

August 9, 2006Agrawal: VDAT'06 Tutorial II53 Inertial Delay of a Gate (Inverter) d HL d LH d HL +d LH d = ──── 2 V in V out time

August 9, 2006Agrawal: VDAT'06 Tutorial II54 Given that events occur at the input of a gate with inertial delay d at times, t 1 ≤... ≤ t n, the number of events at the gate output cannot exceed Theorem 2 min ( n, 1 + ) t n – t 1 --------d t n - t 1 t n - t 1 t 1 t 2 t 3 t n t 1 t 2 t 3 t n time time

August 9, 2006Agrawal: VDAT'06 Tutorial II55 Minimum Transient Design Minimum transient energy condition for a Boolean gate: | t i - t j | < d Where t i and t j are arrival times of input events and d is the inertial delay of gate

August 9, 2006Agrawal: VDAT'06 Tutorial II56 Balanced Delay Method All input events arrive simultaneously Overall circuit delay not increased Delay buffers may have to be inserted 1 1 1 1 1 1 1 1 3 1 1 4?

August 9, 2006Agrawal: VDAT'06 Tutorial II57 Hazard Filter Method Gate delay is made greater than maximum input path delay difference No delay buffers needed (least transient energy) Overall circuit delay may increase 3 1 1 1 1 3 1 1 1 1

August 9, 2006Agrawal: VDAT'06 Tutorial II58 Glitch-Free Design by Linear Programming Variables: gate and buffer delays Objective: minimize number of buffers Subject to: overall circuit delay Subject to: minimum transient condition for multi-input gate

August 9, 2006Agrawal: VDAT'06 Tutorial II59 Variables for Full-Adder Gate delay variables d 4... d 12 Buffer delay variables d 15... d 29 Delay variables are located at the checkpoints of the circuit. Delay variables

August 9, 2006Agrawal: VDAT'06 Tutorial II60 Objective Function Ideal: minimize the number of non-zero delay buffers Actual: minimize sum of buffer delays

August 9, 2006Agrawal: VDAT'06 Tutorial II61 Specify Critical Path Delay 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 Sum of delays on critical path ≤ maxdel Original design

August 9, 2006Agrawal: VDAT'06 Tutorial II62 Multi-Input Gate Condition 1 1 1 1 d1 d2 d d1 - d2 ≤ d d2 - d1 ≤ d d d |d1 - d2| ≤ d ≡

August 9, 2006Agrawal: VDAT'06 Tutorial II63 Results: 1-Bit Adder R. Fourer, D. M. Gay and B. W. Kernighan, AMPL: A Modeling Language for Mathematical Programming, South San Francisco: The Scientific Press, 1993.

August 9, 2006Agrawal: VDAT'06 Tutorial II64 AMPL Solution: maxdel = 6 2 1 1 1 1 1 2 1 2 2 1

August 9, 2006Agrawal: VDAT'06 Tutorial II65 AMPL Solution: maxdel = 7 2 2 1 1 1 1 1 1 3 2

August 9, 2006Agrawal: VDAT'06 Tutorial II66 AMPL Solution: maxdel ≥ 11 2 3 1 1 1 1 4 3 5

August 9, 2006Agrawal: VDAT'06 Tutorial II67 Removing a Limitation Constraints are written by path enumeration. Since number of paths in a circuit can be exponential in circuit size, the formulation is infeasible for large circuits. Example: c880 has 6.96M constraints. Solution: A linear complexity method. See, –T. Raja, Master’s Thesis, Rutgers University, 2002. –T. Raja, V. D. Agrawal and M. L. Bushnell, “Minimum Dynamic Power CMOS Circuit Design by a Reduced Constraint Set Linear Program,” Proc. 16 th International Conf. VLSI Design, 2003, pp. 527-532.

August 9, 2006Agrawal: VDAT'06 Tutorial II68 Comparison of Constraints Number of gates in circuit Number of constraints

August 9, 2006Agrawal: VDAT'06 Tutorial II69 Benchmark Circuits Circuit C432 C880 C6288 c7552 Maxdel. (gates) 17 34 24 48 47 94 43 86 No. of Buffers 95 66 62 34 294 120 366 111 Average 0.72 0.62 0.68 0.40 0.36 0.38 0.36 Peak 0.67 0.60 0.54 0.52 0.36 0.34 0.32 Normalized Power

August 9, 2006Agrawal: VDAT'06 Tutorial II70 c7552: 3,500-gate CMOS Circuit Clock Cycles Instantaneous Energy x10 --10 Joules

August 9, 2006Agrawal: VDAT'06 Tutorial II71 References R. Fourer, D. M. Gay and B. W. Kernighan, AMPL: A Modeling Language for Mathematical Programming, South San Francisco: The Scientific Press, 1993. M. Berkelaar and E. Jacobs, “Using Gate Sizing to Reduce Glitch Power,” Proc. ProRISC Workshop, Mierlo, The Netherlands, Nov. 1996, pp. 183-188. V. D. Agrawal, “Low Power Design by Hazard Filtering,” Proc. 10 th Int’l Conf. VLSI Design, Jan. 1997, pp. 193-197. V. D. Agrawal, M. L. Bushnell, G. Parthasarathy and R. Ramadoss, “Digital Circuit Design for Minimum Transient Energy and Linear Programming Method,” Proc. 12 th Int’l Conf. VLSI Design, Jan. 1999, pp. 434-439. M. Hsiao, E. M. Rudnick and J. H. Patel, “Effects of Delay Model in Peak Power Estimation of VLSI Circuits,” Proc. ICCAD, Nov. 1997, pp. 45-51. T. Raja, A Reduced Constraint Set Linear Program for Low Power Design of Digital Circuits, Master’s Thesis, Rutgers Univ., New Jersey, 2002. T. Raja, V. D. Agrawal and M. L. Bushnell, “Transistor Sizing of Logic gates to Maximize Input Delay Variability,” J. of Low Power Electronics (JOLPE), vol. 2, pp. 121-128, 2006.

August 9, 2006Agrawal: VDAT'06 Tutorial II72 Static (Leakage) Power Dynamic –Signal transitions Logic activity Glitches –Short-circuit Static –Leakage

August 9, 2006Agrawal: VDAT'06 Tutorial II73 Leakage Power IGIG IDID I sub I PT I GIDL n+ Ground V DD R

August 9, 2006Agrawal: VDAT'06 Tutorial II74 Leakage Current Components Subthreshold conduction, I sub Reverse bias pn junction conduction, I D Gate induced drain leakage, I GIDL due to tunneling at the gate-drain overlap Drain source punchthrough, I PT due to short channel and high drain-source voltage Gate tunneling, I G through thin oxide

August 9, 2006Agrawal: VDAT'06 Tutorial II75 Subthreshold Current I sub = μ 0 C ox (W/L) V t 2 exp{(V GS -V TH )/nV t } μ 0 : carrier surface mobility C ox : gate oxide capacitance per unit area L: channel length W: gate width V t = kT/q: thermal voltage n: a technology parameter

August 9, 2006Agrawal: VDAT'06 Tutorial II76 I DS for Short Channel Device I sub = μ 0 C ox (W/L) V t 2 exp{(V GS -V TH +ηV DS )/nV t } V DS = drain to source voltage η: a proportionality factor

August 9, 2006Agrawal: VDAT'06 Tutorial II77 Increased Subthreshold Leakage 0V TH ’V TH Log I sub Gate voltage Scaled device IcIc

August 9, 2006Agrawal: VDAT'06 Tutorial II78 Reducing Leakage Power Leakage power as a fraction of the total power increases as clock frequency drops. Turning supply off in unused parts can save power. For a gate it is a small fraction of the total power; it can be significant for very large circuits. Scaling down features requires lowering the threshold voltage, which increases leakage power; roughly doubles with each shrinking. Multiple-threshold devices are used to reduce leakage power.

August 9, 2006Agrawal: VDAT'06 Tutorial II79 Problem Statement Problem: To Design a CMOS Circuit, –using dual-threshold devices to globally minimize subthreshold leakage –using delay elements to eliminate all glitches –maintaining specified performance –allowing performance-power tradeoff Reference: Y. Lu and V. D. Agrawal, “Leakage and Dynamic Glitch Power Minimization Using Integer Linear Programming for Vth Assignment and Path Balancing,” Proc. PATMOS, 2005, pp. 217-226.

August 9, 2006Agrawal: VDAT'06 Tutorial II80 MILP: Mixed Integer Linear Program Minimize { Σ X i I Li + (1-X i )I Hi all gates i + Σ Σ Δd ij } all gates i→ j WhereX i = 1, gate i has low V th, low leakage = I Li X i = 0, gate i has high V th, high leakage = I Hi Δd ij = delay inserted between gates i and j for glitch suppression X i = [0,1], is an integer, Δd ij is a real variable I Li and I Hi are constants for gate i obtained by SPICE simulation

August 9, 2006Agrawal: VDAT'06 Tutorial II81 MILP - Constraints  Circuit delay constraint for each PO i: T max can be the delay of critical path or clock period specified by the circuit designer.  Glitch suppression constraint for each gate i: (1) (2) (3) Constraints (1), (2) and (3) make sure that T i - t i < d i for each gate, so glitches are eliminated. T i is the latest signal arrival time at the output of gate i. t i is the earliest signal arrival time at the output of gate i.

August 9, 2006Agrawal: VDAT'06 Tutorial II82 Power-Delay Tradeoff Example 14-Gate Full Adder (Unptimized, T max = T c ) A B C S C0 Low V th gates Critical path I leak = 161 pA

August 9, 2006Agrawal: VDAT'06 Tutorial II83 Power-Delay Tradeoff Example 14-Gate Full Adder (Optimized, T max = T c ) A B C S C0 Low V th High V th Delay buffer (high V th ) Critical path I leak = 73 pA

August 9, 2006Agrawal: VDAT'06 Tutorial II84 Power-Delay Tradeoff Example 14-Gate Full Adder (Optimized, T max = 1.25T c ) A B C S C0 Low V th High V th Delay buffer (high V th ) Critical path I leak = 16 pA

August 9, 2006Agrawal: VDAT'06 Tutorial II85 Leakage Reduction and Performance Tradeoff @ 27 ℃, 70nm Circuit # gates Critical Path Delay T c (ns) Unoptimized I leak (μA) Optimized I leak (μA) (T max = T c ) Leakage Reduction Sun OS 5.7 CPU secs. Optimized I leak (μA) (T max = 1.25T c ) Leakage Reduction Sun OS 5.7 CPU secs. C4321600.7512.6201.02261.0%0.420.13295.0%0.3 C4991820.3914.2933.46419.3%0.080.22594.8%1.8 C8803280.6724.4060.52488.1%0.240.15396.5%0.3 C13552140.4034.3883.29025.0%0.10.29493.3%2.1 C19083190.5736.0232.02366.4%590.20496.6%1.3 C26703621.2635.9250.65990.4%0.380.12597.9%0.16 C354010971.74815.6220.97293.8%3.90.31998.0%0.74 C531511651.58919.3322.50587.1%1400.39598.0%0.71 C628811892.17723.1426.07573.8%2770.67897.1%7.48 C755210461.91522.0430.87296.0%1.10.44598.0%0.58

August 9, 2006Agrawal: VDAT'06 Tutorial II86 Leakage, Dynamic and Total Power Comparison @ 90 ℃, 70nm Circuit # Gates Leakage PowerDynamic PowerTotal Power P leak 1* (uW) P leak 2* (uW) Leakage Reduction P dyn 1* (uW) P dyn 2* (uW) Dynamic Reduction P total 1* (uW) P total 2* (uW) Total Reduction C43216035.7711.8766.8%101.073.327.4%136.885.237.7% C49918250.3639.9420.7%225.7160.329.0%276.1200.227.5% C88032885.2111.0587.0%177.3128.027.8%262.5139.147.0% C135521454.1239.9626.3%293.3165.743.5%347.4205.740.8% C190831992.1729.6967.8%254.9197.722.4%347.1227.434.5% C2670362115.411.3290.2%128.6100.821.6%244.0112.154.1% C35401097302.817.9894.1%333.2228.131.5%636.0246.161.3% C53151165421.149.7988.2%465.5304.334.6%886.6354.160.1% C62881189388.597.1775.0%1691.2405.676.0%2079.7502.875.8% C75521046444.418.7595.8%380.9227.840.2%825.3246.670.1% * 1: unoptimized circuits; 2: optimized circuits.

August 9, 2006Agrawal: VDAT'06 Tutorial II87 Low-Power System Design State encoding –Bus encoding –Finite state machine Clock gating –Flip-flop –Shift register Microprocessors –Single processor –Multi-core processor

August 9, 2006Agrawal: VDAT'06 Tutorial II88 Bus Encoding Example: Four bit bus 0000→1110 has three transitions. If bits of second pattern are inverted, then 0000→0001 will have only one transition. Bit-inversion encoding for N-bit bus: Number of bit transitions 0 N/2N N N/2 0 Number of bit transitions after inversion encoding

August 9, 2006Agrawal: VDAT'06 Tutorial II89 Bus-Inversion Encoding Logic Polarity decision logic Sent data Received data Bus register Polarity bit M. Stan and W. Burleson, “Bus-Invert Coding for Low Power I/O,” IEEE Trans. VLSI Systems, vol. 3, no. 1, pp. 49-58, March 1995.

August 9, 2006Agrawal: VDAT'06 Tutorial II90 FSM State Encoding 11 0100 0.1 0.4 0.3 0.6 0.9 0.6 01 1100 0.1 0.4 0.3 0.6 0.9 0.6 Expected number of state-bit transitions: 2(0.3+0.4) + 1(0.1+0.1) = 1.61(0.3+0.4+0.1) + 2(0.1) = 1.0 Transition probability based on PI statistics State encoding can be selected using a power-based cost function.

August 9, 2006Agrawal: VDAT'06 Tutorial II91 FSM: Clock-Gating Moore machine: Outputs depend only on the state variables. –If a state has a self-loop in the state transition graph (STG), then clock can be stopped whenever a self-loop is to be executed. Sj Si Sk Xi/Zk Xk/Zk Xj/Zk Clock can be stopped when (Xk, Sk) combination occurs.

August 9, 2006Agrawal: VDAT'06 Tutorial II92 Clock-Gating in Moore FSM Combinational logic Latch Clock activation logic Flip-flops PI CK PO L. Benini and G. De Micheli, Dynamic Power Management, Boston: Springer, 1998.

August 9, 2006Agrawal: VDAT'06 Tutorial II93 Clock-Gating in Low-Power Flip-Flop D Q D CK C. Piguet, “Circuit and Logic Level Design,” pages 103-133 in W. Nebel and J. Mermet (ed.), Low Power Design in Deep Submicron Electronics, Boston: Kluwer Academic Publishers, 1997.

August 9, 2006Agrawal: VDAT'06 Tutorial II94 Reduced-Power Shift Register D Q D CK(f/2) multiplexer Output Flip-flops are operated at full voltage and half the clock frequency.

August 9, 2006Agrawal: VDAT'06 Tutorial II95 Power Reduction in Processors Just about everything is used. Hardware methods: Voltage reduction for dynamic power Dual-threshold devices for leakage reduction Clock gating, frequency reduction Sleep mode Architecture: Instruction set hardware organization Software methods

August 9, 2006Agrawal: VDAT'06 Tutorial II96 SIA Roadmap for Processors (1999) Year199920022005200820112014 Feature size (nm)180130100705035 Logic transistors/cm 2 6.2M18M39M84M180M390M Clock (GHz)1.252.13.56.010.016.9 Chip size (mm 2 )340430520620750900 Power supply (V)1.81.51.20.90.60.5 High-perf. Power (W)90130160170175183 Source: http://www.semichips.orghttp://www.semichips.org

August 9, 2006Agrawal: VDAT'06 Tutorial II97 Power Reduction Example Alpha 21064: 200MHz @ 3.45V, power dissipation = 26W Reduce voltage to 1.5V, power (5.3x) = 4.9W Eliminate FP, power (3x) = 1.6W Scale 0.75→0.35μ, power (2x) = 0.8W Reduce clock load, power (1.3x) = 0.6W Reduce frequency 200→160MHz, power (1.25x) = 0.5W J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC Microprocessor,” IEEE J. Solid-State Circuits, vol. 31, no. 11, pp. 1703-1714, Nov. 1996.

August 9, 2006Agrawal: VDAT'06 Tutorial II98 Low-Power Datapath Architecture Lower supply voltage –This slows down circuit speed –Use parallel computing to gain the speed back Works well when threshold voltage is also lowered. About 60% reduction in power obtainable. Reference: A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design, Boston: Kluwer Academic Publishers (Now Springer), 1995.

August 9, 2006Agrawal: VDAT'06 Tutorial II99 A Reference Datapath Combinational logic Output Input Register CK Supply voltage= V ref Total capacitance switched per cycle= C ref Clock frequency= f Power consumption:P ref = C ref V ref 2 f C ref

August 9, 2006Agrawal: VDAT'06 Tutorial II100 A Parallel Architecture Comb. Logic Copy 1 Comb. Logic Copy 2 Comb. Logic Copy N Register N to 1 multiplexer Multiphase Clock gen. and mux control Input Output CK f f/N A copy processes every Nth input, operates at reduced voltage Supply voltage: V N ≤ V 1 = V ref N = Deg. of parallelism

August 9, 2006Agrawal: VDAT'06 Tutorial II101 Control Signals, N = 4 CK Phase 1 Phase 2 Phase 3 Phase 4

August 9, 2006Agrawal: VDAT'06 Tutorial II102 Power P N =P proc + P overhead P proc =N(C inreg + C comb )V N 2 f/N + C outreg V N 2 f =(C inreg + C comb +C outreg )V N 2 f =C ref V N 2 f P overhead =C overhead V N 2 f≈ δC ref (N – 1)V N 2 f P N = [1 + δ(N – 1)]C ref V N 2 f P N V N 2 ──= [1 + δ(N – 1)] ─── P 1 V ref 2

August 9, 2006Agrawal: VDAT'06 Tutorial II103 Voltage vs. Speed C L V ref C L V ref Delay of a gate, T ≈ ──── = ────────── Ik(W/L)(V ref – V t ) 2 whereI is saturation current k is a technology parameter W/L is width to length ratio of transistor V t is threshold voltage Supply voltage Normalized gate delay, T 4.0 3.0 2.0 1.0 0.0 VtVt V ref =5VV 2 =2.9V N=1 N=2 V3V3 N=3 1.2μ CMOS Voltage reduction slows down as we get closer to V t

August 9, 2006Agrawal: VDAT'06 Tutorial II104 Increasing Multiprocessing P N /P 1 1 2 3 4 5 6 7 8 9 10 11 12 1.0 0.8 0.6 0.4 0.2 0.0 V t =0V (extreme case) V t =0.4V V t =0.8V N 1.2μ CMOS, V ref = 5V

August 9, 2006Agrawal: VDAT'06 Tutorial II105 Extreme Cases: V t = 0 Delay, T α 1/ V ref For N processing elements, delay = NT → V N = V ref /N P N 1 ──=[1+ δ (N – 1)] ──→1/N P 1 N 2 For negligible overhead, δ→0 P N 1 ──≈── P 1 N 2 For V t > 0, power reduction is less and there will be an optimum value of N.

August 9, 2006Agrawal: VDAT'06 Tutorial II106 Example: Multiplier Core Specification: 200MHz Clock 15W dissipation @ 5V Low voltage operation, V DD ≥ 1.5 volts (V DD – 0.5) 2 Relative clock rate = ─────── 20.25 Problem: Integrate multiplier core on a SOC Power budget for multiplier ~ 5W

August 9, 2006Agrawal: VDAT'06 Tutorial II107 A Multicore Design Multiplier Core 1 Multiplier Core 5 Reg 5 to 1 mux Multiphase Clock gen. and mux control Input Output 200MHz CK 200MHz 40MHz Multiplier Core 2 Core clock frequency = 200/N, N should divide 200.

August 9, 2006Agrawal: VDAT'06 Tutorial II108 How Many Cores? For N cores: clock frequency = 200/N MHz Supply voltage, V DDN = 0.5 + (20.25/N) 1/2 Volts Assuming 10% overhead per core, V DDN Power dissipation =15 [1 + 0.1(N – 1)] ( ─── ) 2 watts 5

August 9, 2006Agrawal: VDAT'06 Tutorial II109 Design Tradeoffs Number of cores N Clock (MHz) Core supply VDDN (Volts) Total Power (Watts) 12005.0015.0 21003.688.94 4502.755.90 5402.515.29 8252.104.50

August 9, 2006Agrawal: VDAT'06 Tutorial II110 Pipeline Architecture Processor f Input Output Register ½ Proc. f InputOutput Register ½ Proc. Register Capacitance = C Voltage = V Frequency = f Power = CV 2 f Capacitance = 1.2C Voltage = 0.6V Frequency = f Power = 0.432CV 2 f

August 9, 2006Agrawal: VDAT'06 Tutorial II111 Approximate Trend n-parallel proc. n-stage pipeline proc. CapacitancenCC VoltageV/n Frequencyf/nf PowerCV 2 f/n 2 Chip area n times10-20% increase G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Kluwer Academic Publishers, 1998.

August 9, 2006Agrawal: VDAT'06 Tutorial II112 Multicore Processors 200020042008 Performance based on SPECint2000 and SPECfp2000 benchmarks Multicore Single core Computer, May 2005, p. 12

August 9, 2006Agrawal: VDAT'06 Tutorial II113 Multicore Processors D. Geer, “Chip Makers Turn to Multicore Processors,” Computer, vol. 38, no. 5, pp. 11-13, May 2005. A. Jerraya, H. Tenhunen and W. Wolf, “Multiprocessor Systems-on-Chips,” Computer, vol. 5, no. 7, pp. 36-40, July 2005; this special issue contains three more articles on multicore processors. S. K. Moore, “Winner Multimedia Monster – Cell’s Nine Processors Make It a Supercomputer on a Chip,” IEEE Spectrum, vol. 43. no. 1, pp. 20-23, January 2006.

August 9, 2006Agrawal: VDAT'06 Tutorial II114 Cell - Cell Broadband Engine Architecture L to R Atsushi Kameyama, Toshiba James Kahle, IBM Masakazu Suzoki, Sony © IEEE Spectrum, January 2006 Nine-processor chip: 192 Gflops

August 9, 2006Agrawal: VDAT'06 Tutorial II115 Cell’s Nine-Processor Chip © IEEE Spectrum, January 2006 Eight Identical Processors f = 5.6GHz (max) 44.8 Gflops

August 9, 2006Agrawal: VDAT'06 Tutorial II116 Books on Low-Power Design (1) L. Benini and G. De Micheli, Dynamic Power Management Design Techniques and CAD Tools, Boston: Springer, 1998. T. D. Burd and R. A. Brodersen, Energy Efficient Microprocessor Design, Boston: Springer, 2002. A. Chandrakasan and R. Brodersen, Low-Power Digital CMOS Design, Boston: Springer, 1995. A. Chandrakasan and R. Brodersen, Low-Power CMOS Design, New York: IEEE Press, 1998. J.-M. Chang and M. Pedram, Power Optimization and Synthesis at Behavioral and System Levels using Formal Methods, Boston: Springer, 1999. M. S. Elrabaa, I. S. Abu-Khater and M. I. Elmasry, Advanced Low-Power Digital Circuit Techniques, Boston: Springer, 1997. R. Graybill and R. Melhem, Power Aware Computing, New York: Plenum Publishers, 2002. S. Iman and M. Pedram, Logic Synthesis for Low Power VLSI Designs, Boston: Springer, 1998. J. B. Kuo and J.-H. Lou, Low-Voltage CMOS VLSI Circuits, New York: Wiley- Interscience, 1999. J. Monteiro and S. Devadas, Computer-Aided Design Techniques for Low Power Sequential Logic Circuits, Boston: Springer, 1997. S. G. Narendra and A. Chandrakasan, Leakage in Nanometer CMOS Technologies, Boston: Springer, 2005. W. Nebel and J. Mermet, Low Power Design in Deep Submicron Electronics, Boston: Springer, 1997.

August 9, 2006Agrawal: VDAT'06 Tutorial II117 Books on Low-Power Design (2) N. Nicolici and B. M. Al-Hashimi, Power-Constrained Testing of VLSI Circuits, Boston: Springer, 2003. V. G. Oklobdzija, V. M. Stojanovic, D. M. Markovic and N. Nedovic, Digital System Clocking: High Performance and Low-Power Aspects, Wiley-IEEE, 2005. M. Pedram and J. M. Rabaey, Power Aware Design Methodologies, Boston: Springer, 2002. C. Piguet, Low-Power Electronics Design, Boca Raton: Florida: CRC Press, 2005. J. M. Rabaey and M. Pedram, Low Power Design Methodologies, Boston: Springer, 1996. S. Roudy, P. K. Wright and J. M. Rabaey, Energy Scavenging for Wireless Sensor Networks, Boston: Springer, 2003. K. Roy and S. C. Prasad, Low-Power CMOS VLSI Circuit Design, New York: Wiley- Interscience, 2000. E. Sánchez-Sinencio and A. G. Andreaou, Low-Voltage/Low-Power Integrated Circuits and Systems – Low-Voltage Mixed-Signal Circuits, New York: IEEE Press, 1999. W. A. Serdijn, Low-Voltage Low-Power Analog Integrated Circuits, Boston:Springer, 1995. S. Sheng and R. W. Brodersen, Low-Power Wireless Communications: A Wideband CDMA System Design, Boston: Springer, 1998. G. Verghese and J. M. Rabaey, Low-Energy FPGAs, Boston: springer, 2001. G. K. Yeap, Practical Low Power Digital VLSI Design, Boston:Springer, 1998. K.-S. Yeo and K. Roy, Low-Voltage Low-Power Subsystems, McGraw Hill, 2004.

August 9, 2006Agrawal: VDAT'06 Tutorial II118 Other Books Useful in Low-Power Design A. Chandrakasan, W. J. Bowhill and F. Fox, Design of High- Performance Microprocessor Circuits, New York: IEEE Press, 2001. N. H. E. Weste and D. Harris, CMOS VLSI Design, Third Edition, Reading, Massachusetts, Addison-Wesley, 2005. S. M. Kang and Y. Leblebici, CMOS Digital Integrated Circuits, New York: McGraw-Hill, 1996. E. Larsson, Introduction to Advanced System-on-Chip Test Design and Optimization, Springer, 2005. J. M. Rabaey, A. Chandrakasan and B. Nikolić, Digital Integrated Circuits, Second Edition, Upper Saddle River, New Jersey: Prentice-Hall, 2003. J. Segura and C. F. Hawkins, CMOS Electronics, How It Works, How It Fails, New York: IEEE Press, 2004.

August 9, 2006Agrawal: VDAT'06 Tutorial II1 Low-Power Electronics and Systems Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and.

Similar presentations

Presentation on theme: "August 9, 2006Agrawal: VDAT'06 Tutorial II1 Low-Power Electronics and Systems Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

August 9, 2006Agrawal: VDAT'06 Tutorial II1 Low-Power Electronics and Systems Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and.

Similar presentations

Presentation on theme: "August 9, 2006Agrawal: VDAT'06 Tutorial II1 Low-Power Electronics and Systems Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and."— Presentation transcript:

Similar presentations

About project

Feedback