Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Design Time Circuits Dejan Marković Borivoje Nikolić.

Similar presentations

Presentation on theme: "Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Design Time Circuits Dejan Marković Borivoje Nikolić."— Presentation transcript:

1 Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Design Time Circuits Dejan Marković Borivoje Nikolić

2 Low Power Design Essentials © Chapter Outline  Optimization framework for energy-delay trade-off  Dynamic power optimization –Multiple supply voltages –Transistor sizing –Technology mapping  Static power optimization –Multiple thresholds –Transistor stacking

3 Low Power Design Essentials © Energy/Power Optimization Strategy  For given function and activity, an optimal operation point can be derived in the energy-performance space  Time of optimization depends upon activity profile  Different optimizations apply to active and static power Fixed Activity Variable Activity No Activity - Standby Active Design timeRun timeSleep Static

4 Low Power Design Essentials © Maximize throughput for given energy or Minimize energy for given throughput Delay Unoptimized design E max D max D min Energy/op E min Energy-Delay Optimization and Trade-off Trade-off space Other important metrics: Area, Reliability, Reusability

5 Low Power Design Essentials © The Design Abstraction Stack Logic/RT (Micro-)Architecture Software Circuit Device System/Application This Chapter A very rich set of design parameters to consider! It helps to consider options in relation to their abstraction layer sizing, supply, thresholds logic family, standard cell versus custom Parallel versus pipelined, general purpose versus application specific Bulk versus SOI Choice of algorithm Amount of concurrency

6 Low Power Design Essentials © Architecture Micro-Architecture Circuit (Logic & FFs) Optimization Can/Must Span Multiple Levels Design optimization combines top-down and bottom-up: “meet-in-the-middle”

7 Low Power Design Essentials © topology A Delay Energy/op Globally optimal energy-delay curve for a given function Energy-Delay Optimization topology B topology A topology B Delay Energy/op

8 Low Power Design Essentials © Some Optimization Observations ∂E / ∂A ∂D / ∂A A=A 0 SA=SA= SBSB SASA f (A 0,B) f (A,B 0 ) Delay Energy D0D0 (A 0,B 0 ) Energy-Delay Sensitivities [Ref: V. Stojanovic, ESSCIRC’02 ]

9 Low Power Design Essentials © ∆E = S A ∙(  ∆D) + S B ∙∆D On the optimal curve, all sensitivities must be equal Finding the Optimal Energy-Delay Curve f (A 0,B) f (A,B 0 ) Delay Energy D0D0 (A 0,B 0 ) ∆D f (A 1,B) Pareto-optimal: the best that can be achieved without disadvantaging at least one metric.

10 Low Power Design Essentials ©  Reducing voltages –Lowering the supply voltage (V DD ) at the expense of clock speed –Lowering the logic swing (V swing )  Reducing transistor sizes (C L ) – Slows down logic  Reducing activity (  ) –Reducing switching activity through transformations –Reducing glitching by balancing logic Reducing Active Design Time

11 Low Power Design Essentials ©  Downsizing and/or lowering the supply on the critical path lowers the operating frequency  Downsizing non-critical paths reduces energy for free, but –Narrows down the path delay distribution –Increases impact of variations, impacts robustness t p (path) # of paths target delay t p (path) # of paths target delay Observation

12 Low Power Design Essentials © topology A topology B Delay Energy/op  Reference case –D min V DD max, V TH ref minimize Energy (V DD, V TH, W) subject toDelay (V DD, V TH, W) ≤ D con Constraints V DD min < V DD < V DD max V TH min < V TH < V TH max W min < W Circuit Optimization Framework [Ref: V. Stojanovic, ESSCIRC’02 ]

13 Low Power Design Essentials © i i+1 CwCw CiCi CiCi C i+1 Optimization Framework: Generic Network V DD,i+1 V DD,i Gate in stage i loaded by fanout (stage i+1)

14 Low Power Design Essentials © Fit parameters: V on,  d, K d,  Alpha-power based Delay Model V DD ref = 1.2V, technology 90 nm

15 Low Power Design Essentials ©  Parasitic delay p i – depends upon gate topology  Electrical effort f i ≈ S i+1 /S i  Logical effort g i – depends upon gate topology  Effective fanout h i = f i g i For Complex Gates [Ref: I. Sutherland, Morgan-Kaufman’99] Combined with Logical Effort Formulation

16 Low Power Design Essentials © = energy consumed by logic gate i Dynamic Energy i i+1 CwCw CiCi CiCi C i+1 V DD,i+1 V DD,i

17 Low Power Design Essentials ©  for equal h (D min )  for equal h (D min ) max at V DD (max) (D min ) max at V DD (max) (D min ) Depends on Sensitivity (  E/  D) Optimizating Return on Investment (ROI)  Gate Sizing  Supply Voltage

18 Low Power Design Essentials ©  Properties of inverter chain –Single path topology –Energy increases geometrically from input to output Example: Inverter Chain 1 S 1 = 1 S2S2 … SNSN S3S3  Goal –Find optimal sizing S = [S 1, S 2, …, S N ], supply voltage, and buffering strategy to achieve the best energy-delay tradeoff

19 Low Power Design Essentials ©  Variable taper achieves minimum energy  Reduce number of stages at large d inc [Ref: Ma, JSSC’94] Inverter Chain: Gate Sizing stage effective fanout, h 0% 1% 10% 30% d inc = 50% nom opt

20 Low Power Design Essentials ©  V DD reduces energy of the final load first  Variable taper achieved by voltage scaling Inverter Chain: V DD Optimization stage V DD / V DD nom 0% 1% 10% 30% d inc = 50% nom opt

21 Low Power Design Essentials ©  Parameter with the largest sensitivity has the largest potential for energy reduction  Two discrete supplies mimic per-stage V DD Inverter Chain: Optimization Results 50 inc d (%) energy reduction (%) d inc (%) Sensitivity (norm) cV DD S gV DD 2V DD

22 Low Power Design Essentials ©  Tree adder –Long wires –Re-convergent paths –Multiple active outputs Example: Kogge-Stone Tree Adder [Ref: P. Kogge, Trans. Comp’73]

23 Low Power Design Essentials © sizing: E (-54%) d inc =10% reference D=D min 2V dd : E (-27%) d inc =10% Tree Adder: Sizing vs. Dual-V DD Optimization  Reference design: all paths are critical  Internal energy  S more effective than V DD –S: E(-54%), 2V dd : E(-27%) at d inc = 10%

24 Low Power Design Essentials © Tree Adder: Multi-dimensional Search  Can get pretty close to optimum with only 2 variables  Getting the minimum speed or delay is very expensive Energy / E ref Delay / D min Reference S, V DD V DD, V TH S, V TH S, V DD, V TH

25 Low Power Design Essentials ©  Block-level supply assignment –Higher throughput/lower latency functions are implemented in higher V DD –Slower functions are implemented with lower V DD –This leads to so-called “voltage islands” with separate supply grids –Level conversion performed at block boundaries  Multiple supplies inside a block –Non-critical paths moved to lower supply voltage –Level conversion within the block –Physical design challenging Multiple Supply Voltages

26 Low Power Design Essentials © V 1 = 1.5V, V TH = 0.3V Using Three V DD ’s + V 2 (V) V 3 (V) V 2 (V) V 3 (V) Power Reduction Ratio [Ref: T. Kuroda, ICCAD’02] © IEEE 2002

27 Low Power Design Essentials © VDD Ratio V 1 (V) P Ratio V2/V1V2/V1 P2/P1P2/P1 { V 1, V 2 } V2/V1V2/V1 V3/V1V3/V1 { V 1, V 2, V 3 } V 1 (V) P3/P1P3/P1 V2/V1V2/V1 V3/V1V3/V1 V4/V1V4/V V 1 (V) P4/P1P4/P1 { V 1, V 2, V 3, V 4 } [Ref: M. Hamada, CICC’01] Optimum Number of V DD ’s  The more V DD ’s the less power, but the effect saturates  Power reduction effect decreases with scaling of V DD  Optimum V 2 /V 1 is around 0.7 © IEEE 2001

28 Low Power Design Essentials ©  Two supply voltages per block are optimal  Optimal ratio between the supply voltages is 0.7  Level conversion is performed on the voltage boundary, using a level-converting flip-flop (LCFF)  An option is to use an asynchronous level converter –More sensitive to coupling and supply noise Lessons: Multiple Supply Voltages

29 Low Power Design Essentials © i1o1 V DDH V DDL V SS Conventional V DDH circuitV DDL circuit i2o2 i1o1 V DDH V DDL V SS Shared N-well V DDH circuitV DDL circuit i2o2 Distributing Multiple Supply Voltages

30 Low Power Design Essentials © V DDH circuit V DDH V DDL V SS N-well isolation V DDL circuit (a) Dedicated row (b) Dedicated region V DDH Row V DDH Region V DDL Region Conventional V DDL Row

31 Low Power Design Essentials © V DDH circuit V DDH V DDL V SS Shared N-well V DDL circuit (a) Floor plan image V DDL circuit V DDH circuit Shared N-Well [Shimazaki et al, ISSCC’03]

32 Low Power Design Essentials © Lower V DD portion is shared [Ref: M. Takahashi, ISSCC’98] “Clustered voltage scaling” Example: Multiple Supplies in a Block FF CVS Structure Conventional Design Critical Path Level-Shifting F/F Critical Path FF © IEEE 1998

33 Low Power Design Essentials © Pulsed Half-Latch versus Master-Slave LCFFs  Smaller # of MOSFETs / clock loading  Faster level conversion using half-latch structure  Shorter D-Q path from pulsed circuit [Ref: F. Ishihara, ISLPED’03] Level Converting Flip-Flops (LCFFs) Master-Slave Pulsed Half-Latch © IEEE 2003

34 Low Power Design Essentials ©  Pulsed precharge LCFF (PPR) –Fast level conversion by precharge mechanism –Suppressed charge/discharge toggle by conditional capture –Short D-Q path [Ref: F. Ishihara, ISLPED’03] Dynamic Realization of Pulsed LCFF Pulsed Precharge Latch © IEEE 2003

35 Low Power Design Essentials © carry gen. partial sum gp gen. 5:1 MUX ain bin carry s0/s1 sum sumb (long loop-back bus) clk clock gen. : V DDH circuit : V DDL circuit INV1 INV2 0.5pF sum sel. 2:1 MUX 9:1 MUX logical unit 9:1 MUX ain0 Case Study: ALU for 64-bit  Processor [Ref: Y. Shimazaki, ISSCC’03] © IEEE 2003

36 Low Power Design Essentials © sum keeper pc sumb V DDH V DDL INV1INV2 domino level converter (9:1 MUX) ain0 sel (V DDH ) V DDH V DDL  INV2 is placed near 9:1 MUX to increase noise immunity  Level conversion is done by a domino 9:1 MUX Low-Swing Bus and Level Converter [Ref: Y. Shimazaki, ISSCC’03] © IEEE 2003

37 Low Power Design Essentials © Single-supply Shared well (V DDH =1.8V) Energy [pJ] T CYCLE [ns] Room temperature GHz V DDL =1.4V Energy:-25.3% Delay :+2.8% V DDL =1.2V Energy:-33.3% Delay :+8.3% Measured Results: Energy and Delay [Ref: Y. Shimazaki, ISSCC’03] © IEEE 2003

38 Low Power Design Essentials © Practical Transistor Sizing  Continuous sizing of transistors only an option in custom design  In ASIC design flows, options set by available library  Discrete sizing options made possible in standard-cell design methodology by providing multiple options for the same cell –Leads to larger libraries (> 800 cells) –Easily integrated into technology mapping

39 Low Power Design Essentials © Larger gates reduce capacitance, but are slower Technology Mapping a b c slack=1 d f

40 Low Power Design Essentials ©  (a) Implemented using 4 input NAND + INV  (b) Implemented using 2 input NAND + 2-input NOR Library 1: High-Speed Technology Mapping Example: 4-input AND Gate type Area (cell unit) Input cap. (fF) Average delay (ps) INV C L C L NAND C L C L NAND C L C L NOR C L C L Library 2: Low-Power (delay formula: C L in fF) (numbers calibrated for 90 nm)

41 Low Power Design Essentials © Technology Mapping – Example 4-input AND (a) NAND4 + INV (b) NAND2 + NOR2 Area811 HS: Delay (ps) C L C L LP: Delay (ps) C L C L Sw Energy (fF) C L C L  Area –4-input more compact than 2-input (2 gates vs. 3 gates)  Timing –both implementations are 2-stage realizations –2 nd stage INV (a) is better driver than NOR2 (b) –For more complex blocks, simpler gates will show better performance  Energy –Internal switching increases energy in the 2-input case –Low-power library has worse delay, but lower leakage (see later)

42 Low Power Design Essentials ©  Technology mapping  Gate selection  Sizing  Pin assignment  Logical Optimizations  Factoring  Restructuring  Buffer insertion/deletion  Don’t care optimization Gate-Level Tradeoffs for Power

43 Low Power Design Essentials © Logic restructuring to minimize spurious transitions Buffer insertion for path balancing Logic Restructuring

44 Low Power Design Essentials © Idea: Modify network to reduce capacitance Caveat: This may increase activity! p a = 0.1; p b = 0.5; p c = 0.5 Algebraic Transformations a b c f f a a b c p 1 =0.05 p 2 =0.05 p 3 =0.075 p 4 =0.75 p 5 =0.075

45 Low Power Design Essentials ©  Joint optimization over multiple design parameters possible using sensitivity-based optimization framework –Equal marginal costs ⇔ Energy-efficient design  Peak performance is VERY power inefficient –About 70% energy reduction for 20% delay penalty –Additional variables for higher energy-efficiency  Two supply voltages in general sufficient; 3 or more supply voltages only offer small advantage  Choice between sizing and supply voltage parameters depends upon circuit topology  But … leakage not considered so far Lessons from Circuit Optimization

46 Low Power Design Essentials ©  Considering leakage as well as dynamic power is essential in sub-100 nm technologies  Leakage is not essentially a bad thing –Increased leakage leads to improved performance, allowing for lower supply voltages –Again a trade-off issue … Considering Design Time

47 Low Power Design Essentials © Must adapt to process and activity variations TopologyInvAddDec (E Lk /E Sw ) opt Leakage – Not Necessarily a Bad Thing Optimal designs have high leakage (E Lk /E Sw ≈ 0.5) E static /E dynamic E norm V th ref -180mV 0.81V DD max V th ref -140mV 0.52V DD max Version 1 Version 2 [Ref: D. Markovic, JSSC’04] © IEEE 2004

48 Low Power Design Essentials ©  Switching energy  Leakage energy with: I 0 (  ): normalized leakage current with inputs in state  Refining the Optimization Model

49 Low Power Design Essentials ©  Using longer transistors –Limited benefit –Increase in active current  Using higher thresholds –Channel doping –Stacked devices –Body biasing  Reducing the voltage!! Reducing Design Time

50 Low Power Design Essentials ©  10% longer gates reduce leakage by 50%  Increases switching power by 18% with W/L = const.  Doubling L reduces leakage by 5x  Impacts performance – Attractive when don’t have to increase W (e.g. memory) Longer Channels Transistor length (nm) nm CMOS Switching energy Leakage power Normalized switching energy Normalized leakage power

51 Low Power Design Essentials ©  There is no need for level conversion  Dual thresholds can be added to standard design flows –High-V Th and Low-V Th libraries are a standard in sub-0.18  m processes –For example: can synthesize using only high-V Th and then only in-place swap in low-V Th cells to improve timing. –Second V Th insertion can be combined with resizing  Only two thresholds are needed per block –Using more than two yields small improvements Using Multiple Thresholds

52 Low Power Design Essentials © V DD = 1.5V, V TH.1 = 0.3V Three V TH ’s + V TH.3 (V) V TH.2 (V) Leakage Reduction Ratio V TH.3 (V) V TH.2 (V) Impact of third threshold very limited [Ref: T. Kuroda, ICCAD’02 ] © IEEE 2002

53 Low Power Design Essentials © Using Multiple Thresholds FF  Cell-by-cell V TH assignment (not at block level)  Achieves all-low-V TH performance with substantial leakage reduction in leakage Low V TH High V TH [Ref: S. Date, SLPE’94 ]

54 Low Power Design Essentials © Shaded transistors are low threshold Low-threshold transistors used only in critical paths Dual-V T Domino P1P1 Inv 1 Inv 2 Inv 3 D n+1 Clk n Clk n+1 DnDn …

55 Low Power Design Essentials ©  Easily introduced in standard cell design methodology by extending cell libraries with cells with different thresholds –Selection of cells during technology mapping –No impact on dynamic power –No interface issues (as was the case with multiple V DD ’s)  Impact: Can reduce leakage power substantially Multiple Thresholds and Design Methodology

56 Low Power Design Essentials © High-V TH Only Low-V TH Only Dual V TH Total Slack-53 psec0 psec Dynamic Power 3.2 mW3.3 mW3.2 mW Static Power 914 nW3873 nW1519 nW All designs synthesized automatically using Synopsys Flows [Courtesy: Synopsys, Toshiba, 2004] Dual-V TH Design for High-Performance Design

57 Low Power Design Essentials © Example: High- vs. Low-Threshold Libraries Leakage Power (nW) Selected combinational tests 130 nm CMOS [Courtesy: Synopsys 2004]

58 Low Power Design Essentials © Complex Gates Increase I on /I off Ratio  I on and I off of single NMOS versus stack of 10 NMOS transistors  Transistors in stack are sized up to give similar drive No stack Stack V DD (V) I off (nA) No stack Stack I on (  A) V DD (V) (90nm technology)

59 Low Power Design Essentials © Complex Gates Increase I on /I off Ratio Stacking transistors suppresses submicron effects  Reduced velocity saturation  Reduced DIBL effect  Allows for operation at lower thresholds Stack No stack Factor 10! x 10 5 V DD (V) I on /I off ratio (90nm technology)

60 Low Power Design Essentials ©  Example: 4-input NAND With transistors sized for similar performance: Leakage of Fan-in(2) = Leakage of Fan-in(4) x 3 (Averaged over all possible input patterns) Fan-in (2)Fan-in (4) versus Complex Gates Increase I on /I off Ratio

61 Low Power Design Essentials © Example: 32 bit Kogge-Stone Adder [Ref: S.Narendra, ISLPED’01] % of input vectors Standby leakage current (  A) factor 18 Reducing the threshold by 150 mV increases leakage of single NMOS transistor by factor 60 © Springer 2001

62 Low Power Design Essentials ©  Circuit optimization can lead to substantial energy reduction at limited performance loss  Energy-delay plots the perfect mechanisms for analyzing energy-delay trade-off’s.  Well-defined optimization problem over W, V DD and V TH parameters  Increasingly better support by today’s CAD flows  Observe: leakage is not necessarily bad – if appropriately managed. Summary

63 Low Power Design Essentials © Books:  A. Bellaouar, M.I Elmasry, Low-Power Digital VLSI Design Circuits and Systems, Kluwer Academic Publishers, 1 st Ed,  D. Chinnery, K. Keutzer, Closing the Gap Between ASIC and Custom, Springer,  D. Chinnery, K. Keutzer, Closing the Power Gap Between ASIC and Custom, Springer,  J. Rabaey, A. Chandrakasan, B. Nikolic, Digital Integrated Circuits: A Design Perspective, 2 nd ed, Prentice Hall  I. Sutherland, B. Sproul, D. Harris, Logical Effort: Designing Fast CMOS Circuits, Morgan- Kaufmann, 1 st Ed, Articles:  R.W. Brodersen, M.A. Horowitz, D. Markovic, B. Nikolic, V. Stojanovic, “Methods for True Power Minimization,” Int. Conf. on Computer-Aided Design (ICCAD), pp , Nov  S. Date, N. Shibata, S.Mutoh, and J. Yamada, "IV 30MHz Memory-Macrocell-Circuit Technology with a 0.5urn Multi-Threshold CMOS," Proceedings of the 1994 Symposium on Low Power Electronics, San Diego, CA, pp , Oct  M. Hamada, Y. Ootaguro, T. Kuroda, “Utilizing Surplus Timing for Power Reduction,” IEEE Custom Integrated Circuits Conf., (CICC), pp , Sept  F. Ishihara, F. Sheikh, B. Nikolic, “Level conversion for dual-supply systems,” Int. Conf. Low Power Electronics and Design, (ISLPED), pp , Aug  P.M. Kogge and H.S. Stone, “A Parallel Algorithm for the Efficient Solution of General Class of Recurrence Equations,” IEEE Trans. Comput., vol. C-22, no. 8, pp , Aug  T. Kuroda, “Optimization and control of V DD and V TH for low-power, high-speed CMOS design,” Proceedings ICCAD 2002, pp., San Jose, Nov References

64 Low Power Design Essentials © Articles (cont.):  H.C. Lin and L.W. Linholm, “An Optimized Output Stage for MOS Integrated Circuits,” IEEE J. Solid-State Circuits, vol. SC-10, no. 2, pp , Apr  S. Ma and P. Franzon, “Energy Control and Accurate Delay Estimation in the Design of CMOS Buffers,” IEEE J. Solid-State Circuits, vol. 29, no. 9, pp , Sept  D. Markovic, V. Stojanovic, B. Nikolic, M.A. Horowitz, R.W. Brodersen, “Methods for True Energy- Performance Optimization,” IEEE Journal of Solid-State Circuits, vol. 39, no. 8, pp , Aug  MathWorks,  S. Narendra, S. Borkar, V. De, D. Antoniadis, A. Chandrakasan, “Scaling of stack effect and its applications for leakage reduction,” Int. Conf. Low Power Electronics and Design, (ISLPED), pp , Aug  T. Sakurai and R. Newton, “Alpha-Power Law MOSFET Model and its Applications to CMOS Inverter Delay and Other Formulas,” IEEE J. Solid-State Circuits, vol. 25, no. 2, pp , Apr  Y. Shimazaki, R. Zlatanovici, B. Nikolic, “A shared-well dual-supply-voltage 64-bit ALU,” Int. Conf. Solid-State Circuits, (ISSCC), pp , Feb  V. Stojanovic, D. Markovic, B. Nikolic, M.A. Horowitz, R.W. Brodersen, “Energy-Delay Tradeoffs in Combinational Logic using Gate Sizing and Supply Voltage Optimization,” European Solid- State Circuits Conf., (ESSCIRC), pp , Sept  M. Takahashi et al., “A 60mW MPEG video codec using clustered voltage scaling with variable supply-voltage scheme,” IEEE Int. Solid-State Circuits Conf., (ISSCC), pp , Feb References

Download ppt "Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 4 Optimizing Design Time Circuits Dejan Marković Borivoje Nikolić."

Similar presentations

Ads by Google