Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Algorithm to Minimize Leakage through Simultaneous Input Vector Control and Circuit Modification Nikhil Jayakumar Sunil P. Khatri Presented by Ayodeji.

Similar presentations


Presentation on theme: "An Algorithm to Minimize Leakage through Simultaneous Input Vector Control and Circuit Modification Nikhil Jayakumar Sunil P. Khatri Presented by Ayodeji."— Presentation transcript:

1 An Algorithm to Minimize Leakage through Simultaneous Input Vector Control and Circuit Modification Nikhil Jayakumar Sunil P. Khatri Presented by Ayodeji Coker Texas A&M University, College Station, TX, USA

2 Contribution of Leakage Power Leakage is a major contributor to total power consumption. “Standby / Sleep” leakage reduction is crucial for portable electronics. Some popular techniques are:  MTCMOS / sleep transistor  Body biasing  Input Vector Control (IVC)

3 Intuition Behind Input Vector Control Stack Effect : As many series cut-off transistors as possible reduces leakage.  Leakage can be about 2 orders of magnitude lower than maximum. Cannot set all gates to minimum leakage state due to logical interdependencies  NAND3 : min leakage state = 000  NOR3 : min leakage state = 111 InputLeakage (A) 0001.37E-10 0012.70E-10 0102.70E-10 0114.96E-10 1002.62E-10 1012.68E-10 1102.51E-09 1111.01E-08 Leakage of a NAND3 gate

4 Traditional Input Vector Control Find the Minimum Leakage Vector (MLV) at the primary inputs.  NP-hard problem.  Several heuristics to find an optimal MLV. Apply inputs through scan-chain or through MUXes at primary inputs (flip-flop outputs) during standby / sleep. Can we do more?  Why restrict ourselves to only primary inputs?

5 Previous Approaches “Leakage Current Reduction in CMOS VLSI Circuits by Input Vector Control” (TVLSI ‘04) – Abdollahi et.al.  Similar to our approach – use control points and IVC.  Our choice of gate variants allows greater flexibility at control points. “Enhanced Leakage Reduction by Gate Replacement” (DAC ‘05) – Yuan et.al. “A Fast Simultaneous Input Vector Generation and Gate Replacement Algorithm for Leakage Power Reduction” (DAC ’06) – Cheng et.al.  Use gate replacement like we do, but a gate G is replaced by a gate G’ to reduce leakage of gate G not control internal nodes. Previous approaches have an associated delay penalty to get a reasonable leakage reduction.  We get a significant leakage reduction with no expected delay penalty.

6 Our Approach - Overview Modify the circuit such that we control internal nodes of the circuit. Create variants of each gate that replaces the original. Traverse a circuit from inputs to output and replace gates in the circuit  Reduce leakage through stack effect for the gates in the fanout of a gate.  Do not necessarily reduce leakage of the gate being replaced. Perform gate replacement such that leakage is reduced but circuit delay is not increased.

7 Variants of a Gate Regular NAND2

8 sngl1out 0 : Used when output of gate is 1 in standby, but all the fanout gates required an output of 0. Variants of a Gate

9 sngl1out 1 : Used when output of gate is 0 in standby, but all the fanout gates required an output of 1. Variants of a Gate

10 snglmx 0 : Used when output of gate is 1 in standby, but some fanout gates require an output of 0. Variants of a Gate

11 snglmx 1 : Used when output of gate is 0 in standby, but some fanout gates require an output of 1. Variants of a Gate

12 dbl variants : Larger counterparts of the sngl variants (devices sized < 2X)  Adds more flexibility to choices for replacement. Variants of a Gate

13 The Gate Replacement Algorithm Assume inputs of gates at first level can be set independently  Gates at first level can all be set to their minimum leakage state. Pick a gate G from the first level. Let g be its output signal. Find what value all gates in the fanout of G require. Try to replace gate if there is a net savings in leakage and there is no timing violation.

14 Example G H J 0 0 1 First set gate G to lowest leakage state - 00 Next look at fanout of gate G – gate J is in its fanout.  If output of G = 1 (the current value) – best state at J possible is 10. Choose from 10,11  Best state possible for J is 00. Choose from 00,01,10,11.  Leakage improvement possible = (Leakage of J at state 00 – Leakage of J at state 10 – Leakage cost of replacing gate G with a sngl1out 0 variant).

15 First set gate G to lowest leakage state - 00 Next look at fanout of gate G – gate J is in its fanout.  If output of G = 1 (the current value) – best state at J possible is 10. Choose from 10,11  Best state possible for J is 00. Choose from 00,01,10,11.  Leakage improvement possible = (Leakage of J at state 00 – Leakage of J at state 10 – Leakage cost of replacing gate G with a sngl1out 0 variant). G H J 0 0 1 0 Example

16 Next set gate H to its lowest leakage state - 00 Then look at fanout of gate H – gate J is in its fanout.  If output of H = 1 (the current value) – best state at J possible is 01. 01 is only choice.  Best state possible for J is 00 Choose from 00,01.  Leakage improvement possible = (Leakage of J at state 00 – Leakage of J at state 01 – Leakage cost of replacing gate H with a sngl1out 0 variant). G H J 0 0 0 0 1 1 0 0 Example

17 …Replacement Algorithm If both logic 0 and logic 1 are required at some node – then try snglmx variants. If sngl variants cause timing violations – try dbl variants.  Use dbl variants only if leakage improvement is positive. Traverse circuit from inputs to output in levelized order.

18 Experimental Results Cell library characterization done in SPICE.  bsim100 Berkeley Predictive Technology Model (BPTM) cards, 1.2V VDD Algorithm implemented in PERL  Run on 3GHz Pentium 4, 2GB RAM, Fedora Core 3.

19 On average 30% improvement in leakage over applying MLV at primary inputs alone. Existing approaches that use IVC and control points to get a similar leakage improvement have a delay penalty of 10 to 15%. Ckt.Min Lkg Original(nA)New min. Lkg(nA)% Lkg Decr alu21251.721022.4418.32 alu42598.142094.9919.37 apex62743.081753.8236.06 apex7812.72592.8827.05 C13552003.611697.8715.26 C432584.46449.9323.02 C8801375.73977.0728.98 C19081909.951548.1218.94 C35404079.92312623.38 C628813020.112011.397.75 dalu3293.892378.2427.8 des15218.0212013.1621.06 i108738.326318.9827.69 i1158.38102.9635.00 i2372.6698.7273.51 i3323.0560.1381.39 i61907.061650.1613.47 i72499.21973.0821.05 i83805.492321.6338.99 i92552.21440.2643.57 t4812915.542409.6317.35 too_large1034.72796.3423.04 Avg29.18 Experimental Results

20 There is never a delay increase. Delay decreases in some instances  due to use of dbl variants.  sngl1out variants improve delay in one transition. Runtime is low.  Current implementation is in PERL – expected to speed up when implemented in C/C++. Ckt.Original Delay (ps)New Delay (ps) % Delay ImprovementRuntime(s) alu21460.71422.162.645.53 alu41755.991753.090.1721.16 apex6739.94739.93020.03 apex7704.11 02.89 C1355930.41930.230.027.8 C4321110.89 01.03 C8801803.931718.754.726.12 C19081489.951488.610.0910.1 C35401870.951870.630.0251.89 C62885651.085637.020.25695.85 dalu1506.291504.320.1342.75 des3021.522470.3318.24655.38 i102549.682499.431.97238.13 i1353.61353.210.11 i2392.98 00.51 i3182.46 00.98 i61080.1 05.5 i71088.31 010.38 i81591.761297.0118.5238.62 i91651.781618.212.0315.87 t481901.69838.367.0228.21 too_large680.24677.890.354.09 Avg2.5684.68 Experimental Results

21 Ckt. Original Active Area(μ2) New Active Area(μ2) Active Area Ovh (%) Sleep Cut-off transistor Active Area (μ2) Active Area excluding sleep cut-off transistors (μ2) Active Area Ovh excluding sleep cut-off transistors (%) alu278.5296.222.5214.0882.124.58 alu4155.42187.9420.9224.87163.074.92 apex6157.36197.1525.2934.71162.443.23 apex749.0466.3235.2415.0551.274.55 C1355108.2133.7423.622.34111.42.96 C43237.9246.0121.337.2938.722.11 C88083.94107.5628.1420.5287.043.69 C1908104.21134.7429.326.95107.793.44 C3540246.42305.1323.8348.84256.294.01 C6288672.99970.3544.18260.06710.295.54 dalu211.55259.0422.4538.5220.544.25 des812.091054.829.89209.27845.534.12 i10490.08621.426.8109.84511.564.38 i111.913.9917.561.8512.142.02 i250.8453.996.22.8151.180.67 i332.2840.3625.03535.369.54 i6109.22124.2113.7213.49110.721.37 i7147.63170.9615.821.11149.851.5 i8234.59273.0916.4132.37240.722.61 i9151.56179.5318.4524.13155.42.53 t481166.08213.8128.7440.15173.664.56 too_large62.5180.8529.3415.465.454.7 Avg23.853.69 Total Active area overhead on average = 24%.  Real area overhead would be lower after layout, place and route. A lot of the area is used by sleep cut- off transistors.  These can be shared – would reduce area, delay and leakage. Experimental Results

22 dblmx variants did not get used. sngl1out variants used the most. Ckt.#sngl1out#dbl1out#snglmxTotal # replacementsTotal # gates alu291030106374 alu4183266218713 apex6204018213779 apex7940697255 C135591160107582 C4324000 170 C880119012125404 C190815036156548 C35403270583561174 C6288164927016863578 dalu342036360946 des1171017012564169 i1073621127942421 i11200 52 i21700 171 i3460064114 i67500 586 i711100 719 i82660142731102 i916724171735 t481237048261803 too_large8902099304 Avg280.683.9530.45299.86940.86 Experimental Results

23 Conclusion We extended input vector control to control internal nodes – not just primary inputs. 30% leakage decrease with no delay penalty  Leakage decrease is over MLV at primary inputs alone.  Delay improvement in many cases. Active area increase = 24%, but this is mostly sleep cut-off transistor area  Placed and routed area is expected to be much lower. Dynamic power estimated to increase by 1.5% on average.

24 Thank you Contact info of authors: nikhil_AT_ece_DOT_tamu_DOT_edu sunilkhatri_AT_tamu_DOT_edu


Download ppt "An Algorithm to Minimize Leakage through Simultaneous Input Vector Control and Circuit Modification Nikhil Jayakumar Sunil P. Khatri Presented by Ayodeji."

Similar presentations


Ads by Google