Presentation is loading. Please wait.

Presentation is loading. Please wait.

Low-Power Design Techniques in Digital Systems Prof. Vojin G. Oklobdzija University of California November 19, 2003.

Similar presentations


Presentation on theme: "Low-Power Design Techniques in Digital Systems Prof. Vojin G. Oklobdzija University of California November 19, 2003."— Presentation transcript:

1 Low-Power Design Techniques in Digital Systems Prof. Vojin G. Oklobdzija University of California November 19, 2003

2 2 Outline of the Talk Power trends in VLSI Scaling theory and predictions Research efforts in power reduction Efficiency measures and design guidelines Latches and Flip-Flops for Low-Power –Dual-Edge FFs –SOI Conclusion: Low-Power perspective

3 3 Power trends in VLSI

4 4 “CMOS Circuits dissipate little power by nature. So believed circuit designers” (Kuroda-Sakurai, 95) “By the year 2000 power dissipation of high-end ICs will exceed the practical limits of ceramic packages, even if the supply voltage can be feasibly reduced.” (* Taken from Sakurai’s ISSCC 2001 presentation) Power (W) x4 / 3years

5 5 Gloom and Doom predictions Source: Shekhar Borkar, Intel

6 6

7 7 High-end growing at 25% / year Consumer (low-end) At 13% / year 15% / yr 12% / yr Power versus Year: taken from ISSCC, uP Report, Hot-Chips

8 8 Year Voltage [V] Power per chip [W] VDD current [A] VDD, Power and Current Trend Current Power Voltage International Technology Roadmap for Semiconductors 1999 update sponsored by the Semiconductor Industry Association in cooperation with European Electronic Component Association (EECA), Electronic Industries Association of Japan (EIAJ), Korea Semiconductor Industry Association (KSIA), and Taiwan Semiconductor Industry Association (TSIA) (* Taken from Sakurai’s ISSCC 2001 presentation)

9 9 Power Delivery Problem (not just California) Source: Shekhar Borkar, Intel Your car starter !

10 10 Trend in L di/dt di/dt is roughly proportional to I * f, where I is the chip’s current and f is the clock frequency P * f / Vdd or I * Vdd * f / Vdd = P * f / Vdd, where P is the chip’s power. The trend is: P f Vdd slightly decreases on-chip L package L slightly decreases Therefore, L di/dt fluctuation increases significantly. (* Taken from Norman Chang, HP )

11 11 Energy-Delay product is improving more than 2x / generation Saving Grace !

12 12 X86 efficiency improving dramatically 4X / generation average improving 3X / generation High-End processors efficiency not improving

13 13 Scaling theory and predictions

14 14 The power dissipation has increased 1000 times over the 15 years and is exceeding 70 Watts Scaling principles: 1. A “constant field scaling” theory [Dennard] assumes that device voltages as well as device dimensions are scaled by a scaling factor x (>1), resulting in a constant electric field in a device:  power density remains constant  circuit performance can be improved in terms of:  density x 2  speed x  power 1/ x 2  power-delay product 1/ x 3 Limitless progress in CMOS is promised with this scaling scenario

15 15 In practice neither a supply voltage nor a threshold voltage had been scaled till 1990 leading to the theory of: “Constant voltage scaling” which assumes the constant voltage This assumption yields: speed improvement by x 2 power density increases rapidly by x 3

16 16 The constant field is not realistic, x 0.5 is satisfactory - however even with that the power dissipation would exceed ECL by 2001: a new philosophy is required ! (* Taken from Sakurai and Kuroda, IEICE 95 paper)

17 17 High-Performance View Point on Power *taken from Ron Preston, DEC Alpha P=k C V 2 f : Shrinking to the new technology (30% reduction in ) –C decreases by 30% –f increases by 1/0.7 = 43% –P new =0.7 (1/0.7) P old = P old (No Change in Power ! ) New design: –Double the No. of devices –P new =2 x 0.7 (1/0.7) P old = 2 X P old (Power Doubles !) Scale V dd by 30% in the new design: –P new =2 x 0.7 (1/0.7) (0.7) 2 P old = P old (Power stays constant !)

18 18 High-Performance View Point on Power *taken from Ron Preston, DEC Alpha Reality: Paradigm Changes: More Aggressive Circuits, Toggle rate increasing, Out of Order, Speculative Execution What to Expect: Power will be limited by the package and cooling techniques Frequency will be determined by the power - as high as package can take ! Chip V dd Freq.Power u3.3V300MHz50W u2.0V600MHz72W Change-30%-39%+100%+44%

19 19 Research Efforts in Low-Power Design P sw = k C L V 2 cc f CLK Reduce Switching Activity: Conditional clock Conditional precharge Switching-off inactive blocks Conditional execution Run it slower: Use parallelism Less pipeline stages Use double-edge flip-flop Technology scaling: The highest win Thresholds should scale Leakage starts to byte Dynamic voltage scaling Reduce the active load: Minimize the circuits Use more efficient design Charge recycling More efficient layout

20 20 Reducing the Power Dissipation The power dissipation can be minimized by reducing: supply voltage load capacitance switching activity –Reducing the supply voltage brings a quadratic improvement –Reducing the load capacitance contributes to the improvement of both power dissipation and circuit speed.

21 21 Voltage Scaling There are three means to maintain the throughput: Reduce V th to improve circuit speed Introduce parallel and pipelined architecture while using slower device speeds (assumes limitless no. of transistors, in reality the transistor density is only increasing by 60% per year) Prepare multiple supply voltages and for each cluster of circuits choose the lowest supply voltage that satisfies the speed. (A good level converter is necessary which exhibits small delay and consumes little power, small area)

22 22

23 23 Is there an optimal design point ?

24 24 Power Dissipation and Circuit Delay Power : P= p t f CLK C L V DD +I 0 10 V DD 2 V th S (  =1.3) k C L V DD (V -V th )  Delay = kQ I = x th (V) V DD (V) Power (W) A B (* Taken from T. Sakurai)

25 25 Power-Delay Product, Energy-Delay Product Lowest Voltage – Highest Threshold – no optimum Power-Delay Product is a misleading measure; it will always favor a processor that operates at lower frequency Energy-Delay is more adequate - but Energy-Delay 2 should be used (*from Sakurai, Kuroda, IEICE 95 paper)

26 26 Power-Delay Product, Energy-Delay Product Horowitz, Indermaur, Gonzales argue against Power-Delay, SLPE’94

27 27 Energy-Delay**2 (*courtesy of Prof. T. Sakurai)

28 28 Energy-Delay Product vs. Energy-Delay**2 Nowka, Hofstee, Carpenter of IBM argue against Energy-Delay as a design efficiency measure (private communication)

29 29 Energy-Delay Product vs. Energy-Delay**2 Nowka, Hofstee, Carpenter of IBM argue against Energy-Delay as a design efficiency measure (private communication) The same design should have relatively the same efficiency Optimal point: (due to to V th being fixed ?)

30 30 Feature Diff. Frequency MHz (100) same CMOS Process.5u 5-metal.5u 4-metal ~same Cache Total32KB Cache16K+16K Cache 64K~same Load/Store UnitNoYes Dual Integer UnitNoYes Register RenamingNoYes Peak Issue2 + Br4 Insts ~double Transistors2.8 Million3.6 Million6.9 Million+30% /+146% SPECint (169) +50% /+61% SPECfp (225) +30% /+80% Power4W13W30W (22.5W) +225%/+463% Spec/Watt26.5/ / /10-115%/ -252% PF=Watt/Freq**34.0E-613.0E-612.8E-6 (PF/Trans)*E IPC PE*IPC**3 (*E6) PE=Watt/Spec**3 3.46E-63.17E-62.63E-6 Example: PowerPC

31 31 FeatureDigital MIPS 10000PowerPC 620 HP 8000Sun Ultra-Sparc Freq500 MHz200 MHz 180 MHz250 MHz Pipeline Stages Issue Rate44444 Out-of-Order Exec.6 lds321656none Register Renam. (int/FP)none/832/328/856none Transistors/ Logic transistors 9.3M/ 1.8M 5.9M/ 2.3M 6.9M/ 2.2M 3.9M*/ 3.9M 3.8M/ 2.0M SPEC95 (Intg/FlPt) 12.6/ /17.29/910.8/ /15 Power25W30W 40W20W SpecInt/ Watt /Energy*Delay Watt/Freq**30.2E-63.75E E-61.28E-6 (PF/Trans)*E (PF/LTrans)*E Watt/Spec**3 12.5E-342.5E-341.5E-331.7E-332.5E-3

32 32 Sensitivity to V th fluctuation ± 0.15V V DD =1.0 V ± 0.05V ΔV TH = 0.5 (* Taken from T. Sakurai)

33 33 Use of Different Circuits Families

34 34 Capacitance Reduction The load capacitance is the sum of: gate capacitance diffusion capacitance routing capacitance Using small number of transistors, or small size of transistors contributes to the reduction in the gate capacitance and the diffusion capacitance. Pass transistor logic may have advantage because it comprises fewer transistors and exhibits smaller stray capacitance than conventional static CMOS logic.

35 35 Pass-Transistor Logic

36 36 Pass-Transistor Logic: CVSL, CPL, SRPL, DSL, DPL, DCVSPG

37 37 SAPL: Sense-Amplifying Pass-transistor Logic All nodes are first discharged and then evaluated by inputs. Outputs are 100mV above GND

38 38 Where does the power go ?

39 39 Power use is different from chip to chip: MPU1 is a low end microprocessor MPU2 is a high-end CPU with large cache ASSP1 is MPEG-2 decoder ASSP2 is an ATM switch (*from Sakurai, Kuroda, IEICE 95 paper)

40 40 Design Example: Strong Arm 110 Two power modes: idle and sleep Power: 0.5W using 1.1V internal PS: W using 2V internal PS: MHz Power Breakdown: I-Cache27% D-Cache16% I-Unit18% Exec-Unit8% I-MMU9% D-MMU8% Clock10% Others4% (PLL < 1%) *from D. Dobberpuhl

41 41 Design Example: Strong Arm 110 *from D. Dobberpuhl

42 42 Design Example: Strong Arm 110 However, leakage currents starts to affect stand-by power *from D. Dobberpuhl

43 43 Controlling both: V DD and V TH for low power

44 44 Controlling V DD and V TH for low power Low power  Low V DD  Low speed  Low V TH  High leakage  V DD -V TH control *) MTCMOS: Multi-Threshold CMOS *) VTCMOS: Variable Threshold CMOS Multiple : spatial assignment Variable : temporal assignment Software-hardware cooperation Technology-circuit cooperation (* from Prof. T. Sakurai)

45 45 Dual-V TH concept Low-V TH circuit (High leakage) High-V TH circuit (Low leakage) Critical paths Non-critical paths (* from Prof. T. Sakurai)

46 46 (* from Prof. T. Sakurai) Clustered Voltage Scaling for Multiple V DD ’s Lower V DD portion is shown as shaded CVS StructureConventional Design Critical Path Level-Shifting F/F Critical Path FF M.Takahashi et al., “A 60mW MPEG4 Video Codec Using Clustered Voltage Scaling with Variable Supply-Voltage Scheme,” ISSCC, pp.36-37, Feb Once V L is applied to a logic gate, V L is applied to subsequent logic gates until F/F’s to eliminate DC current paths. F/F’s restore V H.

47 47 Energy consumption is proportional to the square of V DD. Energy consumption is proportional to the square of V DD. V DD should be lowered to the minimum level which ensures the real-time operation. V DD should be lowered to the minimum level which ensures the real-time operation. Normalized workload Normalized power Variable Vdd Fixed Vdd If you don’t need to hussle, V DD should be as low as possible (* from Prof. T. Sakurai)

48 48 Measured voltage waveforms 1sync frame 200ms Sleep V DDmax =8% on average V DD V DDmax V DDmin Sleep signal Sleep=6% on average (* from Prof. T. Sakurai)

49 49 Measured power characteristics Total power = 0.8W x W x W x 0.06 = 0.2W VDD hopping can cut down power consumption to 1/4 0.8W Supply voltage: V DD [V] Power: P [W] 012 ƒ=100MHz ƒ=200MHz 0.16W Down to 1/5 Time for sleep: 6% 0.07W Time forV DDmin : 86% Time forV DDmax : 8% (* from Prof. T. Sakurai)

50 50 Simulation results RPC: 2 levels (f,f/2) RPC: 3 levels (f,f/2,f/3) RPC: 4 levels (f,f/2,f/3,f/4) RPC: infinite levels post-simulation analysis RPC: 2 levels (f,f/2) RPC: 3 levels (f,f/2,f/3) RPC: 4 levels (f,f/2,f/3,f/4) RPC: infinite levels post-simulation analysis MPEG-2 video decodingVSELP speech encoding Normalized Power P/P FIX Normalized Power P/P FIX Transition Delay T TD (ms) Transition Delay T TD (ms) (* from Prof. T. Sakurai)

51 51 Aggressive Voltage Scaling If we can dynamically scale Vdd and Vth the advantage is obvious *Taken from Kuroda

52 52 Example

53 53 TransMeta Example *Taken from Doug Laird’s presentation, January 19 th 2000

54 54 TransMeta Example *Taken from Doug Laird’s presentation, January 19 th 2000

55 55 TransMeta Example “Code Morphing” is another contributor to power reduction since it eliminates unnecessary external memory access *Taken from Doug Laird’s presentation, January 19 th 2000

56 56 TransMeta Example

57 57 Latches and Flip-Flops for Low-Power

58 58 Simulation Condition and Testbench Timing  Total FF overhead is setup + clock-to-output time  Circuit optimization towards t d-q  Clock skew robustness obtained from observing DQ curve Power-Delay Product  Overall performance parameter at fixed frequency

59 59 Flip-Flop Performance Comparison Total power consumed –internal power –data power –clock power Measured for four cases –no activity (0000… and 1111…) –maximum activity ( ) –average activity (random sequence) Test bench Delay is (minimum D-Q): Clk-Q + Setup time

60 60 OLD TEST BENCH: Total Power = Drivers Power + Test Unit Power PDP- Optimized = Equal Trade-off on Power and Delay Improper Load on Drivers NEW TEST BENCH: Drivers: Fixed Gain and Driving Test Unit Only Data-to-Output Delay PD 2 P Optimized = Best for Constant-Field Scaling OLD TEST BENCH NEW TEST BENCH

61 61 Comparison in terms of speed and EDP tot Technology: 0.2u, V dd =2V, T=20 o C, 100MHz Delay: below 200ps SDFF 187ps HLFF 199ps K-6 ETL 200ps – ps PowerPC latch 266ps Alpha FF 272ps Strong Arm FF 275ps mC 2 MOS latch 292ps –above 500ps SSTC latch 592ps DSTC latch 629ps SSTC* latch 898ps DSTC* latch 1060ps PDP –below 30fJ PowerPC latch 28fJ – fJ HLFF 29fJ SDFF 39fJ mC 2 MOS latch 40fJ Alpha FF 43fJ Strong Arm FF 45fJ – fJ K-6 ETL 70fJ –above 70fJ SSTC latch 95fJ DSTC latch 125fJ

62 62 Delay comparison F-F design brings the fastest structures

63 63 Delay comparison F-F design brings the fastest structures

64 64 Overall ranking EDP tot accepted as the overall cost function Proposed “low-power” latches from Yuan & Svensson, compared with other presented structures do not show advantage, (the optimization was not properly done - optimization is yet to be repeated under different

65 65 Overall ranking, zoomed Real signals have the activity between 0 and 1.0 (  ) Precharged hybrid structures are the fastest but their power consumption strongly depends on the probability of “ones” More “ones” above the  point

66 66 Overall performance Real signals have the activity between 0 and 1.0 (  ) Precharged hybrid structures are the fastest but their power consumption strongly depends on the probability of “ones” More “ones” above the  point

67 67 Conventional Clk-Q vs. minimum D-Q Hidden positive setup time Degradation of Clk-Q

68 68 Internal Power distribution Four sequences characterize the boundaries for internal power consumption –…010101… maximum –random, equal transition probability, average –…111111… precharge activity –…000000… leakage + internal clock processing

69 69 Comparison of Clock power consumption

70 70 Using Dual-Edge Flip-Flop (run at ½ of the frequency save on the power consumed in clock distribution tree)

71 71 Dual-Edge vs. Single-Edge Flip-Flops Comparison Delay [ps] Total Power [  W] Fujitsu 0.18u process; Clock frequency 500MHz (250MHz for Dual Edge FFs) Data activity ratio  = 0.5 V DD = 1.8V Temp = 25º

72 72 Dual-Edge vs. Single-Edge Flip-Flops Comparison Internal Power [  W]Clock Power [  W] Data Power [  W] Fujitsu 0.18u process; Clock frequency 500MHz (250MHz for Dual Edge FFs) Data activity ratio  = 0.5 V DD = 1.8V Temp = 25º

73 73 Silicon on Insulator (SOI) Technology

74 74 SOI Comparison F= 1GHz,  = 0.5, Le = 0.08  m, V DD =1.3V, T = 25  C

75 75 In conclusion…. What can we expect that low power will bring to us ?

76 76 Wearable Computer

77 77 Wearable Computer

78 78 Wearable Computer

79 79 Digital Ink

80 80 Implantable Computer

81 81 Bluetooth

82 82 Brain Ultra small volume Small number of neuron cells Extremely low power Real time image processing (Artificial) Intelligence 3D flight control Sensor Infrared Humidity CO 2 Mosquito Year 2110 Long lifetime by DNA manipulation Bio-computer Year 2010 Extrapolation of the trend with some saturation Many important interesting application Home, Entertainment, Office, Translation, Health care Year 2020??? More assembly technique: 3D Combination of bio and semiconductor


Download ppt "Low-Power Design Techniques in Digital Systems Prof. Vojin G. Oklobdzija University of California November 19, 2003."

Similar presentations


Ads by Google