Presentation is loading. Please wait.

Presentation is loading. Please wait.

Av. Antônio Carlos 6627, CEP: , Belo Horizonte (MG), Brazil

Similar presentations

Presentation on theme: "Av. Antônio Carlos 6627, CEP: , Belo Horizonte (MG), Brazil"— Presentation transcript:

1 Av. Antônio Carlos 6627, CEP: 31270-010, Belo Horizonte (MG), Brazil
VLSI Design Power Frank Sill Torres Department of Electronic Engineering, Federal University of Minas Gerais, Av. Antônio Carlos 6627, CEP: , Belo Horizonte (MG), Brazil

2 trends

3 Trend: Performance Source: Moore, ISSCC 2003

4 Trends – Power Dissipation
SoC Consumer Portable Power Trend [Source: ITRS, 2010 Update]

5 Trends - Power Density Nuclear Reactor → ←Hot Plate

6 Problems of High Power Dissipation
Continuously increasing performance demands Increasing power dissipation of technical devices Today: power dissipation is a main problem High Power dissipation leads to: Reduced time of operation Higher weight (batteries) Reduced mobility High efforts for cooling Increasing operational costs Reduced reliability

7 Chip Power Density Distribution
Power Map On-Die Temperature Power density is not uniformly distributed across the chip Silicon is not a good heat conductor Max junction temperature is determined by hot-spots Impact on packaging, cooling

8 „The Internet is an Electricity Hog“
Badische Zeitung, 2003 Energy for the internet in 2001 in Germany: 6.8 Bill. kWh = 1.4 % of total energy consumption 2.35 Bn. kWh for 17.3 Mill. Internet-PCs 1.91 Bn. for servers 1.67 Bn. for the network 0.87 Bn. for USV Rate of growth (at the moment): 36 % per year Prognosis: Bn. kWh > 6 % total energy consumption > 3 medium nuclear power plants World: 400 Mill. PCs  0.16 PW (P = Peta=1015)

9 Dissipation in a Notebook
Peripherals Disk Display Processing ASICs programmable µPs or DSPs Memory Power supply Communication Battery DC-DC converter WLAN Ethernet

10 Examples for Energy Dissipation
Energy dissipation in a notebook Energy dissipation a PDA

11 Intel beats Varta Battery Capacity Capacity of batteries
Generalized Moore‘s Law Intel beats Varta Capacity of batteries 2% - 6% Increase per year (up to year 2000) Source: Timmernann, 2007

12 Current Progresses Batter. 20 kg
Factor 4 in the last 10 years  still much too less

13 power consumption in cmos

14 Metrics: Energy and Power
Measured in Joules or kWh “Measure of the ability of a system to do work or produce a change” “No activity is possible without energy.” Power Measured in Watts or kW “Amount of energy required for a given unit of time.” Average power Average amount of energy consumed per unit time Simplified to "power" in clear contexts Instantaneous power Energy consumed if time unit goes to zero

15 Metrics: Energy and Power cont’d
Instantaneous Electrical Power P(t) P(t) = v(t) * i(t) v(t): Potential difference (or voltage drop) across component i(t): Current through component Electrical Energy E = P(t) * t = v(t) * i(t) * t Electrical Energy in CMOS circuits Energy = Power * Delay Why?

16 Consumption in CMOS Voltage (Volt, V) Water pressure (bar) Current (Ampere, A) Water quantity per second (liter/s) Energy Amount of Water CL 1 Energy consumption is proportional to capacitive load!

17 Consumption in CMOS cont’d
Voltage (Volt, V) Water pressure (bar) Current (Ampere, A) Water quantity per second (liter/s) Energy Amount of Water CL 1 Energy for calculation only consumed at 0→1 at output

18 Energy and Instantaneous Power
INV1: High instantaneous Power (bigger width) CL Same Energy (Cin ingnored) INV1 is faster INV2: Low instantaneous power CL td1 td2

19 Metrics: Energy and Power cont’d
Power is height of curve Watts Approach 1 Approach 2 time Energy is area under curve Watts Approach 1 Approach 2 time Energy = Power * time for calculation = Power * Delay

20 Metrics: Energy and Power cont’d
Energy dissipation Determines battery life in hours Sets packaging limits Peak power Determines power ground wiring designs Impacts signal noise margin and reliability analysis

21 Metrics: PDP and EDP Power-Delay Product Energy-Delay Product
Power P, delay tp Quality criterion PDP = P * tp [J] P and tp have some weight Two designs can have same PDP, even if tp = 1 year Energy-Delay Product EDP = PDP * tp = P * tp2 Delay tp has higher weight

22 Energy and Power Average Power direct proportional to Energy
 In Following: Power means average power

23 Where Does Power Go in CMOS?
Dynamic Power Consumption Charging and discharging capacitors Short Circuit Currents Short circuit path between supply rails during switching Leakage Leaking diodes and transistors

24 Dynamic Power Consumption
VDD Vin Vout CL f01= α * f Pdyn = CL * VDD2 * P01 * f P01 : probability for 0-to-1 switch of output f : clock frequency α : activity Data dependent - a function of switching activity!

25 Dynamic Power Consumption

26 Transition Probabilities for CMOS Cells
Example: Static 2 Input NOR Cell If A and B with same input signal probability: Truth table of NOR2 cell PA=1 = 1/2 PB=1 = 1/2 A B Out 1 Then: POut=0 = 3/4 POut=1 = 1/4 P0→1 = POut=0 * POut=1 = 3/4 * 1/4 = 3/16 Ceff = P0→1 * CL = 3/16 * CL

27 Transition Probabilities cont’d
A and B with different input signal probability: PA and PB : Probability that input is 1 P : Probability that output is 1 Switching activity in CMOS circuits: P01 = P0 * P1 For 2-Input NOR: P1 = (1-PA)(1-PB) Thus: P01 = (1-P1)*P1 = [1-(1-PA)(1-PB)]*[(1-PA)][1-PB] (see next slide) P01 = Pout=0 * Pout=1 NOR (1 - (1 - PA)(1 - PB)) * (1 - PA)(1 - PB) OR (1 - PA)(1 - PB) * (1 - (1 - PA)(1 - PB)) NAND PAPB * (1 - PAPB) AND (1 - PAPB) * PAPB XOR (1 - (PA + PB- 2PAPB)) * (PA + PB- 2PAPB)

28 Transition Probabilities cont’d
Transition Probability of NOR2 Cell as a Function of Input Probabilities Probability of input signals → high influence on P01 Source: Timmernann, 2007

29 Short Circuit Power Consumption
VDD Vin Isc Vout CL tsc GND Finite slope of input signal During switching: NMOS and PMOS transistors are conducting for short period of time (tsc) Direct current path between VDD and GND Psc = VDD * Isc * (P01 + P10 )

30 Leakage Power Consumption
SiO2 Source Drain Gate Igate Isub L VDD Isub Igate CL Most important Leakage currents: Subthreshold Leakage Isub Gate Oxide Leakage Igate Pleak = Ileak * VDD ≈ (Isub + Igate)* VDD GND

31 Power Equations in CMOS
P = α f CL VDD2 + VDD Ipeak (P01 + P10 ) + VDD Ileak Dynamic power (≈ % today and decreasing relatively) Short-circuit power (≈ 10 % today and decreasing absolutely) Leakage power (≈ 20 – 50 % today and increasing) f0->1 represents the energy consuming transition

32 Leakage

33 Technology Generation
Trends Technology Generation 90 nm 65 nm 45 nm 32 nm 50 nm Manufacturing Development Research 35 nm 30 nm 20 nm 10 nm 5 nm Nanowire SiGe S/D Strained Silicon SiGe S/D Strained Silicon Si Substrate Metal Gate High-k Tri-Gate S G D III-V Carbon Nanotube FET

34 Trends cont‘d Power Dissipation by Leakage currents
Dynamic Power Dissipation Und wenn wir uns jetzt die Vorhersagen für den Energieverbrauch in aktuellen und zukünftigen Technologien ansehen, so können wir erkennen, dass zum einen der dynamische Energieverbrauch mit jeder Technologieverkleinerung weiterhin stark ansteigt. Gleichzeitig … Source: S. Borkar (Intel), ‘05

35 Recap: Transistor Geometrics
Gate-width W polysilicon gate SiO2 gate oxide (good insulator, eox = 3.9 tox – thickness of oxide layer tox Gate length L n+ n+ p-type body Source: Rabaey,“Digital Integrated Circuits”,1995

36 Subthreshold Leakage Threshold Voltage Subthreshold leakage Isub Isub
Transistor characteristic If: „Gate-Source“-Voltage Vgs higher than Vth Channel under Gate Current between Drain and Source If: Vgs lower than Vth (ideal) No current Subthreshold leakage Isub Leakage between Drain and Source when Vgs < Vth Based on: Short Channels Diffusion Thermionic Emission Source Drain Gate Isub Im Folgenden möchte ich Ihnen die beiden wichtigsten Komponenten der Leckströme vorstellen. Beginnen möchte ich mit dem „subthreshold leakage“, der frei übersetzt auch als Unterschwellspannungsstrom bezeichnet werden kann. Dieser ist abhängig von der sogenannten Schwellspannung eines Transistors. Und diese möchte ich kurz erläutern

37 Subthreshold Leakage cont’d
Short-channel device Transistor is conducting Log (Drain current) Isub NMOS-Transistor Vth’ Vth Gate voltage Source: Agarwal, 2007

38 Drain Induced Barrier Lowering (DIBL)
Vgs < Vth Vgs > Vth Height of curve = Potential barrier Changed by gate voltage Electrons have to overcome potential barrier to enter the channel Ideal: Potential barrier is only controlled by gate voltage

39 Drain Induced Barrier Lowering cont’d
Short-channel transistor (L < 180 nm) Long-channel transistor (L > 2 µm) Lowering of potential barrier At short channel transistors potential barrier is also affected by drain voltage  If Vds = VDD Transistors can start to conduct even if Vgs < Vth

40 Temperature dependence
Source: Chatterjee, Intel-labs IOFF at 1100C Isub at 250C 70nm16x 130nm6x Based on Thermionic Emission: subthreshold leakage Isub increases with temperature

41 Gate Oxide Leakage Tunneling effect Gate oxide leakage Igate
Electromagnetic wave strike at barrier: Reflection + Intrusion into barrier If thickness is small enough: Wave interfuse barrier partially: (Electrons tunnel through Barrier) Gate oxide leakage Igate In Nanometer-Transistors, where Tox< 2 nm Electrons tunnel through gate oxide Leakage current Igate Als Nächstes möchte ich mich auf „gate oxide leakage“ konzentrieren. Dieser basiert auf dem sogenannten Tunneleffekt, den ich kurz erläutern möchte

42 Gate Oxide Thickness at 45 nm

43 Gate Oxide Leakage cont’d
Components of Gate Oxide Leakage: Tunneling currents through overlap regions (gate-drain Igso, gate-source Igdo) Tunneling currents into channel (gate-drain Igis, gate-source Igcd) Tunneling currents between gate and bulk (Igb)

44 Further Leakage Components
Reverse bias pn junction conduction Ipn Gate induced drain leakage IGIDL Drain source punchthrough IPT Hot carrier injection IHCI IHCI IGIDL Ipn Ipt

45 Leakage Dependencies Leakage depends on: Gate Width (Isub, Igate)
Gate Length (Isub, Igate) Gate Oxide Thickness (Igate) Threshold Voltage (Isub) Temperature (Isub) Input state (Igate)

46 Low power techniques

47 Lowering Dynamic Power
Reducing VDD has a quadratic effect! Has a negative effect on performance especially as VDD approaches 2VT Lowering CL Improves performance as well Keep transistors minimum size Reducing the switching activity, f01 = P01 * f A function of signal statistics and clock rate Impacted by logic and architecture design decisions

48 Power & Delay Dependence of Vth
w.o. gate leakage Source: Sakurai, ‘01 Micro transductors ‘08, Low Leakage

49 Transistor Sizing for Power Minimization
Lower Capacitance Higher Voltage Small W’s To keep performance Large W’s Higher Capacitance Lower Voltage Larger sized devices: only useful only when interconnects dominate Minimum sized devices: usually optimal for low-power Source: Timmernann, 2007

50 Logic Style and Power Consumption
Voltage increases: Power-delay product improves Best logic style minimizes power-delay for a given delay constraint New Logic style can reduced Power dissipation (if possible / available !) Source: Jan M. Rabaey

51 Logic Restructuring Logic restructuring: changing the topology of a logic network to reduce transitions AND: P01 = P0 * P1 = (1 - PAPB) * PAPB 3/16 0.5 A Y 0.5 (1-0.25)*0.25 = 3/16 A B W 7/64 = 0.109 0.5 15/256 X B F 15/256 0.5 0.5 C C F 0.5 D D Z 0.5 0.5 Look at designing for speed – 8-input AND gate. Which implementation is lower energy? Which is lower delay? So which is better overall? Also look at slide speed.19, Design Technique 3 – when deciding which configuration consumes less power and has the best performance 3/16 = 0.188 Chain implementation has a lower overall switching activity than tree implementation for random inputs BUT: Ignores glitching effects Source: Jan M. Rabaey

52 Input Ordering (1-0.2x0.1)*(0.2x0.1)=0.0196 (1-0.5x0.2)*(0.5x0.2)=0.09 0.2 0.5 B A X X C B F F 0.1 A 0.2 C 0.5 0.1 AND: P01 = (1 - PAPB) * PAPB For lecture Activity at output node, F, equal in both cases Beneficial: postponing introduction of signals with a high transition rate (signals with signal probability close to 0.5) Source: Jan M. Rabaey

53 Glitching A X B Z C ABC 101 000 Unit Delay X Z Source: Jan M. Rabaey

54 Example 1: Chain of NAND Cells
VDD / 2 Source: Jan M. Rabaey

55 Example 2: Adder Circuit
VDD / 2 Source: Jan M. Rabaey

56 How to Cope with Glitching?
F 1 3 2 F 1 1 F 2 2 F 3 Equalize Lengths of Timing Paths Through Design Source: Jan M. Rabaey

57 Clock Gating Power is reduced by two mechanisms
Clock net toggles less frequently, reducing feff Registers’ internal clock buffering switches less often clk qn q d dout din en FSM enF Source: Jan M. Rabaey Execution Unit enE din clk qn q d dout Memory Control enM en clk clk Local Gating Global Gating

58 Clock Gating Insertion
Local clock gating: 3 methods Logic synthesizer finds and implements local gating opportunities RTL code explicitly specifies clock gating Clock gating cell explicitly instantiated in RTL Global clock gating: 2 methods Source: Jan M. Rabaey

59 Clock Gating VHDL Code Conventional RTL Code //always clock the register if rising_edge (clk) then // form the flip-flop if (enable = ‘1’)then q <= din; end if; end if; Low Power Clock Gated RTL Code //only clock the register when enable is true gclk <= enable and clk; // gate the clock if rising_edge (gclk) then // form the flip-flop q <= din; end if; Instantiated Clock Gating Cell //instantiate a clock gating cell from the target library I1: clkgx1 port map(en=>enable, cp=>clk, gclk_out=>gclk); if rising_edge (gclk) then // form the flip-flop q <= din; end if; Source: Jan M. Rabaey

60 Clock Gating: Example 10 15 5 20 25 30.6mW 8.5mW
15 5 30.6mW 20 25 Without clock gating With clock gating Power [mW] DSP/ HIF DEU MIF VDE 896Kb SRAM 90% of FlipFlops clock-gated 70% power reduction by clock-gating MPEG4 decoder Source: M. Ohashi, Matsushita, 2002

61 Data Gating Objective Example Low Power Version
Reduce wasted operations => reduce feff Example Multiplier whose inputs change every cycle, whose output conditionally feeds an ALU Low Power Version Inputs are prevented from rippling through multiplier if multiplier output is not selected X X Source: Jan M. Rabaey

62 Data Gating Insertion Two insertion methods Issues
Logic synthesizer finds and implements data gating opportunities RTL code explicitly specifies data gating Some opportunities cannot be found by synthesizers Issues Extra logic in data path slows timing Additional area due to gating cells Source: Jan M. Rabaey

63 Data Gating VHDL Code: Operand Isolation
Conventional Code assign muxout = sel ? A : A*B ; // build mux Low Power Code assign multinA = sel & A ; // build and cell assign multinB = sel & B ; // build and cell assign muxout = sel ? A : multinA*multinB ; X sel B A muxout X sel B A muxout Source: Jan M. Rabaey

64 Influence of Threshold Voltage Vth
Influence on sub-threshold leakage Isub Influence on delay of logic cells Isub Delay Für die Logikgatter, welche bekanntlich aus den Transistoren aufgebaut sind, folgt, dass die Schwellspannung zum einen den Energieverbrauch durch Leckströme beeinflusst aber auch Auswirkungen auf die Verzögerungszeit hat.

65 Influence of Gate Oxide Thickness Tox
Influence on gate oxide leakage Igate Influence on delay Delay Igate Und genau wie bei der Schwellspannung, dass die Dicke des Gateoxids der Transistoren, Auswirkungen auf die Verzögerungszeit und den Leckstrom der Logikgatter hat

66 Recap: Data Paths Data propagate through different data paths between registers (flipflops - FF) Paths mostly differ in propagation delay times Frequency of clock signal (CLK) depends on path with longest delay  critical path Paths Path

67 Recap: Slack A B Y C G1 ready with evaluation all inputs of G2
arrived G1 ready with evaluation delay of G1 A B Y all inputs of G2 arrived C time Slack for G1

68 Dual-Vth / Dual-Tox Two different cell types: “LVT / LTO”- Cells
Cells consist of „low-Vth“- or „low-Tox“-transistors Low threshold voltage or thin gate oxide layer For critical paths High leakage / short delay “HVT / HTO”- Cells Cells consist of „high-Vth“- „high-Tox“-transistors High threshold voltage or thick gate oxide layer For uncritical paths Low leakage / long delay Leakage reduction at constant performance (no level converter necessary)

69 Performance at different Dual-Vth
Measured at NAND2 BPTM 65nm Technology

70 Leakage Isub at different Dual-Vth
Measured at NAND2 BPTM 65nm Technology

71 Dual-Vth / Dual-Tox Example
HVT- and/or HTO - Cells LVT- and/or LTO Critical Path

72 Stack Effect Transistor stack: at least two transistor from same type (NMOS or PMOS) in a row Based on behavior of internal nodes:  The more transistors are non-conducting (off) the lower the leakage Source: K. Roy

73 Sleep Transistors Idea: Insertion of additional transistors between logic block and supply lines This transistors: connect with SLEEP-signal If circuit has nothing to do: SLEEP signal is active: Stack effect (additional off transistor in row to other) If sleep transistors are High-Vth: approach also called Multi-Threshold CMOS (MTCMOS) Mostly insertion only of 1 Transistor Virtual Vdd sleep Vdd Low-Vth logic cells sleep Virtual Vss Vss Source: Kaijian Shi, Synopsys

74 Sleep Transistors: Realization
Ring style sleep transistor implementation Global VDD VDD VVDD1 domain VVDD2 domain Sleep transistors are placed around each VVDD island Source: Kaijian Shi, Synopsys

75 Sleep Transistors: Realization cont’d
Grid style sleep transistor implementation Global VDD VDD VVDD2 VVDD1 VVDD2 VVDD1 VVDD2 VVDD1 VDD network cross chip; VVDD networks in each gating domain Sleep transistors are placed in grid connecting VDD and VVDDs Source: Kaijian Shi, Synopsys

76 Sleep Transistors: Problems
Current I is not a leakage current! I is a discharging current of load capacitances Sleep transistor can be modeled as resistor R In active mode (cell is working) Current I through sleep transistor Voltage Vx drop over resistor Output voltage reduced to VDD-Vx Reduced Delay (of following blocks)

77 Stackforcing Simple method of using stack effect
 Increasing stack by splitting transistors  Cin stays constant  Only one technology is needed  Area is (almost) the same  Drive strength (drain-source current) is reduced  delay goes down

78 Stackforcing cont’d Normalized delay No Stackforcing Normalized Isub
Source: Narendra, et al., ISLPED01

79 Input Vector Control (IVC)
Leakage of cell depends on input vector

80 Input Vector Control cont’d
Every circuits is input vector with minimum leakage Idea: If design is in passive mode SLEEP signal gets active Sleep vector is applied

81 Pin Reordering Gate leakage in stack depends on input vector
BPTM, 65 nm technology Gate leakage in stack depends on input vector Same logic input vector (amounts of ‘0’ and ‘1’ is equal) → can result in different leakage If input probability is known  reorder pins so that highest probable state has minimum gate leakage

82 Delay and Power versus VDD
Pdyn td Dynamic Power (and leakage) can be traded by delay

83 Adaptive Dynamic Voltage/Frequency Scaling (DVS/DFS)
Slow down processor to fill idle time More Delay  lower operational voltage Runtime Scheduler determines processor speed and selects appropriate voltage Transitions delay for frequencies <150s Potential to realize 10x energy savings E.g.: Intel SpeedStep, AMD PowerNow, Transmeta Longrun Active Idle Active Idle 3.3 V Active 2.4 V

84 DVS/DFS with Transmeta LongRun
Source: Transmeta

85 Multi-VDD Objective Example
Reduce dynamic power by reducing the VDD2 term Higher supply voltage used for speed-critical logic Lower supply voltage used for non speed-critical logic Example Memory VDD = 1.2 V Logic VDD = 1.0 V Logic dynamic power savings = 30% Source: Jan M. Rabaey

86 Multi-VDD Issues Partitioning
Which blocks and modules should use with voltages? Physical and logical hierarchies should match as much as possible Voltages Voltages should be as low as possible to minimize CVDD2f Voltages must be high enough to meet timing specs Level shifters Needed (generally) to buffer signals crossing islands Added delays must be considered Physical design Multiple VDD rails must be considered during floorplanning Timing verification Timing verification must be performed for all corner cases across voltage islands. Source: Jan M. Rabaey

87 Determine which blocks run at which Vdd Multi-voltage placement
Multi-VDD Flow Determine which blocks run at which Vdd Multi-voltage synthesis Determine floor plan Multi-voltage placement Clock tree synthesis Route Verify timing Source: Jan M. Rabaey

88 Power-orientated Programming
2000 4000 6000 8000 10000 12000 14000 bubble.c heap.c quick.c Switched Capacitance (nF) Others Functional Unit Pipeline Registers Register File  Algorithms can differ in power dissipation Source: Irwin, 2000

Download ppt "Av. Antônio Carlos 6627, CEP: , Belo Horizonte (MG), Brazil"

Similar presentations

Ads by Google