Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,

Similar presentations


Presentation on theme: "1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,"— Presentation transcript:

1 1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks, 2002

2 2 Presenters: Kevin Skadron (skadron@cs.virginia.edu)skadron@cs.virginia.edu CS Department, University of Virginia Mircea Stan (mircea@virginia.edu)mircea@virginia.edu ECE Department, University of Virginia David Brooks (dbrooks@eecs.harvard.edu)dbrooks@eecs.harvard.edu CS Department, Harvard University Antonio Gonzalez (antonio@ac.upc.es)antonio@ac.upc.es UPC-Barcelona, and Intel Barcelona Research Center Lev Finkelstein (lev.finkelstein@intel.com)lev.finkelstein@intel.com Intel Haifa

3 © Mircea Stan, Kevin Skadron, David Brooks, 2002 3 Overview 1.Motivation (Kevin) 2.Thermal issues (Kevin) 3.Power modeling (David) 4.Thermal management (David) 5.Optimal DTM (Lev) 6.Clustering (Antonio) 7.Power distribution (David) 8.What current chips do (Lev) 9.HotSpot (Kevin)

4 © Mircea Stan, Kevin Skadron, David Brooks, 2002 4 Overview 1.Motivation (Kevin) 2.Thermal issues (Kevin) 3.Power modeling (David) 4.Thermal management (David) 5.Optimal DTM (Lev) 6.Clustering (Antonio) 7.Power distribution (David) 8.What current chips do (Lev) 9.HotSpot (Kevin)

5 © Mircea Stan, Kevin Skadron, David Brooks, 2002 5 Motivation Power consumption: first-order design constraint  unconstrained power is a theoretical max  peak (  inst.) power is limiting power delivery (dI/dt)  sustained power limits thermal design/packaging  max sustained power: thermal “virus”  same as thermal design power  average active power and idle power limit mobile battery life, etc.  Common fallacy: instantaneous power  temperature Power-density is increasing even faster:  thermal effects become more problematic.  Moore’s Law: exponential increase Need Power/Temperature-aware computing!

6 © Mircea Stan, Kevin Skadron, David Brooks, 2002 6 Power density From PACT 2000 keynote; source: Intel website But this curve is flattening

7 © Mircea Stan, Kevin Skadron, David Brooks, 2002 7 Power-aware figures of merit Power (P): battery time (mobile) packaging (high-performance) Energy (PD): battery life (mobile) fundamental limits (kT) Energy-delay (PD^2): performance and low power Energy-delay^2 (PD^3): emphasis on performance Power-aware  low power Similar to “old” VLSI complexity (A, AD, AD^2) None of these are appropriate for thermal Refs: R. Gonzales et al. “Supply and threshold voltage scaling for low power CMOS”, JSSC, Aug. 1997 A. Martin et al. “Design of an Asynchronous MIPS R3000”, ARVLSI’97 J. Ullman, “Computational aspects of VLSI”, CS Press, 1984

8 © Mircea Stan, Kevin Skadron, David Brooks, 2002 8 Cooking-aware computing  Boiling water will come soon

9 © Mircea Stan, Kevin Skadron, David Brooks, 2002 9 Power and temperature are BAD and can be EVIL Source: Tom’s Hardware Guide http://www6.tomshardware.com/cpu/01q3/010917/heatvideo-01.html

10 © Mircea Stan, Kevin Skadron, David Brooks, 2002 10 Overview 1.Motivation (Kevin) 2.Thermal issues (Kevin) 3.Power modeling (David) 4.Thermal management (David) 5.Optimal DTM (Lev) 6.Clustering (Antonio) 7.Power distribution (David) 8.What current chips do (Lev) 9.HotSpot (Kevin)

11 © Mircea Stan, Kevin Skadron, David Brooks, 2002 11 Thermal issues Temperature affects: Circuit performance Circuit power (leakage) IC reliability IC and system packaging cost Environment

12 © Mircea Stan, Kevin Skadron, David Brooks, 2002 12 Performance and leakage Temperature affects : Transistor threshold and mobility Subthreshold leakage, gate leakage Ion, Ioff, Igate, delay ITRS: 85°C for high-performance, 110°C for embedded! Ion NMOS Ioff

13 © Mircea Stan, Kevin Skadron, David Brooks, 2002 13 Temperature-aware circuits Robustness constraint: sets Ion/Ioff ratio Robustness and reliability: Ion/Igate ratio Idea: keep ratios constant with T: trade leakage for performance! Ref: “Ghoshal et al. “Refrigeration Technologies…”, ISSCC 2000 Garrett et al. “T3…”, ISCAS 2001

14 © Mircea Stan, Kevin Skadron, David Brooks, 2002 14 Resulting performance 25% - 30% extra performance (110 o C to 0 o C) regular TAC

15 © Mircea Stan, Kevin Skadron, David Brooks, 2002 15 Reliability The Arrhenius Equation: MTF=A*exp (E a /K*T) MTF: mean time to failure at T A: empirical constant E a : activation energy K: Boltzmann’s constant T: absolute temperature Failure mechanisms: Die metalization (Corrosion, Electromigration, Contact spiking) Oxide (charge trapping, gate oxide breakdown, hot electrons) Device (ionic contamination, second breakdown, surface-charge) Die attach (fracture, thermal breakdown, adhesion fatigue) Interconnect (wirebond failure, flip-chip joint failure) Package (cracking, whisker and dendritic growth, lid seal failure) Most of the above increase with T (Arrhenius) Notable exception: hot electrons are worse at low temperatures

16 © Mircea Stan, Kevin Skadron, David Brooks, 2002 16 Arrhenius or Erroneous? “Hot” issue in thermal community: is the Arrhenius equation correct/relevant? C. Lasance (Philips): “Erroneous” equation Claim: what really matters are thermal gradients in space and time, thermal cycling Will not solve the dispute here! Agreement: thermal issues are key for reliability, whether static or dynamic Another famous quote: “We have a headache with Arrhenius” (T. Okada, Sony, when asked about reliability prediction methods)

17 © Mircea Stan, Kevin Skadron, David Brooks, 2002 17 Packaging cost From Cray (local power generator and refrigeration)… Source: Gordon Bell, “A Seymour Cray perspective” http://www.research.microsoft.com/users/gbell/craytalk/

18 © Mircea Stan, Kevin Skadron, David Brooks, 2002 18 Packaging cost To today… Grid computing: power plants co-located near compute farms IBM S/390: refrigeration Source: R. R. Schmidt, B. D. Notohardjono “High-end server low temperature cooling” IBM Journal of R&D

19 © Mircea Stan, Kevin Skadron, David Brooks, 2002 19 IBM S/390 refrigeration Complex and expensive Source: R. R. Schmidt, B. D. Notohardjono “High-end server low temperature cooling” IBM Journal of R&D

20 © Mircea Stan, Kevin Skadron, David Brooks, 2002 20 IBM S/390 processor packaging Processor subassembly: complex! C4: Controlled Collapse Chip Connection (flip-chip) Source: R. R. Schmidt, B. D. Notohardjono “High-end server low temperature cooling” IBM Journal of R&D

21 © Mircea Stan, Kevin Skadron, David Brooks, 2002 21 Intel Itanium packaging Complex and expensive (note heatpipe) Source: H. Xie et al. “Packaging the Itanium Microprocessor” Electronic Components and Technology Conference 2002

22 © Mircea Stan, Kevin Skadron, David Brooks, 2002 22 P4 packaging Simpler, but still… Source: Intel web site

23 © Mircea Stan, Kevin Skadron, David Brooks, 2002 23 Environment Environment Protection Agency (EPA): computers consume 10% of commercial electricity consumption –This incl. peripherals, possibly also manufacturing –A DOE report suggested this percentage is much lower –No consensus, but it’s still a lot Equivalent power (with only 30% efficiency) for AC CFCs used for refrigeration Lap burn Fan noise

24 © Mircea Stan, Kevin Skadron, David Brooks, 2002 24 Heat mechanisms Conduction Convection Radiation Phase change Heat storage

25 © Mircea Stan, Kevin Skadron, David Brooks, 2002 25 Conduction Similar to electrical conduction (e.g. metals are good conductors) Heat flow from high energy to low energy Microscopic (vibration, adjacent molecules, electron transport) No major displacement of molecules Need a material: typically in solids (fluids: distance between mol) Typical example: thermal “slug”, spreader, heatsink Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001 A

26 © Mircea Stan, Kevin Skadron, David Brooks, 2002 26 Conduction Different materials (not a strong function of temperature) Si – more variation Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001

27 © Mircea Stan, Kevin Skadron, David Brooks, 2002 27 Convection Macroscopic (bulk transport, mix of hot and cold, energy storage) Need material (typically in fluids, liquid, gas) Natural vs. forced (gas or liquid) Typical example: heatsink (fan), liquid cooling Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001

28 © Mircea Stan, Kevin Skadron, David Brooks, 2002 28 Radiation Electromagnetic waves (can occur in vacuum) Negligible in typical applications Sometimes the only mechanism (e.g. in space) Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001

29 © Mircea Stan, Kevin Skadron, David Brooks, 2002 29 Surface-to-surface contacts Not negligible, heat crowding Thermal greases (can “pump-out”) Phase Change Films (undergo a transition from solid to semi-solid with the application of heat) Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001

30 © Mircea Stan, Kevin Skadron, David Brooks, 2002 30 Phase-change Thermal solutions evolution: Natural air cooling Forced-air cooling Liquid cooling Phase change (e.g. heat pipe) Refrigeration Phase change: a. Solid changing to a liquid—fusion, or melting, b. Liquid changing to a vapor—evaporation, also boiling, c. Vapor changing to a liquid—condensation, e. Liquid changing to a solid—crystallization, or freezing, f. Solid changing to a vapor—sublimation, g. Vapor changing to a solid—deposition.

31 © Mircea Stan, Kevin Skadron, David Brooks, 2002 31 Thermal capacitance Example:  (Aluminum) = 2,710 kg/m 3 C p (Aluminum) = 875 J/(kg-°C) V = t·A = 0.000025 m 3 C bulk = V·C p ·  = 59.28 J/°C

32 © Mircea Stan, Kevin Skadron, David Brooks, 2002 32 Refrigeration “conventional” vs. thermo-electric (TEC) Can get T < T_amb (“negative” Rth!) TEC: Peltier effect (can use for local cooling)

33 © Mircea Stan, Kevin Skadron, David Brooks, 2002 33 TEC electro-thermal model

34 © Mircea Stan, Kevin Skadron, David Brooks, 2002 34 Simplistic steady-state model All thermal transfer: R = k/A Power density matters! Ohm’s law for thermals (steady-state)  V = I · R ->  T = P · R T_hot = P · Rth + T_amb Ways to reduce T_hot: -reduce P (power-aware) -reduce Rth (packaging) -reduce T_amb (Alaska?) -maybe also take advantage of transients (Cth) T_hot T_amb

35 © Mircea Stan, Kevin Skadron, David Brooks, 2002 35 Simplistic dynamic thermal model Electrical-thermal duality V  temp (T) I  power (P) R  thermal resistance (Rth) C  thermal capacitance (Cth) RC  time constant KCL differential eq. I = C · dV/dt + V/R differenceeq.  V = I/C ·  t + V/RC ·  t thermal domain  T = P/C ·  t + T/RC ·  t (T = T_hot – T_amb) One can compute stepwise changes in temperature for any granularity at which one can get P, T, R, C T_hot T_amb

36 © Mircea Stan, Kevin Skadron, David Brooks, 2002 36 Combined package model Source: CRC Press, R. Remsburg Ed. “Thermal Design of Electronic Equipment”, 2001 Steady-state Tj – junction temperature Tc – case temperature Ts – heatsink temperature Ta – ambient temperature

37 © Mircea Stan, Kevin Skadron, David Brooks, 2002 37 Itanium package model Example: processor + 4 cache modules Source: H. Xie et al. “Packaging the Itanium Microprocessor” Electronic Components and Technology Conference 2002

38 © Mircea Stan, Kevin Skadron, David Brooks, 2002 38 Thermal issues summary Performance, power, reliability Architecture-level: conduction only Convection: too complicated Radiation: can be ignored Use compact models for package Power density is key


Download ppt "1 ISCA 2004 Tutorial Thermal Issues for Temperature-Aware Computer Systems Saturday, June 19 th 8:00am - 5:00pm © Mircea Stan, Kevin Skadron, David Brooks,"

Similar presentations


Ads by Google