GREEN COMPUTING Power Consumption Basics in ICT Products Maziar Goudarzi
Outline Metrics Energy consumption in ICT products Some common energy optimization techniques Acknowledgements: Some slides/parts from http://www.ida.liu.se/~TDDD50/
Electrical Units
Power Metrics
Performance related energy metrics Energy-per-instruction (EPI) Energy spent to execute an instruction Used to compare micro-architectural traits Sometimes to model software consumption Not all the instructions consume the same Application energy consumption Power vs. Time
Comparing CPU energies Example: Same program, AMD CPU, 2GHz, 150W, 10s Intel CPU, 2.5GHz, 200W, 8s Which one is better? Another (perhaps better) example Same program Atom processor, 1.5GHz, 10W, 20s Core i7 processor, 2GHz, 55W, 5s
Performance related energy metrics Energy delay product (EDP) Encourages low consumption and fast runtime Energy or delay increase → EDP increases EDP = Watts * runtime2 Energy = Watts * runtime Delay = runtime
Outline Metrics Energy consumption in ICT products Some common energy optimization techniques
Power Consumption Fundamentals Most widely used technology today CMOS (complementary Metal Oxide Semiconductor) technology Technology name Minimum feature size: 65nm, 45nm, … Latest technology?
Power Consumption Fundamentals Elements of power consumption Dynamic power Dissipated when charging /discharging capacitors Inevitable! Static power Leakage Total waste! Was negligible until recently Increased with technology scaling (<180nm) 20 to 40% in today processors AMD Opteron X2: 300mm wafer, 117 chips, 90nm technology Opteron X4: 45nm technology
CMOS Leakage Transistor is not a perfect digital switch! Subthreshold leakage Gate leakage -> high-k dielectric Junction leakage
Subthreshold Leakage Subthreshold leakage depends on
Outline Metrics Energy consumption in ICT products Some common energy optimization techniques Static power reduction Dynamic power reduction
Leakage reduction techniques Subthreshold leakage depends on Architectural techniques to reduce leakage Stacking effect and gated Vdd Drowsy effect Threshold voltage manipulation
Stacking effect and gated Vdd Connection of transistors in series source to drain Reduces the Vds of each transistor Popular stacking technique: Gated Vdd Sleep transistor gates the ground (disconnects power supply)
Gated Vdd for SRAM Dynamically Resized Instruction Cache Cache decay Disable individual lines Managed with counters to estimate dead lines Disabled lines lose the state Expensive management Stefanos Kaxiras, Zhigang Hu, Margaret Martonosi, Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power, ISCA, 2001.
Drowsy effect Voltage-scale of idle memory cells Two levels of supply voltage (Vdd and VddLow) Transistors leak much less than with full Vdd No loss of memory state High level policies for drowsy caches No need for complex management mechanisms Reading delay (cell voltage scaled back to Vdd) Worst case are few cycles of delay Examples Simple: whole cache periodically put in drowsy mode Petit et al.: Simple with heuristics, such as avoid setting the Most Recently Used (MRU) line to drowsy mode
Threshold voltage manipulation The lower the VT, the higher the leakage Technology scaling enforces Reduce Vdd to reduce power consumption and temperature Reduce VT to reduce delay Architectural level techniques Combination of high-VT and low-VT devices High-VT : low leakage, long latency Low-VT : high leakage, short latency Gated-Vdd using a high-VT device
Variable Threshold CMOS Body Biasing Body effect to change device Vth Standby leakage reduction with maximum reverse bias Triple well structure http://mtlweb.mit.edu/researchgroups/icsystems/pubs/tutorials/jkao_2002_iccad_I.pdf
Outline Metrics Energy consumption in ICT products Some common energy optimization techniques Static power reduction Dynamic power reduction
Capacitance and switching activity Capacitance and Switching factor intertwined P=C⋅V2⋅A⋅f Capacitance (C) Fixed at design time Dependant on number of transistors Interconnections Switching activity or factor (A) Fraction between 0 and 1 Factor of capacitance charged/discharged each CPU cycle
Capacitance Description of capacitance (Burd and Brodersen) CL=CW + Cfixed CW: Product of technology constant and device width Optimized at circuit level Cfixed: Capacitance of the interconnections Optimized at architectural level Reduction of wire length Effective placement and routing (locality) Break up large memory banks in smaller chunks
Excess switching activity Avoidable charge/discharge activity Types Idle-unit Idle-width Idle-capacity Parallel-speculative Cacheable Speculative
Idle-unit switching activity Triggered by clock activity in unused units
Idle-width switching activity Processor structures wider than needed Example Units with support for 64 bit operands Most common operations use 16 bit operands Solutions Adapt width of machine according to operands Pack multiple narrow-width operations
Width adaptation
Width adaptation
Idle-capacity switching activity Over-provisioned processor resources Resource partitioning or re-sizing Grounds Wire delay increases as technology scale decreases Long wires imply Non affordable delay High capacitance and consumption Buffered wires reduce circuit delay
Complexity-adaptive structures Complexity-adaptive structures (Albonesi) Trade latency & consumption with capacity Structures become faster as they become smaller Solution Partitions with tri-state buffers When structures are reduced Faster processing Less energy consumed Suitable for SRAM
Parallel speculative switching activity Parallel activity is spent for performance Associative caches All but one associative ways fail to produce a hit All ways are accessed in parallel for speed Solution: Smart way access approaches
Phased Cache
Sequential cache
Cache Way Memorization Upon failure
Voltage-Frequency Scaling Basic dynamic power equation: P = C⋅V2⋅A⋅f Voltage reduction decreases power by the square of it Maximum frequency is limited by voltage Potential cubic reduction in power dissipation Considering f and V Performance decreases linearly
Dynamic voltage/frequency scaling (DVFS) Dynamic adjustment of voltage/frequency Tradeoff power dissipation / performance DVFS decision level Hardware level Exploits different timings of hardware components Program level Program behavior drives decision E.g. scale down when program knows that has to wait System level (OS) Idleness of the system drives decision Voltage/frequency scaled to eliminate idle periods
Dynamic voltage/frequency scaling (DVFS) Examples of commercial systems Intel SpeedStep AMD PowerNow! (for laptops) Cool'n'Quiet (for desktop and servers) Decision taken at system level Changes through specific CPU register Enhanced Intel ® SpeedStep ® Technology for the Intel ® Pentium ® M Processor (White Paper) http://download.intel.com/design/network/papers/30117401.pdf
تمرین اضافی روی کامپیوتر شخصی خود DVFS روی پردازنده را اعمال کرده و میزان مصرف توان آن را تحت کاربردهای مختلف اندازه گیری نمایید. میزان مصرف توان پردازنده را جدا از توان مصرفی دیگر اجزا گزارش کنید. چه اثری مشاهده می کنید؟
Coming Next Power Aware Computing Higher-level power reduction techniques