Presentation on theme: "Managing Static (Leakage) Power S. Kaxiras, M Martonosi, “Computer Architecture Techniques for Power Effecience”, Chapter 5."— Presentation transcript:
Managing Static (Leakage) Power S. Kaxiras, M Martonosi, “Computer Architecture Techniques for Power Effecience”, Chapter 5.
Static Power Remember: Has increased to a significant % of total power consumption. Seen in older technologies, but CMOS prevents open paths from V dd to Gnd Increases exponentially with V T Currently, 20-40% of power consumption is leakage Leakage Current P leak = V dd x I leak I leak : predominantly Sub-threshold leakage, Gate-oxide leakage
Sub-threshold Leakage Transistors are not perfectly digital Current still flows when voltage is below threshold (“off”). Increases with V dd,V T scaling
Sub-threshold Leakage - Current Exponentially dependent on V ds, V gs, V T, T V ds – voltage differential between drain and source transistor stacking and drowsy (or DVS) aim to reduce V gs – Can be set to 0 for sub-threshold leakage V T – threshold voltage, can be scaled to reduce leakage (trades off with performance) T – Temperature, can lead to thermal runaway
Gate-oxide leakage Caused by “tunneling”, electrons escape through the insulator Thicker layer of high-k material can be used Reduces tunneling Doesn’t compromise performance *from Wikipedia
Leakage Reduction Techniques Stacking effect and gated V dd Dynamically resized cache (DRI), cache decay, adaptive mode control (AMC) and functional unit decay. Drowsy effect Drowsy caches, hybrid approaches, temperature-adaptive approaches, and compiler approaches. Threshold Voltage manipulation Dynamic – DVFS and Adaptive Body Scaling Static – MTCMOS functional units, Asymmetric Memory Cells
Stacking Non-state preserving Significant leakage reduction, but a power-up latency (10s of cycle) Two transistors that are “off” cause less leakage than one Gated V dd is common implementation
Dynamically Resized Cache (DRI) Resize instruction cache to fit current working set of code The rest is turned off using gated V dd Direct-mapped DRI cache shown to right
Cache Decay Turns off cache lines after a number of cycles if the cache hasn’t been accessed Essential to accurately predict when a cache line is no longer useful L1 Cache lines are generational When a line is loaded, several access occur immediately after Followed by a long dead period
Cache Decay - Implementation Counter on cache-line, advances every few hundred cycles Decay interval defined globally First miss resets counter and restores power
Adaptive Cache Decay Adaptive Mode Control Small decay intervals can cause more decay-induced misses Large intervals can cause cache lines to stay active in their dead-time Mechanisms to adjust decay interval dynamically Per-cache-line adaptive decay Global decay interval adaptive techniques Control Theoretic Techniques
Drowsy Effect – Data Caches State-preserving Medium leakage reduction, small (<10) power up latency Instead of cutting power off, voltage is scaled to V ddLow Must be switched back to V dd before access “Simple” policy, MRO, TMRO
Drowsy Effect- Instruction Caches Simple policy does not work due to locality, delay in fetch affects performance Implementation moved to cache-bank level Only bank that is actively accessed is kept awake Improved by Next Sub-Bank predication based on code behavior Other behavior techniques: Program hotspots Code sequentiality
Mixed State-Preserving and Non-State Preserving Drowsy addresses the disadvantage of Gated V dd But does not save as much leakage Performance varies between two methods: For fast L2, L1 cache decay performs better In L2: Non-state-preserving policies do not perform as well as drowsy policies Decay-induced miss penalty is too high Hybrid scheme: put cache line in drowsy for a time before turning it off completely.
Reliability and Temperature Higher temperature benefit more from decay Low temperatures: minimizing dynamic energy penalty is most important; this argues in favor of the drowsy mode. Soft-error reliability Occurs when a particle strike causes a bit to flip Cache decay improves reliability by turning off a large portion for the data In-Cache Replication