CSE477 L26 System Power.1Irwin&Vijay, PSU, 2002 Low Power Design in Microarchitectures and Memories [Adapted from Mary Jane Irwin ( www.cse.psu.edu/~mji.

Slides:



Advertisements
Similar presentations
Reducing Leakage Power in Peripheral Circuits of L2 Caches Houman Homayoun and Alex Veidenbaum Dept. of Computer Science, UC Irvine {hhomayou,
Advertisements

Keeping Hot Chips Cool Ruchir Puri, Leon Stok, Subhrajit Bhattacharya IBM T.J. Watson Research Center Yorktown Heights, NY Circuits R-US.
Power Reduction Techniques For Microprocessor Systems
Oscilloscope Watch Teardown. Agenda History and General overview Hardware design: – Block diagram and general overview – Choice of the microcontroller.
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 19: Timing Issues; Introduction to Datapath.
A 16-Bit Kogge Stone PS-CMOS adder with Signal Completion Seng-Oon Toh, Daniel Huang, Jan Rabaey May 9, 2005 EE241 Final Project.
Introduction to CMOS VLSI Design Lecture 18: Design for Low Power David Harris Harvey Mudd College Spring 2004.
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Low Power Design in CMOS [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey.
S. Reda EN160 SP’08 Design and Implementation of VLSI Systems (EN1600) Lecture 14: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
On-Line Adjustable Buffering for Runtime Power Reduction Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 13: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
Designing for Low Power
Lecture 5 – Power Prof. Luke Theogarajan
CS 7810 Lecture 13 Pipeline Gating: Speculation Control For Energy Reduction S. Manne, A. Klauser, D. Grunwald Proceedings of ISCA-25 June 1998.
Lecture 7: Power.
Power-aware Computing n Dramatic increases in computer power consumption: » Some processors now draw more than 100 watts » Memory power consumption is.
Low Power Design of Integrated Systems Assoc. Prof. Dimitrios Soudris
EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
The CMOS Inverter Slides adapted from:
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
CS 423 – Operating Systems Design Lecture 22 – Power Management Klara Nahrstedt and Raoul Rivas Spring 2013 CS Spring 2013.
6.893: Advanced VLSI Computer Architecture, September 28, 2000, Lecture 4, Slide 1. © Krste Asanovic Krste Asanovic
EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.
CSE477 L12&13 Low Power.1Irwin&Vijay, PSU, 2002 TKT-1527 Digital System Design Issues Designing for Low Power Mary Jane Irwin ( )
CSE477 L26 System Power.1Irwin&Vijay, PSU, 2002 TKT-1527 Digital System Design Issues Low Power Techniques in Microarchitectures and Memories Mary Jane.
© Digital Integrated Circuits 2nd Sequential Circuits Digital Integrated Circuits A Design Perspective Designing Sequential Logic Circuits Jan M. Rabaey.
ENGG 6090 Topic Review1 How to reduce the power dissipation? Switching Activity Switched Capacitance Voltage Scaling.
17 Sep 2002Embedded Seminar2 Outline The Big Picture Who’s got the Power? What’s in the bag of tricks?
Low Power Techniques in Processor Design
1 VLSI Design SMD154 LOW-POWER DESIGN Magnus Eriksson & Simon Olsson.
CPE 626 Advanced VLSI Design Lecture 8: Power and Designing for Low Power Aleksandar Milenkovic
Lecture#14. Last Lecture Summary Memory Address, size What memory stores OS, Application programs, Data, Instructions Types of Memory Non Volatile and.
Review: Basic Building Blocks  Datapath l Execution units -Adder, multiplier, divider, shifter, etc. l Register file and pipeline registers l Multiplexers,
1 Overview 1.Motivation (Kevin) 1.5 hrs 2.Thermal issues (Kevin) 3.Power modeling (David) Thermal management (David) hrs 5.Optimal DTM (Lev).5 hrs.
1 Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy.
Basics of Energy & Power Dissipation Lecture notes S. Yalamanchili, S. Mukhopadhyay. A. Chowdhary.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Washington State University
CSE477 L24 RAM Cores.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 24: RAM Cores Mary Jane Irwin ( )
Why Low Power Testing? 台大電子所 李建模.
Guy Lemieux, Mehdi Alimadadi, Samad Sheikhaei, Shahriar Mirabbasi University of British Columbia, Canada Patrick Palmer University of Cambridge, UK SoC.
1 VLSI/SOC Design Methodologies and Challenges Dr. Chia-Jiu Wang University of Colorado at Colorado Springs Department of Electrical and Computer Engineering.
Why Power Matters Packaging costs Power supply rail design
CSE477 L23 Memories.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 23: Semiconductor Memories Mary Jane Irwin (
Thermal-aware Issues in Computers IMPACT Lab. Part A Overview of Thermal-related Technologies.
경종민 Low-Power Design for Embedded Processor.
CSE477 L12&13 Low Power.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 12&13: Designing for Low Power Mary Jane Irwin (
Basics of Energy & Power Dissipation
Lev Finkelstein ISCA/Thermal Workshop 6/ Overview 1.Motivation (Kevin) 2.Thermal issues (Kevin) 3.Power modeling (David) 4.Thermal management (David)
By Nasir Mahmood.  The NoC solution brings a networking method to on-chip communication.
Sp09 CMPEN 411 L14 S.1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 14: Designing for Low Power [Adapted from Rabaey’s Digital Integrated Circuits,
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 6.1 EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng.
11/15/05ELEC / Lecture 191 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
CS203 – Advanced Computer Architecture
LOW POWER DESIGN METHODS
CSE477 L26 System Power.1Irwin&Vijay, PSU, 2003 CSE477 VLSI Digital Circuits Fall 2003 Lecture 26: Low Power Techniques in Microarchitectures and Memories.
PipeliningPipelining Computer Architecture (Fall 2006)
Overview Motivation (Kevin) Thermal issues (Kevin)
CS203 – Advanced Computer Architecture
Temperature and Power Management
LOW POWER DESIGN METHODS V.ANANDI ASST.PROF,E&C MSRIT,BANGALORE.
SECTIONS 1-7 By Astha Chawla
Mary Jane Irwin ( ) CSE477 VLSI Digital Circuits Fall 2002 Lecture 26: Low Power Techniques in Microarchitectures.
Architecture & Organization 1
Architecture & Organization 1
Lecture 7: Power.
Lecture 7: Power.
Presentation transcript:

CSE477 L26 System Power.1Irwin&Vijay, PSU, 2002 Low Power Design in Microarchitectures and Memories [Adapted from Mary Jane Irwin ( )

CSE477 L26 System Power.2Irwin&Vijay, PSU, 2002 Review: Energy & Power Equations E = C L V DD 2 P 0  1 + t sc V DD I peak P 0  1 + V DD I leakage P = C L V DD 2 f 0  1 + t sc V DD I peak f 0  1 + V DD I leakage Dynamic power (~90% today and decreasing relatively) Short-circuit power (~8% today and decreasing absolutely) Leakage power (~2% today and increasing) f 0  1 = P 0  1 * f clock

CSE477 L26 System Power.3Irwin&Vijay, PSU, 2002 Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency EnergyDesign TimeNon-active ModulesRun Time Active Logic Design Reduced V dd Sizing Multi-V dd Clock Gating DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage+ Multi-V T Sleep Transistors Multi-V dd Variable V T + Variable V T

CSE477 L26 System Power.4Irwin&Vijay, PSU, 2002 Bus Multiplexing  Share long data buses with time multiplexing (S 1 uses even cycles, S 2 odd) S2S2 S1S1 D1D1 D2D2 S1S1 S2S2 D2D2 D1D1  Buses are a significant source of power dissipation due to high switching activities and large capacitive loading l 15% of total power in Alpha l 30% of total power in Intel  But what if data samples are correlated (e.g., sign bits)?

CSE477 L26 System Power.5Irwin&Vijay, PSU, 2002 Correlated Data Streams Bit position MSB LSB Bit switching probabilities  For a shared (multiplexed) bus advantages of data correlation are lost (bus carries samples from two uncorrelated data streams) l Bus sharing should not be used for positively correlated data streams l Bus sharing may prove advantageous in a negatively correlated data stream (where successive samples switch sign bits) - more random switching

CSE477 L26 System Power.6Irwin&Vijay, PSU, 2002 Glitch Reduction by Pipelining  Glitches depend on the logic depth of the circuit - gates deeper in the logic network are more prone to glitching l arrival times of the gate inputs are more spread due to delay imbalances l usually affected more by primary input switching  Reduce logic depth by adding pipeline registers l additional energy used by the clock and pipeline registers PC FetchDecodeExecuteMemoryWriteBack Instruction MAR MDR I$D$ clk pipeline stage isolation register

CSE477 L26 System Power.7Irwin&Vijay, PSU, 2002 Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency EnergyDesign TimeNon-active ModulesRun Time Active Logic Design Reduced V dd Sizing Multi-V dd Clock Gating DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage+ Multi-V T Sleep Transistors Multi-V dd Variable V T + Variable V T

CSE477 L26 System Power.8Irwin&Vijay, PSU, 2002 Clock Gating  Gate off clock to idle functional units l e.g., floating point units l need logic to generate disable signal -increases complexity of control logic -consumes power -timing critical to avoid clock glitches at OR gate output l additional gate delay on clock signal -gating OR gate can replace a buffer in the clock distribution tree  Most popular method for power reduction of clock signals and functional units RegReg clock disable Functional unit

CSE477 L26 System Power.9Irwin&Vijay, PSU, 2002 Clock Gating in a Pipelined Datapath  For idle units (e.g., floating point units in Exec stage, WB stage for instructions with no write back operation) PC FetchDecodeExecuteMemoryWriteBack Instruction MAR MDR I$D$ clk No FPNo WB

CSE477 L26 System Power.10Irwin&Vijay, PSU, 2002 Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency EnergyDesign TimeNon-active ModulesRun Time Active Logic Design Reduced V dd Sizing Multi-V dd Clock Gating DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage+ Multi-V T Sleep Transistors Multi-V dd Variable V T + Variable V T

CSE477 L26 System Power.11Irwin&Vijay, PSU, 2002 Dynamic Frequency and Voltage Scaling  Intel’s SpeedStep l Hardware that steps down the clock frequency (dynamic frequency scaling – DFS) when the user unplugs from AC power -PLL from 650MHz  500MHz l CPU stalls during SpeedStep adjustment  Transmeta LongRun l Hardware that applies both DFS and DVS (dynamic supply voltage scaling) -32 levels of V DD from 1.1V to 1.6V -PLL from 200MHz  700MHz in increments of 33MHz l Triggered when CPU load change is detected by software -heavier load  ramp up V DD, when stable speed up clock -lighter load  slow down clock, when PLL locks onto new rate, ramp down V DD l CPU stalls only during PLL relock (< 20 microsec)

CSE477 L26 System Power.12Irwin&Vijay, PSU, 2002 Dynamic Thermal Management (DTM) Initiation Mechanism: How do we enable technique? Response Mechanism: What technique do we enable? Trigger Mechanism: When do we enable DTM techniques?

CSE477 L26 System Power.13Irwin&Vijay, PSU, 2002 DTM Trigger Mechanisms  Mechanism: How to deduce temperature?  Direct approach: on-chip temperature sensors l Based on differential voltage change across 2 diodes of different sizes l May require >1 sensor l Hysteresis and delay are problems  Policy: When to begin responding? l Trigger level set too high means higher packaging costs l Trigger level set too low means frequent triggering and loss in performance  Choose trigger level to exploit difference between average and worst case power

CSE477 L26 System Power.14Irwin&Vijay, PSU, 2002 DTM Initiation and Response Mechanisms  Operating system or micro architectural control? l Hardware support can reduce performance penalty by 20-30%  Initiation of policy incurs some delay l When using DVS and/or DFS, much of the performance penalty can be attributed to enabling/disabling overhead l Increasing policy delay reduces overhead; smarter initiation techniques would help as well  Thermal window (100Kcycles+) l Larger thermal windows “smooth” short thermal spikes

CSE477 L26 System Power.15Irwin&Vijay, PSU, 2002 DTM Activation and Deactivation Cycle Trigger Reached  Response Delay – Invocation time (e.g., adjust clock) Response Delay  Policy Delay – Number of cycles engaged Policy Delay Check Temp Check Temp Turn Response On Initiation Delay  Initiation Delay – OS interrupt/handler Shutoff Delay Turn Response Off  Shutoff Delay – Disabling time (e.g., re-adjust clock)

CSE477 L26 System Power.16Irwin&Vijay, PSU, 2002 DTM Savings Benefits Time Temperature DTM Disabled DTM/Response Engaged Designed for cooling capacity without DTM DTM trigger level Designed for cooling capacity with DTM System Cost Savings

CSE477 L26 System Power.17Irwin&Vijay, PSU, 2002 Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency EnergyDesign TimeNon-active ModulesRun Time Active Logic Design Reduced V dd Sizing Multi-V dd Clock Gating DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage+ Multi-V T Sleep Transistors Multi-V dd Variable V T + Variable V T

CSE477 L26 System Power.18Irwin&Vijay, PSU, 2002 Speculated Power of a 15mm  P

CSE477 L26 System Power.19Irwin&Vijay, PSU, 2002 Review: Variable V T (ABB) at Run Time  V T = V T0 +  (  |-2  F + V SB | -  |-2  F |) where V T0 is the threshold voltage at V SB = 0 V SB is the source-bulk (substrate) voltage  is the body-effect coefficient V SB (V) V T (V)  For an n-channel device, the substrate is normally tied to ground  A negative bias causes V T to increase from 0.45V to 0.85V  Adjusting the substrate bias at run time is called adaptive body-biasing (ABB)