Vishwani D. Agrawal James J. Danaher Professor

Slides:



Advertisements
Similar presentations
Computer Abstractions and Technology
Advertisements

9/15/05ELEC / Lecture 71 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
Power Reduction Techniques For Microprocessor Systems
Fall 06, Sep 19, 21 ELEC / Lecture 6 1 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic.
Polynomial-Time Algorithms for Designing Dual-Voltage Energy Efficient Circuits Master’s Thesis Defense Mridula Allani Advisor : Dr. Vishwani D. Agrawal.
10/27/05ELEC / Lecture 161 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
11/01/05ELEC / Lecture 171 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani.
8/19/04ELEC / ELEC / Advanced Topics in Electrical Engineering Designing VLSI for Low-Power and Self-Test Fall 2004 Vishwani.
Priyadharshini Shanmugasundaram Vishwani D. Agrawal DYNAMIC SCAN CLOCK CONTROL FOR TEST TIME REDUCTION MAINTAINING.
9/20/05ELEC / Lecture 81 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
9/13/05ELEC / Lecture 61 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
8/18/05ELEC / Lecture 11 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
Fall 2006, Nov. 28 ELEC / Lecture 11 1 ELEC / (Fall 2006) Low-Power Design of Electronic Circuits Power Analysis: High-Level.
Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 14 1 ELEC 5270/6270 Fall 2007 Low-Power Design of Electronic Circuits Power Aware Microprocessors Vishwani.
2/8/06D&T Seminar1 Multi-Core Parallelism for Low- Power Design Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering.
Fall 06, Sep 14 ELEC / Lecture 5 1 ELEC / (Fall 2006) Low-Power Design of Electronic Circuits (Formerly ELEC / )
Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors Vishwani D. Agrawal.
Architectural Power Management for High Leakage Technologies Department of Electrical and Computer Engineering Auburn University, Auburn, AL /15/2011.
Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 6 1 ELEC 5270/6270 Fall 2007 Low-Power Design of Electronic Circuits Dynamic Power: Device Sizing Vishwani.
Fall 2006: Dec. 5 ELEC / Lecture 13 1 ELEC / (Fall 2006) Low-Power Design of Electronic Circuits Adiabatic Logic Vishwani.
Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 11 1 ELEC 5270/6270 Fall 2007 Low-Power Design of Electronic Circuits Adiabatic Logic Vishwani D. Agrawal.
Spring 07, Feb 15 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Power Dissipation in VLSI Chips Vishwani D. Agrawal.
Low Power Design of Integrated Systems Assoc. Prof. Dimitrios Soudris
6.893: Advanced VLSI Computer Architecture, September 28, 2000, Lecture 4, Slide 1. © Krste Asanovic Krste Asanovic
Power-Aware SoC Test Optimization through Dynamic Voltage and Frequency Scaling Vijay Sheshadri, Vishwani D. Agrawal, Prathima Agrawal Dept. of Electrical.
Copyright Agrawal, 2007ELEC5270/6270 Spring 13, Lecture 81 ELEC 5270/6270 Spring 2013 Low-Power Design of Electronic Circuits Power Aware Microprocessors.
Low Power Techniques in Processor Design
Computer Performance Computer Engineering Department.
Fall 2014, Nov ELEC / Lecture 12 1 ELEC / Computer Architecture and Design Fall 2014 Instruction-Level Parallelism.
An Efficient Algorithm for Dual-Voltage Design Without Need for Level-Conversion SSST 2012 Mridula Allani Intel Corporation, Austin, TX (Formerly.
Jia Yao and Vishwani D. Agrawal Department of Electrical and Computer Engineering Auburn University Auburn, AL 36830, USA Dual-Threshold Design of Sub-Threshold.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
Using Cycle Efficiency as a System Designer Metric to Characterize an Embedded DSP and Compare Hard Core vs. Soft Core Advisor Dr. Vishwani D. Agrawal.
Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.
Spring 2010, Mar 10ELEC 7770: Advanced VLSI Design (Agrawal)1 ELEC 7770 Advanced VLSI Design Spring 2010 Gate Sizing Vishwani D. Agrawal James J. Danaher.
Basics of Energy & Power Dissipation
Copyright Agrawal, 2007ELEC6270 Spring 09, Lecture 71 ELEC 5270/6270 Spring 2009 Low-Power Design of Electronic Circuits Power Analysis: High-Level Vishwani.
Copyright Agrawal, 2007ELEC6270 Spring 13, Lecture 101 ELEC 5270/6270 Spring 2013 Low-Power Design of Electronic Circuits Adiabatic Logic Vishwani D. Agrawal.
11/15/05ELEC / Lecture 191 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
ELEC Digital Logic Circuits Fall 2015 Delay and Power Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering.
CS203 – Advanced Computer Architecture
LOW POWER DESIGN METHODS
ELEC 5270/6270 Spring 2015 Low-Power Design of Electronic Circuits Power Aware Microprocessors Copyright Agrawal, 2007ELEC5270/6270 Spr 15, Lecture 81.
Power-Optimal Pipelining in Deep Submicron Technology
ELEC 5270/6270 Spring 2013 Low-Power Design of Electronic Circuits Pass Transistor Logic: A Low Power Logic Family Vishwani D. Agrawal James J. Danaher.
CS203 – Advanced Computer Architecture
Lecture 2: Performance Today’s topics:
CS161 – Design and Architecture of Computer Systems
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
LOW POWER DESIGN METHODS V.ANANDI ASST.PROF,E&C MSRIT,BANGALORE.
Lynn Choi School of Electrical Engineering
Morgan Kaufmann Publishers
ELEC 5270/6270 Spring 2013 Low-Power Design of Electronic Circuits Pseudo-nMOS, Dynamic CMOS and Domino CMOS Logic Vishwani D. Agrawal James J. Danaher.
Vishwani D. Agrawal James J. Danaher Professor
Lecture 2: Performance Today’s topics: Technology wrap-up
Vishwani D. Agrawal James J. Danaher Professor
ELEC 5270/6270 Spring 2015 Low-Power Design of Electronic Circuits Pseudo-nMOS, Dynamic CMOS and Domino CMOS Logic Vishwani D. Agrawal James J. Danaher.
Reduced Voltage Test Can be Faster!
CSV881: Low-Power Design Multicore Design for Low Power
Vishwani D. Agrawal James J. Danaher Professor
Vishwani D. Agrawal James J. Danaher Professor
ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Pass Transistor Logic: A Low Power Logic Family Vishwani D. Agrawal James J. Danaher.
Circuit Design Techniques for Low Power DSPs
Overheads for Computers as Components 2nd ed.
Vishwani D. Agrawal James J. Danaher Professor
Vishwani D. Agrawal James J. Danaher Professor
ELEC 7770 Advanced VLSI Design Spring 2012 Gate Sizing
ELEC 5270/6270 Spring 2009 Low-Power Design of Electronic Circuits Pseudo-nMOS, Dynamic CMOS and Domino CMOS Logic Vishwani D. Agrawal James J. Danaher.
Presentation transcript:

ELEC 5270/6270 Spring 2009 Low-Power Design of Electronic Circuits Power Aware Microprocessors Vishwani D. Agrawal James J. Danaher Professor Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL 36849 vagrawal@eng.auburn.edu http://www.eng.auburn.edu/~vagrawal/COURSE/E6270_Spr09/course.html Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

SIA Roadmap for Processors (1999) Year 1999 2002 2005 2008 2011 2014 Feature size (nm) 180 130 100 70 50 35 Logic transistors/cm2 6.2M 18M 39M 84M 180M 390M Clock (GHz) 1.25 2.1 3.5 6.0 10.0 16.9 Chip size (mm2) 340 430 520 620 750 900 Power supply (V) 1.8 1.5 1.2 0.9 0.6 0.5 High-perf. Power (W) 90 160 170 175 183 Source: http://www.semichips.org Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

Power Reduction in Processors Just about everything is used. Hardware methods: Voltage reduction for dynamic power Dual-threshold devices for leakage reduction Clock gating, frequency reduction Sleep mode Architecture: Instruction set hardware organization Software methods Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

SPEC CPU2000 Benchmarks Twelve integer and 14 floating point programs, CINT2000 and CFP2000. Each program run time is normalized to obtain a SPEC ratio with respect to the run time of Sun Ultra 5_10 with a 300MHz processor. CINT2000 and CFP2000 summary measurements are the geometric means of SPEC ratios. LINPACK is numerically intensive floating point linear system (Ax = b) program used for benchmarking supercomputers. SPECPOWER_ssj2008 measures power and performance of a computer system. Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

Reference CPU s: Sun Ultra 5_10 300MHz Processor Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

CINT2000: 3.4GHz Pentium 4, HT Technology (D850MD Motherboard) SPECint2000_base = 1341 SPECint2000 = 1389 Source: www.spec.org Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

Two Benchmark Results Baseline: A uniform configuration not optimized for specific program: Same compiler with same settings and flags used for all benchmarks Other restrictions Peak: Run is optimized for obtaining the peak performance for each benchmark program. Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

CFP2000: 3.6GHz Pentium 4, HT Technology (D925XCV/AA-400 Motherboard) SPECfp2000_base = 1627 SPECfp2000 = 1630 Source: www.spec.org Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

CINT2000: 1.7GHz Pentium 4 (D850MD Motherboard) SPECint2000_base = 579 SPECint2000 = 588 Source: www.spec.org Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

CFP2000: 1.7GHz Pentium 4 (D850MD Motherboard) SPECfp2000_base = 648 SPECfp2000 = 659 Source: www.spec.org Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

Energy SPEC Benchmarks Energy efficiency mode: Besides the execution time, energy efficiency of SPEC benchmark programs is also measured. Energy efficiency of a benchmark program is given by: 1/(Execution time) Energy efficiency = ──────────── joules consumed Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

Energy Efficiency Efficiency averaged on n benchmark programs: n Efficiency = ( Π Efficiencyi )1/n i=1 where Efficiencyi is the efficiency for program i. Relative efficiency: Efficiency of a computer Relative efficiency = ───────────────── Eff. of reference computer Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

SPEC2000 Relative Energy Efficiency Always max. clock Laptop adaptive clk. Min. power min. clock Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

Voltage Scaling Dynamic: Reduce voltage and frequency during idle or low activity periods. Static: Clustered voltage scaling Logic on non-critical paths given lower voltage. 47% power reduction with 10% area increase reported. M. Igarashi et al., “Clustered Voltage Scaling Techniques for Low-Power Design,” Proc. IEEE Symp. Low Power Design, 1997. Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

Processor Utilization Throughput = Operations / second Compute-intensive processes Maximum throughput Low throughput (background) processes Throughput System idle Time Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

Examples of Processes Compute-intensive: spreadsheet, spelling check, video decoding, scientific computing. Low throughput: data entry, screen updates, low bandwidth I/O data transfer. Idle: no computation, no expected output. Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

Effects of Voltage Reduction Voltage reduction increases delay, decreases throughput: Slow reduction in throughput at first Rapid reduction in throughput for VDD ≤ Vth Time per operation (TPO) increases Voltage reduction continues to reduce power consumption: Energy per operation (EPO) = Power × TPO Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

Energy per Operation (EPO) 1.0 0.5 0.0 EPO Power TPO 1 2 3 4 5 VDD / Vth Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

Dynamic Voltage and Clock Throughput Time spent in: Battery life Fast mode Slow mode Idle mode Always full speed 10% 0% 90% 1 hr Sometimes full speed 1% 9% 5.3 hrs Rarely full speed 0.1% 99% 0.9% 9.2 hrs T. D. Burd and R. W. Brodersen, Energy Efficient Microprocessors, Springer, 2002, pp. 35-36. Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

Example: Find Minimum Energy Mode Processor data (rated operation): 2 GHz clock 1.5 volt supply voltage 0.5 volt threshold voltage Power consumption 50 watts dynamic power 50 watts static power Maximum clock frequency for V volt supply f α (V – VTH)/V Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

Example Cont. Dynamic power: Pd = CV2f = C(1.5)2×2×109 = 50W C = 11.11 nF, capacitance switching/cycle Pd = 11.11 V2f Dynamic energy per cycle: Ed = Pd/f = 11.11 V2 Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

Example Cont. Clock frequency: f = k (V – VTH)/V = k (1.5 – 0.5)/1.5 = 2 GHz k = 3 GHz, a proportionality constant f = 3(V – 0.5)/V GHz Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

Example Cont. Static power: Ps = k’ V2 = k’ (1.5)2 = 50W k’ = 22.22 mho, total leakage conductance Ps = 22.22 V2 Static energy per cycle: Es = Ps/f = 22.22 V3/[3(V – 0.5)] = 7.41 V3/(V – 0.5) Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

Example Cont. Total energy per cycle: E = Ed + Es = 11.11 V2 + 7.41 V3/(V – 0.5) To minimize E, ∂E/∂V = 0, or 5V2 – 4.6V + 0.75 = 0 Solutions of quadratic equation: V = 0.679 volt, 0.221 volt Discard second solution, which is lower than the threshold voltage of 0.5 volt. Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

Example: Result Rated mode Low energy mode Reduction (%) Voltage 1.5 V 54.7% Clock frequency 2 GHz 791 MHz 60% Dynamic energy/cycle 25.00 nJ 5.12 nJ 79.52% Static energy/cycle 12.96 nJ 48.16% Total energy/cycle 50.0 nJ 18.08 nJ 63.84% Dynamic power 50.0 W 4.05 W 91.90% Static power 10.25 W 79.50% Total power 100.0 W 14.20 W 85.80% Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

Problem of Process Variation in Nanometer Tecchnologies Clock specification Power specification From a presentation: Power Reduction using LongRun2 in Transmeta’s Efficon Processor, by D. Ditzel May 17, 2006 Number of chips Nominal voltage Lower voltage operation Higher voltage operation Yield loss due to high leakage Yield loss due to slow speed Lower Vth Vth Higher Vth Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

Pipeline Gating A pipeline processor uses speculative execution. Incorrect branch prediction results in pipeline stalls and wasted energy. Idea: Stop fetching instructions if a branch hazard is expected: If the count (M) of incorrect predictions exceeds a pre-specified number (N), then suspend fetching instruction for some k cycles. Ref.: S. Manne, A. Klauser and D. Grunwald, “Pipeline Gating: Speculation Control for Energy Reduction,” Proc. 25th Annual International Symp. Computer Architecture, June 1998. Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

Slack Scheduling Application: Superscalar, out-of-order execution: An instruction is executed as soon as the required data and resources become available. A commit unit reorders the results. Delay the completion of instructions whose result is not immediately needed. Example of RISC instructions: add r0, r1, r2; (A) sub r3, r4, r5; (B) and r9, x1, r9; (C) or r5, r9, r10; (D) xor r2, r10, r11; (E) J. Casmira and D. Grunwald, “Dynamic Instruction Scheduling Slack,” Proc. ACM Kool Chips Workshop, Dec. 2000. Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

Slack Scheduling Example Standard scheduling A B C D E Slack scheduling A B C D E Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

Slack Scheduling Re-order buffer Low-power execution units Slack bit Scheduling logic Low-power execution units Slack bit Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

Clock Distribution clock Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

Clock Power Pclk = CLVDD2f + CLVDD2f / λ + CLVDD2f / λ2 + . . . stages – 1 1 = CLVDD2f Σ ─ n = 0 λn where CL = total load capacitance λ = constant fanout at each stage in distribution network Clock consumes about 40% of total processor power. Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

Clock Network Examples Alpha 21064 Alpha 21164 Alpha 21264 Technology 0.75μ CMOS 0.5μ CMOS 0.35μ CMOS Frequency (MHz) 200 300 600 Total capacitance 12.5nF Clock gating used. Total power 80 -110W Clock load 3.25nF 3.75nF Clock power 40% 40% (20W) Max. clock skew 200ps (<10%) 90ps D. W. Bailey and B. J. Benschneider, “Clocking Design and Analysis for a 600-MHz Alpha Microprocessor,” IEEE J. Solid-State Circuits, vol. 33, no. 11, pp. 1627-1633, Nov. 1998. Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

Power Reduction Example Alpha 21064: 200MHz @ 3.45V, power dissipation = 26W Reduce voltage to 1.5V, power (5.3x) = 4.9W Eliminate FP, power (3x) = 1.6W Scale 0.75→0.35μ, power (2x) = 0.8W Reduce clock load, power (1.3x) = 0.6W Reduce frequency 200→160MHz, power (1.25x) = 0.5W J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC Microprocessor,” IEEE J. Solid-State Circuits, vol. 31, no. 11, pp. 1703-1714, Nov. 1996. Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12

For More on Microprocessors T. D. Burd and R. W. Brodersen, Energy Efficient Microprocessor Design, Springer, 2002. R. Graybill and R. Melhem, Power Aware Computing, New York: Plenum Publishers, 2002. Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 12