Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors Vishwani D. Agrawal James J. Danaher Professor ECE Department, Auburn University Auburn, AL
Spring 07, Feb 22ELEC 7770: Advanced VLSI Design (Agrawal)2 SIA Roadmap for Processors (1999) Year Feature size (nm) Logic transistors/cm 2 6.2M18M39M84M180M390M Clock (GHz) Chip size (mm 2 ) Power supply (V) High-perf. Power (W) Source:
Spring 07, Feb 22ELEC 7770: Advanced VLSI Design (Agrawal)3 Power Reduction in Processors Just about everything is used. Hardware methods: Voltage reduction for dynamic power Dual-threshold devices for leakage reduction Clock gating, frequency reduction Sleep mode Architecture: Instruction set hardware organization Software methods
Spring 07, Feb 22ELEC 7770: Advanced VLSI Design (Agrawal)4 SPEC CPU2000 Benchmarks Twelve integer and 14 floating point programs, CINT2000 and CFP2000. Each program run time is normalized to obtain a SPEC ratio with respect to the run time of Sun Ultra 5_10 with a 300MHz processor. CINT2000 and CFP2000 summary measurements are the geometric means of SPEC ratios.
Spring 07, Feb 22ELEC 7770: Advanced VLSI Design (Agrawal)5 Reference CPU s: Sun Ultra 5_10 300MHz Processor
Spring 07, Feb 22ELEC 7770: Advanced VLSI Design (Agrawal)6 CINT2000: 3.4GHz Pentium 4, HT Technology (D850MD Motherboard) SPECint2000_base = 1341 SPECint2000 = 1389 Source:
Spring 07, Feb 22ELEC 7770: Advanced VLSI Design (Agrawal)7 Two Benchmark Results Baseline: A uniform configuration not optimized for specific program: Same compiler with same settings and flags used for all benchmarks Other restrictions Peak: Run is optimized for obtaining the peak performance for each benchmark program.
Spring 07, Feb 22ELEC 7770: Advanced VLSI Design (Agrawal)8 CFP2000: 3.6GHz Pentium 4, HT Technology (D925XCV/AA-400 Motherboard) SPECfp2000_base = 1627 SPECfp2000 = 1630 Source:
Spring 07, Feb 22ELEC 7770: Advanced VLSI Design (Agrawal)9 CINT2000: 1.7GHz Pentium 4 (D850MD Motherboard) SPECint2000_base = 579 SPECint2000 = 588 Source:
Spring 07, Feb 22ELEC 7770: Advanced VLSI Design (Agrawal)10 CFP2000: 1.7GHz Pentium 4 (D850MD Motherboard) SPECfp2000_base = 648 SPECfp2000 = 659 Source:
Spring 07, Feb 22ELEC 7770: Advanced VLSI Design (Agrawal)11 Energy SPEC Benchmarks Energy efficiency mode: Besides the execution time, energy efficiency of SPEC benchmark programs is also measured. Energy efficiency of a benchmark program is given by: 1/(Execution time) Energy efficiency = ──────────── joules consumed joules consumed
Spring 07, Feb 22ELEC 7770: Advanced VLSI Design (Agrawal)12 Energy Efficiency Efficiency averaged on n benchmark programs: n n Efficiency= ( Π Efficiency i ) 1/n i=1 i=1 where Efficiency i is the efficiency for program i. Relative efficiency: Efficiency of a computer Efficiency of a computer Relative efficiency = ───────────────── Eff. of reference computer
Spring 07, Feb 22ELEC 7770: Advanced VLSI Design (Agrawal)13 SPEC2000 Relative Energy Efficiency Always max. clock Laptop adaptive clk. Min. power min. clock
Spring 07, Feb 22ELEC 7770: Advanced VLSI Design (Agrawal)14 Voltage Scaling Dynamic: Reduce voltage and frequency during idle or low activity periods. Static: Clustered voltage scaling Logic on non-critical paths given lower voltage. 47% power reduction with 10% area increase reported. M. Igarashi et al., “Clustered Voltage Scaling Techniques for Low-Power Design,” Proc. IEEE Symp. Low Power Design, 1997.
Spring 07, Feb 22ELEC 7770: Advanced VLSI Design (Agrawal)15 Pipeline Gating A pipeline processor uses speculative execution. Incorrect branch prediction results in pipeline stalls and wasted energy. Idea: Stop fetching instructions if a branch hazard is expected: If the count (M) of incorrect predictions exceeds a pre- specified number (N), then suspend fetching instruction for some k cycles. Ref.: S. Manne, A. Klauser and D. Grunwald, “Pipeline Gating: Speculation Control for Energy Reduction,” Proc. 25 th Annual International Symp. Computer Architecture, June 1998.
Spring 07, Feb 22ELEC 7770: Advanced VLSI Design (Agrawal)16 Slack Scheduling Application: Superscalar, out-of-order execution: An instruction is executed as soon as data and resources it needs become available. A commit unit reorders the results. Delay the execution of instructions whose result is not immediately needed. Example of RISC instructions: addr0, r1, r2;(A) sub r3, r4, r5;(B) and r9, x1, r9;(C) or r5, r9, r10;(D) xor r2, r10, r11;(E) J. Casmira and D. Grunwald, “Dynamic Instruction Scheduling Slack,” Proc. ACM Kool Chips Workshop, Dec
Spring 07, Feb 22ELEC 7770: Advanced VLSI Design (Agrawal)17 Slack Scheduling Example Slack scheduling A BC D E Standard scheduling ABC D E
Spring 07, Feb 22ELEC 7770: Advanced VLSI Design (Agrawal)18 Slack Scheduling Slack bit Low-power execution units Re-order buffer Scheduling logic
Spring 07, Feb 22ELEC 7770: Advanced VLSI Design (Agrawal)19 Clock Distribution clock
Spring 07, Feb 22ELEC 7770: Advanced VLSI Design (Agrawal)20 Clock Power P clk = C L V DD 2 f + C L V DD 2 f / λ + C L V DD 2 f / λ stages – 1 1 = C L V DD 2 f Σ─ n = 0λ n where C L =total load capacitance λ =constant fanout at each stage in distribution network Clock consumes about 40% of total processor power.
Spring 07, Feb 22ELEC 7770: Advanced VLSI Design (Agrawal)21 Clock Network Examples Alpha Alpha Alpha Technology 0.75μ CMOS 0.5μ CMOS 0.35μ CMOS Frequency (MHz) Total capacitance 12.5nF Clock load 3.25nF3.75nF Clock power 40% 40% (20W) Max. clock skew 200ps (<10%) 90ps D. W. Bailey and B. J. Benschneider, “Clocking Design and Analysis for a 600-MHz Alpha Microprocessor,” IEEE J. Solid-State Circuits, vol. 33, no. 11, pp , Nov
Spring 07, Feb 22ELEC 7770: Advanced VLSI Design (Agrawal)22 Power Reduction Example Alpha 21064: 3.45V, power dissipation = 26W Reduce voltage to 1.5V, power (5.3x) = 4.9W Eliminate FP, power (3x) = 1.6W Scale 0.75→0.35μ, power (2x) = 0.8W Reduce clock load, power (1.3x) = 0.6W Reduce frequency 200→160MHz, power (1.25x) = 0.5W J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC Microprocessor,” IEEE J. Solid-State Circuits, vol. 31, no. 11, pp , Nov
Spring 07, Feb 22ELEC 7770: Advanced VLSI Design (Agrawal)23 Parallel Architecture Processor f f/2 Processor f/2 f Input Output Input Output Capacitance = C Voltage = V Frequency = f Power = CV 2 f Capacitance = 2.2C Voltage = 0.6V Frequency = 0.5f Power = 0.396CV 2 f
Spring 07, Feb 22ELEC 7770: Advanced VLSI Design (Agrawal)24 Pipeline Architecture Processor f Input Output Register ½ Proc. f InputOutput Register ½ Proc. Register Capacitance = C Voltage = V Frequency = f Power = CV 2 f Capacitance = 1.2C Voltage = 0.6V Frequency = f Power = 0.432CV 2 f
Spring 07, Feb 22ELEC 7770: Advanced VLSI Design (Agrawal)25 Approximate Trend n-parallel proc. n-parallel proc. n-stage pipeline proc. n-stage pipeline proc. CapacitancenCC VoltageV/nV/n Frequencyf/nf Power CV 2 f/n 2 Chip area n times n times 10-20% increase G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Kluwer Academic Publishers, 1998.
Spring 07, Feb 22ELEC 7770: Advanced VLSI Design (Agrawal)26 For More on Microprocessors T. D. Burd and R. W. Brodersen, Energy Efficient Microprocessor Design, Springer, R. Graybill and R. Melhem, Power Aware Computing, New York: Plenum Publishers, 2002.