Presentation is loading. Please wait.

Presentation is loading. Please wait.

11/15/05ELEC 5970-001/6970-001 Lecture 191 ELEC 5970-001/6970-001(Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.

Similar presentations


Presentation on theme: "11/15/05ELEC 5970-001/6970-001 Lecture 191 ELEC 5970-001/6970-001(Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits."— Presentation transcript:

1 11/15/05ELEC 5970-001/6970-001 Lecture 191 ELEC 5970-001/6970-001(Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits Power Aware Microprocessors Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering Auburn University http://www.eng.auburn.edu/~vagrawal vagrawal@eng.auburn.edu

2 11/15/05ELEC 5970-001/6970-001 Lecture 192 SIA Roadmap for Processors (1999) Year199920022005200820112014 Feature size (nm)180130100705035 Logic transistors/cm 2 6.2M18M39M84M180M390M Clock (GHz)1.252.13.56.010.016.9 Chip size (mm 2 )340430520620750900 Power supply (V)1.81.51.20.90.60.5 High-perf. Power (W)90130160170175183 Source: http://www.semichips.orghttp://www.semichips.org

3 11/15/05ELEC 5970-001/6970-001 Lecture 193 Power Reduction in Processors Just about everything is used. Hardware methods: Voltage reduction for dynamic power Dual-threshold devices for leakage reduction Clock gating, frequency reduction Sleep mode Architecture: Instruction set hardware organization Software methods

4 11/15/05ELEC 5970-001/6970-001 Lecture 194 SPEC CPU2000 Benchmarks Twelve integer and 14 floating point programs, CINT2000 and CFP2000. Each program run time is normalized to obtain a SPEC ratio with respect to the run time of Sun Ultra 5_10 with a 300MHz processor. CINT2000 and CFP2000 summary measurements are the geometric means of SPEC ratios.

5 11/15/05ELEC 5970-001/6970-001 Lecture 195 Reference CPU s: Sun Ultra 5_10 300MHz Processor

6 11/15/05ELEC 5970-001/6970-001 Lecture 196 CINT2000: 3.4GHz Pentium 4, HT Technology (D850MD Motherboard) SPECint2000_base = 1341 SPECint2000 = 1389 Source: www.spec.orgwww.spec.org

7 11/15/05ELEC 5970-001/6970-001 Lecture 197 Two Benchmark Results Baseline: A uniform configuration not optimized for specific program: Same compiler with same settings and flags used for all benchmarks Other restrictions Peak: Run is optimized for obtaining the peak performance for each benchmark program.

8 11/15/05ELEC 5970-001/6970-001 Lecture 198 CFP2000: 3.6GHz Pentium 4, HT Technology (D925XCV/AA-400 Motherboard) SPECfp2000_base = 1627 SPECfp2000 = 1630 Source: www.spec.orgwww.spec.org

9 11/15/05ELEC 5970-001/6970-001 Lecture 199 CINT2000: 1.7GHz Pentium 4 (D850MD Motherboard) SPECint2000_base = 579 SPECint2000 = 588 Source: www.spec.orgwww.spec.org

10 11/15/05ELEC 5970-001/6970-001 Lecture 1910 CFP2000: 1.7GHz Pentium 4 (D850MD Motherboard) SPECfp2000_base = 648 SPECfp2000 = 659 Source: www.spec.orgwww.spec.org

11 11/15/05ELEC 5970-001/6970-001 Lecture 1911 Energy SPEC Benchmarks Energy efficiency mode: Besides the execution time, energy efficiency of SPEC benchmark programs is also measured. Energy efficiency of a benchmark program is given by: 1/(Execution time) Energy efficiency = ──────────── joules consumed

12 11/15/05ELEC 5970-001/6970-001 Lecture 1912 Energy Efficiency Efficiency averaged on n benchmark programs: n Efficiency= ( Π Efficiency i ) 1/n i=1 where Efficiency i is the efficiency for program i. Relative efficiency: Efficiency of a computer Relative efficiency = ───────────────── Eff. of reference computer

13 11/15/05ELEC 5970-001/6970-001 Lecture 1913 SPEC2000 Relative Energy Efficiency Always max. clock Laptop adaptive clk. Min. power min. clock

14 11/15/05ELEC 5970-001/6970-001 Lecture 1914 Voltage Scaling Dynamic: Reduce voltage and frequency during idle or low activity periods. Static: Clustered voltage scaling Logic on non-critical path given lower voltage 47% power reduction with 10% area increase reported. M. Igarashi et al., “Clustered Voltage Scaling Techniques for Low-Power Design,” Proc. IEEE Symp. Low Power Design, 1997.

15 11/15/05ELEC 5970-001/6970-001 Lecture 1915 Pipeline Gating A pipeline processor uses speculative execution. Incorrect branch prediction results in pipeline stalls and wasted energy. Idea: Stop fetching instructions if a branch hazard is expected: If the count (M) of incorrect predictions exceeds a pre- specified number (N), then suspend fetching instruction for some k cycles. Ref.: S. Manne, A. Klauser and D. Grunwald, “Pipeline Gating: Speculation Control for Energy Reduction,” Proc. 25 th Annual International Symp. Computer Architecture, June 1998.

16 11/15/05ELEC 5970-001/6970-001 Lecture 1916 Slack Scheduling Application: Superscalar, out-of-order execution: An instruction is executed as soon as data and resources it needs become available. A commit unit reorders the results. Delay the execution of instructions whose result is not immediately needed. Example of RISC instructions: addr0, r1, r2;(A) sub r3, r4, r5;(B) and r9, x1, r9;(C) or r5, r9, r10;(D) xor r2, r10, r11;(E) J. Casmira and D. Grunwald, “Dynamic Instruction Scheduling Slack,” Proc. ACM Kool Chips Workshop, Dec. 2000.

17 11/15/05ELEC 5970-001/6970-001 Lecture 1917 Slack Scheduling Example Slack scheduling A BC D E Standard scheduling ABC D E

18 11/15/05ELEC 5970-001/6970-001 Lecture 1918 Slack Scheduling Slack bit Low-power execution units Re-order buffer Scheduling logic

19 11/15/05ELEC 5970-001/6970-001 Lecture 1919 Parallel Architecture Processor f f/2 Processor f/2 f Input Output Input Output Capacitance = C Voltage = V Frequency = f Power = CV 2 f Capacitance = 2.2C Voltage = 0.6V Frequency = 0.5f Power = 0.396CV 2 f

20 11/15/05ELEC 5970-001/6970-001 Lecture 1920 Pipeline Architecture Processor f Input Output Register ½ Proc. f InputOutput Register ½ Proc. Register Capacitance = C Voltage = V Frequency = f Power = CV 2 f Capacitance = 1.2C Voltage = 0.6V Frequency = f Power = 0.432CV 2 f

21 11/15/05ELEC 5970-001/6970-001 Lecture 1921 Approximate Trend n-parallel proc. n-stage pipeline proc. CapacitancenCC VoltageV/n Frequencyf/nf PowerCV 2 f/n 2 Chip area n times10-20% increase G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Kluwer Academic Publishers, 1998.

22 11/15/05ELEC 5970-001/6970-001 Lecture 1922 Clock Distribution clock

23 11/15/05ELEC 5970-001/6970-001 Lecture 1923 Clock Power P clk = C L V DD 2 f + C L V DD 2 f / λ + C L V DD 2 f / λ 2 +... stages – 1 1 = C L V DD 2 f Σ─ n = 0λ n where C L =total load capacitance λ =constant fanout at each stage in distribution network Clock consumes about 40% of total processor power.

24 11/15/05ELEC 5970-001/6970-001 Lecture 1924 Clock Network Examples Alpha 21064Alpha 21164Alpha 21264 Technology0.75μ CMOS0.5μ CMOS0.35μ CMOS Frequency (MHz)200300600 Total capacitance12.5nF Clock load3.25nF3.75nF Clock power20W Max. clock skew200ps (<10%)90ps D. W. Bailey and B. J. Benschneider, “Clocking Design and Analysis for a 600-MHz Alpha Microprocessor,” IEEE J. Solid-State Circuits, vol. 33, no. 11, pp. 1627-1633, Nov. 1998.

25 11/15/05ELEC 5970-001/6970-001 Lecture 1925 Power Reduction Example Alpha 21064: 200MHz @ 3.45V, power dissipation = 26W Reduce voltage to 1.5V, power (5.3x) = 4.9W Eliminate FP, power (3x) = 1.6W Scale 0.75→0.35μ, power (2x) = 0.8W Reduce clock load, power (1.3x) = 0.6W Reduce frequency 200→160MHz, power (1.25x) = 0.5W J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC Microprocessor,” IEEE J. Solid-State Circuits, vol. 31, no. 11, pp. 1703-1714, Nov. 1996.


Download ppt "11/15/05ELEC 5970-001/6970-001 Lecture 191 ELEC 5970-001/6970-001(Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits."

Similar presentations


Ads by Google