Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani.

Similar presentations


Presentation on theme: "Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani."— Presentation transcript:

1 Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani D. Agrawal James J. Danaher Professor ECE Department, Auburn University Auburn, AL 36849 vagrawal@eng.auburn.edu http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07

2 Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)2 Power Dissipation in CMOS Logic (0.25µ) %75%5%20 P total (0→1) = C L V DD 2 + t sc V DD I peak + V DD I leakage CLCL V DD

3 Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)3 Low-Power Datapath Architecture  Lower supply voltage  This slows down circuit speed  Use parallel computing to gain the speed back  Works well when threshold voltage is also lowered.  About 60% reduction in power obtainable.  Reference: A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design, Boston: Kluwer Academic Publishers (Now Springer), 1995.

4 Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)4 A Reference Datapath Combinational logic Output Input Register CK Supply voltage= V ref Total capacitance switched per cycle= C ref Clock frequency= f Power consumption:P ref = C ref V ref 2 f C ref

5 Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)5 A Parallel Architecture Comb. Logic Copy 1 Comb. Logic Copy 2 Comb. Logic Copy N Register N to 1 multiplexer Multiphase Clock gen. and mux control Input Output CK f f/N Each copy processes every Nth input, operates at reduced voltage Supply voltage: V N ≤ V 1 = V ref N = Deg. of parallelism

6 Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)6 Level Converter: L to H Vin_L Vout_H VDDH VDDL Transistors with thicker oxide and longer channels N. H. E. Weste and D. Harris, CMOS VLSI Design, Third Edition, Section 12.4.3, Addison-Wesley, 2005.

7 Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)7 Level Converter: H to L Vin_H Vout_L VDDL Transistors with thicker oxide and longer channels N. H. E. Weste and D. Harris, CMOS VLSI Design, Third Edition, Section 12.4.3, Addison-Wesley, 2005.

8 Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)8 Control Signals, N = 4 CK Phase 1 Phase 2 Phase 3 Phase 4

9 Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)9 Power P N =P proc + P overhead P proc =N(C inreg + C comb )V N 2 f/N + C outreg V N 2 f =(C inreg + C comb +C outreg )V N 2 f =C ref V N 2 f P overhead =C overhead V N 2 f≈ δC ref (N – 1)V N 2 f P N = [1 + δ(N – 1)]C ref V N 2 f P N V N 2 ──= [1 + δ(N – 1)] ─── P 1 V ref 2

10 Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)10 Voltage vs. Speed C L V ref C L V ref Delay of a gate, T ≈ ──── = ────────── Ik(W/L)(V ref – V t ) 2 whereI is saturation current k is a technology parameter W/L is width to length ratio of transistor V t is threshold voltage Supply voltage Normalized gate delay, T 4.0 3.0 2.0 1.0 0.0 VtVt V ref =5VV 2 =2.9V N=1 N=2 V3V3 N=3 1.2μ CMOS Voltage reduction slows down as we get closer to V t

11 Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)11 Increasing Multiprocessing P N /P 1 1 2 3 4 5 6 7 8 9 10 11 12 1.0 0.8 0.6 0.4 0.2 0.0 V t =0V (extreme case) V t =0.4V V t =0.8V N 1.2μ CMOS, V ref = 5V

12 Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)12 Extreme Cases: V t = 0 Delay, T α 1/ V ref For N processing elements, delay = NT → V N = V ref /N P N 1 ──=[1+ δ (N – 1)] ──→1/N P 1 N 2 For negligible overhead, δ→0 P N 1 ──≈── P 1 N 2 For V t > 0, power reduction is less and there will be an optimum value of N.

13 Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)13 Example: Multiplier Core  Specification:  200MHz Clock  15W dissipation @ 5V  Low voltage operation, V DD ≥ 1.5 volts (V DD – 0.5) 2 (V DD – 0.5) 2 Relative clock rate = ─────── Relative clock rate = ─────── 20.25 20.25  Problem:  Integrate multiplier core on a SOC  Power budget for multiplier ~ 5W

14 Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)14 A Multicore Design Multiplier Core 1 Multiplier Core 5 Reg 5 to 1 mux Multiphase Clock gen. and mux control Input Output 200MHz CK 200MHz 40MHz Multiplier Core 2 Core clock frequency = 200/N, N should divide 200.

15 Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)15 How Many Cores?  For N cores:  clock frequency = 200/N MHz  Supply voltage, V DDN = 0.5 + (20.25/N) 1/2 Volts  Assuming 10% overhead per core, V DDN V DDN Power dissipation =15 [1 + 0.1(N – 1)] ( ─── ) 2 watts 5

16 Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)16 Design Tradeoffs Number of cores N Clock (MHz) Core supply VDDN (Volts) Total Power (Watts) 12005.0015.0 21003.688.94 4502.755.90 5402.515.29 8252.104.50

17 Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)17 Power Reduction in Processors  Just about everything is used.  Hardware methods:  Voltage reduction for dynamic power  Dual-threshold devices for leakage reduction  Clock gating, frequency reduction  Sleep mode  Architecture:  Instruction set  hardware organization  Software methods

18 Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)18 Parallel Architecture Processor f f/2 Processor f/2 f Input Output Input Output Capacitance = C Voltage = V Frequency = f Power = CV 2 f Capacitance = 2.2C Voltage = 0.6V Frequency = 0.5f Power = 0.396CV 2 f

19 Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)19 Pipeline Architecture Processor f Input Output Register ½ Proc. f InputOutput Register ½ Proc. Register Capacitance = C Voltage = V Frequency = f Power = CV 2 f Capacitance = 1.2C Voltage = 0.6V Frequency = f Power = 0.432CV 2 f

20 Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)20 Approximate Trend n-parallel proc. n-parallel proc. n-stage pipeline proc. n-stage pipeline proc. CapacitancenCC VoltageV/nV/n Frequencyf/nf Power CV 2 f/n 2 Chip area n times n times 10-20% increase G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Kluwer Academic Publishers, 1998.

21 Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)21 Multicore Processors 200020042008 Performance based on SPECint2000 and SPECfp2000 benchmarks Multicore Single core Computer, May 2005, p. 12

22 Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)22 Multicore Processors  D. Geer, “Chip Makers Turn to Multicore Processors,” Computer, vol. 38, no. 5, pp. 11-13, May 2005.  A. Jerraya, H. Tenhunen and W. Wolf, “Multiprocessor Systems-on-Chips,” Computer, vol. 5, no. 7, pp. 36-40, July 2005; this special issue contains three more articles on multicore processors.  S. K. Moore, “Winner Multimedia Monster – Cell’s Nine Processors Make It a Supercomputer on a Chip,” IEEE Spectrum, vol. 43. no. 1, pp. 20-23, January 2006.

23 Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)23 Cell - Cell Broadband Engine Architecture L to R Atsushi Kameyama, Toshiba James Kahle, IBM Masakazu Suzoki, Sony © IEEE Spectrum, January 2006 Nine-processor chip: 192 Gflops

24 Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)24 Cell’s Nine-Processor Chip © IEEE Spectrum, January 2006 Eight Identical Processors f = 5.6GHz (max) 44.8 Gflops


Download ppt "Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani."

Similar presentations


Ads by Google