Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10 1 ELEC 5270/6270 Fall 2007 Low-Power Design of Electronic Circuits Memory and Multicore Design Vishwani.

Similar presentations


Presentation on theme: "Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10 1 ELEC 5270/6270 Fall 2007 Low-Power Design of Electronic Circuits Memory and Multicore Design Vishwani."— Presentation transcript:

1 Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10 1 ELEC 5270/6270 Fall 2007 Low-Power Design of Electronic Circuits Memory and Multicore Design Vishwani D. Agrawal James J. Danaher Professor Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL 36849 vagrawal@eng.auburn.edu http://www.eng.auburn.edu/~vagrawal/COURSE/E6270_Fall07/course.html

2 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 102 Memory Architecture Word 0 Word 1 Word 2 M bits Storage cell Word N-2 Word N-1 Input-Output (M bits) N words S0S0 S N-1 Word 0 Word 1 Word 2 M bits Storage cell Word N-2 Word N-1 Input-Output (M bits) N words S0S0 S N-1 A 0 A 1. A K-1 Decoder K address lines K = log 2 N N = 2 K

3 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 103 Memory Organization Sense amplifiers/drivers Column decoder A L A L+1 A K–1 Storage cell Word line Bit line Input-Output (M bits) A 0 A L–1 2 K – L M.2 L K – L bit row address L bit column address N = 2 K M-bit words Row decoder

4 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 104 An SRAM Cell bit VDD WL BL

5 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 105 Read Operation bit VDD WL BL 1. Precharge to VDD 2. WL = Logic 1 3. Sense amplifier converts BL swing to logic level

6 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 106 Precharge Circuit bit VDD WL BL Diff. sense ampl. VDD PC

7 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 107 Reading 1 from Cell Precharge time WL BL Sense ampl. output Pulsed to save bit line charge

8 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 108 Write Operation, 1→ 0 bit VDD WL BL 0 1 1. Set BL = 0, BL = 1 2. WL = 1

9 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 109 Cell Array Power Management Smaller transistors Smaller transistors Low supply voltage Low supply voltage Lower voltage swing (0.1V – 0.3V for SRAM) Lower voltage swing (0.1V – 0.3V for SRAM) Sense amplifier restores the full voltage swing for outside use. Sense amplifier restores the full voltage swing for outside use. Power-down and sleep modes Power-down and sleep modes

10 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1010 Sense Amplifier bit SE or CLK Sense ampl. enable: Low when bit lines are precharged and equalized VDD Full voltage swing output

11 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1011 Sense Amplifier: Precharge bit=1 SE=0 VDD 0 OFF ON

12 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1012 Sense Amplifier: Reading 0 bit=1 – ∆ bit=1 SE=1 VDD 1 0 ON OFF ON

13 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1013 Sense Amplifier: Reading 1 bit=1 bit=1– ∆ SE=1 VDD 0 1 ON OFF ON

14 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1014 Block-Oriented Architecture A single cell array may contain 64 Kbits to 256 Kbits. A single cell array may contain 64 Kbits to 256 Kbits. Larger arrays become slow and consume more power. Larger arrays become slow and consume more power. Larger memories are block oriented. Larger memories are block oriented.

15 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1015 Hierarchical Organization Global data bus Global amplifier/driver I/O Block 0 Block 1 Block P-1 Control circuitry Block selector Row addr. Column addr. Block addr.

16 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1016 Power Saving Block-oriented memory Block-oriented memory Lengths of local word and bit lines are kept small. Lengths of local word and bit lines are kept small. Block address is used to activate the addressed block. Block address is used to activate the addressed block. Unaddressed blocks are put in power-saving mode: Unaddressed blocks are put in power-saving mode: sense amplifier and row/column decoders are disabled. sense amplifier and row/column decoders are disabled. Cell array is put in power-saving mode. Cell array is put in power-saving mode.

17 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1017 Static Power 0.00.61.21.8 Supply voltage 1.3μ 1.1μ 900n 700n 500n 300n 100n 0.13μ CMOS 0.18μ CMOS 8-kbit SRAM 7x increase Leakage current (Amperes)

18 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1018 Power Saving Modes Power-down: Disconnect supply. Data is not retained. Must be refreshed before use. Example, caches. Power-down: Disconnect supply. Data is not retained. Must be refreshed before use. Example, caches. Increasing thresholds by body biasing: Negative bias on nonactive cells reduces leakage. Increasing thresholds by body biasing: Negative bias on nonactive cells reduces leakage. Sleep mode: Sleep mode: Insert resistance in leakage path; retain data. Insert resistance in leakage path; retain data. Lower supply voltage. Lower supply voltage.

19 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1019 Adding Resistance in Leakage Path SRAM cell SRAM cell SRAM cell GND VDD sleep Low-threshold transistor VSS.int VDD.int

20 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1020 Lowering Supply Voltage SRAM cell SRAM cell SRAM cell GND VDD sleep VDDL ≥ 100mV for 0.13μ CMOS Sleep = 1, data retention mode

21 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1021 Parallelization of Memories instr. A instr. C instr. E. f/2 Mem 1 instr. B instr. D instr. F. f/2 Mem 2 MUX f/2 01 Power = C’ f/2 V DD 2 C. Piguet, “Circuit and Logic Level Design,” pp. 124-125 in W. Nebel and J. Mermet (Eds.), Low Power Design in Deep Submicron Electronics, Springer, 1997.

22 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1022 References K. Itoh, VLSI Memory Chip Design, Springer- Verlag, 2001. K. Itoh, VLSI Memory Chip Design, Springer- Verlag, 2001. J. M. Rabaey, A. Chandrakasan and B. Nikolić, Digital Integrated Circuits, Upper Saddle River, New Jersey: Pearson Education, Inc., 2003, Chapter 12. J. M. Rabaey, A. Chandrakasan and B. Nikolić, Digital Integrated Circuits, Upper Saddle River, New Jersey: Pearson Education, Inc., 2003, Chapter 12. S.-M. Kang and Y. Leblebici, CMOS Digital Integrated Circuits Analysis and Design, New York: McGraw-Hill, 1996, Chapter 10. S.-M. Kang and Y. Leblebici, CMOS Digital Integrated Circuits Analysis and Design, New York: McGraw-Hill, 1996, Chapter 10.

23 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1023 Low-Power Datapath Architecture Lower supply voltage Lower supply voltage This slows down circuit speed This slows down circuit speed Use parallel computing to gain the speed back Use parallel computing to gain the speed back Works well when threshold voltage is also lowered. Works well when threshold voltage is also lowered. About 60% reduction in power obtainable. About 60% reduction in power obtainable. Reference: A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design, Boston: Kluwer Academic Publishers (Now Springer), 1995. Reference: A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design, Boston: Kluwer Academic Publishers (Now Springer), 1995.

24 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1024 A Reference Datapath Combinational logic Output Input Register CK Supply voltage= V ref Total capacitance switched per cycle= C ref Clock frequency= f Power consumption:P ref = C ref V ref 2 f C ref

25 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1025 A Parallel Architecture Comb. Logic Copy 1 Comb. Logic Copy 2 Comb. Logic Copy N Register N to 1 multiplexer Multiphase Clock gen. and mux control Input Output CK f f/N Each copy processes every Nth input, operates at reduced voltage Supply voltage: V N ≤ V 1 = V ref N = Deg. of parallelism

26 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1026 Level Converter: L to H Vin_L Vout_H VDDH VDDL Transistors with thicker oxide and longer channels N. H. E. Weste and D. Harris, CMOS VLSI Design, Third Edition, Section 12.4.3, Addison-Wesley, 2005.

27 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1027 Level Converter: H to L Vin_H Vout_L VDDL Transistors with thicker oxide and longer channels N. H. E. Weste and D. Harris, CMOS VLSI Design, Third Edition, Section 12.4.3, Addison-Wesley, 2005.

28 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1028 Control Signals, N = 4 CK Phase 1 Phase 2 Phase 3 Phase 4

29 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1029 Power P N =P proc + P overhead P proc =N(C inreg + C comb )V N 2 f/N + C outreg V N 2 f =(C inreg + C comb +C outreg )V N 2 f =C ref V N 2 f P overhead =C overhead V N 2 f≈ δC ref (N – 1)V N 2 f P N = [1 + δ(N – 1)]C ref V N 2 f P N V N 2 ──= [1 + δ(N – 1)] ─── P 1 V ref 2

30 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1030 Voltage vs. Speed C L V ref C L V ref Delay of a gate, T ≈ ──── = ────────── Ik(W/L)(V ref – V t ) 2 whereI is saturation current k is a technology parameter W/L is width to length ratio of transistor V t is threshold voltage Supply voltage Normalized gate delay, T 4.0 3.0 2.0 1.0 0.0 VtVt V ref =5VV 2 =2.9V N=1 N=2 V3V3 N=3 1.2μ CMOS Voltage reduction slows down as we get closer to V t

31 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1031 Increasing Multiprocessing P N /P 1 1 2 3 4 5 6 7 8 9 10 11 12 1.0 0.8 0.6 0.4 0.2 0.0 V t =0V (extreme case) V t =0.4V V t =0.8V N 1.2μ CMOS, V ref = 5V

32 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1032 Extreme Cases: V t = 0 Delay, T α 1/ V ref For N processing elements, delay = NT → V N = V ref /N P N 1 ──=[1+ δ (N – 1)] ──→1/N P 1 N 2 For negligible overhead, δ→0 P N 1 ──≈── P 1 N 2 For V t > 0, power reduction is less and there will be an optimum value of N.

33 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1033 Example: Multiplier Core Specification: Specification: 200MHz Clock 200MHz Clock 15W dissipation @ 5V 15W dissipation @ 5V Low voltage operation, V DD ≥ 1.5 volts Low voltage operation, V DD ≥ 1.5 volts (V DD – 0.5) 2 (V DD – 0.5) 2 Relative clock rate = ─────── Relative clock rate = ─────── 20.25 20.25 Problem: Problem: Integrate multiplier core on a SOC Integrate multiplier core on a SOC Power budget for multiplier ~ 5W Power budget for multiplier ~ 5W

34 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1034 A Multicore Design Multiplier Core 1 Multiplier Core 5 Reg 5 to 1 mux Multiphase Clock gen. and mux control Input Output 200MHz CK 200MHz 40MHz Multiplier Core 2 Core clock frequency = 200/N, N should divide 200.

35 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1035 How Many Cores? For N cores: For N cores: clock frequency = 200/N MHz clock frequency = 200/N MHz Supply voltage, V DDN = 0.5 + (20.25/N) 1/2 Volts Supply voltage, V DDN = 0.5 + (20.25/N) 1/2 Volts Assuming 10% overhead per core, Assuming 10% overhead per core, V DDN V DDN Power dissipation =15 [1 + 0.1(N – 1)] ( ─── ) 2 watts 5

36 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1036 Design Tradeoffs Number of cores, N Clock (MHz) Core supply VDDN (Volts) Total Power (Watts) 12005.0015.0 21003.688.94 4502.755.90 5402.515.29 8252.104.50

37 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1037 Power Reduction in Processors Just about everything is used. Just about everything is used. Hardware methods: Hardware methods: Voltage reduction for dynamic power Voltage reduction for dynamic power Dual-threshold devices for leakage reduction Dual-threshold devices for leakage reduction Clock gating, frequency reduction Clock gating, frequency reduction Sleep mode Sleep mode Architecture: Architecture: Instruction set Instruction set hardware organization hardware organization Software methods Software methods

38 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1038 Parallel Architecture Processor f f/2 Processor f/2 f Input Output Input Output Capacitance = C Voltage = V Frequency = f Power = CV 2 f Capacitance = 2.2C Voltage = 0.6V Frequency = 0.5f Power = 0.396CV 2 f

39 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1039 Pipeline Architecture Processor f Input Output Register ½ Proc. f InputOutput Register ½ Proc. Register Capacitance = C Voltage = V Frequency = f Power = CV 2 f Capacitance = 1.2C Voltage = 0.6V Frequency = f Power = 0.432CV 2 f

40 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1040 Approximate Trend n-parallel proc. n-parallel proc. n-stage pipeline proc. n-stage pipeline proc. CapacitancenCC VoltageV/nV/n Frequencyf/nf Power CV 2 f/n 2 Chip area n times n times 10-20% increase G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Springer, 1998.

41 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1041 Multicore Processors 200020042008 Performance based on SPECint2000 and SPECfp2000 benchmarks Multicore Single core Computer, May 2005, p. 12

42 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1042 Multicore Processors D. Geer, “Chip Makers Turn to Multicore Processors,” Computer, vol. 38, no. 5, pp. 11-13, May 2005. D. Geer, “Chip Makers Turn to Multicore Processors,” Computer, vol. 38, no. 5, pp. 11-13, May 2005. A. Jerraya, H. Tenhunen and W. Wolf, “Multiprocessor Systems-on-Chips,” Computer, vol. 5, no. 7, pp. 36-40, July 2005; A. Jerraya, H. Tenhunen and W. Wolf, “Multiprocessor Systems-on-Chips,” Computer, vol. 5, no. 7, pp. 36-40, July 2005; this special issue contains three more articles on multicore processors. S. K. Moore, “Winner Multimedia Monster – Cell’s Nine Processors Make It a Supercomputer on a Chip,” IEEE Spectrum, vol. 43. no. 1, pp. 20-23, January 2006. S. K. Moore, “Winner Multimedia Monster – Cell’s Nine Processors Make It a Supercomputer on a Chip,” IEEE Spectrum, vol. 43. no. 1, pp. 20-23, January 2006.

43 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1043 Cell - Cell Broadband Engine Architecture L to R Atsushi Kameyama, Toshiba James Kahle, IBM Masakazu Suzoki, Sony © IEEE Spectrum, January 2006 Nine-processor chip: 192 Gflops

44 Copyright Agrawal, 2007ELEC6270 Fall 07, Lecture 1044 Cell’s Nine-Processor Chip © IEEE Spectrum, January 2006 Eight Identical Processors f = 5.6GHz (max) 44.8 Gflops


Download ppt "Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 10 1 ELEC 5270/6270 Fall 2007 Low-Power Design of Electronic Circuits Memory and Multicore Design Vishwani."

Similar presentations


Ads by Google