Presentation is loading. Please wait.

Presentation is loading. Please wait.

Vishwani D. Agrawal James J. Danaher Professor

Similar presentations


Presentation on theme: "Vishwani D. Agrawal James J. Danaher Professor"— Presentation transcript:

1 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Memory and Multicore Design
Vishwani D. Agrawal James J. Danaher Professor Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL 36849 Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

2 Memory Architecture A0 K address lines A1 . Decoder AK-1 K = log2N
M bits M bits S0 S0 Word 0 Word 0 Word 1 Word 1 Storage cell Storage cell Word 2 Word 2 A0 A1 . AK-1 N words K address lines Decoder N words Word N-2 Word N-2 SN-1 Word N-1 Word N-1 K = log2N N = 2K SN-1 Input-Output (M bits) Input-Output (M bits) Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

3 Sense amplifiers/drivers
Memory Organization 2K – L Bit line Storage cell AL AL+1 AK–1 Word line K – L bit row address Row decoder N = 2K M-bit words M.2L Sense amplifiers/drivers L bit column address A0 AL–1 Column decoder Input-Output (M bits) Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

4 An SRAM Cell WL BL BL VDD bit bit Copyright Agrawal, 2007
ELEC6270 Spring 09, Lecture 13

5 Read Operation 1. Precharge to VDD WL 2. WL = Logic 1 BL BL
bit bit BL BL 3. Sense amplifier converts BL swing to logic level Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

6 Precharge Circuit VDD VDD PC WL BL BL Diff. sense ampl. VDD
Equalization device bit bit BL BL Diff. sense ampl. Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

7 Reading 1 from Cell WL BL Precharge BL Sense ampl. output time
Pulsed to save bit line charge WL BL Precharge BL Sense ampl. output time Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

8 Write Operation, 1→ 0 2. WL = 1 WL BL BL 1 1. Set BL = 0, BL = 1 VDD
bit bit BL BL 1 1. Set BL = 0, BL = 1 Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

9 Cell Array Power Management
Smaller transistors Low supply voltage Lower voltage swing (0.1V – 0.3V for SRAM) Sense amplifier restores the full voltage swing for outside use. Power-down and sleep modes Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

10 Sense Amplifier VDD Full voltage swing output bit bit
Sense ampl. enable: Low when bit lines are precharged and equalized SE or CLK Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

11 Sense Amplifier: Precharge
VDD VDD bit=1 ON ON bit=1 SE=0 OFF Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

12 Sense Amplifier: Reading 0
VDD 1 bit=1 – ∆ OFF ON bit=1 SE=1 ON Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

13 Sense Amplifier: Reading 1
VDD 1 bit=1 ON OFF bit=1– ∆ SE=1 ON Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

14 Block-Oriented Architecture
A single cell array may contain 64 Kbits to 256 Kbits. Larger arrays become slow and consume more power. Larger memories are block oriented. Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

15 Hierarchical Organization
Block Block 1 Block P-1 Row addr. Column addr. Block addr. Global data bus Control circuitry Global amplifier/driver Block selector I/O Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

16 Power Saving Block-oriented memory
Lengths of local word and bit lines are kept small. Block address is used to activate the addressed block. Unaddressed blocks are put in power-saving mode: sense amplifier and row/column decoders are disabled. Cell array is put in power-saving mode. Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

17 Static Power 1.3μ 8-kbit SRAM 1.1μ 900n 700n Leakage current (Amperes)
0.13μ CMOS Leakage current (Amperes) 7x increase 0.18μ CMOS Supply voltage Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

18 Power Saving Modes Power-down: Disconnect supply. Data is not retained. Must be refreshed before use. Example, caches. Increasing thresholds by body biasing: Negative bias on nonactive cells reduces leakage. Sleep mode: Insert resistance in leakage path; retain data. Lower supply voltage. Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

19 Adding Resistance in Leakage Path
VDD Low-threshold transistor sleep VDD.int SRAM cell SRAM cell SRAM cell VSS.int sleep GND Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

20 Lowering Supply Voltage
VDD VDDL ≥ 100mV for 0.13μ CMOS Sleep = 1, data retention mode sleep SRAM cell SRAM cell SRAM cell GND Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

21 Parallelization of Memories
instr. A instr. C instr. E . f/2 Mem 1 Mem 2 instr. B instr. D instr. F . Power = C’ f/2 VDD2 f/2 f/2 MUX 1 C. Piguet, “Circuit and Logic Level Design,” pp in W. Nebel and J. Mermet (Eds.), Low Power Design in Deep Submicron Electronics, Springer, 1997. Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

22 References K. Itoh, VLSI Memory Chip Design, Springer-Verlag, 2001.
J. M. Rabaey, A. Chandrakasan and B. Nikolić, Digital Integrated Circuits, Upper Saddle River, New Jersey: Pearson Education, Inc., 2003, Chapter 12. S.-M. Kang and Y. Leblebici, CMOS Digital Integrated Circuits Analysis and Design, New York: McGraw-Hill, 1996, Chapter 10. Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

23 Low-Power Datapath Architecture
Lower supply voltage This slows down circuit speed Use parallel computing to gain the speed back Works well when threshold voltage is also lowered. About 60% reduction in power obtainable. Reference: A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design, Boston: Kluwer Academic Publishers (Now Springer), 1995. Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

24 A Reference Datapath Combinational logic Register Register Output
Input Register Register Output Cref CK Supply voltage = Vref Total capacitance switched per cycle = Cref Clock frequency = f Power consumption: Pref = CrefVref2f Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

25 A Parallel Architecture
Supply voltage: VN ≤ V1 = Vref N = Deg. of parallelism Each copy processes every Nth input, operates at reduced voltage Register Comb. Logic Copy 1 f/N Register Comb. Logic Copy 2 Register Output Input N to 1 multiplexer f/N f Register Comb. Logic Copy N Multiphase Clock gen. and mux control f/N CK Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

26 Level Converter: L to H VDDH Vout_H Vin_L VDDL
Transistors with thicker oxide and longer channels VDDH Vout_H Vin_L VDDL N. H. E. Weste and D. Harris, CMOS VLSI Design, Third Edition, Section , Addison-Wesley, 2005. Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

27 Level Converter: H to L VDDL Vout_L Vin_H
Transistors with thicker oxide and longer channels VDDL Vout_L Vin_H N. H. E. Weste and D. Harris, CMOS VLSI Design, Third Edition, Section , Addison-Wesley, 2005. Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

28 Control Signals, N = 4 CK Phase 1 Phase 2 Phase 3 Phase 4
Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

29 Power PN = Pproc + Poverhead
Pproc = N(Cinreg+ Ccomb)VN2f/N + CoutregVN2f = (Cinreg+ Ccomb+Coutreg)VN2f = CrefVN2f Poverhead = CoverheadVN2f ≈ δCref(N – 1)VN2f PN = [1 + δ(N – 1)]CrefVN2f PN VN2 ── = [1 + δ(N – 1)] ─── P Vref2 Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

30 Voltage vs. Speed CLVref CLVref Delay of a gate, T ≈ ──── = ──────────
I k(W/L)(Vref – Vt)2 where I is saturation current k is a technology parameter W/L is width to length ratio of transistor Vt is threshold voltage 4.0 3.0 2.0 1.0 0.0 Voltage reduction slows down as we get closer to Vt 1.2μ CMOS N=3 Normalized gate delay, T N=2 N=1 Supply voltage Vt V3 V2=2.9V Vref =5V Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

31 Increasing Multiprocessing
1.0 0.8 0.6 0.4 0.2 0.0 1.2μ CMOS, Vref = 5V Vt=0.8V PN/P1 Vt=0.4V Vt=0V (extreme case) N Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

32 Extreme Cases: Vt = 0 Delay, T α 1/ Vref
For N processing elements, delay = NT → VN = Vref/N PN 1 ── = [1+ δ (N – 1)] ── → 1/N P1 N2 For negligible overhead, δ→0 PN 1 ── ≈ ── P1 N2 For Vt > 0, power reduction is less and there will be an optimum value of N. Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

33 Example: Multiplier Core
Specification: 200MHz Clock 15W 5V Low voltage operation, VDD ≥ 1.5 volts (VDD – 0.5)2 Relative clock rate = ─────── 20.25 Problem: Integrate multiplier core on a SOC Power budget for multiplier ~ 5W Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

34 A Multicore Design Core clock frequency = 200/N, N should divide 200.
Multiplier Core 1 Reg 40MHz Multiplier Core 2 Output Reg 5 to 1 mux Reg Input 40MHz 200MHz Multiphase Clock gen. and mux control Multiplier Core 5 Reg 40MHz 200MHz CK Core clock frequency = 200/N, N should divide 200. Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

35 How Many Cores? For N cores: clock frequency = 200/N MHz
Supply voltage, VDDN = (20.25/N)1/2 volts Assuming 10% overhead per core, VDDN Power dissipation =15 [ (N – 1)] (───)2 watts 5 Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

36 Core supply VDDN (Volts)
Design Tradeoffs Number of cores, N Clock (MHz) Core supply VDDN (Volts) Total Power (Watts) 1 200 5.00 15.0 2 100 3.68 8.94 4 50 2.75 5.90 5 40 2.51 5.29 8 25 2.10 4.50 Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

37 Power Reduction in Processors
Just about everything is used. Hardware methods: Voltage reduction for dynamic power Dual-threshold devices for leakage reduction Clock gating, frequency reduction Sleep mode Architecture: Instruction set hardware organization Software methods Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

38 Parallel Architecture
Processor Processor Input Output Output f/2 Input Processor f f Capacitance = C Voltage = V Frequency = f Power = CV2f Capacitance = 2.2C Voltage = 0.6V Frequency = 0.5f Power = 0.396CV2f f/2 Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

39 Pipeline Architecture
Processor Proc. Proc. Input Output Input Output Register Register Register f f Capacitance = C Voltage = V Frequency = f Power = CV2f Capacitance = 1.2C Voltage = 0.6V Frequency = f Power = 0.432CV2f Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

40 Approximate Trend n-parallel proc. n-stage pipeline proc. Capacitance
Voltage V/n Frequency f/n f Power CV2f/n2 Chip area n times 10-20% increase G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Springer, 1998. Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

41 SPECint2000 and SPECfp2000 benchmarks
Multicore Processors Computer, May 2005, p. 12 Multicore SPECint2000 and SPECfp2000 benchmarks Performance based on Single core Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

42 Multicore Processors D. Geer, “Chip Makers Turn to Multicore Processors,” Computer, vol. 38, no. 5, pp , May 2005. A. Jerraya, H. Tenhunen and W. Wolf, “Multiprocessor Systems-on-Chips,” Computer, vol. 5, no. 7, pp , July 2005; this special issue contains three more articles on multicore processors. S. K. Moore, “Winner Multimedia Monster – Cell’s Nine Processors Make It a Supercomputer on a Chip,” IEEE Spectrum, vol. 43. no. 1, pp , January 2006. Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

43 Cell - Cell Broadband Engine Architecture
Nine-processor chip: 192 Gflops © IEEE Spectrum, January 2006 L to R Atsushi Kameyama, Toshiba James Kahle, IBM Masakazu Suzoki, Sony Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13

44 Cell’s Nine-Processor Chip
© IEEE Spectrum, January 2006 Eight Identical Processors f = 5.6GHz (max) 44.8 Gflops Copyright Agrawal, 2007 ELEC6270 Spring 09, Lecture 13


Download ppt "Vishwani D. Agrawal James J. Danaher Professor"

Similar presentations


Ads by Google