Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 1 Low-Power Design and Test Memory and Multicore Design Vishwani D. Agrawal Auburn.

Similar presentations


Presentation on theme: "Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 1 Low-Power Design and Test Memory and Multicore Design Vishwani D. Agrawal Auburn."— Presentation transcript:

1 Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 1 Low-Power Design and Test Memory and Multicore Design Vishwani D. Agrawal Auburn University, USA vagrawal@eng.auburn.edu Srivaths Ravi Texas Instruments India Srivaths.ravi@ti.com Hyderabad, July 30-31, 2007 http://www.eng.auburn.edu/~vagrawal/hyd.html

2 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 62 Memory Architecture Word 0 Word 1 Word 2 M bits Storage cell Word N-2 Word N-1 Input-Output (M bits) N words S0S0 S N-1 Word 0 Word 1 Word 2 M bits Storage cell Word N-2 Word N-1 Input-Output (M bits) N words S0S0 S N-1 A 0 A 1. A k-1 Decoder k address lines k = log 2 N

3 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 63 Memory Organization Sense amplifiers/drivers Column decoder A K A K-1 A L-1 Storage cell Word line Bit line Input-Output (M bits) A 0 A K-1 2 L-K M.2 K

4 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 64 An SRAM Cell bit VDD WL BL

5 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 65 Read Operation bit VDD WL BL 1. Precharge to VDD 2. WL = Logic 1 3. Sense amplifier converts BL swing to logic level

6 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 66 Precharge Circuit bit VDD WL BL Diff. sense ampl. VDD PC

7 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 67 Reading 1 from Cell Precharge time WL BL Sense ampl. output Pulsed to save bit line charge

8 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 68 Write Operation, bit = 1 → 0 bit VDD WL BL 0 1 1. Set BL = 0, BL = 1 2. WL = 1

9 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 69 Cell Array Power Management  Smaller transistors  Low supply voltage  Lower voltage swing (0.1V – 0.3V for SRAM)  Sense amplifier restores the full voltage swing for outside use.

10 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 610 Sense Amplifier bit SE Sense ampl. enable: Low when bit lines are precharged and equalized VDD Full voltage swing output

11 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 611 Block-Oriented Architecture  A single cell array may contain 64 Kbits to 256 Kbits.  Larger arrays become slow and consume more power.  Larger memories are block oriented.

12 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 612 Hierarchical Organization Global data bus Global amplifier/driver I/O Block 0 Block 1 Block P-1 Control circuitry Block selector Row addr. Column addr. Block addr.

13 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 613 Power Saving  Block-oriented memory  Lengths of local word and bit lines are kept small.  Block address is used to activate the addressed block.  Unaddressed blocks are put in power-saving mode:  sense amplifier and row/column decoders are disabled.  Power is maintained for data retention in cells.

14 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 614 Static Power 0.00.61.21.8 Supply voltage 1.3μ 1.1μ 900n 700n 500n 300n 100n 0.13μ CMOS 0.18μ CMOS 8-kbit SRAM 7x increase Leakage current (Amperes)

15 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 615 Adding Resistance in Leakage Path SRAM cell array SRAM cell array SRAM cell array GND VDD sleep Low-threshold transistor VSS.int VDD.int

16 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 616 Lowering Supply Voltage SRAM cell array SRAM cell array SRAM cell array GND VDD sleep VDDL= 100mV for 0.13μ CMOS Sleep = 1, data retention mode

17 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 617 Parallelization of Memories instr. A instr. C instr. E. f/2 Mem 1 instr. B instr. D instr. F. f/2 Mem 2 MUX f/2 01 Power = C’ f/2 V DD 2 C. Piguet, “Circuit and Logic Level Design,” pp. 124-125 in W. Nebel and J. Mermet (Eds.), Low Power Design in Deep Submocron Electronics, Springer, 1997.

18 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 618 References  K. Itoh, VLSI Memory Chip Design, Springer-Verlag, 2001.  J. M. Rabaey, A. Chandrakasan and B. Nikolić, Digital Integrated Circuits, Upper Saddle River, New Jersey: Pearson Education, Inc., 2003.

19 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 619 Low-Power Datapath Architecture  Lower supply voltage  This slows down circuit speed  Use parallel computing to gain the speed back  Works well when threshold voltage is also lowered.  About 60% reduction in power obtainable.  Reference: A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design, Boston: Kluwer Academic Publishers (Now Springer), 1995.

20 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 620 A Reference Datapath Combinational logic Output Input Register CK Supply voltage= V ref Total capacitance switched per cycle= C ref Clock frequency= f Power consumption:P ref = C ref V ref 2 f C ref

21 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 621 A Parallel Architecture Comb. Logic Copy 1 Comb. Logic Copy 2 Comb. Logic Copy N Register N to 1 multiplexer Multiphase Clock gen. and mux control Input Output CK f f/N Each copy processes every Nth input, operates at reduced voltage Supply voltage: V N ≤ V 1 = V ref N = Deg. of parallelism

22 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 622 Level Converter: L to H Vin_L Vout_H VDDH VDDL Transistors with thicker oxide and longer channels N. H. E. Weste and D. Harris, CMOS VLSI Design, Third Edition, Section 12.4.3, Addison-Wesley, 2005.

23 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 623 Level Converter: H to L Vin_H Vout_L VDDL Transistors with thicker oxide and longer channels N. H. E. Weste and D. Harris, CMOS VLSI Design, Third Edition, Section 12.4.3, Addison-Wesley, 2005.

24 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 624 Control Signals, N = 4 CK Phase 1 Phase 2 Phase 3 Phase 4

25 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 625 Power P N =P proc + P overhead P proc =N(C inreg + C comb )V N 2 f/N + C outreg V N 2 f =(C inreg + C comb +C outreg )V N 2 f =C ref V N 2 f P overhead =C overhead V N 2 f≈ δC ref (N – 1)V N 2 f P N = [1 + δ(N – 1)]C ref V N 2 f P N V N 2 ──= [1 + δ(N – 1)] ─── P 1 V ref 2

26 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 626 Voltage vs. Speed C L V ref C L V ref Delay of a gate, T ≈ ──── = ────────── Ik(W/L)(V ref – V t ) 2 whereI is saturation current k is a technology parameter W/L is width to length ratio of transistor V t is threshold voltage Supply voltage Normalized gate delay, T 4.0 3.0 2.0 1.0 0.0 VtVt V ref =5VV 2 =2.9V N=1 N=2 V3V3 N=3 1.2μ CMOS Voltage reduction slows down as we get closer to V t

27 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 627 Increasing Multiprocessing P N /P 1 1 2 3 4 5 6 7 8 9 10 11 12 1.0 0.8 0.6 0.4 0.2 0.0 V t =0V (extreme case) V t =0.4V V t =0.8V N 1.2μ CMOS, V ref = 5V

28 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 628 Extreme Cases: V t = 0 Delay, T α 1/ V ref For N processing elements, delay = NT → V N = V ref /N P N 1 ──=[1+ δ (N – 1)] ──→1/N P 1 N 2 For negligible overhead, δ→0 P N 1 ──≈── P 1 N 2 For V t > 0, power reduction is less and there will be an optimum value of N.

29 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 629 Example: Multiplier Core  Specification:  200MHz Clock  15W dissipation @ 5V  Low voltage operation, V DD ≥ 1.5 volts (V DD – 0.5) 2 (V DD – 0.5) 2 Relative clock rate = ─────── Relative clock rate = ─────── 20.25 20.25  Problem:  Integrate multiplier core on a SOC  Power budget for multiplier ~ 5W

30 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 630 A Multicore Design Multiplier Core 1 Multiplier Core 5 Reg 5 to 1 mux Multiphase Clock gen. and mux control Input Output 200MHz CK 200MHz 40MHz Multiplier Core 2 Core clock frequency = 200/N, N should divide 200.

31 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 631 How Many Cores?  For N cores:  clock frequency = 200/N MHz  Supply voltage, V DDN = 0.5 + (20.25/N) 1/2 Volts  Assuming 10% overhead per core, V DDN V DDN Power dissipation =15 [1 + 0.1(N – 1)] ( ─── ) 2 watts 5

32 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 632 Design Tradeoffs Number of cores, N Clock (MHz) Core supply VDDN (Volts) Total Power (Watts) 12005.0015.0 21003.688.94 4502.755.90 5402.515.29 8252.104.50

33 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 633 Power Reduction in Processors  Just about everything is used.  Hardware methods:  Voltage reduction for dynamic power  Dual-threshold devices for leakage reduction  Clock gating, frequency reduction  Sleep mode  Architecture:  Instruction set  hardware organization  Software methods

34 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 634 Parallel Architecture Processor f f/2 Processor f/2 f Input Output Input Output Capacitance = C Voltage = V Frequency = f Power = CV 2 f Capacitance = 2.2C Voltage = 0.6V Frequency = 0.5f Power = 0.396CV 2 f

35 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 635 Pipeline Architecture Processor f Input Output Register ½ Proc. f InputOutput Register ½ Proc. Register Capacitance = C Voltage = V Frequency = f Power = CV 2 f Capacitance = 1.2C Voltage = 0.6V Frequency = f Power = 0.432CV 2 f

36 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 636 Approximate Trend n-parallel proc. n-parallel proc. n-stage pipeline proc. n-stage pipeline proc. CapacitancenCC VoltageV/nV/n Frequencyf/nf Power CV 2 f/n 2 Chip area n times n times 10-20% increase G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Kluwer Academic Publishers, 1998.

37 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 637 Multicore Processors 200020042008 Performance based on SPECint2000 and SPECfp2000 benchmarks Multicore Single core Computer, May 2005, p. 12

38 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 638 Multicore Processors  D. Geer, “Chip Makers Turn to Multicore Processors,” Computer, vol. 38, no. 5, pp. 11-13, May 2005.  A. Jerraya, H. Tenhunen and W. Wolf, “Multiprocessor Systems-on-Chips,” Computer, vol. 5, no. 7, pp. 36-40, July 2005; this special issue contains three more articles on multicore processors.  S. K. Moore, “Winner Multimedia Monster – Cell’s Nine Processors Make It a Supercomputer on a Chip,” IEEE Spectrum, vol. 43. no. 1, pp. 20-23, January 2006.

39 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 639 Cell - Cell Broadband Engine Architecture L to R Atsushi Kameyama, Toshiba James Kahle, IBM Masakazu Suzoki, Sony © IEEE Spectrum, January 2006 Nine-processor chip: 192 Gflops

40 Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 640 Cell’s Nine-Processor Chip © IEEE Spectrum, January 2006 Eight Identical Processors f = 5.6GHz (max) 44.8 Gflops


Download ppt "Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 1 Low-Power Design and Test Memory and Multicore Design Vishwani D. Agrawal Auburn."

Similar presentations


Ads by Google