Presentation is loading. Please wait.

Presentation is loading. Please wait.

Low Power Architecture and Implementation of Multicore Design Khushboo Sheth, Kyungseok Kim Fan Wang, Siddharth Dantu ELEC6270 Low Power Design of Electronic.

Similar presentations


Presentation on theme: "Low Power Architecture and Implementation of Multicore Design Khushboo Sheth, Kyungseok Kim Fan Wang, Siddharth Dantu ELEC6270 Low Power Design of Electronic."— Presentation transcript:

1 Low Power Architecture and Implementation of Multicore Design Khushboo Sheth, Kyungseok Kim Fan Wang, Siddharth Dantu ELEC6270 Low Power Design of Electronic Circuits Team Project VLSI D&T Seminar Nov. 8 2006 Advisor: Dr. V Agrawal

2 Project Objectives  Design and verify 16-bit ALU with synchronous clocked inputs and outputs.  Study low-voltage power and delay characteristics of the design.  Redesign ALU for minimum power and highest speed.

3 Component of Power Dissipation  Dynamic  Power due to Signal transitions. Logic power (due to logic transitions). Logic power (due to logic transitions). Glitch power (due to glitches). Glitch power (due to glitches).  Short Circuit power  Static  Leakage power (due to leakage currents).

4 Power components in CMOS circuit V DD Ground CLCL R on R=large v i (t) v o (t) Dynamic power Short circuit power Leakage power Power =CV DD 2

5 1-bit ALU Design 1-bit ALU Core Reg B Reg A Reg C

6 1 bit ALU Core Simulation Specification Technology TSMC 0.25 um Application Voltage 2.5 Volt N-MOS Vth 0.365 V P-MOS Vth -0.5625 V Temperature 90 C degree Spice Simulator Eldo ver. 6.3.1.1 Sweep Supply Voltage (6 point) 0,0.5,1.0,1.5,2.0,2.5 V

7 Combinational Logic DFF NX156 NX80 NX16 NX60 A B CLK C CYIN CY Z 1-bit ALU Core Timing ( Vdd=2.5V ) Longest Path in Combinational Logic: c <= a+b (Opcode 0000) opcode[3:0] COMPOUT C CY COMPOUT Z opcode 1010 (nand) opcode 1001 (c<=b) opcode 1000 (c<=a) opcode 0111 (and) opcode 0110 (or) opcode 0101 (nor) opcode 0100 (xor) opcode 0011 (not equal) opcode 0010 (equal) opcode 0001 (a-b) opcode 0000 (a+b) opcode others (all zero’s output)

8 1-bit ALU Core Sweep Vdd from 2.5V to 0V 2.5V 2.0V 1.5V 1.0V 0.5V 0.0V Analog Mode C(NX156) Output Vdd=2.5 Vdd=0.5

9 1Bit ALU Core Logic Operation Voltage @200Mz Supply Voltage Sweep near PMOS Vth = -0.5625 V ( ver. NMOS Vth= 0.365) Sweep From Vsupply = 0.50 to 1.00 Volt ( linear increment 0.05 V, 11 point) Vsupply = 0.85 V Correct Operation Overshoot Ripples Vsupply = 0.85 V (Analog Domain) Output Input Vsupply = 0.80 V (Analog Domain) Vsupply = 0.80 V Wrong Operation Output Input opcode 1000 (c<=a)

10 1-bit ALU Average Power vs. Delay @200MHz 1-bit ALU Core Average Power 1bit ALU Block Average Power 1-bit ALU Core Delay 0.01.00.52.01.52.5 31.0283 0.5427 82.8828 354.563 179.9153 2.2493 1.4203 0.4955 0.7204 0.4123 Power = CV DD 2

11 16 Bit ALU (Single Core) Design Combinational Logic (16-Bit ALU) Output Input Register CK Supply voltage= V ref Total capacitance switched per cycle= C ref Clock frequency= f Power consumption:P ref = C ref V ref 2 f C ref

12 16-BIT ALU Vectors abOpcodecyin Vector110101010101010100001010101010101 0001 (sub) 0 Vector201010101010101011010101010101010 0011 (comp) 0 Vector301010101010101011010101010101010 0100 (xor) 0 Vector411111111111111110000000000000001 0000 (add) 0 Vector501100110011001100000000000000000 1010 (nand) 0 Vector600010110011011010101010010101010 0001 (sub) 0 *Vector4 activate the critical path, carryout = 1

13 16-Bit ALU Simulation Result Circuit information: # 694 Gates Clock Frequency applied: 10 MHz Temperature: 27C o Vectors Applied: 6 vectors TSMC025 Technology : Vthn = 0.365 V, Vthp = -0.562 V Simulation Time: 700 ns By ELDO, SPICE simulation Simulation Time: 700 ns Voltage (v) (v)2.51.250.850.6250.45 Static Power(nw) 24.556.023.051.841.71 Average Power (uw) 391.1662.6226.6614.573.56 Delay (ns) 2.837.1418.8873.21 Ckt failed

14 16 Bit ALU Functional Correct Operation at 2.5 V, 1.25 V, 0.85 V and 0.625 V for 6 Vectors

15 Circuit fail @0.45 V (< Vth) Circuit fail @0.45 V (< Vth) Simulated Single Vector Pair

16 16-Bit ALU Power Savings and Delay Increase with Reference @ 2.5 Volts Voltage (v) (v)(Reference)VDD 2.5V 2.5V 1.25 V VDD/2 VDD/2 0.85 V VDD/3 VDD/3 0.625 V VDD/4 VDD/4 Average Power (uw) 391.16 62.22 62.22P2.5/6.2484% 26.22 26.22P2.5/14.6793% 14.67 14.67P2.5/26.6696% Delay (ns) 2.837.142.57*D2.5 18.87 18.876.67*D2.5 73.21 73.2125.87*D2.5

17 16 Bit ALU Power Savings and Delay Increase with Reference @1.25 Volts Voltage(v)(Reference) 1.25 1.250.85(VDD/1.5)0.625(VDD/2) Average Power (uw)62.2226.66P1.25/2.3557%14.67P1.25/4.2777% Delay(ns)7.1418.87 2.63 * D1.25 73.21 10.25 * D1.25

18 Different Technology Impact On Power Saving 16 Bit ALU Simulation Setup:  Supply Voltage: 2.5v  Simulation Transient Time: 700 ns  6 vectors  Temperature: 27C o TechnologyTSMC035TSMC025 #Gates after synthesis 734 gates 694 gate Voltage 2.5 V Static Power 24.555 N Watts 24.550 N Watts Average Power 381.60 U Watts 391.16 U Watts Delay 3.12 ns 2.83 ns

19 Temperature Influence On Power  734  Circuit information: # 734 Gates   Clock Frequency applied: 10 MHz ; Vdd=2.5V  Vectors Applied: 6 vectors  Simulation Time: 700 ns   TSMC035 Technology Temperature (C o ) 0 27 276090120900 Static Power (nw) (nw)12.724.575.51357.364803.33.38mw Average Power (uw) 404.23381.60378.15367.48363.1570.43 w Delay (ns) 2.583.123.183.533.91 Ckt fail!!

20 Multicore Design Methodology  Lower supply voltage This slows down circuit speed This slows down circuit speed Use parallel computing to gain the speed back Use parallel computing to gain the speed back  Multi-core means to place two or more complete cores within a single module.  This architecture is a “divide and conquer” strategy. By splitting the work between multiple execution cores, a multi-core design can perform more work within a given clock cycle.  About more than 60% reduction in power is observed. Source: http://www.eng.auburn.edu/~vagrawal/D&TSEMINAR_SPR06/SLIDES/Agrawal_DTSem06.ppt

21 Parallel Architecture Comb. Logic Copy 1 Comb. Logic Copy 2 Comb. Logic Copy 4 Rgst Register Rgst 4 to 1 multiplexer Input Output CK f f/4 Rgst f/4 Comb. Logic Copy 3 f/4 Mux control Ck0 Ck1 Ck2 Ck3 16 Bit ALU

22 Control Signals, N = 4 CK Phase 1 Phase 2 Phase 3 Phase 4 Mux control 000110110001 1011……

23 16 Bit ALU Multi-core Power Savings and Delay Increase with Reference @2.5 Volts 16 Bit ALU Multi-core Power Savings and Delay Increase with Reference @2.5 Volts Temperature: 27C Vectors Applied: 6 vectors Circuit information: # 2617 Gates Clock Frequency applied: 10 MHz Temperature: 27C Vectors Applied: 6 vectors TSMC025 Technology : Vthn = 0.365 V, Vthp = -0.562 V Simulation Setup: Simulation Time: 700 ns Simulator: ELDO(Spice) Simulation Setup: Simulation Time: 700 ns Voltage (v) (v)(Reference) 2.5 2.51.25VDD/20.85VDD/30.625VDD/40.45 Static Power (nw) 96.3523.5611.947.216.37 Average Power (uw)687.8695.64UP2.5/7.1986%40.93UP2.5/16.894%21.13UP2.5/32.5594.75%7.26U Delay(ns)0.110.575.18*D2.51.5213.8*D2.530.70279.1*D2.5 Ckt failed

24 16 Bit ALU Multicore Power Savings and Delay Increase with Reference @1.25 Volts Voltage(v) (Reference) (Reference)1.25 VDD VDD0.85VDD/1.50.625VDD/2 Average Power (uw)95.6440.93P1.25/2.3357%21.13P1.25/4.5278% Delay(ns)0.571.52 2.67 * D1.25 30.7 53.86 * D1.25

25 Power and Delay comparison @2.5 V Reference Design with Multicore Design at different voltages Voltage(v)2.5VDD Reference Design 1.25 Multicore Design VDD/2 0.85 Multicore Design VDD/30.725MulticoreDesignVDD/3.50.7MulticoreDesignVDD/3.6 0.625 Multicore Design VDD/4 Average Power (uw)391.1695.64P2.5/4.0976%40.93P2.5/9.5689.5%25.6P2.5/15.2393.45%22.35P2.5/17.594.3%21.14P2.5/18.594.6% Delay(ns)2.830.57D2.5/4.961.52D2.5/1.862.61D2.5/1.083.04D2.5/0.9330.7D2.5/0.09

26 Summary  For Single core ALU design we get more than 60% power savings at reduced voltage but at the cost of performance.  With Reference of 2.5 Volts we observe power drops faster than 1/Vsquare.  With Reference of 1.25 Volts, power drop is almost equal to 1/Vsquare.  Multi-core design helps to gain the speed back at reduced voltage and consumes less power.

27 References  ELEC6270 Low Power Design Electronics Class Slides from Dr. Agrawal  Spring 06, Dr. Agrawal’ Presentation on VLSI D&T seminar “Multi-Core Parallelism for Low-Power Design” Multi-Core Parallelism for Low-Power DesignMulti-Core Parallelism for Low-Power Design  www.tomshardware.com  N. H. E. Weste and D. Harris, CMOS VLSI Design, Third Edition, Reading, Massachusetts, Addison-Wesley, 2005.  L. Shang, R.P Dick, “Thermal crisis: challenges and potential solutions,” Potentials IEEE, vol. 25, Issue 5, 2006  International Technology Roadmap for Semiconductors. http://public.itrs.net http://public.itrs.net  Alokik Kanwal, “A review of Carbon Nanotube Field Effect Transistors” Version 2.0, 2003  K. K Likharev, “Single Electron Devices and their applications,” Proc IIEEE, vol. 87, no. 4, pp. 606-632, Apr. 1999  A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design, Boston: Kluwer Academic Publishers (Now Springer), 1995.  “Quad-core processor forecas”,Alexander Wolfe @TechWeb Alexander WolfeTechWebAlexander WolfeTechWeb

28 Thank You !!!


Download ppt "Low Power Architecture and Implementation of Multicore Design Khushboo Sheth, Kyungseok Kim Fan Wang, Siddharth Dantu ELEC6270 Low Power Design of Electronic."

Similar presentations


Ads by Google