Presentation is loading. Please wait.

Presentation is loading. Please wait.

Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University.

Similar presentations


Presentation on theme: "Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University."— Presentation transcript:

1 Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California Davis Tutorial Presentation 16 th International Symposium on Computer Arithmetic Santiago de Compostela, SPAIN June 18, 2003

2 16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN2 Issues to be addressed How do we compare different topologies for their efficiency ? How do we estimate speed and efficiency of our algorithm ? What criteria's should we use when developing a new algorithm ? How does power enter into this equation ?

3 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN3 Additional Issues Determine which topology is the best for given Power or Delay budget Determine which topology can stretch the furthest in terms of speed or power

4 Metric

5 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN5 Previously used estimates Counting the number of gates (logic levels): not accurate

6 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN6 Critical path in Motorola's 64-bit CLA

7 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN7 Motorola's 64- bit CLA Modified PG Block Intermediate propagate signals P i:0 are generated to speed-up C 3

8 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN8 Fan-In and Fan-Out Dependency Fan-In and Fan-Out Dependency (Oklobdzija, Barnes: IBM 1985)

9 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN9 Delay Comparison: Variable Block Adder (Oklobdzija, Barnes: IBM 1985) DelayComplexity

10 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN10 Design Objective Design takes time: –finding results afterward is not of much value There is a disconnect between measures used by computer arithmetic when developing an algorithm and what is obtained after implementation –we want to estimate as close to the measured results A simple tool that can evaluate different design trade-off for a given technology is needed Power trade-off is the most important –speed and power are tradable

11 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN11 Logical Effort Theory Back of the Envelope complexity: good for estimating speed Gate delay = linear function of load –Slope: logical effort gate driving characteristics –Intersect: parasitic gate internal load Logical Effort accuracy is not sufficient –We needed to extend and refine the method –However, that becomes more than Back of the Envelope Logical Effort does not account for possible power-delay trade-offs

12 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN12 Logical Effort Theory Excel –a platform of choice (ARITH-16) –Simple enough –Can provide computation quickly –Easy to enter a given design Technology characterization is needed: –This needs to be done only once: available for every design afterwards –Domino gate = 2 stages of dynamic and static Different driving characteristics of these stages Multi-output gate (carry-look-ahead, Ling/conditional sum) Energy model needs to be included

13 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN13 AGUs: performance and peak-current limiters High activity thermal hotspot Goal: high-performance energy-efficient design Energy Motivation Execution core 120 o C Cache Processor thermal map AGU Temp ( o C) *courtesy of Intel Corp.

14 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN14 Kogge-Stone Adder Critical path = PG+5+XOR = 7 gate stages Generate,Propagate fanout of 2,3 Maximum interconnect spans 16b Energy inefficient PG Carry-merge gates XOR 0 0

15 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN15 Sparse-tree Adder Architecture Generate every 4 th carry in parallel Side-path: 4-bit conditional sum generator 73% fewer carry-merge gates energy-efficient

16 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN16 Kogge-Stone adder (8-stage) D = 8*(GBH) 1/8 * *P

17 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN17 MXA2 – Architecture & Result Multiplexer-based Generate carries using radix-2 (P,G) 4-bit conditional sum selected by carries 4-b cell width = 17 m 9-stage critical path –Per-stage effort = 3.7 –Total effort delay = 33.3 –Total parasitic = 22.5 –Total delay = 55.8

18 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN18 HC2 – Architecture Generate even carries using radix-2 (P,G) Generate odd carries from even carries CMOS adder for sum 1-b cell width 4 m 10-stage critical path

19 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN19 HC2 – Circuits & Results

20 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN20 KS2 – Architecture & Results Generate carries using radix-2 (P,G) CMOS adder for sum Similar circuits as HC2 1-b cell width 4 m 9-stage critical path

21 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN21 KS4 – Architecture Generate carries using redundant radix-4 (P,G) Dynamic circuit 1-b cell width 4 m 6-stage critical path

22 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN22 KS4 – Circuits & Result

23 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN23 (P,G,C) Network G-Path P-Path CLA4 – Architecture Generate carries using radix-4 (P,G,C) 1-b cell width 4 m 15-stage critical path

24 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN24 CLA4 – Circuits & Result

25 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN25 LNG4 – Architecture Generate carries using Ling pseudo-carries Conditional sums selected by local & long carries 1-b cell width 5.1 m; 9-stage critical path

26 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN26 LNG4 – Circuits & Result

27 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN27 Results from Simulation Fairly consistent with logical effort analysis Per-stage delay –1.4 FO 4 (static) –0.8 FO 4 (dynamic)

28 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN28 Delay of Representative 64-b Adders

29 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN29 What happened when Power is considered ?

30 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN30 Energy-Delay Space Energy Delay E min D min speed barrier power limit Different Adders

31 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN31 Logical Effort

32 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN32 Delay in a Logic Gate Delay of a logic gate has two components d = f + p Logical effort describes relative ability of gate topology to deliver current (defined to be 1 for an inverter) Electrical effort is the ratio of output to input capacitance parasitic delay effort delay, stage effort f = gh logical effort electrical effort = C out /C in electrical effort is also called fanout *from Mathew Sanu / D. Harris

33 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN33 Logical Effort Parameters: Inverter d = gh + p Delay increases linearly with fanout More complex gates have greater g and p p=3.8ps (parasitic delay) Fanout: h =C in /C out Delay d=gh+p g=2.2 (logic effort) *from Mathew Sanu / D. Harris

34 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN34 Normalized Logical Effort: Inverter Define delay of unloaded inverter = 1 Define logical effort g of inverter = 1 Delay of complex gates can be defined w.r.t d= parasitic delay effort delay Fanout: h = C out /C in Normalized delay: d inverter g = p = d = 1 1 gh + p = h+1 *from Mathew Sanu / D. Harris

35 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN35 Computing Logical Effort DEF:Logical effort is the ratio of the input capacitance to the input capacitance of an inverter delivering the same output current Measured from delay vs. fanout plots of simulated gates Or estimated, counting capacitance in units of transistor W *from Mathew Sanu / D. Harris

36 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN36 L.E for Adder Gates Logical effort parameters obtained from simulation for std cells Define logical effort g of inverter = 1 Delay of complex gates can be defined w.r.t d=1 *from Mathew Sanu / D. Harris

37 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN37 Normalized L.E Logical effort & parasitic delay normalized to that of inverter Gate typeLogical Eff. (g) Parasitics (Pinv) Inverter11 Dyn. Nand Dyn. CM Dyn. CM-4N13.71 Static CM Mux XOR *from Mathew Sanu

38 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN38 Delay of a string of gates Delay of a path, D = d i = g i h i + p i g i & p i are constants To minimize path delay, optimal values of h i are to be determined D is minimized when each stage bears the same effort, i.e. g i h i = g i+1 h i+1 *from Mathew Sanu / D. Harris

39 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN39 Minimizing path delay Logical Effort of a string of gates: Path Electrical Effort: Branching Effort Path Branching Effort: Path Effort: F=GBH gigi G = C out(path) C in(path) H = hihi = bibi B = C on-path + C off-path C on-path b = Delay is minimized when each stage bears the same effort: f = g i h i = F 1/N The minimum delay of an N-stage path is: NF 1/N + P *from Mathew Sanu / D. Harris

40 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN40 Inclusion of Wire Delay into Logical Effort

41 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN41 Wiring Load Wiring in hand analysis –Only lumped capacitance included Wiring in HSPICE –Short wire: 1-segment -model RC network –Long wire: 4-segment -model RC network –Using worst-case wire capacitance Wire length –Estimated from most critical 1-bit pitch

42 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN42 Modeling interconnect cap. Include interconnect cap in branching factor C on-path + C off-path C on-path b = CM0 C off-path C on-path PG Adder bitpitch CM0 C int C on-path PG Adder bitpitch C off-path = 2 C on-path + C off-path +C int C on-path b = = 2+ C int C on-path = 2 + I I : % int. cap to gate cap in 1 adder bitpitch

43 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN43 Branching g0g1 g2g3 Logical Effort assumes the branching factor of this circuit to be 2. This is incorrect and can create inaccuracies

44 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN44 f 0 = f 1, f 2 = f 3 T d1 = (f 0 + f 1 + parasitics) T d2 = (f 2 + f 3 + parasitics) g0g1 g2g3 Minimum Delay occurs when T d1 = T d2 Correction on Branching

45 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN45 Real Branching Calculation Branching only equals 2 when: This explains why we had to resort to Excel !

46 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN46 Technology Characterization

47 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN47 Characterization Setup Logical Effort Requirements: –Equalize input and output transitions. Logical Effort is characterized by varying the h (Cout/Cin) of a gate. By using a variable load of inverters each gate can be characterized over the same range of loads. The Logical Effort of each gate is characterized for each input. Energy is characterized for each output transition of the gate caused by each input transition. i.e. for an inverter: energy is measured for t LH and t HL

48 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN48 LE Characterization Setup for Static Gates Gate In t LH t HL Average Energy.. Variable Load

49 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN49 LE Characterization Setup for Dynamic Gates Gate In t HL Energy Variable Load

50 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN50 LE Table (Static CMOS) Technology: P/N Ratio = 2 INV = 3.67, p INV = 4.29 Measured on worst-case single-input switching

51 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN51 Static CMOS Gates: Delay Graphs

52 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN52 Static Gates: Pull-up Delay Graph

53 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN53 LE Table (Dynamic CMOS) Technology: Minimum-sized keeper included Measured on all-input switching of worst path

54 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN54 Dynamic CMOS: Delay Graphs

55 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN55 Dynamic CMOS: Delay Graphs

56 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN56 Energy Calculation

57 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN57 Energy Calculation 8X Minimal Size Dyn-NAND 16X Minimal Size Dyn-NAND

58 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN58 Energy Calculation

59 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN59 Energy Calculation

60 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN60 Energy Calculation NAND-2

61 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN61 Examples

62 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN62 64-Bit Adders Han-Carlson (prefix-2, HC2): Static and Dynamic Han-Carlson (prefix-2, HC2-2): Dynamic-Static Kogge-Stone (prefix-2, KS2): Static and Dynamic Kogge-Stone (prefix-2, KS2-2): Dynamic-Static Quaternary-Tree (prefix-2, QT2): Static and Dynamic Included wire delay, t delay = 0.7R wire C wire Included wire energy, E w = C wire V 2

63 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN63 Adder S0 S63 A0 A63 C wire Test Setup 1mm wire H=(Cin + Cwire)/Cin

64 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN64 Energy-Delay Estimates

65 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN65 Adders: Energy Dynamic: KS, HC Static Dynamic-Static QT KS HC

66 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN66 Dynamic Static Implementation of Carry-Merge stage Regular Domino ImplementationCompound-Domino Implementation inverters to be eliminated

67 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN67 Energy-Delay comparison of 64-bit KS, HC and QT adders

68 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN68 Adders: Critical Path Energy QT dynamic-static HC dynamic-static QT static KS dynamic-static HC-dynamic KS dynamic HC-static KS-static

69 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN69 Intel 32-bit Adder 0.13u 1.2V [VLSI-2002] QT KS KS estimated QT Estimated

70 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN70 Energy-Delay comparison of 32-bit QT and KS adders: estimated vs. simulation in 0.10mm technology

71 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN71 Est. Results: All Adders w/o Wires

72 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN72 Est. Results: All Adders w/ Wires

73 June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN73Conclusion Using realistic measures for comparing various designs leads to better design choices Power is as important as speed Making comparison in Energy-Delay space is necessary: –power can always be traded for speed and vice versa Wire effects are significant Leakage currents ?


Download ppt "Design of Power Efficient VLSI Arithmetic: Speed and Power Trade-offs Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University."

Similar presentations


Ads by Google