Download presentation

Presentation is loading. Please wait.

Published byEsteban Swindell Modified about 1 year ago

1
Speed and Power Trade-offs: Applied to Adder Design: Vojin G. Oklobdzija, Ram Krishnamurthy Intel AMR / ACSEL Laboratory Intel Corp/ University of California Davis From: Tutorial Presentation 16 th International Symposium on Computer Arithmetic Santiago de Compostela, SPAIN June 18, 2003

2
16th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 2 Issues to be addressed How do we compare different topologies for their efficiency ? How do we estimate speed and efficiency of our algorithm ? What criteria's should we use when developing a new algorithm ? How does power enter into this equation ?

3
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 3 Additional Issues Determine which topology is the best for given Power or Delay budget Determine which topology can stretch the furthest in terms of speed or power

4
MetricMetric

5
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 5 Previously used estimates Counting the number of gates (logic levels): not accurate

6
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 6 Critical path in Motorola's 64-bit CLA

7
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 7 Motorola's 64-bit CLA Modified PG Block Intermediate propagate signals P i:0 are generated to speed-up C 3

8
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 8 Fan-In and Fan-Out Dependency Fan-In and Fan-Out Dependency (Oklobdzija, Barnes: IBM 1985)

9
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 9 Delay Comparison: Variable Block Adder (Oklobdzija, Barnes: IBM 1985) DelayComplexity

10
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 10 Design Objective Design takes time: –finding results afterward is not of much value There is a disconnect between measures used by computer arithmetic when developing an algorithm and what is obtained after implementation –we want to estimate as close to the measured results A simple tool that can evaluate different design trade-off for a given technology is needed Power trade-off is the most important –speed and power are tradable

11
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 11 Logical Effort Theory “Back of the Envelope” complexity: good for estimating speed Gate delay = linear function of load –Slope: logical effort gate driving characteristics –Intersect: parasitic gate internal load “Logical Effort” accuracy is not sufficient –We needed to extend and refine the method –However, that becomes more than “Back of the Envelope” Logical Effort does not account for possible power-delay trade-offs

12
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 12 Logical Effort Theory Excel –a platform of choice (ARITH-16) –Simple enough –Can provide computation quickly –Easy to enter a given design Technology characterization is needed: –This needs to be done only once: available for every design afterwards –Domino gate = 2 stages of dynamic and static Different driving characteristics of these stages Multi-output gate (carry-look-ahead, Ling/conditional sum) Energy model needs to be included

13
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 13 AGUs: performance and peak-current limiters High activity thermal hotspot Goal: high-performance energy-efficient design Energy Motivation Execution core 120 o C Cache Processor thermal map AGU Temp ( o C) *courtesy of Intel Corp.

14
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 14 Critical Paths of Representative 64-bit Adders

15
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 15 Kogge-Stone Adder Critical path = PG+5+XOR = 7 gate stages Generate,Propagate fanout of 2,3 Maximum interconnect spans 16b Energy inefficient PG Carry-merge gates XOR 0 0

16
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 16 Sparse-tree Adder Architecture Generate every 4 th carry in parallel Side-path: 4-bit conditional sum generator 73% fewer carry-merge gates energy-efficient

17
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 17 Kogge-Stone adder (8-stage) D = 8*(GBH) 1/8 * *P

18
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 18 MXA2 – Architecture & Result Multiplexer-based Generate carries using radix-2 (P,G) 4-bit conditional sum selected by carries 4-b cell width = 17 m 9-stage critical path –Per-stage effort = 3.7 –Total effort delay = 33.3 –Total parasitic = 22.5 –Total delay = 55.8

19
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 19 HC2 – Architecture Generate even carries using radix-2 (P,G) Generate odd carries from even carries CMOS adder for sum 1-b cell width 4 m 10-stage critical path

20
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 20 HC2 – Circuits & Results

21
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 21 KS2 – Architecture & Results Generate carries using radix-2 (P,G) CMOS adder for sum Similar circuits as HC2 1-b cell width 4 m 9-stage critical path

22
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 22 KS4 – Architecture Generate carries using redundant radix-4 (P,G) Dynamic circuit 1-b cell width 4 m 6-stage critical path

23
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 23 KS4 – Circuits & Result

24
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 24 (P,G,C) Network G-Path P-Path CLA4 – Architecture Generate carries using radix-4 (P,G,C) 1-b cell width 4 m 15-stage critical path

25
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 25 CLA4 – Circuits & Result

26
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 26 LNG4 – Architecture Generate carries using Ling pseudo-carries Conditional sums selected by local & long carries 1-b cell width 5.1 m; 9-stage critical path

27
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 27 LNG4 – Circuits & Result

28
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 28 Results from Simulation Fairly consistent with logical effort analysis Per-stage delay –1.4 FO 4 (static) –0.8 FO 4 (dynamic)

29
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 29 Delay of Representative 64-b Adders

30
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 30 What happened when Power is considered ?

31
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 31 What happened when Power is considered ? Must look at Energy-Delay Space of designs

32
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 32 Energy-Delay Space Energy Delay E min D min speed barrier power limit Different Adders

33
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 33 Logical Effort in Energy-Delay Space It is possible to lower energy by trading delay? or … Most design approaches focus here

34
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 34 Logical Effort

35
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 35 Delay in a Logic Gate Delay of a logic gate has two components d = f + p Logical effort describes relative ability of gate topology to deliver current (defined to be 1 for an inverter) Electrical effort is the ratio of output to input capacitance parasitic delay effort delay, stage effort f = gh logical effort electrical effort = C out /C in electrical effort is also called “fanout” *from Mathew Sanu / D. Harris

36
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 36 Logical Effort Parameters: Inverter d = gh + p Delay increases linearly with fanout More complex gates have greater g and p p=3.8ps (parasitic delay) Fanout: h =C in /C out Delay d=gh+p g=2.2 (logic effort) *from Mathew Sanu / D. Harris

37
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 37 Normalized Logical Effort: Inverter Define delay of unloaded inverter = 1 Define logical effort ‘g’ of inverter = 1 Delay of complex gates can be defined w.r.t d= parasitic delay effort delay Fanout: h = C out /C in Normalized delay: d inverter g = p = d = 1 1 gh + p = h+1 *from Mathew Sanu / D. Harris

38
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 38 Computing Logical Effort DEF:Logical effort is the ratio of the input capacitance to the input capacitance of an inverter delivering the same output current Measured from delay vs. fanout plots of simulated gates Or estimated, counting capacitance in units of transistor W *from Mathew Sanu / D. Harris

39
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 39 L.E for Adder Gates Logical effort parameters obtained from simulation for std cells Define logical effort ‘g’ of inverter = 1 Delay of complex gates can be defined w.r.t d=1 *from Mathew Sanu / D. Harris

40
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 40 Normalized L.E Logical effort & parasitic delay normalized to that of inverter Gate typeLogical Eff. (g) Parasitics (Pinv) Inverter11 Dyn. Nand Dyn. CM Dyn. CM-4N13.71 Static CM Mux XOR *from Mathew Sanu

41
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 41 Delay of a string of gates Delay of a path, D = d i = g i h i + p i g i & p i are constants To minimize path delay, optimal values of h i are to be determined D is minimized when each stage bears the same effort, i.e. g i h i = g i+1 h i+1 *from Mathew Sanu / D. Harris

42
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 42 Minimizing path delay Logical Effort of a string of gates: Path Electrical Effort: Branching Effort Path Branching Effort: Path Effort: F=GBH gigi G = C out(path) C in(path) H = hihi = bibi B = C on-path + C off-path C on-path b = Delay is minimized when each stage bears the same effort: f = g i h i = F 1/N The minimum delay of an N-stage path is: NF 1/N + P *from Mathew Sanu / D. Harris

43
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 43 Inclusion of Wire Delay into Logical Effort

44
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 44 Wiring Load Wiring in hand analysis –Only lumped capacitance included Wiring in HSPICE –Short wire: 1-segment -model RC network –Long wire: 4-segment -model RC network –Using worst-case wire capacitance Wire length –Estimated from most critical 1-bit pitch

45
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 45 Modeling interconnect cap. Include interconnect cap in branching factor C on-path + C off-path C on-path b = CM0 C off-path C on-path PG Adder bitpitch CM0 C int C on-path PG Adder bitpitch C off-path = 2 C on-path + C off-path +C int C on-path b = = 2+ C int C on-path = 2 + I I : % int. cap to gate cap in 1 adder bitpitch

46
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 46 Branching g0g1 g2g3 Logical Effort assumes the “branching” factor of this circuit to be 2. This is incorrect and can create inaccuracies

47
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 47 f 0 = f 1, f 2 = f 3 T d1 = (f 0 + f 1 + parasitics) T d2 = (f 2 + f 3 + parasitics) g0g1 g2g3 Minimum Delay occurs when T d1 = T d2 Correction on Branching

48
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 48 “Real” Branching Calculation Branching only equals 2 when: This explains why we had to resort to Excel !

49
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 49 Technology Characterization

50
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 50 Characterization Setup Logical Effort Requirements: –Equalize input and output transitions. Logical Effort is characterized by varying the h (Cout/Cin) of a gate. By using a variable load of inverters each gate can be characterized over the same range of loads. The Logical Effort of each gate is characterized for each input. Energy is characterized for each output transition of the gate caused by each input transition. i.e. for an inverter: energy is measured for t LH and t HL

51
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 51 LE Characterization Setup for Static Gates Gate In t LH t HL Average Energy.. Variable Load

52
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 52 LE Characterization Setup for Dynamic Gates Gate In t HL Energy Variable Load

53
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 53 LE Table (Static CMOS) Technology: P/N Ratio = 2 INV = 3.67, p INV = 4.29 Measured on worst-case single-input switching

54
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 54 Static CMOS Gates: Delay Graphs

55
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 55 Static Gates: Pull-up Delay Graph

56
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 56 LE Table (Dynamic CMOS) Technology: Minimum-sized keeper included Measured on all-input switching of worst path

57
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 57 Dynamic CMOS: Delay Graphs

58
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 58 Dynamic CMOS: Delay Graphs

59
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 59 Energy Calculation

60
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 60 Energy Calculation 8X Minimal Size Dyn-NAND 16X Minimal Size Dyn-NAND

61
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 61 Energy Calculation

62
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 62 Energy Calculation

63
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 63 Energy Calculation NAND-2

64
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 64 Examples

65
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN Bit Adders Han-Carlson (prefix-2, HC2): Static and Dynamic Han-Carlson (prefix-2, HC2-2): Dynamic-Static Kogge-Stone (prefix-2, KS2): Static and Dynamic Kogge-Stone (prefix-2, KS2-2): Dynamic- Static Quaternary-Tree (prefix-2, QT2): Static and Dynamic Included wire delay, t delay = 0.7R wire C wire Included wire energy, E w = C wire V 2

66
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 66 Adder S0 S63 A0 A63 C wire Test Setup 1mm wire H=(Cin + Cwire)/Cin

67
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 67 Energy-Delay Estimates

68
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 68 Adders: Energy Dynamic: KS, HC Static Dynamic-Static QT KS HC

69
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 69 Dynamic Static Implementation of Carry-Merge stage Regular Domino ImplementationCompound-Domino Implementation inverters to be eliminated

70
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 70 Energy-Delay comparison of 64-bit KS, HC and QT adders

71
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 71 Adders: Critical Path Energy QT dynamic-static HC dynamic-static QT static KS dynamic-static HC-dynamic KS dynamic HC-static KS-static

72
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 72 Intel 32-bit Adder 0.13u 1.2V [VLSI-2002] QT KS KS estimated QT Estimated

73
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 73 Energy-Delay comparison of 32-bit QT and KS adders: estimated vs. simulation in 0.10mm technology

74
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 74 Est. Results: All Adders w/o Wires

75
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 75 Est. Results: All Adders w/ Wires

76
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 76 Energy-Delay Trade-offs Initial Design Optimized Design Worst Case Energy Vector With 100% Input Activity Energy Saving Delay Saving 90nm technology Collaboration with Intel AMR

77
June 18, th International Symposium on Computer Arithmetic, Santiago de Compostela, SPAIN 77Conclusion Using realistic measures for comparing various designs leads to better design choices Power is as important as speed Making comparison in Energy-Delay space is necessary: –power can always be traded for speed and vice versa Wire effects are significant Leakage currents ?

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google