Presentation is loading. Please wait.

Presentation is loading. Please wait.

VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California

Similar presentations


Presentation on theme: "VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California"— Presentation transcript:

1 VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California http://www.ece.ucdavis.edu/acsel

2 Prof. V.G. OklobdzijaVLSI Arithmetic2 Introduction Digital Computer Arithmetic belongs to Computer Architecture, however, it is also an aspect of logic design. The objective of Computer Arithmetic is to develop appropriate algorithms that are utilizing available hardware in the most efficient way. Ultimately, speed, power and chip area are the most often used measures, making a strong link between the algorithms and technology of implementation.

3 Prof. V.G. OklobdzijaVLSI Arithmetic3 Basic Operations Addition Multiplication Multiply-Add Division Evaluation of Functions Multi-Media

4 Addition of Binary Numbers

5 Prof. V.G. OklobdzijaVLSI Arithmetic5 Addition of Binary Numbers Full Adder. The full adder is the fundamental building block of most arithmetic circuits: The sum and carry outputs are described as: Full Adder C in C out sisi aiai bibi

6 Prof. V.G. OklobdzijaVLSI Arithmetic6 Addition of Binary Numbers Propagate Generate InputsOutputs cici aiai bibi sisi c i+1 00000 00110 01010 01101 10010 10101 11001 11111

7 Prof. V.G. OklobdzijaVLSI Arithmetic7 Full-Adder Implementation Full Adder operations is defined by equations: One-bit adder could be implemented as shown Carry-Propagate: and Carry-Generate g i

8 Prof. V.G. OklobdzijaVLSI Arithmetic8 High-Speed Addition One-bit adder could be implemented more efficiently because MUX is faster

9 Prof. V.G. OklobdzijaVLSI Arithmetic9 The Ripple-Carry Adder

10 Prof. V.G. OklobdzijaVLSI Arithmetic10 The Ripple-Carry Adder From Rabaey

11 Prof. V.G. OklobdzijaVLSI Arithmetic11 Inversion Property From Rabaey

12 Prof. V.G. OklobdzijaVLSI Arithmetic12 Minimize Critical Path by Reducing Inverting Stages From Rabaey

13 Prof. V.G. OklobdzijaVLSI Arithmetic13 Ripple Carry Adder Carry-Chain of an RCA implemented using multiplexer from the standard cell library: Critical Path Oklobdzija, ISCAS’88

14 Prof. V.G. OklobdzijaVLSI Arithmetic14 Manchester Carry-Chain Realization of the Carry Path Simple and very popular scheme for implementation of carry signal path

15 Prof. V.G. OklobdzijaVLSI Arithmetic15 Original Design T. Kilburn, D. B. G. Edwards, D. Aspinall, "Parallel Addition in Digital Computers: A New Fast "Carry" Circuit", Proceedings of IEE, Vol. 106, pt. B, p. 464, September 1959.

16 Prof. V.G. OklobdzijaVLSI Arithmetic16 Manchester Carry Chain (CMOS) Kilburn, et al, IEE Proc, 1959. Implement P with pass-transistors Implement G with pull-up, kill (delete) with pull-down Use dynamic logic to reduce the complexity and speed up

17 Prof. V.G. OklobdzijaVLSI Arithmetic17 Pass-Transistor Realization in DPL

18 Prof. V.G. OklobdzijaVLSI Arithmetic18 Carry-Skip Adder MacSorley, Proc IRE 1/61 Lehman, Burla, IRE Trans on Comp, 12/61

19 Prof. V.G. OklobdzijaVLSI Arithmetic19 Carry-Skip Adder Bypass From Rabaey

20 Prof. V.G. OklobdzijaVLSI Arithmetic20 Carry-Skip Adder: N-bits, k-bits/group, r=N/k groups

21 Prof. V.G. OklobdzijaVLSI Arithmetic21 Carry-Skip Adder k

22 Prof. V.G. OklobdzijaVLSI Arithmetic22 Variable Block Adder (Oklobdzija, Barnes: IBM 1985)

23 Prof. V.G. OklobdzijaVLSI Arithmetic23 Carry-chain of a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985)

24 Prof. V.G. OklobdzijaVLSI Arithmetic24 Carry-chain of a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985) 1 1 3 3 4 4 5 5 6  =9 Any-point-to-any-point delay = 9  as compared to 12  for CSKA

25 Prof. V.G. OklobdzijaVLSI Arithmetic25 Carry-chain block size determination for a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985)

26 Prof. V.G. OklobdzijaVLSI Arithmetic26 Delay Calculation for Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Delay model:

27 Prof. V.G. OklobdzijaVLSI Arithmetic27 Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Variable Group Length Oklobdzija, Barnes, Arith’85

28 Prof. V.G. OklobdzijaVLSI Arithmetic28 Carry-chain of a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Variable Block Lengths No closed form solution for delay It is a dynamic programming problem

29 Prof. V.G. OklobdzijaVLSI Arithmetic29 Delay Comparison: Variable Block Adder (Oklobdzija, Barnes: IBM 1985)

30 Prof. V.G. OklobdzijaVLSI Arithmetic30 Delay Comparison: Variable Block Adder VBA- Multi-Level CLA VBA

31 Prof. V.G. OklobdzijaVLSI Arithmetic31 Fan-Out Dependency

32 Prof. V.G. OklobdzijaVLSI Arithmetic32 Fan-In Dependency

33 Prof. V.G. OklobdzijaVLSI Arithmetic33 Delay Comparison: Variable Block Adder (Oklobdzija, Barnes: IBM 1985)

34 Prof. V.G. OklobdzijaVLSI Arithmetic34

35 Prof. V.G. OklobdzijaVLSI Arithmetic35 Carry-Lookahead Adder (Weinberger and Smith) A. Weinberger and J. L. Smith, “A Logic for High-Speed Addition”, National Bureau of Standards, Circ. 591, p.3-12, 1958.

36 Prof. V.G. OklobdzijaVLSI Arithmetic36 Carry-Lookahead Adder (Weinberger and Smith)

37 Prof. V.G. OklobdzijaVLSI Arithmetic37 Carry-Lookahead Adder One gate delay  to calculate p, g One  to calculate P and two for G Three gate delays To calculate C 4(j+1) Compare that to 8  in RCA !

38 Prof. V.G. OklobdzijaVLSI Arithmetic38 Carry-Lookahead Adder (Weinberger and Smith) Additional two gate delays C 16 will take a total of 5  vs. 32  for RCA !

39 Prof. V.G. OklobdzijaVLSI Arithmetic39 32-bit Carry Lookahead Adder

40 Prof. V.G. OklobdzijaVLSI Arithmetic40 Carry-Lookahead Adder (Weinberger and Smith: original derivation )

41 Prof. V.G. OklobdzijaVLSI Arithmetic41 Carry-Lookahead Adder (Weinberger and Smith: original derivation )

42 Prof. V.G. OklobdzijaVLSI Arithmetic42 Carry-Lookahead Adder (Weinberger and Smith) please notice the similarity with Parallel-Prefix Adders !

43 Prof. V.G. OklobdzijaVLSI Arithmetic43 Carry-Lookahead Adder (Weinberger and Smith) please notice the similarity with Parallel-Prefix Adders !

44 Delay Optimized CLA B. Lee, V. G. Oklobdzija Journal of VLSI Signal Processing, Vol.3, No.4, October 1991

45 Prof. V.G. OklobdzijaVLSI Arithmetic45 Delay Optimized CLA: Lee-Oklobdzija ‘91 (a.) Fixed groups and levels (b.) variable-sized groups, fixed levels (c.) variable-sized groups and fixed levels (d.) variable-sized groups and levels

46 Prof. V.G. OklobdzijaVLSI Arithmetic46 Two-Levels of Logic Implementation of the Carry Block

47 Prof. V.G. OklobdzijaVLSI Arithmetic47 Two-Levels of Logic Implementation of the Carry-Lookahead Block

48 Prof. V.G. OklobdzijaVLSI Arithmetic48 Three-Levels of Logic Implementation of the Carry Block (restricted fan-in)

49 Prof. V.G. OklobdzijaVLSI Arithmetic49 Three-Levels of Logic Implementation of the Carry Lookahead (restricted fan-in)

50 Prof. V.G. OklobdzijaVLSI Arithmetic50 Delay Optimized CLA: Lee-Oklobdzija ‘91 Delay: Two-level BCLA Delay: Three-level BCLA

51 Prof. V.G. OklobdzijaVLSI Arithmetic51 Delay Optimized CLA: Lee-Oklobdzija ‘91 (a.) 2-level BCLA  =8.5nS (b.) 3-level BCLA  =8.9nS

52 Motorola: CLA Implementation Example A. Naini, D. Bearden and W. Anderson, “A 4.5nS 96b CMOS Adder Design”, Proceedings of the IEEE Custom Integrated Circuits Conference, May 3-6, 1992.

53 Prof. V.G. OklobdzijaVLSI Arithmetic53 Critical path in Motorola's 64-bit CLA

54 Prof. V.G. OklobdzijaVLSI Arithmetic54 Motorola's 64-bit CLA conventional PG Block

55 Prof. V.G. OklobdzijaVLSI Arithmetic55 Motorola's 64-bit CLA Modified PG Block Intermediate propagate signals P i:0 are generated to speed-up C 3

56 Ling’s Adder Huey Ling, “High-Speed Binary Adder” IBM Journal of Research and Development, Vol.5, No.3, 1981.

57 Prof. V.G. OklobdzijaVLSI Arithmetic57 Ling Adder Variation of CLA: Ling, IBM J. Res. Dev, 5/81 Ling’s equations:

58 Prof. V.G. OklobdzijaVLSI Arithmetic58 Ling Adder Ling’s equation Doran, Trans on Comp 9/88 Propagates information on two bits

59 Prof. V.G. OklobdzijaVLSI Arithmetic59 Ling Adder Conventional: Ling:

60 Prof. V.G. OklobdzijaVLSI Arithmetic60 S. Naffziger, ISSCC’96

61 Prof. V.G. OklobdzijaVLSI Arithmetic61 S. Naffziger, ISSCC’96

62 Prof. V.G. OklobdzijaVLSI Arithmetic62 S. Naffziger, ISSCC’96

63 Prof. V.G. OklobdzijaVLSI Arithmetic63 S. Naffziger, ISSCC’96

64 Prof. V.G. OklobdzijaVLSI Arithmetic64 S. Naffziger, ISSCC’96

65 Prof. V.G. OklobdzijaVLSI Arithmetic65 S. Naffziger, ISSCC’96

66 Prof. V.G. OklobdzijaVLSI Arithmetic66 S. Naffziger, ISSCC’96

67 Prof. V.G. OklobdzijaVLSI Arithmetic67 S. Naffziger, ISSCC’96

68 Prof. V.G. OklobdzijaVLSI Arithmetic68 S. Naffziger, ISSCC’96

69 Prof. V.G. OklobdzijaVLSI Arithmetic69 S. Naffziger, ISSCC’96

70 Prof. V.G. OklobdzijaVLSI Arithmetic70 S. Naffziger, ISSCC’96

71 Prof. V.G. OklobdzijaVLSI Arithmetic71 Results: S. Naffziger, “A Subnanosecond 64-b Adder”, ISSCC ‘ 96 0.5u Technology Speed: 0.930 nS Nominal process, 80C, V=3.3V

72 ConditionalSum Adder J. Sklansky, “Conditional-Sum Addition Logic”, IRE Transactions on Electronic Computers, EC-9, p.226-231, 1960.

73 Prof. V.G. OklobdzijaVLSI Arithmetic73 Conditional Sum Adder

74 Prof. V.G. OklobdzijaVLSI Arithmetic74 ConditionalSum Adder

75 Carry-Select Adder O. J. Bedrij, “Carry-Select Adder”, IRE Transactions on Electronic Computers, June 1962, p.340-34

76 Prof. V.G. OklobdzijaVLSI Arithmetic76 Carry-Select Adder O.J. Bedrij, IBM Poughkeepsie, 1962

77 Prof. V.G. OklobdzijaVLSI Arithmetic77 Carry-Select Adder Addition under assumption of C in =0 and C in =1.

78 Prof. V.G. OklobdzijaVLSI Arithmetic78 Carry Select Adder: combining two 32-b VBAs in select mode Delay =  VBA32 +  MUX

79 Addition Under Non-equal Signal Arrival Profile Assumption P. Stelling, V. G. Oklobdzija, "Design Strategies for Optimal Hybrid Final Adders in a Parallel Multiplier", special issue on VLSI Arithmetic, Journal of VLSI Signal Processing, Kluwer Academic Publishers, Vol.14, No.3, December 1996

80 Prof. V.G. OklobdzijaVLSI Arithmetic80 Signal Arrival Profile form the Parallel Multiplier Partial-Product Recuction Tree

81 Prof. V.G. OklobdzijaVLSI Arithmetic81 Oklobdzija, Villeger, IEEE Transactions on VLSI Systems, June, 1995

82 Prof. V.G. OklobdzijaVLSI Arithmetic82 Oklobdzija and Villeger, IEEE Transactions on VLSI Systems, June, 1995

83 Prof. V.G. OklobdzijaVLSI Arithmetic83

84 Prof. V.G. OklobdzijaVLSI Arithmetic84

85 Prof. V.G. OklobdzijaVLSI Arithmetic85

86 Prof. V.G. OklobdzijaVLSI Arithmetic86

87 Prof. V.G. OklobdzijaVLSI Arithmetic87

88 Prof. V.G. OklobdzijaVLSI Arithmetic88

89 Prof. V.G. OklobdzijaVLSI Arithmetic89

90 Prof. V.G. OklobdzijaVLSI Arithmetic90

91 Performing Multiply-Add Operation in the Multiply Time P. Stelling, V. G. Oklobdzija, " Achieving Multiply-Accumulate Operation in the Multiply Time", Thirteenth International Symposium on Computer Arithmetic, Pacific Grove, California, July 5 - 9, 1997.

92 Prof. V.G. OklobdzijaVLSI Arithmetic92

93 Prof. V.G. OklobdzijaVLSI Arithmetic93 Final Adder: Implementation

94 Prof. V.G. OklobdzijaVLSI Arithmetic94 Final Adder: Implementation

95 Prof. V.G. OklobdzijaVLSI Arithmetic95 Final Adder: Implementation

96 Prof. V.G. OklobdzijaVLSI Arithmetic96 Final Adder: Implementation

97 Recurrence Solver Based Adders Koggie and Stone, IEEE Trans on Computers, August 1973 Bilgory and Gajski, 18 th DAC, 1981 Brent and Kung, IEEE Trans on Computers, March 1982

98 Prof. V.G. OklobdzijaVLSI Arithmetic98 Recurrence Solver Based Adders 1973, Koggie and Stone published a general recurrence scheme for parallel computation 1979, Brent and Kung published Tech. Report on regular layout for parallel adders 1980, Guibas and Vuillemin, developed a layout scheme based on recurrence equation for addition 1980, Ladner and Fisher published “parallel prefix computation”, Jo of ACM 1981, Bilgory and Gajski published a paper on recurrence structures for automatic cell generation

99 Prof. V.G. OklobdzijaVLSI Arithmetic99 Recurrence Solver Based Adders They are based on recurrence equation for P,G (what is new there since Weinberger ?!!): Or:and

100 Prof. V.G. OklobdzijaVLSI Arithmetic100 Recurrence Solver Based Adders

101 Prof. V.G. OklobdzijaVLSI Arithmetic101 Carry-Lookahead Adder (Weinberger and Smith) Just to remind you ! please notice the similarity with Parallel-Prefix Adders !

102 Multiplexer Based Adder Farooqui and Oklobdzija 1999 Int’l Sym. on VLSI Technology, Taipei, Taiwan, June 8-10, 1999

103 Prof. V.G. OklobdzijaVLSI Arithmetic103 Multiplexer Based Adder Based on the realization that MUX circuit is faster than a logic gate due to its transmission gate implementation. Based on Carry-Lookahead method (W-S), or recurrence solver.

104 Prof. V.G. OklobdzijaVLSI Arithmetic104 Multiplexer Based Adder A. A. Farooqui, V. G. Oklobdzija, F. Chechrazi, 1999 Int’l Sym. on VLSI Technology, Taipei, Taiwan, June 8-10, 1999.

105 Prof. V.G. OklobdzijaVLSI Arithmetic105 Multiplexer Based Adder A. A. Farooqui, V. G. Oklobdzija, F. Chechrazi, 1999 Int’l Sym. on VLSI Technology, Taipei, Taiwan, June 8-10, 1999.

106 Prof. V.G. OklobdzijaVLSI Arithmetic106 Multiplexer Based Adder A. A. Farooqui, V. G. Oklobdzija, F. Chechrazi, 1999 Int’l Sym. on VLSI Technology, Taipei, Taiwan, June 8-10, 1999.

107 Prof. V.G. OklobdzijaVLSI Arithmetic107 Multiplexer Based Adder A. A. Farooqui, V. G. Oklobdzija, F. Chechrazi, 1999 Int’l Sym. on VLSI Technology, Taipei, Taiwan, June 8-10, 1999. Results in a very fast structure 7-MUX delays for a 64-b adder Delay using standard cell 0.25u, 2.5V, 25 o C : Adder Size (bits) Delay (pS) 8625 16665 32710 64903

108 Prof. V.G. OklobdzijaVLSI Arithmetic108 DEC "Alpha" 21064 Adder Combination: –8-bit tapered pre-discharged Manchester Carry Chains, with C in = 0 and C in = 1 –32-bit LSB Carry Lookahead Adder –32-bit MSB Conditional-Sum Adder –Carry-Select on most significant 32-bits –Latches in the middle: pipelined addition

109 Prof. V.G. OklobdzijaVLSI Arithmetic109 DEC "Alpha" 21064 Adder

110 Prof. V.G. OklobdzijaVLSI Arithmetic110 DEC "Alpha" 21064 Adder: Results The first 200MHz processor Built using 0.75u technology V=3.3V, 30W Pipelined (two-latches) allowing 5nS throughput and 10nS latency

111 Conclusion VLSI Implementation of Addition

112 Prof. V.G. OklobdzijaVLSI Arithmetic112 Conclusion: VLSI Implementation of Addition Currently, implementation parameters are not reflected in algorithms used for development Layout and wire delays effects are largely neglected and this is becoming intolerable in the next generation of technology Transistor sizing has a large effect which can out weight the algorithm There is a great disconnect between algorithm and implementation New rules and measures of goodness are needed

113 Multiplication Parallel Multiplier Implementation

114 Prof. V.G. OklobdzijaVLSI Arithmetic114 Multiplication Algorithm: for j=0,....,n-1 initially p(n)=XY after n steps

115 Prof. V.G. OklobdzijaVLSI Arithmetic115 Parallel Multipliers

116 Prof. V.G. OklobdzijaVLSI Arithmetic116 4:2 Compressor

117 Prof. V.G. OklobdzijaVLSI Arithmetic117 Re-designed 4:2 Compressor with 3 XOR Delay C in I1 I2 I3 I4 0 1 S C C out

118 A Method for Generation of Fast Parallel Multipliers by Vojin G. Oklobdzija David Villeger Simon S. Liu Electrical and Computer Engineering University of California Davis

119 Prof. V.G. OklobdzijaVLSI Arithmetic119

120 Idea !!!!!

121 Prof. V.G. OklobdzijaVLSI Arithmetic121

122 Prof. V.G. OklobdzijaVLSI Arithmetic122 Three-Dimensional optimization Method: TDM (Oklobdzija, Villeger, Liu, 1996)

123 Prof. V.G. OklobdzijaVLSI Arithmetic123

124 Prof. V.G. OklobdzijaVLSI Arithmetic124

125 Method

126 Prof. V.G. OklobdzijaVLSI Arithmetic126

127 Prof. V.G. OklobdzijaVLSI Arithmetic127

128 Prof. V.G. OklobdzijaVLSI Arithmetic128

129 Computer Tools

130 Prof. V.G. OklobdzijaVLSI Arithmetic130 Algorithm for Automatic Generation of Partial Product Array. Initialize: Form 2N-1 lists Li ( i = 0, 2N-2 ) each consisting of pi elements where: p i = i+1 for i £ N-1 and p i = 2N-1-i for i  N An element of a list Li ( j = 0,...,pi-1 ) is a pair: i where: nj : is a unique node identifying name  j : is a delay associated with that node representing a delay of a signal arriving to the node nj with respect to some reference point. For i = 0,1 and 2N-2: connect nodes from the corresponding lists Li directly to the CPA.

131 Prof. V.G. OklobdzijaVLSI Arithmetic131 For i=2 to i=2N-3 {Partial Product Array Generation} Begin For if length of Li is even Then Begin If sort the elements of Li in ascending order by the values of delay  j connect an HA to the first 2 elements of Li starting with the slowest input Ds =max {  A+  A-s,  B+  B-s} Dc =max {  A+  A-c,  B+  B-c} remove 2 elements from Li insert the pair into Li insert the pair into Li+1 decrement the length of Li increment the length of Li+1 End If;

132 while length of Li > 3 Begin While sort the elements of Li in ascending order by the values of delay  j connect an FA to the first 3 elements of Li starting with the slowest input of the FA: Ds =max {  A+  A-s,  B+  B-s,  Ci+  Ci-s} Dc = max {  A+  A-c,  B+  B-c,  Ci+  Ci-c} remove 3 elements from Li insert the pair into Li insert the pair into Li+1 subtract 2 from the length of Li increment the length of Li+1 End While; sort the elements of Li connect an FA to the last 3 nodes of Li connect the S and C to the bit i and i+1 of the CPA End For; End Method;

133 Prof. V.G. OklobdzijaVLSI Arithmetic133

134 Prof. V.G. OklobdzijaVLSI Arithmetic134

135

136 Prof. V.G. OklobdzijaVLSI Arithmetic136

137 Competing Approaches

138 Prof. V.G. OklobdzijaVLSI Arithmetic138 Organization of Hitachi's DPL multiplier

139 Prof. V.G. OklobdzijaVLSI Arithmetic139 Hitachi's 4:2 compressor structure

140 Prof. V.G. OklobdzijaVLSI Arithmetic140 DPL multiplexer circuit

141 RECOMENDATIONS

142 Prof. V.G. OklobdzijaVLSI Arithmetic142 Conclusion 1.The key to improving multiplier speed was in optimizing interconnections, not the compressor circuit (as it was believed for so long). 2.With the increase in wire delay it is important to make a connection between layout topology and algorithm for optimal interconnection of the PPRT. 3.Using one of the “fast adders” (CLA) as a final adder was acutally counterproductive. A simple final adder, but optimized for the signal arrival profile yields better results with less hardware. 4.It is possible to further optimize the PPRT and FA so that Multiply-Add operation (fused) can be performed in multiply time. 5.For the larger size multipliers / adders (as used in cryptography) the optimization procedures (described) yields even better results. See: http://www.ece.ucdavis.edu/acsel/Publications.html

143 Prof. V.G. OklobdzijaVLSI Arithmetic143 Read This ! 1.E. Swartzlander, "Computer Arithmetic". Vol. 1&2, IEEE Computer Society Press, 1990. 2.K. Hwang, "Computer Arithmetic : Principles, Architecture and Design", John Wiley and Sons, 1979. 3.M. Ercegovac, “Digital Systems and Hardware/Firmware Algorithms”, Chapter 12: Arithmetic Algorithms and Processors, John Wiley & Sons, 1985. 4.A. Chandrakasan, W. Bowhill, F Fox, Editors, "Design of High Performance Microprocessors Circuits", IEEE Press, July 2000. 5.V. G. Oklobdzija, “High-Performance System Design: Circuits and Logic”, IEEE Press, July 1999. Also: http://www.ece.ucdavis.edu/acsel/Publications.html

144 Prof. V.G. OklobdzijaVLSI Arithmetic144 THE END

145 Hollywood


Download ppt "VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California"

Similar presentations


Ads by Google