Download presentation
Presentation is loading. Please wait.
Published byAlexis Willis Modified over 8 years ago
1
VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California http://www.ece.ucdavis.edu/acsel
2
Prof. V.G. OklobdzijaVLSI Arithmetic2 Introduction Digital Computer Arithmetic belongs to Computer Architecture, however, it is also an aspect of logic design. The objective of Computer Arithmetic is to develop appropriate algorithms that are utilizing available hardware in the most efficient way. Ultimately, speed, power and chip area are the most often used measures, making a strong link between the algorithms and technology of implementation.
3
Prof. V.G. OklobdzijaVLSI Arithmetic3 Basic Operations Addition Multiplication Multiply-Add Division Evaluation of Functions Multi-Media
4
Addition of Binary Numbers
5
Prof. V.G. OklobdzijaVLSI Arithmetic5 Addition of Binary Numbers Full Adder. The full adder is the fundamental building block of most arithmetic circuits: The sum and carry outputs are described as: Full Adder C in C out sisi aiai bibi
6
Prof. V.G. OklobdzijaVLSI Arithmetic6 Addition of Binary Numbers Propagate Generate InputsOutputs cici aiai bibi sisi c i+1 00000 00110 01010 01101 10010 10101 11001 11111
7
Prof. V.G. OklobdzijaVLSI Arithmetic7 Full-Adder Implementation Full Adder operations is defined by equations: One-bit adder could be implemented as shown Carry-Propagate: and Carry-Generate g i
8
Prof. V.G. OklobdzijaVLSI Arithmetic8 High-Speed Addition One-bit adder could be implemented more efficiently because MUX is faster
9
Prof. V.G. OklobdzijaVLSI Arithmetic9 The Ripple-Carry Adder
10
Prof. V.G. OklobdzijaVLSI Arithmetic10 The Ripple-Carry Adder From Rabaey
11
Prof. V.G. OklobdzijaVLSI Arithmetic11 Inversion Property From Rabaey
12
Prof. V.G. OklobdzijaVLSI Arithmetic12 Minimize Critical Path by Reducing Inverting Stages From Rabaey
13
Prof. V.G. OklobdzijaVLSI Arithmetic13 Ripple Carry Adder Carry-Chain of an RCA implemented using multiplexer from the standard cell library: Critical Path Oklobdzija, ISCAS’88
14
Prof. V.G. OklobdzijaVLSI Arithmetic14 Manchester Carry-Chain Realization of the Carry Path Simple and very popular scheme for implementation of carry signal path
15
Prof. V.G. OklobdzijaVLSI Arithmetic15 Original Design T. Kilburn, D. B. G. Edwards, D. Aspinall, "Parallel Addition in Digital Computers: A New Fast "Carry" Circuit", Proceedings of IEE, Vol. 106, pt. B, p. 464, September 1959.
16
Prof. V.G. OklobdzijaVLSI Arithmetic16 Manchester Carry Chain (CMOS) Kilburn, et al, IEE Proc, 1959. Implement P with pass-transistors Implement G with pull-up, kill (delete) with pull-down Use dynamic logic to reduce the complexity and speed up
17
Prof. V.G. OklobdzijaVLSI Arithmetic17 Pass-Transistor Realization in DPL
18
Prof. V.G. OklobdzijaVLSI Arithmetic18 Carry-Skip Adder MacSorley, Proc IRE 1/61 Lehman, Burla, IRE Trans on Comp, 12/61
19
Prof. V.G. OklobdzijaVLSI Arithmetic19 Carry-Skip Adder Bypass From Rabaey
20
Prof. V.G. OklobdzijaVLSI Arithmetic20 Carry-Skip Adder: N-bits, k-bits/group, r=N/k groups
21
Prof. V.G. OklobdzijaVLSI Arithmetic21 Carry-Skip Adder k
22
Prof. V.G. OklobdzijaVLSI Arithmetic22 Variable Block Adder (Oklobdzija, Barnes: IBM 1985)
23
Prof. V.G. OklobdzijaVLSI Arithmetic23 Carry-chain of a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985)
24
Prof. V.G. OklobdzijaVLSI Arithmetic24 Carry-chain of a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985) 1 1 3 3 4 4 5 5 6 =9 Any-point-to-any-point delay = 9 as compared to 12 for CSKA
25
Prof. V.G. OklobdzijaVLSI Arithmetic25 Carry-chain block size determination for a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985)
26
Prof. V.G. OklobdzijaVLSI Arithmetic26 Delay Calculation for Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Delay model:
27
Prof. V.G. OklobdzijaVLSI Arithmetic27 Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Variable Group Length Oklobdzija, Barnes, Arith’85
28
Prof. V.G. OklobdzijaVLSI Arithmetic28 Carry-chain of a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Variable Block Lengths No closed form solution for delay It is a dynamic programming problem
29
Prof. V.G. OklobdzijaVLSI Arithmetic29 Delay Comparison: Variable Block Adder (Oklobdzija, Barnes: IBM 1985)
30
Prof. V.G. OklobdzijaVLSI Arithmetic30 Delay Comparison: Variable Block Adder VBA- Multi-Level CLA VBA
31
Prof. V.G. OklobdzijaVLSI Arithmetic31 Fan-Out Dependency
32
Prof. V.G. OklobdzijaVLSI Arithmetic32 Fan-In Dependency
33
Prof. V.G. OklobdzijaVLSI Arithmetic33 Delay Comparison: Variable Block Adder (Oklobdzija, Barnes: IBM 1985)
34
Prof. V.G. OklobdzijaVLSI Arithmetic34
35
Prof. V.G. OklobdzijaVLSI Arithmetic35 Carry-Lookahead Adder (Weinberger and Smith) A. Weinberger and J. L. Smith, “A Logic for High-Speed Addition”, National Bureau of Standards, Circ. 591, p.3-12, 1958.
36
Prof. V.G. OklobdzijaVLSI Arithmetic36 Carry-Lookahead Adder (Weinberger and Smith)
37
Prof. V.G. OklobdzijaVLSI Arithmetic37 Carry-Lookahead Adder One gate delay to calculate p, g One to calculate P and two for G Three gate delays To calculate C 4(j+1) Compare that to 8 in RCA !
38
Prof. V.G. OklobdzijaVLSI Arithmetic38 Carry-Lookahead Adder (Weinberger and Smith) Additional two gate delays C 16 will take a total of 5 vs. 32 for RCA !
39
Prof. V.G. OklobdzijaVLSI Arithmetic39 32-bit Carry Lookahead Adder
40
Prof. V.G. OklobdzijaVLSI Arithmetic40 Carry-Lookahead Adder (Weinberger and Smith: original derivation )
41
Prof. V.G. OklobdzijaVLSI Arithmetic41 Carry-Lookahead Adder (Weinberger and Smith: original derivation )
42
Prof. V.G. OklobdzijaVLSI Arithmetic42 Carry-Lookahead Adder (Weinberger and Smith) please notice the similarity with Parallel-Prefix Adders !
43
Prof. V.G. OklobdzijaVLSI Arithmetic43 Carry-Lookahead Adder (Weinberger and Smith) please notice the similarity with Parallel-Prefix Adders !
44
Delay Optimized CLA B. Lee, V. G. Oklobdzija Journal of VLSI Signal Processing, Vol.3, No.4, October 1991
45
Prof. V.G. OklobdzijaVLSI Arithmetic45 Delay Optimized CLA: Lee-Oklobdzija ‘91 (a.) Fixed groups and levels (b.) variable-sized groups, fixed levels (c.) variable-sized groups and fixed levels (d.) variable-sized groups and levels
46
Prof. V.G. OklobdzijaVLSI Arithmetic46 Two-Levels of Logic Implementation of the Carry Block
47
Prof. V.G. OklobdzijaVLSI Arithmetic47 Two-Levels of Logic Implementation of the Carry-Lookahead Block
48
Prof. V.G. OklobdzijaVLSI Arithmetic48 Three-Levels of Logic Implementation of the Carry Block (restricted fan-in)
49
Prof. V.G. OklobdzijaVLSI Arithmetic49 Three-Levels of Logic Implementation of the Carry Lookahead (restricted fan-in)
50
Prof. V.G. OklobdzijaVLSI Arithmetic50 Delay Optimized CLA: Lee-Oklobdzija ‘91 Delay: Two-level BCLA Delay: Three-level BCLA
51
Prof. V.G. OklobdzijaVLSI Arithmetic51 Delay Optimized CLA: Lee-Oklobdzija ‘91 (a.) 2-level BCLA =8.5nS (b.) 3-level BCLA =8.9nS
52
Motorola: CLA Implementation Example A. Naini, D. Bearden and W. Anderson, “A 4.5nS 96b CMOS Adder Design”, Proceedings of the IEEE Custom Integrated Circuits Conference, May 3-6, 1992.
53
Prof. V.G. OklobdzijaVLSI Arithmetic53 Critical path in Motorola's 64-bit CLA
54
Prof. V.G. OklobdzijaVLSI Arithmetic54 Motorola's 64-bit CLA conventional PG Block
55
Prof. V.G. OklobdzijaVLSI Arithmetic55 Motorola's 64-bit CLA Modified PG Block Intermediate propagate signals P i:0 are generated to speed-up C 3
56
Ling’s Adder Huey Ling, “High-Speed Binary Adder” IBM Journal of Research and Development, Vol.5, No.3, 1981.
57
Prof. V.G. OklobdzijaVLSI Arithmetic57 Ling Adder Variation of CLA: Ling, IBM J. Res. Dev, 5/81 Ling’s equations:
58
Prof. V.G. OklobdzijaVLSI Arithmetic58 Ling Adder Ling’s equation Doran, Trans on Comp 9/88 Propagates information on two bits
59
Prof. V.G. OklobdzijaVLSI Arithmetic59 Ling Adder Conventional: Ling:
60
Prof. V.G. OklobdzijaVLSI Arithmetic60 S. Naffziger, ISSCC’96
61
Prof. V.G. OklobdzijaVLSI Arithmetic61 S. Naffziger, ISSCC’96
62
Prof. V.G. OklobdzijaVLSI Arithmetic62 S. Naffziger, ISSCC’96
63
Prof. V.G. OklobdzijaVLSI Arithmetic63 S. Naffziger, ISSCC’96
64
Prof. V.G. OklobdzijaVLSI Arithmetic64 S. Naffziger, ISSCC’96
65
Prof. V.G. OklobdzijaVLSI Arithmetic65 S. Naffziger, ISSCC’96
66
Prof. V.G. OklobdzijaVLSI Arithmetic66 S. Naffziger, ISSCC’96
67
Prof. V.G. OklobdzijaVLSI Arithmetic67 S. Naffziger, ISSCC’96
68
Prof. V.G. OklobdzijaVLSI Arithmetic68 S. Naffziger, ISSCC’96
69
Prof. V.G. OklobdzijaVLSI Arithmetic69 S. Naffziger, ISSCC’96
70
Prof. V.G. OklobdzijaVLSI Arithmetic70 S. Naffziger, ISSCC’96
71
Prof. V.G. OklobdzijaVLSI Arithmetic71 Results: S. Naffziger, “A Subnanosecond 64-b Adder”, ISSCC ‘ 96 0.5u Technology Speed: 0.930 nS Nominal process, 80C, V=3.3V
72
ConditionalSum Adder J. Sklansky, “Conditional-Sum Addition Logic”, IRE Transactions on Electronic Computers, EC-9, p.226-231, 1960.
73
Prof. V.G. OklobdzijaVLSI Arithmetic73 Conditional Sum Adder
74
Prof. V.G. OklobdzijaVLSI Arithmetic74 ConditionalSum Adder
75
Carry-Select Adder O. J. Bedrij, “Carry-Select Adder”, IRE Transactions on Electronic Computers, June 1962, p.340-34
76
Prof. V.G. OklobdzijaVLSI Arithmetic76 Carry-Select Adder O.J. Bedrij, IBM Poughkeepsie, 1962
77
Prof. V.G. OklobdzijaVLSI Arithmetic77 Carry-Select Adder Addition under assumption of C in =0 and C in =1.
78
Prof. V.G. OklobdzijaVLSI Arithmetic78 Carry Select Adder: combining two 32-b VBAs in select mode Delay = VBA32 + MUX
79
Addition Under Non-equal Signal Arrival Profile Assumption P. Stelling, V. G. Oklobdzija, "Design Strategies for Optimal Hybrid Final Adders in a Parallel Multiplier", special issue on VLSI Arithmetic, Journal of VLSI Signal Processing, Kluwer Academic Publishers, Vol.14, No.3, December 1996
80
Prof. V.G. OklobdzijaVLSI Arithmetic80 Signal Arrival Profile form the Parallel Multiplier Partial-Product Recuction Tree
81
Prof. V.G. OklobdzijaVLSI Arithmetic81 Oklobdzija, Villeger, IEEE Transactions on VLSI Systems, June, 1995
82
Prof. V.G. OklobdzijaVLSI Arithmetic82 Oklobdzija and Villeger, IEEE Transactions on VLSI Systems, June, 1995
83
Prof. V.G. OklobdzijaVLSI Arithmetic83
84
Prof. V.G. OklobdzijaVLSI Arithmetic84
85
Prof. V.G. OklobdzijaVLSI Arithmetic85
86
Prof. V.G. OklobdzijaVLSI Arithmetic86
87
Prof. V.G. OklobdzijaVLSI Arithmetic87
88
Prof. V.G. OklobdzijaVLSI Arithmetic88
89
Prof. V.G. OklobdzijaVLSI Arithmetic89
90
Prof. V.G. OklobdzijaVLSI Arithmetic90
91
Performing Multiply-Add Operation in the Multiply Time P. Stelling, V. G. Oklobdzija, " Achieving Multiply-Accumulate Operation in the Multiply Time", Thirteenth International Symposium on Computer Arithmetic, Pacific Grove, California, July 5 - 9, 1997.
92
Prof. V.G. OklobdzijaVLSI Arithmetic92
93
Prof. V.G. OklobdzijaVLSI Arithmetic93 Final Adder: Implementation
94
Prof. V.G. OklobdzijaVLSI Arithmetic94 Final Adder: Implementation
95
Prof. V.G. OklobdzijaVLSI Arithmetic95 Final Adder: Implementation
96
Prof. V.G. OklobdzijaVLSI Arithmetic96 Final Adder: Implementation
97
Recurrence Solver Based Adders Koggie and Stone, IEEE Trans on Computers, August 1973 Bilgory and Gajski, 18 th DAC, 1981 Brent and Kung, IEEE Trans on Computers, March 1982
98
Prof. V.G. OklobdzijaVLSI Arithmetic98 Recurrence Solver Based Adders 1973, Koggie and Stone published a general recurrence scheme for parallel computation 1979, Brent and Kung published Tech. Report on regular layout for parallel adders 1980, Guibas and Vuillemin, developed a layout scheme based on recurrence equation for addition 1980, Ladner and Fisher published “parallel prefix computation”, Jo of ACM 1981, Bilgory and Gajski published a paper on recurrence structures for automatic cell generation
99
Prof. V.G. OklobdzijaVLSI Arithmetic99 Recurrence Solver Based Adders They are based on recurrence equation for P,G (what is new there since Weinberger ?!!): Or:and
100
Prof. V.G. OklobdzijaVLSI Arithmetic100 Recurrence Solver Based Adders
101
Prof. V.G. OklobdzijaVLSI Arithmetic101 Carry-Lookahead Adder (Weinberger and Smith) Just to remind you ! please notice the similarity with Parallel-Prefix Adders !
102
Multiplexer Based Adder Farooqui and Oklobdzija 1999 Int’l Sym. on VLSI Technology, Taipei, Taiwan, June 8-10, 1999
103
Prof. V.G. OklobdzijaVLSI Arithmetic103 Multiplexer Based Adder Based on the realization that MUX circuit is faster than a logic gate due to its transmission gate implementation. Based on Carry-Lookahead method (W-S), or recurrence solver.
104
Prof. V.G. OklobdzijaVLSI Arithmetic104 Multiplexer Based Adder A. A. Farooqui, V. G. Oklobdzija, F. Chechrazi, 1999 Int’l Sym. on VLSI Technology, Taipei, Taiwan, June 8-10, 1999.
105
Prof. V.G. OklobdzijaVLSI Arithmetic105 Multiplexer Based Adder A. A. Farooqui, V. G. Oklobdzija, F. Chechrazi, 1999 Int’l Sym. on VLSI Technology, Taipei, Taiwan, June 8-10, 1999.
106
Prof. V.G. OklobdzijaVLSI Arithmetic106 Multiplexer Based Adder A. A. Farooqui, V. G. Oklobdzija, F. Chechrazi, 1999 Int’l Sym. on VLSI Technology, Taipei, Taiwan, June 8-10, 1999.
107
Prof. V.G. OklobdzijaVLSI Arithmetic107 Multiplexer Based Adder A. A. Farooqui, V. G. Oklobdzija, F. Chechrazi, 1999 Int’l Sym. on VLSI Technology, Taipei, Taiwan, June 8-10, 1999. Results in a very fast structure 7-MUX delays for a 64-b adder Delay using standard cell 0.25u, 2.5V, 25 o C : Adder Size (bits) Delay (pS) 8625 16665 32710 64903
108
Prof. V.G. OklobdzijaVLSI Arithmetic108 DEC "Alpha" 21064 Adder Combination: –8-bit tapered pre-discharged Manchester Carry Chains, with C in = 0 and C in = 1 –32-bit LSB Carry Lookahead Adder –32-bit MSB Conditional-Sum Adder –Carry-Select on most significant 32-bits –Latches in the middle: pipelined addition
109
Prof. V.G. OklobdzijaVLSI Arithmetic109 DEC "Alpha" 21064 Adder
110
Prof. V.G. OklobdzijaVLSI Arithmetic110 DEC "Alpha" 21064 Adder: Results The first 200MHz processor Built using 0.75u technology V=3.3V, 30W Pipelined (two-latches) allowing 5nS throughput and 10nS latency
111
Conclusion VLSI Implementation of Addition
112
Prof. V.G. OklobdzijaVLSI Arithmetic112 Conclusion: VLSI Implementation of Addition Currently, implementation parameters are not reflected in algorithms used for development Layout and wire delays effects are largely neglected and this is becoming intolerable in the next generation of technology Transistor sizing has a large effect which can out weight the algorithm There is a great disconnect between algorithm and implementation New rules and measures of goodness are needed
113
Multiplication Parallel Multiplier Implementation
114
Prof. V.G. OklobdzijaVLSI Arithmetic114 Multiplication Algorithm: for j=0,....,n-1 initially p(n)=XY after n steps
115
Prof. V.G. OklobdzijaVLSI Arithmetic115 Parallel Multipliers
116
Prof. V.G. OklobdzijaVLSI Arithmetic116 4:2 Compressor
117
Prof. V.G. OklobdzijaVLSI Arithmetic117 Re-designed 4:2 Compressor with 3 XOR Delay C in I1 I2 I3 I4 0 1 S C C out
118
A Method for Generation of Fast Parallel Multipliers by Vojin G. Oklobdzija David Villeger Simon S. Liu Electrical and Computer Engineering University of California Davis
119
Prof. V.G. OklobdzijaVLSI Arithmetic119
120
Idea !!!!!
121
Prof. V.G. OklobdzijaVLSI Arithmetic121
122
Prof. V.G. OklobdzijaVLSI Arithmetic122 Three-Dimensional optimization Method: TDM (Oklobdzija, Villeger, Liu, 1996)
123
Prof. V.G. OklobdzijaVLSI Arithmetic123
124
Prof. V.G. OklobdzijaVLSI Arithmetic124
125
Method
126
Prof. V.G. OklobdzijaVLSI Arithmetic126
127
Prof. V.G. OklobdzijaVLSI Arithmetic127
128
Prof. V.G. OklobdzijaVLSI Arithmetic128
129
Computer Tools
130
Prof. V.G. OklobdzijaVLSI Arithmetic130 Algorithm for Automatic Generation of Partial Product Array. Initialize: Form 2N-1 lists Li ( i = 0, 2N-2 ) each consisting of pi elements where: p i = i+1 for i £ N-1 and p i = 2N-1-i for i N An element of a list Li ( j = 0,...,pi-1 ) is a pair: i where: nj : is a unique node identifying name j : is a delay associated with that node representing a delay of a signal arriving to the node nj with respect to some reference point. For i = 0,1 and 2N-2: connect nodes from the corresponding lists Li directly to the CPA.
131
Prof. V.G. OklobdzijaVLSI Arithmetic131 For i=2 to i=2N-3 {Partial Product Array Generation} Begin For if length of Li is even Then Begin If sort the elements of Li in ascending order by the values of delay j connect an HA to the first 2 elements of Li starting with the slowest input Ds =max { A+ A-s, B+ B-s} Dc =max { A+ A-c, B+ B-c} remove 2 elements from Li insert the pair into Li insert the pair into Li+1 decrement the length of Li increment the length of Li+1 End If;
132
while length of Li > 3 Begin While sort the elements of Li in ascending order by the values of delay j connect an FA to the first 3 elements of Li starting with the slowest input of the FA: Ds =max { A+ A-s, B+ B-s, Ci+ Ci-s} Dc = max { A+ A-c, B+ B-c, Ci+ Ci-c} remove 3 elements from Li insert the pair into Li insert the pair into Li+1 subtract 2 from the length of Li increment the length of Li+1 End While; sort the elements of Li connect an FA to the last 3 nodes of Li connect the S and C to the bit i and i+1 of the CPA End For; End Method;
133
Prof. V.G. OklobdzijaVLSI Arithmetic133
134
Prof. V.G. OklobdzijaVLSI Arithmetic134
136
Prof. V.G. OklobdzijaVLSI Arithmetic136
137
Competing Approaches
138
Prof. V.G. OklobdzijaVLSI Arithmetic138 Organization of Hitachi's DPL multiplier
139
Prof. V.G. OklobdzijaVLSI Arithmetic139 Hitachi's 4:2 compressor structure
140
Prof. V.G. OklobdzijaVLSI Arithmetic140 DPL multiplexer circuit
141
RECOMENDATIONS
142
Prof. V.G. OklobdzijaVLSI Arithmetic142 Conclusion 1.The key to improving multiplier speed was in optimizing interconnections, not the compressor circuit (as it was believed for so long). 2.With the increase in wire delay it is important to make a connection between layout topology and algorithm for optimal interconnection of the PPRT. 3.Using one of the “fast adders” (CLA) as a final adder was acutally counterproductive. A simple final adder, but optimized for the signal arrival profile yields better results with less hardware. 4.It is possible to further optimize the PPRT and FA so that Multiply-Add operation (fused) can be performed in multiply time. 5.For the larger size multipliers / adders (as used in cryptography) the optimization procedures (described) yields even better results. See: http://www.ece.ucdavis.edu/acsel/Publications.html
143
Prof. V.G. OklobdzijaVLSI Arithmetic143 Read This ! 1.E. Swartzlander, "Computer Arithmetic". Vol. 1&2, IEEE Computer Society Press, 1990. 2.K. Hwang, "Computer Arithmetic : Principles, Architecture and Design", John Wiley and Sons, 1979. 3.M. Ercegovac, “Digital Systems and Hardware/Firmware Algorithms”, Chapter 12: Arithmetic Algorithms and Processors, John Wiley & Sons, 1985. 4.A. Chandrakasan, W. Bowhill, F Fox, Editors, "Design of High Performance Microprocessors Circuits", IEEE Press, July 2000. 5.V. G. Oklobdzija, “High-Performance System Design: Circuits and Logic”, IEEE Press, July 1999. Also: http://www.ece.ucdavis.edu/acsel/Publications.html
144
Prof. V.G. OklobdzijaVLSI Arithmetic144 THE END
145
Hollywood
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.