Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Design of a Parallel-Prefix Adder Architecture with Efficient Timing-Area Tradeoff Characteristic Sabyasachi Das University of Colorado, Boulder Sunil.

Similar presentations


Presentation on theme: "1 Design of a Parallel-Prefix Adder Architecture with Efficient Timing-Area Tradeoff Characteristic Sabyasachi Das University of Colorado, Boulder Sunil."— Presentation transcript:

1 1 Design of a Parallel-Prefix Adder Architecture with Efficient Timing-Area Tradeoff Characteristic Sabyasachi Das University of Colorado, Boulder Sunil P. Khatri Texas A&M University

2 2 What is an Adder? IC block that performs addition of 2 data signals IC block that performs addition of 2 data signals Well-known logic architectures Well-known logic architectures Often part of other arithmetic components, like Sum-of-Products, Multiplier etc. Often part of other arithmetic components, like Sum-of-Products, Multiplier etc. Computationally-intensive and occupies large area Computationally-intensive and occupies large area Wide usage in almost all digital designs Wide usage in almost all digital designs

3 3 Overview of an adder a 7 a 6 a 5 a 4 a 3 a 2 a 1 a 0 b 7 b 6 b 5 b 4 b 3 b 2 b 1 b 0 b 7 b 6 b 5 b 4 b 3 b 2 b 1 b 0_____________________________ S 8 S 7 S 6 S 5 S 4 S 3 S 2 S 1 S 0 For each bit (i = 0 to (n-1)) For each bit (i = 0 to (n-1)) S i = a i b i Carry i S i = a i b i Carry i Carry i+1 = (a i b i ) (b i Carry i ) (Carry i a i ) Carry i+1 = (a i b i ) (b i Carry i ) (Carry i a i )

4 4 Introduction to Parallel-Prefix Adder Fast family of adders Fast family of adders Computes Carry i for each bit i in a tree structure Computes Carry i for each bit i in a tree structure Several different flavors are available Several different flavors are available Brent-Kung and Kogge-Stone are very popular Brent-Kung and Kogge-Stone are very popular

5 5 Generate and Propagate for a Bit For each bit i of the adder, Generate (G i ) indicates whether a carry is generated from that bit For each bit i of the adder, Generate (G i ) indicates whether a carry is generated from that bit G i = a i b i G i = a i b i For each bit i of the adder, Propagate (P i ) indicates whether a carry is propagated through that bit For each bit i of the adder, Propagate (P i ) indicates whether a carry is propagated through that bit P i = a i b i P i = a i b i Generate and Propagate concept is extendable to blocks comprising multiple bits Generate and Propagate concept is extendable to blocks comprising multiple bits

6 6 Generate and Propagate for Blocks If two blocks (comprising one or more bits) have the GP value-pairs as (G left, P left ) and (G right, P right ), then the combined block has the GP values as follows: If two blocks (comprising one or more bits) have the GP value-pairs as (G left, P left ) and (G right, P right ), then the combined block has the GP values as follows: G left, right = G left (P left G right ) G left, right = G left (P left G right ) P left, right = P left P right P left, right = P left P right This operation is performed by a This operation is performed by a carry-operator or o-operator. (G left, P left ) (G right, P right ) (G left, right, P left, right )

7 7 Kogge-Stone (KS) Adder Parallel prefix, fast architecture: log 2 n levels Parallel prefix, fast architecture: log 2 n levels Requires large area: (n*log 2 n-n+1) cells Requires large area: (n*log 2 n-n+1) cells GP 3 GP 2 GP 1 GP 0 GP 7 GP 6 GP 5 GP 4 C4C4 C3C3 C2C2 C8C8 C7C7 C6C6 C5C5 C1C1 Kogge and Stone, “A parallel algorithm for the efficient solution of a general class of recurrence equations”, In IEEE transaction for Computers, 1973

8 8 Brent-Kung (BK) Adder Parallel prefix architecture: (2*log 2 n-2) levels Parallel prefix architecture: (2*log 2 n-2) levels Optimized for area: (2n-2-log 2 n) cells Optimized for area: (2n-2-log 2 n) cells GP 3 GP 2 GP 1 GP 0 GP 7 GP 6 GP 5 GP 4 C4C4 C3C3 C2C2 C8C8 C7C7 C6C6 C5C5 C1C1 Brent and Kung, “A regular layout for parallel adders”, In IEEE transaction for Computers, 1982

9 9 Our Proposed Approach Use 2-input XOR and AND gates to compute G i and P i values Use 2-input XOR and AND gates to compute G i and P i values Use triple-carry operator in parallel-prefix tree to compute Carry i values Use triple-carry operator in parallel-prefix tree to compute Carry i values Use P i and Carry i to compute final Sum i values. Use P i and Carry i to compute final Sum i values. G and P Generator (for each bit) Parallel-Prefix Tree using Triple-Carry operator Computation of Final Sum values 2 Inputs Outputs

10 10 Generate and Propagate for a Bit In our approach, we use the traditional way of computing the Generate (G i ) and Propagate (P i ) for each bit. In our approach, we use the traditional way of computing the Generate (G i ) and Propagate (P i ) for each bit. G i = a i b i G i = a i b i P i = a i b i P i = a i b i If G i is equal to 1, that indicates a Carry i+1 signal equal to 1’b1 (logic-1) is generated from the i th bit If G i is equal to 1, that indicates a Carry i+1 signal equal to 1’b1 (logic-1) is generated from the i th bit If P i is equal to 1, that indicates the Carry i gets fed to the Carry i+1 signal If P i is equal to 1, that indicates the Carry i gets fed to the Carry i+1 signal

11 11 Triple-Carry Operator If three blocks (or bits) have the GP value-pairs as If three blocks (or bits) have the GP value-pairs as (G left, P left ), (G mid, P mid ) and (G right, P right ), then the combined block generates a Carry only if Left block generates a Carry OR Left block generates a Carry OR Middle block generates a Carry and Left block propagates that OR Middle block generates a Carry and Left block propagates that OR Right block generates a Carry and both Middle and Left blocks propagate that Carry. Right block generates a Carry and both Middle and Left blocks propagate that Carry. The combined block propagates only if The combined block propagates only if Each of the three blocks propagates the input Carry. Each of the three blocks propagates the input Carry.

12 12 Triple-Carry Operator If three blocks (consisting of one or more bits) have the GP value-pairs as (G left, P left ), (G mid, P mid ) and (G right, P right ), then the combined block has the GP values as follows: If three blocks (consisting of one or more bits) have the GP value-pairs as (G left, P left ), (G mid, P mid ) and (G right, P right ), then the combined block has the GP values as follows: G left, right = G left (P left G mid ) (P left P mid G right ) G left, right = G left (P left G mid ) (P left P mid G right ) P left, right = P left P mid P right P left, right = P left P mid P right This operation is performed by a triple-carry operator or o3-operator. This operation is performed by a triple-carry operator or o3-operator.

13 13 Triple-Carry Operator Typically, delay of a triple-carry operator is about 110% to 130% of the delay of a traditional carry-operator. Typically, delay of a triple-carry operator is about 110% to 130% of the delay of a traditional carry-operator. Typically, area of a triple-carry operator is about 150% to 180% of the area of a traditional carry-operator. Typically, area of a triple-carry operator is about 150% to 180% of the area of a traditional carry-operator. (G mid, P mid )(G right, P right ) (G left, right, P left, right ) (G left, P left )

14 14 Proposed Parallel-Prefix Network In the 1 st level (or topmost level) of the parallel-prefix tree network, we use maximum number of triple-carry operators to combine groups of three GP 3k, GP 3k+1 and GP 3k+2 (k starts from zero) In the 1 st level (or topmost level) of the parallel-prefix tree network, we use maximum number of triple-carry operators to combine groups of three GP 3k, GP 3k+1 and GP 3k+2 (k starts from zero) In the quadrant closest to LSB, we use the traditional carry- operator exclusively. In the quadrant closest to LSB, we use the traditional carry- operator exclusively. In the quadrant closest to MSB, our proposed triple-carry operator extensively. In the quadrant closest to MSB, our proposed triple-carry operator extensively. In the middle two quadrants, we use both carry-operator and triple-carry operator in a timing-driven fashion. In the middle two quadrants, we use both carry-operator and triple-carry operator in a timing-driven fashion. We restrict the fanout of each operator to 5 We restrict the fanout of each operator to 5

15 15 Proposed Parallel-Prefix Network Critical path primarily goes through the bits near MSB Critical path primarily goes through the bits near MSB We instantiate more triple-carry operators along the critical path and bits near MSB. We instantiate more triple-carry operators along the critical path and bits near MSB. This reduces the depth along the critical path of the parallel-prefix computation tree. This reduces the depth along the critical path of the parallel-prefix computation tree. The delay of o3 operator is about 110%-130% of delay of o operator. The delay of o3 operator is about 110%-130% of delay of o operator. Bits near LSB are typically less critical and has less depth Bits near LSB are typically less critical and has less depth We instantiate more traditional carry operators in the bits near LSB. We instantiate more traditional carry operators in the bits near LSB. This saves area occupied by the parallel-prefix computation tree. This saves area occupied by the parallel-prefix computation tree. The area of o3 operator is about 150%-180% of area of o operator. The area of o3 operator is about 150%-180% of area of o operator.

16 16 Proposed Parallel-Prefix Network For an example of the 24-bit adder, please refer to the paper. For an example of the 24-bit adder, please refer to the paper. GP 3 GP 2 GP 1 GP 0 GP 7 GP 6 GP 5 GP 4 C4C4 C3C3 C2C2 C8C8 C7C7 C6C6 C5C5 C1C1 GP 11 GP 10 GP 9 GP 8 GP 15 GP 14 GP 13 GP 12 C 12 C 11 C 10 C 16 C 15 C 14 C 13 C9C9

17 17 Computation of Final Sum Values At the output of the parallel-prefix computation tree, G i, 0 and P i, 0 (for each bit i) values are produced. At the output of the parallel-prefix computation tree, G i, 0 and P i, 0 (for each bit i) values are produced. By definition, if G i, 0 is equal to 1’b1 (logic-1), then a carry gets fed to the (i+1) th bit. Hence, By definition, if G i, 0 is equal to 1’b1 (logic-1), then a carry gets fed to the (i+1) th bit. Hence, Carry i+1 = G i, 0 Carry i+1 = G i, 0 Sum i+1 is computed by using the following equation Sum i+1 is computed by using the following equation Sum i+1 = P i+1 Carry i+1 Sum i+1 = P i+1 Carry i+1 = P i+1 G i, 0 = P i+1 G i, 0

18 18 Delay Results On an average, Our approach produces about 23% faster adder than BK adder and about 0.5% faster than KS adder

19 19 Area Results On an average, Our approach produces about 9% larger adder than BK adder and about 30% smaller than KS adder

20 20 Summary Triple-carry operator combines GP values of 3 blocks Triple-carry operator combines GP values of 3 blocks Use triple-carry operator in the parallel-prefix computation tree to reduce delay of the critical-path Use triple-carry operator in the parallel-prefix computation tree to reduce delay of the critical-path Use traditional carry-operator in non timing-critical path to reduce the overall area Use traditional carry-operator in non timing-critical path to reduce the overall area Our approach is 0.5% faster than KS and 23% faster than BK Our approach is 0.5% faster than KS and 23% faster than BK Our approach is 29% smaller than KS and 9% larger than BK Our approach is 29% smaller than KS and 9% larger than BK

21 21 Thank you


Download ppt "1 Design of a Parallel-Prefix Adder Architecture with Efficient Timing-Area Tradeoff Characteristic Sabyasachi Das University of Colorado, Boulder Sunil."

Similar presentations


Ads by Google