# The Art of Digital Design and Fast Adder Circuits Lecture Notes # 4 Shantanu Dutt Electrical & Computer Eng. University of Illinois at Chicago.

## Presentation on theme: "The Art of Digital Design and Fast Adder Circuits Lecture Notes # 4 Shantanu Dutt Electrical & Computer Eng. University of Illinois at Chicago."— Presentation transcript:

The Art of Digital Design and Fast Adder Circuits Lecture Notes # 4 Shantanu Dutt Electrical & Computer Eng. University of Illinois at Chicago

Outline Different dependency aspects in divide-&- conquer (D&C) Techniques for tackling dependency aspects in D&C Application to adder designs---ripple carry, tree- based carry-lookahead, carry select

Dependency Aspects in D&C A1,1A1,2A2,1 A2,2 Subprob. A1 Root problem A Subprob. A2 Stitch-up of solns to A1 and A2 to form the complete soln to A Q: Is there a data dependency between A1 and A2, i.e., does solution of A2 depend on some o/p generated by A1 or vice versa? If there is no dependency, then A1 and A2 can be solved independently and some stitch-up logic used to combine the o/ps of A1 and A2 to obtain the o/p of A. Example design problems are n-bit comparison, sorting of n #s If there is a dependency between A1 and A2 there are a few strategies that can be used to design such circuits---note that a stitch-up logic can still be needed for D&C partitioning w/ dependency of a design problem. D&C tree arc Data flow arc Legend

Dependency Aspects in D&C The Wait Strategy Strategy 1: Wait for required o/p of A1 and then perform A2, e.g., as in a ripple-carry adder: A = n-bit addition, A1 = (n/2)-bit addition of the L.S. n/2 bits, A2 = (n/2)-bit addition of the M.S. n/2 bits No concurrency between A1 and A2: t(A) = t(A1) + t(A2) + t(stich-up) = 2*t(A1) + t(stich-up) if A1 and A2 are the same problems of the same size (w/ different i/ps) Subprob. A2 Root problem A Subprob. A1 Data flow

Dependency Aspects in D&C The “Design-for-all-cases and Select” Strategy Other variations---“Predict Strategy”: Have a single copy of A2 but choose a highly likely value of the k-bit i/p and perform A1, A2 concurrently. If after k-bit i/p from A1 is available and selection is incorrect, re-do A2 w/ correct available value. t(A) = p(correct-choice)*max(t(A1), t(A2)) +[(1-p(correct-choice)]*t(A2) + t(Mux) + t(stich-up), where p(correct-choice) is probability that our choice of the k-bit i/p for A2 is correct Need a completion signal to indicate when the final o/p is available for A; assuming worst- case time (when the choice is incorrect) is meaningless is such designs Root problem A Subprob. A1 Subprob. A2 4-to-1 Mux Select i/p 00 01 10 11 I/p00 I/p01 I/p10 I/p11 Strategy 2: For a k-bit i/p from A1 to A2, design 2**k copies of A2 each with a different hardwired k-bit i/p to replace the one from A1. Select the correct o/p from all the copies of A2 via a (2**k)-to-1 Mux that is selected by the k-bit o/p from A1 when it becomes available E.g., carry-select adder t(A) = max(t(A1), t(A2)) + t(Mux) + t(stich- up) = t(A1) + t(Mux) + t(stitch-up) if A1 and A2 are the same problems

Dependency Aspects in D&C--- The “Lookahead” Strategy Strategy 3: Redo the design of A2 so that it can do as much processing as possible that is independent of the i/p from A1 (A2_indep = A2_lookahd). This is the “lookahead” computation that prepares for the final computation of A2 (A2_dep) that can start once A2_indep and A1 are done. t(A) = max(t(A1), t(A2_indep)) + t(A2_dep) + t(stitch-up) E.g., Carry-looakahead adder --- does lookahead computation; also looakahead compuattion is associative, so doable in (log n). Overall computation is also doable in (log n) time. A less structured example: Let a1 be the i/p from A1 to A2. If A2 has the logic a2 = v’x’ + uvx + w’xy + wz’a1 + u’xa1. If this were implemented using 2-i/p AND/OR gates, the delay will be 8 delay units (1 unit = delay for 1 i/p) after a1 is available. If the logic is re-structured as a2= (v’x’ + uvx + w’xy) + (wz’ + u’x)a1, and if the logic in the 2 brackets are performed before a1 is available (these constitute A2_indep), then the delay is only 4 delay units after a1 is available. Root problem A Subprob. A1 Data flow Subprob. A2 A2_dep A2_indep or A2_lookahd Concept a2 w’xywz’a1u’xa1v’x’uvx A2 Critical path after a1 avail (8-unit delay) w’xywz’u’xa1v’x’uvx A2_indep A2_dep Critical path after a1 avail (4-unit delay) Example of an unstructured logic for A2

Tree CLA Adders First of all, can we generate multi-bit P,G signals formed from single-bit ones? Secondly, can we generate them fast, say, in (log n) time using a tree-structured circuit? The answer is “Yes” to both Qs. For the 2 nd Q, the answer is “Yes” since, P, G operations are associative! Concept of the propagate Pk for k bits: Pk is 1 under the conditions that the carry into the least-significant of the k bits should be the carry-out of the most-significant of the k bits. In terms of the 1-bit p i ’s this happens if and only if all the k bits are in “propagate mode”, i.e., for all i, 1 <= i <= k, p i = 1. Thus Pk = p k-1 p k-2 ……… p 0. Since “and” is associative, the propagate is an associative operation and can thus be generated using a tree-circuit in log n time.

Tree CLA Adders (contd) Concept of generate Gk for k bits: Gk is 1 under the conditions that the carry-out of the k bits should be 1 irrespective of the carry-in to the k bits For k=2, this happens whenever g1=1 or (g0=1 and p1=1): G2 = g1 + p1g0 Now consider k=3. Conceptually speaking, G3=1 iff g2=1 or G2(bits 1-0)=1 and p2=1. This operates on the 1-bit g and 1-bit p for bit 2 and the 2-bit G for bits 1 & 0: G3 = g2 + p2 G2(1-0) = g2 + [p2 (g1 + p1g0)] = g2 + p2g1 + p2p1g0 However, G3=1 iff G2(bits 2-1)=1 or g0=1 and P2(bits 2-1)=1. This operates on the 2-bit G and P for bits 2 & 1 and the 1-bit g and 1-bit p for bit 2: G3 = G2(2-1) + P2(2-1)g0 = [g2 + p2g1] + [p2p1g0] = g2 + p2g1 + p2p1g0 (same as above!) In other words (g2,p2) gen [(g1,p1) gen (g0, p0)] = [(g2,p2) gen (g1,p1)] gen (g0, p0) --- you can also come to the same conclusion using a truth table (TT). Hence generate (gen) is also an associative operation and can thus be generated using a tree-circuit in log n time. p2 g2 2 p1 g1 2 p0 g0 2 gen G3 G2(2-1) p0 g0 2 p1 g1 2 p2 g2 2 gen G3 G2(1-0) p0 g0 2 p1 g1 2 p2 g2 2 gen G3(2-0) G2(1-0) p3 g3 2 gen G4 p0 g0 2 p1 g1 2 p2 g2 2 gen G2(1-0) p3 g3 2 gen G4 gen G2(3-2) &

Tree CLA Adders (contd) In practice, instead of generating generates and propagates in a binary tree using 2- bit prop, gen operations, 4-bit prop, gen operations are used as basic modules and the higher-level generate and propagates are generated using a 4-ary tree. p0 g0 2 p1 g1 2 p2 g2 2 gen G3(2-0) G2(1-0) p3 g3 2 gen G4 4-bit gen = i.e., G4 = g3 + p3g2 + p3p2g1 + p3p2p1g0 Similarly for 4-bit propagates: P4 = p3p2p1p0 We thus have the following 4-ary prop, gen (P, G) tree using 4-bit (P,G) generation logic as the basic module (c) 4-ary (P,G)-tree (b) Basic 4-bit (P,G)-module (a) 4-bit G generation using 2-bit G-operations