Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 17: Adders.

Similar presentations


Presentation on theme: "Lecture 17: Adders."— Presentation transcript:

1 Lecture 17: Adders

2 Outline Datapath Computer Arithmetic Principles Single-bit Addition
Carry-Ripple Adder Carry-Skip Adder Carry-Lookahead Adder Carry-Select Adder Carry-Increment Adder Tree Adder 17: Adders

3 A Generic Digital Processor
17: Adders

4 Building Blocks for Digital Architectures
Arithmetic unit Bit sliced data path – adder, multiplier, shifter, comparator, etc. Memory RAM, ROM, buffers, shift registers Control Finite state machine (PLA, random logic) Counters Interconnect Switches, arbiters, bus 17: Adders

5 An Intel Microprocessor
17: Adders

6 Bit-Sliced Design 17: Adders

7 Bit-Sliced Datapath 17: Adders

8 Itanium Integer Datapath
17: Adders

9 Motivation Arithmetic units are, among others, core of every data path and addressing unit. Data path is at the core of microprocessors (CPU) signal processors (DSP) data processing application specific IC’s (ASIC) and programmable IC’s (FPGA) Standard arithmetic units available from libraries Design of arithmetic units necessary for non-standard operations high performance components library development 17: Adders

10 Naming Conventions Signal busses: A (1-D), Ai, (2-D), ai:k (sub-bus, 1-D) Signals: a, ai (1-D), ai,k (2-D), Ai:k (group signal) Circuit complexity measures: A (Area), T (cycle time, delay), AT (area-time product), L (latency, number of cycles). Arithmetic operators: +, -, •, /, log (=log2) Logic operators: OR, AND, XOR, NOT, … 17: Adders

11 Circuit Complexity Measures
Unit gate model Inverter, buffer: A = 0, T = 0 Simple monotonic 2-input gates (AND, OR, NAND, NOR): A = 1, T = 1 Simple non-monotonic 2-input gates (XOR, XNOR): A = 2, T = 2 Simple m-input gates: A = m – 1, T = Wiring not considered Only for estimation purposes 17: Adders

12 Recursive Function Evaluation
Given: inputs ai, outputs zi, function f (graph sym. •) Non-recursive functions (n.) Output zi is a function of input ai Parallel structure 17: Adders

13 Recursive Function Evaluation
Recursive functions (r.) Output zi is a function of all inputs ak, k ≤ i with a single output z = zn-1 (r.s.): f is non-associative (r.s.n) serial structure f is associative (r.s.a) serial or single-tree structure 17: Adders

14 Recursive Function Evaluation
Output zi is a function of all inputs ak, k ≤ i multiple outputs zi (r.m.) (=> prefix problem) f is non-associative (r.m.n) serial structure f is associative (r.m.a) Serial or multi-tree structure Shared tree structure 17: Adders

15 Arithmetic Operations
Overview 17: Adders

16 Overview of Arithmetic Operations
Direct implementation of dedicated units always: 1 – 5 in most cases: 6 sometimes: 7, 8 Sequential implementation using simpler units and several clock cycles (decomposition) sometimes: 6 in most cases: 7, 8, 9 Table look-up techniques using ROMs universal: simple application to all operations efficient only for single-operand operations of high complexity (8 - 12) and small word length. 17: Adders

17 Overview of Arithmetic Operations
Approximation using simpler units: 7 – 12 Taylor series expansion polynomial and rational approximations convergence of recursive equation systems CORDIC (COordinate Rotation DIgital Computer) 17: Adders

18 Binary Number Systems Radix-2, binary number system (BNS): irredundant, weighted, positional, monotonic. n-bit number is an ordered sequence of bits (binary digits) Simple and efficient implementation in digital circuits MSB/LSB (most/least significant bit): an-1/a0 Represents an integer or fixed point number, exact. Fixed point numbers: m-bit integer n-m bit fraction 17: Adders

19 Binary Number Systems Unsigned: positive or natural numbers Value:
Range: Two’s (2’s) complement: standard representation of signed or integer numbers Value Range 17: Adders

20 Binary Number Systems Complement: Sign: an-1
Properties: asymmetric range, compatible with unsigned numbers in many arithmetic operations. (same treatment of positive and negative numbers) One’s (1’s) complement: similar to 2’s complement Value: Range: 17: Adders

21 Binary Number Systems Complement: Sign: an-1
Properties: double representation of zero, symmetric range, modulo (2n-1) number system. Sign-magnitude: alternative representation of signed numbers Value: Range: 17: Adders

22 Binary Number Systems Sign: an-1
Properties: double representation of zero, symmetric range, different treatment of positive and negative numbers in arithmetic operations, no MSB toggles at sign changes around 0 (=> low power) 17: Adders

23 Gray Numbers Gray numbers (code): binary, irredundant, non-weighted, non-monotonic. Property: unit-distance coding. Exactly one-bit toggles between adjacent numbers. Applications: counters with low output toggle rate (low power busses), representation of continuous signals for low-error sampling (no false numbers due to switching of different bits at different times). Non-monotonic numbers: difficult arithmetic operations (addition, comparison). 17: Adders

24 Gray Numbers Binary - Gray conversion Gray – binary conversion
17: Adders

25 Redundant Number Systems
Non-binary, redundant, weighted number systems. Digit set larger than radix (typically radix 2) => multiple representations of the same number => redundancy. No carry propagation in adders => more efficient implementation of adder-based units (multipliers, dividers, etc.) Redundancy => no direct implementation of relational operators => conversion to irredundant numbers. Several bits used to represent one digit => higher storage requirements. Expensive conversion to irredundant numbers. Not necessary if redundant input operators are allowed. 17: Adders

26 Delayed-Carry Representation
Delayed-carry or half adder representation 1 digit holds the sum of 2 bits (no carry out) Example: = (0,0) (1,0) = 2 17: Adders

27 Carry-Save Representation
One digit holds the sum of 3 bits or 1 digit and 1 bit. No carry-out digit, carry is saved. Standard redundant number system for fast addition. 17: Adders

28 Signed-Digit Representation
Signed-digit (SD) or redundant digit (RD) number representation. No carry propagation in S = R + T One digit holds the sum of two digits. No carry-out. 17: Adders

29 Signed-Digit Representation
Minimal SD representation: minimal number of non-zero digits. Applications: sequential multiplication (less cycles), filters with constant coefficients (less hardware). Example: minimal 17: Adders

30 Signed-Digit Representation
Canonical SD representation: minimal SD. Not two non-zero digits in sequence. SD -> binary: carry propagation necessary => adder. Other applications: high speed multipliers. Similar to carry-save, simple use for signed numbers. 17: Adders

31 Residue Number Systems
Non-binary, irredundant, non-weighted number system. Carry-free and fast additions and multiplications. Complex and slow other arithmetic operations (e.g. comparison, sign, and overflow detection) because digits are not weighted. Conversion to weighted mixed-radix or binary system required. Codes for error correction and detection. Possible applications (but hardly used) Digital filters Error detection and correction 17: Adders

32 Residue Number Systems
Base is n-tuple of integers (mn-1, mn-2, …, m0), residues (or moduli). These mi are pairwise prime. Arithmetic operations: each digit computed separately. 17: Adders

33 Residue Number Systems
Best moduli mi are 2k and 2k – 1. High storage efficiency with k bits. Simple modular addition k bit adder without cout 17: Adders

34 Residue Number Systems
Example: 17: Adders

35 Floating-Point Numbers
Larger range, smaller precision than fixed-point representation, inexact, real numbers. Double-number form => discontinuous precision. S | biased exponent E | unsigned norm mantissa M Basic arithmetic operations 17: Adders

36 Floating-Point Numbers
Basic arithmetic operations based in fixed point add, multiply, and shift operations. Post-normalization required. Applications: Processors: real floating point formats (e.g. IEEE standard), large range due to universal use. ASICs: usually simplified floating-point formats with small exponents, smaller range. Used for range extension of normal fixed-point numbers. IEEE floating point format: 17: Adders

37 Logarithmic Number System
Alternative representation to floating point (mantissa + integer exponent -> only fixed point exponent). Single number form => continuous precision => higher accuracy, more reliable. Basic arithmetic operations: (A < B) = (EA < EB) additionally consider sign A + B by approximation or addition in conventional number system and double conversion. 17: Adders

38 Logarithmic Number System
Basic arithmetic operations Simpler multiplication, exponentiation. More complex addition. Expensive conversion: (anti)logarithms probably by table look-up. Applications: real-time digital filters. 17: Adders

39 Antitetrational Number System
Tetration (t.x = and antitetration (a.t.x) Larger range, but smaller precision than logarithmic representation. Otherwise, analogous. Note that all these systems can be mixed in composite arithmetic. Choice of number representation should be hidden from the user. The compiler should handle it. Rational numbers can also be represented in floating slash notation. 17: Adders

40 Round-Off Schemes Intermediate results with d additional lower bits. This results in higher accuracy. Rounding: keeping error e small during final word length reduction: Trade-off: numerical accuracy vs implementation cost. Truncation = average error e Round to nearest (normal rounding) 17: Adders

41 Round-Off Schemes Round to nearest The error is nearly symmetric
can often be included in a previous operation. Round to nearest even/odd bias = 0 (symmetric) Mandatory in IEEE floating-point standard 3 guard bits for rounding after floating point operations: guard bit G (postnormalization), round bit R (round to nearest ), sticky bit S (round to nearest even) 17: Adders

42 Addition 17: Adders

43 Single-Bit Addition Half Adder Full Adder A B Cout S 1 A B C Cout S 1
1 A B C Cout S 1 17: Adders

44 1-Bit Adders Add up m bits of same magnitude
Output the sum as a k-bit number ( ) Or count 1’s at inputs => (m,k) counter – combinational counter. A half adder is a (2,2) counter 17: Adders

45 1-Bit Adders 17: Adders

46 1-Bit Adders A full-adder is a (3,2) counter. 17: Adders

47 PGK For a full adder, define what happens to carries
(in terms of A and B) Generate: Cout = 1 independent of C G = A • B Propagate: Cout = C P = A  B Kill: Cout = 0 independent of C K = ~A • ~B 17: Adders

48 Full Adder Design I Brute force implementation from eqns 17: Adders

49 Full Adder Design II Factor S in terms of Cout
S = ABC + (A + B + C)(~Cout) Critical path is usually C to Cout in ripple adder 17: Adders

50 Full Adder Design II Same circuit with sized transistors 17: Adders

51 Layout Clever layout circumvents usual line of diffusion
Use wide transistors on critical path Eliminate output inverters 17: Adders

52 Full Adder Design III Complementary Pass Transistor Logic (CPL)
Slightly faster, but more area 17: Adders

53 Full Adder Design III Transmission gates 17: Adders

54 Full Adder Design IV Dual-rail domino
Very fast, but large and power hungry Used in very fast multipliers 17: Adders

55 (m,k) Counters Usually built from full-adders.
Associativity of addition allows conversion from linear to tree structure => faster at the same number of FAs. 17: Adders

56 (7,3) Counter Example 17: Adders

57 Carry Propagate Adders
Add two n-bit operands A and B and an optional carry in cin by performing carry propagation. Sum (cout, S) is an irredundant (n+1) bit number 17: Adders

58 Carry Propagate Adders
N-bit adder called CPA Each sum bit depends on all previous carries How do we compute all these carries quickly? 17: Adders

59 Ripple-Carry Adder(RCA)
Serial arrangement of n full adders. Simplest, smallest, and slowest CPA structure. 17: Adders

60 Carry-Ripple Adder Simplest design: cascade full adders
Critical path goes from Cin to Cout Design full adder to have fast carry delay 17: Adders

61 Carry Ripple Adder Note that worst case delay is linear with number of bits. Goal: Make the fastest possible carry path circuit. 17: Adders

62 A Full Adder Circuit 17: Adders

63 Inversion Property 17: Adders

64 Inversions Critical path passes through majority gate
Built from minority + inverter Eliminate inverter and use inverting full adder 17: Adders

65 Mirror Adder 17: Adders

66 Mirror Adder 17: Adders

67 Mirror Adder The NMOS and PMOS chains are completely symmetrical. A maximum of two series transistors can be observed in the carry generation circuit. When laying out the cell, the most critical issue is the minimization of the capacitance at node Co. The reduction of the diffusion capacitances is particularly important. The capacitance at node Co is composed of four diffusion capacitances, two internal gate capacitances, and six gate capacitances in the connecting adder cell. 17: Adders

68 Mirror Adder The transistors connected to Ci are placed closest to the input. Only the transistors in the carry stage have to be optimized for optimal speed. All transistors in the sum stage can be minimal size. 17: Adders

69 Transmission Gate FA 17: Adders

70 Carry Propagation Speed-up
Concatenation of partial CPA’s with fast cin -> cout. Fast carry look-ahead logic for entire range of bits. 17: Adders

71 Generate / Propagate Equations often factored into G and P
Generate and propagate for groups spanning i:j Base case Sum: 17: Adders

72 PG Logic 17: Adders

73 PG Logic 17: Adders

74 Carry-Ripple Revisited
17: Adders

75 Carry-Ripple PG Diagram
17: Adders

76 PG Diagram Notation 17: Adders

77 Manchester Carry Chain
17: Adders

78 Manchester Carry Chain
17: Adders

79 Manchester Carry Chain
17: Adders

80 Carry-Skip Adder Carry-ripple is slow through all N stages
Carry-skip allows carry to skip over groups of n bits Decision based on n-bit propagate signal 17: Adders

81 Carry-Skip Adder 17: Adders

82 Carry-Skip Adder 17: Adders

83 Carry-Skip Adder 17: Adders

84 Carry-Skip PG Diagram For k n-bit groups (N = nk) 17: Adders

85 Variable Group Size Delay grows as O(sqrt(N)) 17: Adders

86 Carry-Skip Adder Partial CPA with fast ck -> ci
If Pi-1:k = 0 : ck does not become c’i and c’i is selected, becoming ci. If Pi-1:k = 0 : ck becomes c’i, but c’i is skipped. Path ck -> c’i -> ci never sensitized => fast ck -> ci False path => inherent logic redundancy => problems in circuit optimization, timing analysis, and testing. 17: Adders

87 Carry-Skip Adder Variable group sizes are faster.
Use larger groups in the middle Minimize delays a0 -> ck -> si-1 and ak -> ci -> sn-1 Partial CPA type is RCA or CSKA (multilevel CSKA) Medium speed-up at small hardware overhead (+ AND/bit +MUX/group) 17: Adders

88 CSKA + Manchester 17: Adders

89 Carry-Select Adder Trick for critical paths dependent on late input X
Precompute two possible outputs for X = 0, 1 Select proper output when X arrives Carry-select adder precomputes n-bit sums For both possible carries into n-bit group 17: Adders

90 Carry-Select Adder Partial CPA with fast ck -> ci and ck -> si-1:k Two CPA’s compute two possible results (cin = 0/1), group carry-in ck selects correct one afterwards. Variable group sizes are faster; use larger groups at end (MSB). Balance delays a0 -> ck and ak -> ci0 Partial CPA type is RCA, CSLA (multilevel CSLA) or CLA. 17: Adders

91 Carry-Select Adder High speed-up at high hardware overhead.
+ MUX/bit + (CPA + MUX)/group 17: Adders

92 Carry-Select Adder 17: Adders

93 Carry-Select Adder 17: Adders

94 Linear Carry-Select 17: Adders

95 Square-Root Carry-Select
17: Adders

96 Delay Comparison 17: Adders

97 Carry-Increment Adder
Partial CPA with fast ck -> ci and ck -> si-1:k Result is incremented after addition if ck = 1 Variable group sizes are faster, use larger groups at end (MSB). Balance delays a0 -> ck and ak -> c’i Partial CPA could be RCA, CIA (multilevel CIA) or CLA. High speed-up at medium hardware overhead (+AND/bit + (incrementer + AND/OR)/group). Logic of CPA and incrementer could be merged. 17: Adders

98 Carry-Increment Adder
17: Adders

99 Carry-Increment Adder
Example: gate-level schematic of carry-increment adder (CIA) Only two different logic cells (bit-slices): IHA and IFA 17: Adders

100 Carry-Increment Adder
Factor initial PG and final XOR out of carry-select 17: Adders

101 Variable Group Size Also buffer noncritical signals 17: Adders

102 Conditional-Sum Adder
Optimized multilevel CSLA with logn levels Correct sum bits or are conditionally selected through logn levels of multiplexers. Bit groups of size 2l at level l. Higher parallelism, more balanced signal paths. Highest speed-up at highest hardware overhead (2RCA + more than logn MUX/bit) 17: Adders

103 Conditional-Sum Adder
17: Adders

104 Conditional-Sum Adder
17: Adders

105 Conditional-Sum Adder
17: Adders

106 Carry-Lookahead Adder
Carries look ahead before sum bits are computed Hierarchical arrangement using levels: passed up, c’0 passed down between levels. High speed-up at medium hardware overhead. 17: Adders

107 Carry-Lookahead Adder
17: Adders

108 Carry-Lookahead Adder
17: Adders

109 Carry-Lookahead Adder
Carry-lookahead adder computes Gi:0 for many bits in parallel. Uses higher-valency cells with more than two inputs. 17: Adders

110 CLA PG Diagram 17: Adders

111 Carry-Lookahead 17: Adders

112 Lookahead Tree 17: Adders

113 Lookahead Tree 17: Adders

114 Higher-Valency Cells 17: Adders

115 Higher Valency PG Diagram
17: Adders

116 Tree Adder If lookahead is good, lookahead across lookahead!
Recursive lookahead gives O(log N) delay Many variations on tree adders 17: Adders

117 Parallel Prefix Adders
Universal adder architecture comprising RCA, CIA, CLA, and more (entire range of area-delay trade-offs from slowest RCA to fastest CLA). Preprocessing, carry-lookahead, and postprocessing step. Carries calculated using parallel-prefix algorithms High regularity: suitable for synthesis and layout High flexibility: special adders, other arthmetic operations, exchangeable prefix algorithms. High performance: smallest and fastest adders 17: Adders

118 Parallel Prefix Adders

119 Prefix Problem Inputs (xn-1,…,x0) outputs (yn-1,…,y0), associative binary operator • Associativity of • => tree structures for evaluation 17: Adders

120 Prefix Problem Group variables : covers bits (xk,…,xi) at level l.
Carry-propagation is prefix problem: Parallel-prefix algorithms: Multi-tree structures T = O(n) -> O(logn) Sharing subtrees A = O(n2) -> O(nlogn) Different algorithms trading area vs delay. Also consider wirng and fanout. 17: Adders

121 Prefix Algorithms Algorithms visualized by directed acyclic graphs (DAG) with array structure (n bits x m levels). Graph vertex symbols Performance measures: A• : graph size (number of black nodes) T• : graph depth (number of black nodes on critical path) 17: Adders

122 Prefix Algorithms Serial prefix algorithm (RCA) 17: Adders

123 Prefix Algorithms Sklansky parallel-prefix algorithm (PPA-SK)
Tree-like collection, parallel redistribution of carries 17: Adders

124 Sklansky 17: Adders

125 Prefix Algorithms Brent-Kung parallel-prefix algorithm (PPA-BK)
Traditional CLA is PPA-BK with 4-bit groups Tree-like redistribution of carries (fan-out tree) 17: Adders

126 Brent-Kung 17: Adders

127 Prefix Algorithms Kogge-Stone parallel-prefix algorithm (PPA-KS)
very high wiring requirements 17: Adders

128 Kogge-Stone 17: Adders

129 Prefix Algorithms Carry-increment parallel-prefix algorithm 17: Adders

130 Prefix Algorithms Mixed serial/parallel-prefix algorithm (RCA+PPA)
Linear size-depth trade-off using parameter k: k = 0 : serial prefix graph : Brent-Kung parallel-prefix graph Fills the gap between RCA and PPA-BK (CLA) in steps of single •-operations. 17: Adders

131 Prefix Algorithms 17: Adders

132 Prefix Algorithms Example: 4-bit PPA-SK
Efficient AND-OR-prefix circuit for the generate and AND-prefix circuit for the propagate signals Optimization: alternatingly AOI/OAI- resp. NAND-/NOR-gates (inverting gatesare smaller and faster). Can also be realized using two MUX-prefix circuits 17: Adders

133 Prefix Algorithms 17: Adders

134 Prefix Algorithms Prefix adders can be synthesized by human or computer as well. Starting from a serial structure, one can use compression rules and expansion rules to obtain new graphs. Can generate all previous graphs except PPA-KS. Universal adder synthesis approach. 17: Adders

135 Tree Adder Taxonomy Ideal N-bit tree adder would have
L = log N logic levels Fanout never exceeding 2 No more than one wiring track between levels Describe adder with 3-D taxonomy (l, f, t) Logic levels: L + l Fanout: 2f + 1 Wiring tracks: 2t Known tree adders sit on plane defined by l + f + t = L-1 17: Adders

136 Tree Adder Taxonomy 17: Adders

137 Han-Carlson 17: Adders

138 Knowles [2, 1, 1, 1] 17: Adders

139 Ladner-Fischer 17: Adders

140 Taxonomy Revisited 17: Adders

141 More Adder Issues Multilevel adders
Multilevel versions of adders possible CSKA, CSLA, CIA Hybrid adders Arbitrary combination of speed-up techniques possible. Often used combinations: CLA – CSLA Transistor level adders Influence of logic styles (dynamic logic, pass transistor logic) Efficient transistor level implementation of ripple-carry chains (Manchester chain) Combinations of speed-up techniques make sense. Much higher design effort Many efficient implementations exist in the literature. Higher valency (radix) also possible. 17: Adders

142 More Adder Issues Higher valency is a poor choice in static CMOS logic since each stage has higher delay. However, if the stages are built using domino logic, it could prove to be an advantage. Nodes with large fanouts or long wires could use buffers. The prefix trees can also be internally pipelined. 17: Adders

143 Transistor Level 17: Adders

144 Transistor Level 17: Adders

145 Transistor Level 17: Adders

146 Higher Valency Adders 17: Adders

147 Sparse Trees Building a prefix tree to compute carries in every bit is expensive in terms of power. An alternative is to compute carries into short groups such as s = 2,3,8, or 16 bits. Meanwhile, pairs of s-bit adders precompute the sums assuming both carries-in of 0 and 1 to each group. It is a hybrid between a prefix adder and carry select adder. 17: Adders

148 Valency-3 BK Adder Sparse tree adder with s = 3 17: Adders

149 Carry-Select Implementation
17: Adders

150 Sparse Tree Adders Intel Valency-2 Sklansky sparse tree adder with s=4

151 Sparse Tree Adders Valency-3 Kogge-Stone sparse tree adder with s=3

152 Ling Adders Ling discovered a technique to remove one series transistor from the critical group generate path at the expense of another XOR gate in the sum precomputation. Define a pseudo-generate Hi:j = Gi + Gi-1:j This is a simpler computation. Define a pseudo-propagate signal I that is a shifted version of propagate. 17: Adders

153 Ling Adders Finally, the sums are computed by 17: Adders

154 Ling Adders 17: Adders

155 Comparison Standard-cell implementation, 0.8mm technology 17: Adders

156 Comparison 17: Adders

157 Summary Adder architectures offer area / power / delay tradeoffs.
Choose the best one for your application. Architecture Classification Logic Levels Max Fanout Tracks Cells Carry-Ripple N-1 1 N Carry-Skip n=4 N/4 + 5 2 1.25N Carry-Inc. n=4 N/4 + 2 4 2N Brent-Kung (L-1, 0, 0) 2log2N – 1 Sklansky (0, L-1, 0) log2N N/2 + 1 0.5 Nlog2N Kogge-Stone (0, 0, L-1) N/2 Nlog2N 17: Adders

158 E vs Delay Trade-off 17: Adders

159 E vs Delay Tradeoff 90nm 64 bit domino KS Ling adder with various valency and s 17: Adders

160 Area vs Delay Synthesized Adders 17: Adders


Download ppt "Lecture 17: Adders."

Similar presentations


Ads by Google