Lecture 17: Adders.

Name: Lecture 17: Adders.
Uploaded: 2017-07-08T10:24:01+00:00
Duration: PTM38S50
Channel: Dina Beasley
Description: Lecture 17: Adders.

Lecture 17: Adders

Outline Datapath Computer Arithmetic Principles Single-bit Addition
Carry-Ripple Adder Carry-Skip Adder Carry-Lookahead Adder Carry-Select Adder Carry-Increment Adder Tree Adder 17: Adders

A Generic Digital Processor
17: Adders

Building Blocks for Digital Architectures
Arithmetic unit Bit sliced data path – adder, multiplier, shifter, comparator, etc. Memory RAM, ROM, buffers, shift registers Control Finite state machine (PLA, random logic) Counters Interconnect Switches, arbiters, bus 17: Adders

An Intel Microprocessor
17: Adders

Bit-Sliced Design 17: Adders

Bit-Sliced Datapath 17: Adders

Itanium Integer Datapath
17: Adders

Motivation Arithmetic units are, among others, core of every data path and addressing unit. Data path is at the core of microprocessors (CPU) signal processors (DSP) data processing application specific IC’s (ASIC) and programmable IC’s (FPGA) Standard arithmetic units available from libraries Design of arithmetic units necessary for non-standard operations high performance components library development 17: Adders

Naming Conventions Signal busses: A (1-D), Ai, (2-D), ai:k (sub-bus, 1-D) Signals: a, ai (1-D), ai,k (2-D), Ai:k (group signal) Circuit complexity measures: A (Area), T (cycle time, delay), AT (area-time product), L (latency, number of cycles). Arithmetic operators: +, -, •, /, log (=log2) Logic operators: OR, AND, XOR, NOT, … 17: Adders

Circuit Complexity Measures
Unit gate model Inverter, buffer: A = 0, T = 0 Simple monotonic 2-input gates (AND, OR, NAND, NOR): A = 1, T = 1 Simple non-monotonic 2-input gates (XOR, XNOR): A = 2, T = 2 Simple m-input gates: A = m – 1, T = Wiring not considered Only for estimation purposes 17: Adders

Recursive Function Evaluation
Given: inputs ai, outputs zi, function f (graph sym. •) Non-recursive functions (n.) Output zi is a function of input ai Parallel structure 17: Adders

Recursive functions (r.) Output zi is a function of all inputs ak, k ≤ i with a single output z = zn-1 (r.s.): f is non-associative (r.s.n) serial structure f is associative (r.s.a) serial or single-tree structure 17: Adders

Output zi is a function of all inputs ak, k ≤ i multiple outputs zi (r.m.) (=> prefix problem) f is non-associative (r.m.n) serial structure f is associative (r.m.a) Serial or multi-tree structure Shared tree structure 17: Adders

Arithmetic Operations
Overview 17: Adders

Overview of Arithmetic Operations
Direct implementation of dedicated units always: 1 – 5 in most cases: 6 sometimes: 7, 8 Sequential implementation using simpler units and several clock cycles (decomposition) sometimes: 6 in most cases: 7, 8, 9 Table look-up techniques using ROMs universal: simple application to all operations efficient only for single-operand operations of high complexity (8 - 12) and small word length. 17: Adders

Overview of Arithmetic Operations
Approximation using simpler units: 7 – 12 Taylor series expansion polynomial and rational approximations convergence of recursive equation systems CORDIC (COordinate Rotation DIgital Computer) 17: Adders

Binary Number Systems Radix-2, binary number system (BNS): irredundant, weighted, positional, monotonic. n-bit number is an ordered sequence of bits (binary digits) Simple and efficient implementation in digital circuits MSB/LSB (most/least significant bit): an-1/a0 Represents an integer or fixed point number, exact. Fixed point numbers: m-bit integer n-m bit fraction 17: Adders

Binary Number Systems Unsigned: positive or natural numbers Value:
Range: Two’s (2’s) complement: standard representation of signed or integer numbers Value Range 17: Adders

Binary Number Systems Complement: Sign: an-1
Properties: asymmetric range, compatible with unsigned numbers in many arithmetic operations. (same treatment of positive and negative numbers) One’s (1’s) complement: similar to 2’s complement Value: Range: 17: Adders

Binary Number Systems Complement: Sign: an-1
Properties: double representation of zero, symmetric range, modulo (2n-1) number system. Sign-magnitude: alternative representation of signed numbers Value: Range: 17: Adders

Binary Number Systems Sign: an-1
Properties: double representation of zero, symmetric range, different treatment of positive and negative numbers in arithmetic operations, no MSB toggles at sign changes around 0 (=> low power) 17: Adders

Gray Numbers Gray numbers (code): binary, irredundant, non-weighted, non-monotonic. Property: unit-distance coding. Exactly one-bit toggles between adjacent numbers. Applications: counters with low output toggle rate (low power busses), representation of continuous signals for low-error sampling (no false numbers due to switching of different bits at different times). Non-monotonic numbers: difficult arithmetic operations (addition, comparison). 17: Adders

Gray Numbers Binary - Gray conversion Gray – binary conversion
17: Adders

Redundant Number Systems
Non-binary, redundant, weighted number systems. Digit set larger than radix (typically radix 2) => multiple representations of the same number => redundancy. No carry propagation in adders => more efficient implementation of adder-based units (multipliers, dividers, etc.) Redundancy => no direct implementation of relational operators => conversion to irredundant numbers. Several bits used to represent one digit => higher storage requirements. Expensive conversion to irredundant numbers. Not necessary if redundant input operators are allowed. 17: Adders

Delayed-Carry Representation
Delayed-carry or half adder representation 1 digit holds the sum of 2 bits (no carry out) Example: = (0,0) (1,0) = 2 17: Adders

Carry-Save Representation
One digit holds the sum of 3 bits or 1 digit and 1 bit. No carry-out digit, carry is saved. Standard redundant number system for fast addition. 17: Adders

Signed-Digit Representation
Signed-digit (SD) or redundant digit (RD) number representation. No carry propagation in S = R + T One digit holds the sum of two digits. No carry-out. 17: Adders

Minimal SD representation: minimal number of non-zero digits. Applications: sequential multiplication (less cycles), filters with constant coefficients (less hardware). Example: minimal 17: Adders

Canonical SD representation: minimal SD. Not two non-zero digits in sequence. SD -> binary: carry propagation necessary => adder. Other applications: high speed multipliers. Similar to carry-save, simple use for signed numbers. 17: Adders

Residue Number Systems
Non-binary, irredundant, non-weighted number system. Carry-free and fast additions and multiplications. Complex and slow other arithmetic operations (e.g. comparison, sign, and overflow detection) because digits are not weighted. Conversion to weighted mixed-radix or binary system required. Codes for error correction and detection. Possible applications (but hardly used) Digital filters Error detection and correction 17: Adders

Base is n-tuple of integers (mn-1, mn-2, …, m0), residues (or moduli). These mi are pairwise prime. Arithmetic operations: each digit computed separately. 17: Adders

Best moduli mi are 2k and 2k – 1. High storage efficiency with k bits. Simple modular addition k bit adder without cout 17: Adders

Example: 17: Adders

Floating-Point Numbers
Larger range, smaller precision than fixed-point representation, inexact, real numbers. Double-number form => discontinuous precision. S | biased exponent E | unsigned norm mantissa M Basic arithmetic operations 17: Adders

Floating-Point Numbers
Basic arithmetic operations based in fixed point add, multiply, and shift operations. Post-normalization required. Applications: Processors: real floating point formats (e.g. IEEE standard), large range due to universal use. ASICs: usually simplified floating-point formats with small exponents, smaller range. Used for range extension of normal fixed-point numbers. IEEE floating point format: 17: Adders

Logarithmic Number System
Alternative representation to floating point (mantissa + integer exponent -> only fixed point exponent). Single number form => continuous precision => higher accuracy, more reliable. Basic arithmetic operations: (A < B) = (EA < EB) additionally consider sign A + B by approximation or addition in conventional number system and double conversion. 17: Adders

Logarithmic Number System
Basic arithmetic operations Simpler multiplication, exponentiation. More complex addition. Expensive conversion: (anti)logarithms probably by table look-up. Applications: real-time digital filters. 17: Adders

Antitetrational Number System
Tetration (t.x = and antitetration (a.t.x) Larger range, but smaller precision than logarithmic representation. Otherwise, analogous. Note that all these systems can be mixed in composite arithmetic. Choice of number representation should be hidden from the user. The compiler should handle it. Rational numbers can also be represented in floating slash notation. 17: Adders

Round-Off Schemes Intermediate results with d additional lower bits. This results in higher accuracy. Rounding: keeping error e small during final word length reduction: Trade-off: numerical accuracy vs implementation cost. Truncation = average error e Round to nearest (normal rounding) 17: Adders

Round-Off Schemes Round to nearest The error is nearly symmetric
can often be included in a previous operation. Round to nearest even/odd bias = 0 (symmetric) Mandatory in IEEE floating-point standard 3 guard bits for rounding after floating point operations: guard bit G (postnormalization), round bit R (round to nearest ), sticky bit S (round to nearest even) 17: Adders

Addition 17: Adders

Single-Bit Addition Half Adder Full Adder A B Cout S 1 A B C Cout S 1
1 A B C Cout S 1 17: Adders

1-Bit Adders Add up m bits of same magnitude
Output the sum as a k-bit number ( ) Or count 1’s at inputs => (m,k) counter – combinational counter. A half adder is a (2,2) counter 17: Adders

1-Bit Adders 17: Adders

1-Bit Adders A full-adder is a (3,2) counter. 17: Adders

PGK For a full adder, define what happens to carries
(in terms of A and B) Generate: Cout = 1 independent of C G = A • B Propagate: Cout = C P = A  B Kill: Cout = 0 independent of C K = ~A • ~B 17: Adders

Full Adder Design I Brute force implementation from eqns 17: Adders

Full Adder Design II Factor S in terms of Cout
S = ABC + (A + B + C)(~Cout) Critical path is usually C to Cout in ripple adder 17: Adders

Full Adder Design II Same circuit with sized transistors 17: Adders

Layout Clever layout circumvents usual line of diffusion
Use wide transistors on critical path Eliminate output inverters 17: Adders

Full Adder Design III Complementary Pass Transistor Logic (CPL)
Slightly faster, but more area 17: Adders

Full Adder Design III Transmission gates 17: Adders

Full Adder Design IV Dual-rail domino
Very fast, but large and power hungry Used in very fast multipliers 17: Adders

(m,k) Counters Usually built from full-adders.
Associativity of addition allows conversion from linear to tree structure => faster at the same number of FAs. 17: Adders

(7,3) Counter Example 17: Adders

Carry Propagate Adders
Add two n-bit operands A and B and an optional carry in cin by performing carry propagation. Sum (cout, S) is an irredundant (n+1) bit number 17: Adders

Carry Propagate Adders
N-bit adder called CPA Each sum bit depends on all previous carries How do we compute all these carries quickly? 17: Adders

Ripple-Carry Adder(RCA)
Serial arrangement of n full adders. Simplest, smallest, and slowest CPA structure. 17: Adders

Carry-Ripple Adder Simplest design: cascade full adders
Critical path goes from Cin to Cout Design full adder to have fast carry delay 17: Adders

Carry Ripple Adder Note that worst case delay is linear with number of bits. Goal: Make the fastest possible carry path circuit. 17: Adders

A Full Adder Circuit 17: Adders

Inversion Property 17: Adders

Inversions Critical path passes through majority gate
Built from minority + inverter Eliminate inverter and use inverting full adder 17: Adders

Mirror Adder 17: Adders

Mirror Adder The NMOS and PMOS chains are completely symmetrical. A maximum of two series transistors can be observed in the carry generation circuit. When laying out the cell, the most critical issue is the minimization of the capacitance at node Co. The reduction of the diffusion capacitances is particularly important. The capacitance at node Co is composed of four diffusion capacitances, two internal gate capacitances, and six gate capacitances in the connecting adder cell. 17: Adders

Mirror Adder The transistors connected to Ci are placed closest to the input. Only the transistors in the carry stage have to be optimized for optimal speed. All transistors in the sum stage can be minimal size. 17: Adders

Transmission Gate FA 17: Adders

Carry Propagation Speed-up
Concatenation of partial CPA’s with fast cin -> cout. Fast carry look-ahead logic for entire range of bits. 17: Adders

Generate / Propagate Equations often factored into G and P
Generate and propagate for groups spanning i:j Base case Sum: 17: Adders

PG Logic 17: Adders

Carry-Ripple Revisited
17: Adders

Carry-Ripple PG Diagram
17: Adders

PG Diagram Notation 17: Adders

Manchester Carry Chain
17: Adders

Carry-Skip Adder Carry-ripple is slow through all N stages
Carry-skip allows carry to skip over groups of n bits Decision based on n-bit propagate signal 17: Adders

Carry-Skip Adder 17: Adders

Carry-Skip PG Diagram For k n-bit groups (N = nk) 17: Adders

Variable Group Size Delay grows as O(sqrt(N)) 17: Adders

Carry-Skip Adder Partial CPA with fast ck -> ci
If Pi-1:k = 0 : ck does not become c’i and c’i is selected, becoming ci. If Pi-1:k = 0 : ck becomes c’i, but c’i is skipped. Path ck -> c’i -> ci never sensitized => fast ck -> ci False path => inherent logic redundancy => problems in circuit optimization, timing analysis, and testing. 17: Adders

Carry-Skip Adder Variable group sizes are faster.
Use larger groups in the middle Minimize delays a0 -> ck -> si-1 and ak -> ci -> sn-1 Partial CPA type is RCA or CSKA (multilevel CSKA) Medium speed-up at small hardware overhead (+ AND/bit +MUX/group) 17: Adders

CSKA + Manchester 17: Adders

Carry-Select Adder Trick for critical paths dependent on late input X
Precompute two possible outputs for X = 0, 1 Select proper output when X arrives Carry-select adder precomputes n-bit sums For both possible carries into n-bit group 17: Adders

Carry-Select Adder Partial CPA with fast ck -> ci and ck -> si-1:k Two CPA’s compute two possible results (cin = 0/1), group carry-in ck selects correct one afterwards. Variable group sizes are faster; use larger groups at end (MSB). Balance delays a0 -> ck and ak -> ci0 Partial CPA type is RCA, CSLA (multilevel CSLA) or CLA. 17: Adders

Carry-Select Adder High speed-up at high hardware overhead.
+ MUX/bit + (CPA + MUX)/group 17: Adders

Carry-Select Adder 17: Adders

Linear Carry-Select 17: Adders

Square-Root Carry-Select
17: Adders

Delay Comparison 17: Adders

Carry-Increment Adder
Partial CPA with fast ck -> ci and ck -> si-1:k Result is incremented after addition if ck = 1 Variable group sizes are faster, use larger groups at end (MSB). Balance delays a0 -> ck and ak -> c’i Partial CPA could be RCA, CIA (multilevel CIA) or CLA. High speed-up at medium hardware overhead (+AND/bit + (incrementer + AND/OR)/group). Logic of CPA and incrementer could be merged. 17: Adders

17: Adders

Example: gate-level schematic of carry-increment adder (CIA) Only two different logic cells (bit-slices): IHA and IFA 17: Adders

Factor initial PG and final XOR out of carry-select 17: Adders

Variable Group Size Also buffer noncritical signals 17: Adders

Conditional-Sum Adder
Optimized multilevel CSLA with logn levels Correct sum bits or are conditionally selected through logn levels of multiplexers. Bit groups of size 2l at level l. Higher parallelism, more balanced signal paths. Highest speed-up at highest hardware overhead (2RCA + more than logn MUX/bit) 17: Adders

Conditional-Sum Adder
17: Adders

Carry-Lookahead Adder
Carries look ahead before sum bits are computed Hierarchical arrangement using levels: passed up, c’0 passed down between levels. High speed-up at medium hardware overhead. 17: Adders

17: Adders

Carry-lookahead adder computes Gi:0 for many bits in parallel. Uses higher-valency cells with more than two inputs. 17: Adders

CLA PG Diagram 17: Adders

Carry-Lookahead 17: Adders

Lookahead Tree 17: Adders

Higher-Valency Cells 17: Adders

Higher Valency PG Diagram
17: Adders

Tree Adder If lookahead is good, lookahead across lookahead!
Recursive lookahead gives O(log N) delay Many variations on tree adders 17: Adders

Parallel Prefix Adders
Universal adder architecture comprising RCA, CIA, CLA, and more (entire range of area-delay trade-offs from slowest RCA to fastest CLA). Preprocessing, carry-lookahead, and postprocessing step. Carries calculated using parallel-prefix algorithms High regularity: suitable for synthesis and layout High flexibility: special adders, other arthmetic operations, exchangeable prefix algorithms. High performance: smallest and fastest adders 17: Adders

Parallel Prefix Adders

Prefix Problem Inputs (xn-1,…,x0) outputs (yn-1,…,y0), associative binary operator • Associativity of • => tree structures for evaluation 17: Adders

Prefix Problem Group variables : covers bits (xk,…,xi) at level l.
Carry-propagation is prefix problem: Parallel-prefix algorithms: Multi-tree structures T = O(n) -> O(logn) Sharing subtrees A = O(n2) -> O(nlogn) Different algorithms trading area vs delay. Also consider wirng and fanout. 17: Adders

Prefix Algorithms Algorithms visualized by directed acyclic graphs (DAG) with array structure (n bits x m levels). Graph vertex symbols Performance measures: A• : graph size (number of black nodes) T• : graph depth (number of black nodes on critical path) 17: Adders

Prefix Algorithms Serial prefix algorithm (RCA) 17: Adders

Prefix Algorithms Sklansky parallel-prefix algorithm (PPA-SK)
Tree-like collection, parallel redistribution of carries 17: Adders

Sklansky 17: Adders

Prefix Algorithms Brent-Kung parallel-prefix algorithm (PPA-BK)
Traditional CLA is PPA-BK with 4-bit groups Tree-like redistribution of carries (fan-out tree) 17: Adders

Brent-Kung 17: Adders

Prefix Algorithms Kogge-Stone parallel-prefix algorithm (PPA-KS)
very high wiring requirements 17: Adders

Kogge-Stone 17: Adders

Prefix Algorithms Carry-increment parallel-prefix algorithm 17: Adders

Prefix Algorithms Mixed serial/parallel-prefix algorithm (RCA+PPA)
Linear size-depth trade-off using parameter k: k = 0 : serial prefix graph : Brent-Kung parallel-prefix graph Fills the gap between RCA and PPA-BK (CLA) in steps of single •-operations. 17: Adders

Prefix Algorithms 17: Adders

Prefix Algorithms Example: 4-bit PPA-SK
Efficient AND-OR-prefix circuit for the generate and AND-prefix circuit for the propagate signals Optimization: alternatingly AOI/OAI- resp. NAND-/NOR-gates (inverting gatesare smaller and faster). Can also be realized using two MUX-prefix circuits 17: Adders

Prefix Algorithms 17: Adders

Prefix Algorithms Prefix adders can be synthesized by human or computer as well. Starting from a serial structure, one can use compression rules and expansion rules to obtain new graphs. Can generate all previous graphs except PPA-KS. Universal adder synthesis approach. 17: Adders

Tree Adder Taxonomy Ideal N-bit tree adder would have
L = log N logic levels Fanout never exceeding 2 No more than one wiring track between levels Describe adder with 3-D taxonomy (l, f, t) Logic levels: L + l Fanout: 2f + 1 Wiring tracks: 2t Known tree adders sit on plane defined by l + f + t = L-1 17: Adders

Tree Adder Taxonomy 17: Adders

Han-Carlson 17: Adders

Knowles [2, 1, 1, 1] 17: Adders

Ladner-Fischer 17: Adders

Taxonomy Revisited 17: Adders

More Adder Issues Multilevel adders
Multilevel versions of adders possible CSKA, CSLA, CIA Hybrid adders Arbitrary combination of speed-up techniques possible. Often used combinations: CLA – CSLA Transistor level adders Influence of logic styles (dynamic logic, pass transistor logic) Efficient transistor level implementation of ripple-carry chains (Manchester chain) Combinations of speed-up techniques make sense. Much higher design effort Many efficient implementations exist in the literature. Higher valency (radix) also possible. 17: Adders

More Adder Issues Higher valency is a poor choice in static CMOS logic since each stage has higher delay. However, if the stages are built using domino logic, it could prove to be an advantage. Nodes with large fanouts or long wires could use buffers. The prefix trees can also be internally pipelined. 17: Adders

Transistor Level 17: Adders

Higher Valency Adders 17: Adders

Sparse Trees Building a prefix tree to compute carries in every bit is expensive in terms of power. An alternative is to compute carries into short groups such as s = 2,3,8, or 16 bits. Meanwhile, pairs of s-bit adders precompute the sums assuming both carries-in of 0 and 1 to each group. It is a hybrid between a prefix adder and carry select adder. 17: Adders

Valency-3 BK Adder Sparse tree adder with s = 3 17: Adders

Carry-Select Implementation
17: Adders

Sparse Tree Adders Intel Valency-2 Sklansky sparse tree adder with s=4

Sparse Tree Adders Valency-3 Kogge-Stone sparse tree adder with s=3

Ling Adders Ling discovered a technique to remove one series transistor from the critical group generate path at the expense of another XOR gate in the sum precomputation. Define a pseudo-generate Hi:j = Gi + Gi-1:j This is a simpler computation. Define a pseudo-propagate signal I that is a shifted version of propagate. 17: Adders

Ling Adders Finally, the sums are computed by 17: Adders

Ling Adders 17: Adders

Comparison Standard-cell implementation, 0.8mm technology 17: Adders

Comparison 17: Adders

Summary Adder architectures offer area / power / delay tradeoffs.
Choose the best one for your application. Architecture Classification Logic Levels Max Fanout Tracks Cells Carry-Ripple N-1 1 N Carry-Skip n=4 N/4 + 5 2 1.25N Carry-Inc. n=4 N/4 + 2 4 2N Brent-Kung (L-1, 0, 0) 2log2N – 1 Sklansky (0, L-1, 0) log2N N/2 + 1 0.5 Nlog2N Kogge-Stone (0, 0, L-1) N/2 Nlog2N 17: Adders

E vs Delay Trade-off 17: Adders

E vs Delay Tradeoff 90nm 64 bit domino KS Ling adder with various valency and s 17: Adders

Area vs Delay Synthesized Adders 17: Adders

Lecture 17: Adders.

Similar presentations

Presentation on theme: "Lecture 17: Adders."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 17: Adders.

Similar presentations

Presentation on theme: "Lecture 17: Adders."— Presentation transcript:

Similar presentations

About project

Feedback