Download presentation

Presentation is loading. Please wait.

Published byBryanna Roderick Modified over 3 years ago

1
Recent Developments in Theory and Implementation of Parallel Prefix Adders Neil Burgess Division of Electronics Cardiff School of Engineering Cardiff University

2
Motivation Parallel Prefix Adders (e.g. Kogge- Stone) mostly ignored for deep submicron VLSI –large fan-out points –wide wiring channels Recent insights: can remove both and do... –absolute difference –late increment –media processing

3
Structure of Presentation Parallel Prefix Adder theory –Kogge-Stone, Ladner-Fisher New log-depth prefix trees –Knowles’ “family of adders” New applications of prefix adders –late operations, media adder

4
I. Parallel Prefix Adder theory

5
Prefix adder structure A(0:w-1) Bit propagate and generate cells g(0:w-1)p(0:w-1) B(0:w-1) c(1:w) Prefix carry tree s(0:w) Sum cells (XOR gates)

6
Prefix Equations - 1 g(i) = a(i) b(i)“carry generate” p(i) = a(i) b(i)“carry propagate” k(i) = {a(i) b(i)}“carry kill” g(i), p(i), & k(i) are mutually exclusive –Use any two: g(i) & k(i) = NAND & NOR –p(i) needed as well: s(i) = p(i) c(i)

7
Prefix Equations - 2 Generate and Not Kill signals are com- bined to form “Group Signals” G x z K x z interpretation 0 0c(x+1) = 0 0 1c(x+1) = c(z) 1 0Don’t care 1 1c(x+1) = 1

8
Prefix Equations - Interpretation Group signals yield carry signals: Tree outputs: c(i+1) = G i 0 Tree inputs: G i i = g(i) ; K i i = k(i)

9
Prefix Equations - characteristics Associative –sub-terms may be pre-computed in parallel

10
Prefix equations - characteristics Idempotent –sub-terms may be “overlapped” g(0), k(0)g g(1), k(1)g g(2), k(2)g GK 1 0 22 11 22 00 c(3)c c(2)c c(1)c

11
4-bit Ladner-Fisher prefix tree 1 sub-term pre-computed Logarithmic depth Fan-out = 2 in 2 nd row (laterally)

12
8-bit Ladner-Fisher prefix tree Log depth; lateral fan-out = 4 in 3 rd row No exploitation of idempotency

13
16-bit Ladner-Fisher prefix tree Log depth with large fan-out in final row

14
4-bit Kogge-Stone prefix graph Fan-out = 1 (laterally) 1 extra cell parallel wires in 2 nd row

15
8-bit Kogge-Stone prefix graph More cells & wiring than Ladner-Fisher

16
16-bit Kogge-Stone prefix graph Low fan-out but wider wiring channels No exploitation of idempotency

17
Black cells and grey cells Carries, c(i) = G i-1 0 ; K i-1 0 terms not needed G-only cells called and coloured “grey”

18
The story so far… Parallel prefix adders available in VLSI Log-depth adders possible: –high fan-outs {1,2,4,8…} & low cell count –low fan-outs {1,1,1,1…} & high cell count Problematic in VLSI (buffering, area) Idempotency of ‘ ’ operator not exploited

19
II. Knowles’ “Family of Adders”

20
Log-depth prefix trees In VLSI: –L-F trees require too much buffering delay –K-S trees require too much area (wire flux) Fan-outs characterised as: –{1,2,4,8…} Ladner-Fisher –{1,1,1,1…} Kogge-Stone

21
Knowles’ insight Use other fan-out schemes 5 possible 8-bit log-depth prefix trees: –{1,1,1}17 cellsKogge-Stone –{1,1,2}17 cellsuses idempotency –{1,1,4}14 cellsno idempotency –{1,2,2}14 cellsno idempotency –{1,2,4} 12 cellsLadner-Fisher

22
Knowles’ 8-bit prefix trees All trees are log-depth

23
Tree construction rules Levels are labelled 0,1,2... Fan-out at j th level, 2 k, satisfies 2 k 2 j Fan-out at j th level fan-out at j+1 th level Lateral wire length at j th level is 2 j

24
Knowles’ 16-bit trees - I {1,1,1,1} 49 cells{1,1,1,8}42 cells {1,1,1,2} 49cells {1,2,2,2} 42 cells {1,1,1,4} 49cells {1,1,4,4} 40 cells {1,1,2,2} 49cells {1,1,4,8} 36 cells {1,1,2,4} 49cells {1,2,2,8} 36 cells {1,1,2,8} 42cells {1,2,4,4} 36 cells {1,2,2,4} 42cells {1,2,4,8} 32 cells

25
Knowles’ 16-bit trees - II {1,1,1,1} {1,1,1,8} {1,1,1,2} Idempotent{1,2,2,2} {1,1,1,4} Idempotent {1,1,4,4} {1,1,2,2} Idempotent {1,1,4,8} {1,1,2,4} Idempotent {1,2,2,8} {1,1,2,8} Idempotent {1,2,4,4} {1,2,2,4} Idempotent{1,2,4,8}

26
Knowles’ 16-bit trees - III {1,1,1,1} {1,1,1,8}R {1,1,1,2} I{1,2,2,2} R {1,1,1,4} I{1,1,4,4} R {1,1,2,2} I{1,1,4,8} R {1,1,2,4} I{1,2,2,8} R {1,1,2,8} R, I{1,2,4,4} R {1,2,2,4} R, I{1,2,4,8} R

27
Quick way of spotting R, I Define span(l) as distance from start of wire to first cell in l th level span(l) = 2 l fanout(l) 1 tree characteristics –R if span(j) span(k) for j < k –I if span(i) + span(j) = span(k) for i < j < k

28
Examples of R & I spotting fanout(l)span(l) characteristic [1,1,1,1] [1,2,4,8] neither R nor I [1,1,2,2] [1,2,3,7] I only [1,2,2,2] [1,1,3,7] R only [1,2,2,4] [1,1,3,5] R & I Are R & I adders “best”?

29
VLSI design of prefix adders Adders laid out as rectangular array of prefix cells (and gaps) Assume cells measure 10 m 4 m –2 cells per significance 20 m / bit Key design parameters: –buffering (area & delay) –wiring channels (area)

30
16-bit adder example Assumptions Maximum fan-out without buffering: –3 cells + 80 m wire (4 cell widths) Maximum fan-out with buffering: –9 cells + 240 m wire (12 cell widths) Employ {1,2,2,4} architecture

31
{1,2,2,4} prefix adder layout

32
Area vs Time for 32-bit adders Delay 1212.51313.514 24 26 28 30 32 34 36 38 40 Area K-S {1,1,1,1,1} {1,1,2,2,2} L-F {1,2,4,8,16} {1,2,2,4,4} [1,1,3,5,13]

33
32-bit prefix tree adders Exploitable trade-off between adder’s delay and area –Kogge-Stone adder 16% faster than Ladner- Fisher but 66% larger –{1,2,2,4,4} adder 8% faster than Ladner-Fisher but only 3% larger –buffering also trades off speed for area

34
III. New applications of prefix adders

35
Other addition operations Late increment –Mod 2 w -1 addition for Reed-Solomon coding –floating-point rounding Late complement –absolute difference for video motion estimation –sign-magnitude addition Typically use 2 adders and a MUX

36
Increments in prefix trees Row of prefix cells = ‘late +1’ operation Ladner-Fisher comprises many late +1’s –1 8-bit, 2 4-bit, 4 2-bit, & 8 1-bit

37
Late increment tree Adder returns A+B if inc = 0 Adder returns A+B+1 if inc = 1 inc

38
Late increment logic “Late Carry” lc(i) set high if: –c(i) = 1 or –inc = 1 and a(n),b(n) 0,0 n: 0 n < i p(i)p(i) s(i)s(i) inc K i 0 c(i) = G i 0 lc(i)

39
Late complement theory In 2’s-complement, N = -(N+1) A + B = A B 1 * late increment then yields A B (A + B) = -(A B 1+1) = B A Absolute difference readily available

40
Absolute difference logic If c(w) = 0, result negative –if c(w) = 0, invert all the bits –else always perform late increment with K i-1 0 p(i) s(i)s(i) c(w)c(w) K i 0 c(i)

41
Summary of “late” ops Available on all prefix adders Extra delay: 1 gate’s delay + buffering Extra hardware: w black cells This technique used in floating-point units –late increment for rounding –late complement for true subtraction

42
Media (“packed”) arithmetic Fundamental strategy: Use full wordlength hardware for multiple sub-wordlength computations Examples: –32-bit adder 4 8-bit adders –32-bit multiplier 2 16-bit multipliers

43
Partitioning an adder Criteria: –support carries propagating within sub-adders –prevent carries propagating between sub- adders Solutions: –put AND gates on carry chains slower adder –put dummy 0’s on operand bits larger adder Use prefix adder!!

44
Packed prefix adder - 1 Force k(n) = 0 at partition points –prevents carries propagating across bit n –exploits don’t care condition (g, k) = (1,0) Implementation –change k(n) gate to (2,1) OR-AND gate –delay-neutral modification

45
Packed prefix adder - 2 Force c(n) = G n-1 0 = 0 at partition points –prevents c(n) s(n) errors Implementation –insert AND gates (off critical path) or –change G n-1 0 gate to ({2,1},1) complex gate –BUT need G n-1 0 signal for sub-adder overflows

46
Packed prefix adder - 3 Sub-adder carries complete early Extraneous cells automatically do nothing

47
Last Slide Recent developments in prefix adders: –new “family” of log-depth trees –late operations –packed arithmetic for media processing Future possibilities: –systematic exploitation of idempotency –trees with reduced buffering –combine packed arithmetic/late ops

48
ANY QUESTIONS OR COMMENTS?

Similar presentations

OK

CS1Q Computer Systems Lecture 9 Simon Gay. Lecture 9CS1Q Computer Systems - Simon Gay2 Addition We want to be able to do arithmetic on computers and therefore.

CS1Q Computer Systems Lecture 9 Simon Gay. Lecture 9CS1Q Computer Systems - Simon Gay2 Addition We want to be able to do arithmetic on computers and therefore.

© 2018 SlidePlayer.com Inc.

All rights reserved.

By using this website, you agree with our use of **cookies** to functioning of the site. More info in our Privacy Policy and Google Privacy & Terms.

Ads by Google

Ppt on mahatma gandhi in hindi Ppt on 10 sikh gurus sikhism Ppt on beer lambert law extinction Ppt on online shopping mall Ppt on synthesis and degradation of purines and pyrimidines size Ppt on ozone layer depletion Ppt on care of public property auction Ppt on building construction in india Ppt on topography of pakistan Ppt on the french revolution and the idea of the nation