Presentation is loading. Please wait.

Presentation is loading. Please wait.

CALTECH CS137 Winter2006 -- DeHon 1 CS137: Electronic Design Automation Day 9: January 30, 2006 Parallel Prefix.

Similar presentations


Presentation on theme: "CALTECH CS137 Winter2006 -- DeHon 1 CS137: Electronic Design Automation Day 9: January 30, 2006 Parallel Prefix."— Presentation transcript:

1 CALTECH CS137 Winter2006 -- DeHon 1 CS137: Electronic Design Automation Day 9: January 30, 2006 Parallel Prefix

2 CALTECH CS137 Winter2006 -- DeHon 2 Today Bit-Level –Addition –LUT Cascades For Sums –Applications FSMs SATADD Data Forwarding Pointer Jumping –Applications

3 CALTECH CS137 Winter2006 -- DeHon 3 Introduction / Reminder Addition in Log Time

4 CALTECH CS137 Winter2006 -- DeHon 4 Ripple Carry Addition Simple “definition” of addition Serially resolve carry at each bit

5 CALTECH CS137 Winter2006 -- DeHon 5 CLA Think about each adder bit as a computing a function on the carry in –C[i]=g(c[i-1]) –Particular function f will depend on a[i], b[i] –G=f(a,b)

6 CALTECH CS137 Winter2006 -- DeHon 6 Functions What functions can g(c[i-1]) be? –g(x)=1 a[i]=b[i]=1 –g(x)=x a[i] xor b[i]=1 –g(x)=0 A[i]=b[i]=0

7 CALTECH CS137 Winter2006 -- DeHon 7 Functions What functions can g(c[i-1]) be? –g(x)=1 Generate a[i]=b[i]=1 –g(x)=x Propagate a[i] xor b[i]=1 –g(x)=0 Squash A[i]=b[i]=0

8 CALTECH CS137 Winter2006 -- DeHon 8 Combining Want to combine functions –Compute c[i]=g i (g i-1 (c[i-2])) –Compute compose of two functions What functions will the compose of two of these functions be? –Same as before Propagate, generate, squash

9 CALTECH CS137 Winter2006 -- DeHon 9 Compose Rules (LSB MSB) ComposeResult GG GP GS PG PP PS SG SP SS

10 CALTECH CS137 Winter2006 -- DeHon 10 Compose Rules (LSB MSB) ComposeResult GGS GPG GSS PGG PPP PSS SGG SPS SSS

11 CALTECH CS137 Winter2006 -- DeHon 11 Combining Do it again… Combine g[i-3,i-2] and g[i-1,i] What do we get?

12 CALTECH CS137 Winter2006 -- DeHon 12 Reduce Tree

13 CALTECH CS137 Winter2006 -- DeHon 13 Associative Reduce  Prefix Shows us how to compute the Nth value in O(log(N)) time Can actually produce all intermediate values in this time –w/ only a constant factor more hardware

14 CALTECH CS137 Winter2006 -- DeHon 14 Prefix Tree

15 CALTECH CS137 Winter2006 -- DeHon 15 Parallel Prefix Important Pattern Applicable any time operation is associative Function Composition is always associative

16 CALTECH CS137 Winter2006 -- DeHon 16 Generalizing LUT Cascade

17 CALTECH CS137 Winter2006 -- DeHon 17 Cascaded LUT Delay Model Tcascade =T(3LUT) + T(mux) Don’t pay –General interconnect –Full 4-LUT delay

18 CALTECH CS137 Winter2006 -- DeHon 18 Parallel Prefix LUT Cascade? Can we do better than N×Tmux? Can we compute LUT cascade in O(log(N)) time? Can we compute mux cascade using parallel prefix? Can we make mux cascade associative?

19 CALTECH CS137 Winter2006 -- DeHon 19 Parallel Prefix Mux cascade How can mux transform S  mux-out? –A=0, B=0  mux-out=0 –A=1, B=1  mux-out=1 –A=0, B=1  mux-out=S –A=1, B=0  mux-out=/S

20 CALTECH CS137 Winter2006 -- DeHon 20 Parallel Prefix Mux cascade How can mux transform S  mux-out? –A=0, B=0  mux-out=0 Stop= S –A=1, B=1  mux-out=1 Generate= G –A=0, B=1  mux-out=S Buffer = B –A=1, B=0  mux-out=/S Invert = I

21 CALTECH CS137 Winter2006 -- DeHon 21 Parallel Prefix Mux cascade How can 2 muxes transform input? Can I compute 2-mux transforms from 1 mux transforms?

22 CALTECH CS137 Winter2006 -- DeHon 22 Two-mux transforms SS  S SG  G SB  S SI  G GS  S GG  G GB  G GI  S BS  S BG  G BB  B BI  I IS  S IG  G IB  I II  B

23 CALTECH CS137 Winter2006 -- DeHon 23 Generalizing mux-cascade How can N muxes transform the input? Is mux transform composition associative?

24 CALTECH CS137 Winter2006 -- DeHon 24 Associative Reduce Mux-Cascade Can be hardwired, no general interconnect

25 CALTECH CS137 Winter2006 -- DeHon 25 For Sums

26 CALTECH CS137 Winter2006 -- DeHon 26 Prefix Sum Common Operation: –Want B[x] such that B[x]=A[0]+A[1]+…A[x] –For I=0 to x B[x]=B[x-1]+A[x]

27 CALTECH CS137 Winter2006 -- DeHon 27 Prefix Sum Compute in tree fashion –A[I]+A[I+1] –A[I]+A[I+1]+A[I+2]+A[I+3] –…–… Combine partial sums back down tree –S(0:7)+S(8:9)+S(10)=S(0:10)

28 CALTECH CS137 Winter2006 -- DeHon 28 Other simple operators Prefix-OR Prefix-AND Prefix-MAX Prefix-MIN

29 CALTECH CS137 Winter2006 -- DeHon 29 Find-First One Useful for arbitration –Finds first (highest-priority) requestor –Also magnitude finding in numbers How: –Prefix-OR –Locally compute X[I-1]^X[I] –Flags the first one

30 CALTECH CS137 Winter2006 -- DeHon 30 Arbitration Often want to find first M requestors –E.g. Assign unique memory ports to first M processors requesting Prefix-sum across all potential requesters Counts requesters, giving unique number to each Know if one of first M –Perhaps which resource assigned

31 CALTECH CS137 Winter2006 -- DeHon 31 Partitioning Use something to order –E.g. spectral linear ordering –…or 1D cellular swap to produce linear order Parallel prefix on area of units –If not all same area Know where the midpoint is

32 CALTECH CS137 Winter2006 -- DeHon 32 Channel Width Prefix sum on delta wires at each node –To compute net channel widths at all points along channel –E.g. 1D ordered Maybe use with cellular placement scheme

33 CALTECH CS137 Winter2006 -- DeHon 33 Rank Finding Looking for I’th ordered element Do a prefix-sum on high-bit only –Know m=number of things > 01111111… High-low search on result –I.e. if number > I, recurse on half with leading zero –If number < I, search for (I-m)’th element in half with high-bit true Find median in log 2 (N) time

34 CALTECH CS137 Winter2006 -- DeHon 34 FA/FSM Evaluation (regular expression recognition)

35 CALTECH CS137 Winter2006 -- DeHon 35 Finite Automata Machine has finite state: S On each cycle –Input I –Compute output and new state Based on inputs and current state O i,S (i+1) =f(S i,I i ) Intuitively, a sequential process –Must know previous state to compute next –Must know state to compute output

36 CALTECH CS137 Winter2006 -- DeHon 36 Function Specialization But, this is just functions –…and function composition is associative Given that we know input sequence: –I 0,I 1,I 2 … Can compute specialized functions: –f i (s)=f(s,I i ) What is f i (s)? –Worst-case, a translation table: S=0  NS0, S=1  NS1 ….

37 CALTECH CS137 Winter2006 -- DeHon 37 Function Composition Now: O (i+m),S (i+m+1) = f (i+m) (f (i+m-1) (f (i+m-2) (…f i (S i )))) Can we compute the function composition? –f (i+1,i) (s)=f (i+1) (f i (s)) –What is f (i+1,i) (s)? A translation table just like f i (s) and f (i+1) (s) Table of size |S|, can fillin in O(|S|) time

38 CALTECH CS137 Winter2006 -- DeHon 38 Recursive Function Composition Now: O (i+m),S (i+m+1) = f (i+m) (f (i+m-1) (f (i+m-2) (…f i (S i )))) We can compute the composition –f (i+1,i) (s)=f (i+1) (f i (s)) Repeat to compute –f (i+3,i) (s)=f (i+3,i+2) (f (i+1,i) (s)) –Etc. until have computed: f (i+m,i) (s) in O(log(m)) steps

39 CALTECH CS137 Winter2006 -- DeHon 39 Implications If can get input stream, –Any FA can be evaluated in O(log(N)) time –Regular Expression recognition in O(log(N)) Any streaming operator with finite state –Where the input stream is independent of the output stream –Can be run arbitrarily fast by using parallel- prefix on FSM evaluation

40 CALTECH CS137 Winter2006 -- DeHon 40 Saturated Addition S (i+1) =max(min(I i +S i,maxval),minval) Could model as FSM with: –|S|=maxval-minval So, in theory, FSM result applies …but |S| might be 2 16, 2 24

41 CALTECH CS137 Winter2006 -- DeHon 41 SATADD Composition Can compute composition efficiently [Papadantonakis et al. FPT2005]

42 CALTECH CS137 Winter2006 -- DeHon 42 SATADD Composition

43 CALTECH CS137 Winter2006 -- DeHon 43 SATADD Reduce Tree

44 CALTECH CS137 Winter2006 -- DeHon 44 Data Forwarding UltraScalar From Henry, Kuszmaul, et al. ARVLSI’99, SPAA’99, ISCA’00

45 CALTECH CS137 Winter2006 -- DeHon 45 Consider Machine Each FU has a full RF –FU=Functional Unit –RF=Register File Build network between FUs –use network to connect produce/consume –user register names to configure interconnect Signal data ready along network

46 CALTECH CS137 Winter2006 -- DeHon 46 Ultrascalar: concept model

47 CALTECH CS137 Winter2006 -- DeHon 47 Ultrascalar Concept Linear delay O(1) register cost / FU Complete renaming at each FU –different set of registers –so when say complete RF at each FU, that’s only the logical registers

48 CALTECH CS137 Winter2006 -- DeHon 48 Ultrascalar: cyclic prefix

49 CALTECH CS137 Winter2006 -- DeHon 49 Parallel Prefix Basic idea is one we saw with adders An FU will either – produce a register (generate) –or transmit a register (propagate) –can do tree combining pair of FUs will either both propagate or will generate compute function by pair in one stage recurse to next stage get log-depth tree network connecting producer and consumer

50 CALTECH CS137 Winter2006 -- DeHon 50 Ultrascalar: cyclic prefix

51 CALTECH CS137 Winter2006 -- DeHon 51 Pointer Jumping

52 CALTECH CS137 Winter2006 -- DeHon 52 Pointer Jumping Motivation Have a tree –E.g. is-a relationship tree in NETL Want to know if a node is of a particular type (is-a mammal) How long to find out? –Naïve: O(distance) Spread one level per timestep

53 CALTECH CS137 Winter2006 -- DeHon 53 Following Pointer Chain Naïve: spread/color from target node –On each step push down to children Most nodes idle –Only active on the step something arrives Can the idle nodes do something to accelerate?

54 CALTECH CS137 Winter2006 -- DeHon 54 Jumping Intermediates Add notion of transitive parent Initially: transitive-parent=parent On each step: –If my transitive-parent marked Mark self –else Transitive-parent = transitive-parent(transitive-parent)

55 CALTECH CS137 Winter2006 -- DeHon 55 How Much Jumping? On each step: –If my transitive-parent marked Mark self –else Transitive-parent = transitive-parent(transitive-parent) How many such steps? –O(log(distance))

56 CALTECH CS137 Winter2006 -- DeHon 56 Pointer Jumping Same basic idea as data forwarding Can find length of a list in O(log(length)) time

57 CALTECH CS137 Winter2006 -- DeHon 57 Variations

58 CALTECH CS137 Winter2006 -- DeHon 58 Segmented Parallel Prefix f i () can ignore its input –…or the function can let special I’s tell it to reset the state E.g. build huge/hardwired carry chain hardware and configurably break into separate adders (LUT cascades)

59 CALTECH CS137 Winter2006 -- DeHon 59 Cyclic Segmented Parallel Prefix Wrap output back to input Configurable segmentation defines the starting/stopping point E.g. –In Ultrascalar dataforwarding Leave data in place and use FUs in FIFO fashion, redefining the “head” at each cycle –Priority allocation scheme Mark priority item as start of segment –Perhaps chose randomly (e.g. hardware router)

60 CALTECH CS137 Winter2006 -- DeHon 60 Admin Class Wed. Baseline due Friday

61 CALTECH CS137 Winter2006 -- DeHon 61 Big Ideas Any associative operation can be made parallel –Performed in log(N) time with O(N) hardware Any Finite Automata computation can be accelerated with parallelism –(FA evaluation  NC) Function composition is associated –  all functional operations can be associative


Download ppt "CALTECH CS137 Winter2006 -- DeHon 1 CS137: Electronic Design Automation Day 9: January 30, 2006 Parallel Prefix."

Similar presentations


Ads by Google