Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a.

Similar presentations


Presentation on theme: "Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a."— Presentation transcript:

1 Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a 2 ) ) a 3 = a 1 ) (a 2 ) a 3 ) How to compute (a 1 ) a 2 ) …. ) a n ) in parallel in O(logn) time?

2 Approach 1 a0a1a2a3a4a5a6a7  [0:1]  [0:0]  [1:2]  [2:3]  [3:4]  [4:5]  [5:6]  [6:7]  [0:1]  [0:0]  [0:2]  [0:3]  [1:4]  [2:5]  [3:6]  [4:7]  [0:1]  [0:0]  [0:2]  [0:3]  [0:4]  [0:5]  [0:6]  [0:7] d=1 d=2 d=4 Assume that n = 2 k for i = 0 to k-1 for j = 0 to n-1-2 i do in parallel x[j+ 2 i ] = x[j] + x[j+ 2 i ]

3 How to do on Tree Architecture? for each node if there is a signal from left and right S t <- S l + S r if there is a signal R, send R to both its children if the node is a leaf and there is a signal R, X <- X + R SlSl SrSr StSt R

4 How to do on a Hypercube A complete binary tree can be embedded into a hypercube Simpler solution: each node computes prefix and total sum for i = 0 to k-1 for j = 0 to n-1 do in parallel x[j] = x[j] + sum[j i ] if i-th bit of j = 1 sum[j ] = sum[j] + sum[j i ], where j i and j have the same binary number representation except their i-th bit, where the i-th bit of j i is the complement of the i-bit of j.

5 Prefix on Hypercube a0a1a2a3a4a5a6a7 for i = 0 to k-1 for j = 0 to n-1 do in parallel x[j] = x[j] + sum[j i ] if i-th bit of j = 1 sum[j ] = sum[j] + sum[j i ],  [0:1]  [0:0]  [0:1]  [2:2]  [2:3]  [4:4]  [4:5]  [6:6]  [6:7] d=1 X SUM  [0:1]  [0:3]  [0:0]  [0:3]  [2:2]  [0:3]  [2:3]  [0:3]  [4:4]  [4:7]  [4:5]  [4:7]  [4:6]  [4:7] d=2 X SUM  [0:1]  [0:7]  [0:0]  [0:7]  [2:2]  [0:7]  [2:3]  [0:7]  [0:4]  [0:7]  [0:5]  [0:7]  [0:6]  [0:7] d=4 X SUM

6 Applications of Data Parallel Operations Any associative operations: Examples: –min, max, add –adding two binary numbers –finite state automata –radix sort –segmented prefix sum –routing packing unpacking broadcast (copy-scan) –solving recurrence equations –straight line computation (parallel arithmetic evaluation)

7 Adding two n bit numbers as parallel prefix a = a n-1 …. a 0 b = b n-1 …. b 0 s = a + b note that s i = a i  b i  c i-1 to compute c i define g and p as: g i = a i  b i, p i = a i  b i define  as : (g,p)  (g’,p’) = (g  (p  g’), p  p’) Then carry bit c i can be computed by: (g,p)  (g’,p’) = (g  (p  g’), p  p’) (G i, P i ) = (g i,p i )  (g i-1, p i-1 )  …  (g 0,p 0 ) and G i = c i

8 Hardware circuit of recursive look-ahead adder

9 Parsing a regular language b b cc q1q1 q2q2 q0q0  (q0,b) = q2,  (q0,c) = q1,  (q1,b) = q0,  (q1,c) = qr,  (q2,b) = qr,  (q2,c) = q0 qr: reject state q0->q2 q1->q0 q2->qr q2 q0 qr q1 qr q0 q1 qr q0 q2 q0 qr q1’ q2’ q3’ q1’ q2’ q3’ q0 q1 qr q1 qr q0 b q1’ q2’ q3’ q0 q1 qr q0 qr q2 q0 q1 qr q0 qr q2 q0 qr b c c b c

10 Segmented Prefix operation Segment boundary 13371218715 after 12345678 before

11 Segmented Prefix computation Let  be any associative operation. For segmented operation of , define  ’ as follows:  ’ b| b a a  b | b | a | (a  b)| b Then  ’ is associative and we can compute segmented operation in O(logn) time.

12 Enumerating Data = [5 6 3 1 8 3 7 5 9 2] active procs = [1 0 1 1 0 0 1 0 1 0] enumerated = [0 x 1 2 x x 3 x 4 0]

13 packing data = [5 6 3 1 8 3 7 5 9 2] active procs = [1 0 1 1 0 0 1 0 1 0] enumerated = [0 x 1 2 x x 3 x 4 x] packed data =[5 3 1 7 9 x x x x x]

14 Packing and Unpacking on Hypercube Packing adjust bit 0 adjust bit 1 adjust bit 2... adjust bit k-1 Unpacking adjust bit k-1 adjust bit k-2... adjust bit 1 adjust bit 0 How about in the order of adjust bit 0, 1,..., k-1 for packing?

15 Unpacking Address 0 1 2 3 4 5 6 7 8 9 data = [6 2 3 5 9 x x x x x] active procs = [1 0 1 1 0 0 1 0 1 0] enumerated = [0 x 1 2 x x 3 x 4 x] destination =[0 2 3 6 8 x x x x x] unpacked data =[6 x 2 3 x x 5 x 9 x]

16 Copy Scan (broadcast) address 0 1 2 3 4 5 6 7 8 9 data = [ 6 2 3 5 9 4 1 7 8 10] segmented bit = [ 1 0 1 1 0 0 1 0 1 0] result = [ 6 6 3 5 5 5 1 1 8 8]

17 Radix Sort for j = k-1 to 0 // x has k bits for all i in [0.. n-1] do parallel { if j-th bit of x[i] is 0 { y[i] = enumerate c = count } if j-th bit of x[i] is 1 y [i] <- enumerate + c x [y[i]] = x [i] } Radix sort another code for j = k-1 to 0 // x has k bits for all i in [0.. n-1] do parallel { pack left x[i] if j-th bit of x[i] pack right x[i] if j-th bit of x[i] }

18 Quick Sort 1. Pick a pivot p 2. Broadcast p 3. For all PE i, compare A[i] with p { if A[i] <p, pack left A[i] in the segment if A[i] >= p, pack right A[i] in the segment } 4. Mark the segment boundary 5. Each segment, quick sort recursively

19 Solving Linear Recurrence Equations f n =a n-1 f n-1 + a n-2 f n-2 f n f n-1

20 Pointer Jumping and Tree Computation How to compute a prefix on a linked list? 1 2 3 4 56 7 If NEXT[i] != NILL then X[i] <- X[i] + X[NEXT[i]] NEXT[i] <- NEXT[NEXT[i]] 10 14 18 22 1813 7 3 5 7 9 1113 7 28 27 25 22 18 13 7 How to make 1 3 6 10 15 21 28 order?

21 Application: Tree computation Pre-order numbering Each node Leaf node 1 1 Can be applied to in order, post order number of children, depth etc. Bi-component, etc also

22 Recurrence Equation Example: LU decomposition on a triangular matrix


Download ppt "Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a."

Similar presentations


Ads by Google