Download presentation
Presentation is loading. Please wait.
Published byMorgan Perkins Modified over 8 years ago
1
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-3
2
Euler Tours ● Technique for fast optimal processing of tree data ● Euler circuit of directed graph: directed cycle that traverses each edge exactly once ● Represent (rooted) tree by Euler circuit of its directed version
3
Trees (Balance Parentheses) Key property: The parenthesis subsequence corresponding to a subtree is balanced. ( ( ( ) ( ) ) ( ) ( ( ) ( ) ( ) ) )
4
Computing the Depth Problem definition ➢ Given a binary tree with n nodes, compute the depth of each node Serial algorithm takes O(n) time A simple parallel algorithm ➢ Starting from root, compute the depths level by level ➢ Still O(n) because the height of the tree could be as high as n Euler tour algorithm ➢ Uses parallel prefix computation
5
Computing the Depth ● Euler tour: A cycle that traverses each edge exactly once in a graph ➢ It is a directed version of a tree Regard an undirected edge into two directed edges ➢ Any directed version of a tree has an Euler tour by traversing the tree in a DFS way forming a linked list. ● Employ 3*n processors ➢ Each node i has fields i.parent, i.left, i.right ➢ Each node i has three processors, i.A, i.B, and i.C.
6
Computing the Depth ● Three processors in each node of the tree are linked as follows i.A = i.left.A if i.left != nil i.B if i.left = nil i.B = i.right.A if i.right != nil i.C if i.right = nil i.C = i.parent.B if i is the left child i.parent.C if i is the right child nil if i.parent = nil
7
Computing the Depth ● Algorithm ➢ Construct the Euler tour for the tree – O(1) time ➢ Assign 1 to all A processors, 0 to B processors, -1 to C processors ➢ Perform a parallel prefix computation ➢ The depth of each node resides in its C processor ● O(log n) ➢ Actually log 3n ● EREW because no concurrent read or write ● Speedup ➢ S = n/log n
8
Computing the depth
9
Broadcasting on a PRAM “Broadcast” can be done on CREW PRAM in O(1) steps : Broadcaster sends value to shared memory Processors read from shared memory Requires lg(P) steps on EREW PRAM. M PPPPPPPP B
10
Concurrent Write - Finding Max ● Finding max problem ➢ Given an array of n elements, find the maximum(s) ➢ sequential algorithm is O(n) ● Data structure for parallel algorithm ➢ Array A[1..n] ➢ Array m[1..n]. m[i] is true if A[i] is the maximum ➢ Use n 2 processors
11
Concurrent Write - Finding Max ● Fast_max(A, n) for i = 1 to n do, in parallel m[i] = true// A[i] is potentially maximum for i = 1 to n, j = 1 to n do, in parallel if A[i] < A[j] then m[i] = false for i = 1 to n do, in parallel if m[i] = true then max = A[i] return max ● Time complexity: O(1)
12
Concurrent Write - Finding Max ● Concurrent-write ➢ In step 4 and 5, processors with A[i] < A[j] write the same value ‘false’ into the same location m[i] ➢ This actually implements m[i] = (A[i] A[1]) … (A[i] A[n]) ● Is this work efficient? ➢ No, n 2 processors in O(1) ➢ O(n 2 ) work vs. sequential algorithm is O(n)
13
Concurrent Write - Finding Max ● What is the time complexity for the Exclusive-write? ➢ Initially elements “think” that they might be the maximum ➢ First iteration: For n/2 pairs, compare. ➢ n/2 elements might be the maximum. ➢ Second iteration: n/4 elements might be the maximum. ➢ log n th iteration: one element is the maximum. ➢ So Fast_max with Exclusive-write takes O(log n). ● O(1) (CRCW) vs. O(log n) (EREW)
14
Simulating CRCW with EREW ● CRCW algorithms are faster than EREW algorithms ➢ How much fast? ● Theorem ➢ A p-processor CRCW algorithm can be no more than O(log p) times faster than the best p-processor EREW algorithm
15
Simulating CRCW with EREW ● Proof by simulating CRCW steps with EREW steps ➢ Assumption: A parallel sorting takes O(log n) time with n processors ➢ When CRCW processor p i write a datum x i into a location l i, EREW p i writes the pair (l i, x i ) into a separate location A[i] Note EREW write is exclusive, while CRCW may be concurrent ➢ Sort A by l i O(log p) time by assumption ➢ Compare adjacent elements in A ➢ For each group of the same elements, only one processor, say first, write x i into the global memory l i. Note this is also exclusive. ➢ Total time complexity: O(log p)
16
Simulating CRCW with EREW
17
CRCW vs. EREW ● CRCW ➢ Hardware implementations are expensive ➢ Used infrequently ➢ Easier to program, runs faster, more powerful. ➢ Implemented hardware is slower than that of EREW ➢ In reality one cannot find maximum in O(1) time ● EREW ➢ Programming model is too restrictive ➢ Cannot implement powerful algorithms
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.