1 Parallel Algorithms (chap. 30, 1 st edition) Parallel: perform more than one operation at a time. PRAM model: Parallel Random Access Model. p0p0 p1p1.

Slides:



Advertisements
Similar presentations
Introduction to Algorithms
Advertisements

and 6.855J Cycle Canceling Algorithm. 2 A minimum cost flow problem , $4 20, $1 20, $2 25, $2 25, $5 20, $6 30, $
October 17, 2005 Copyright© Erik D. Demaine and Charles E. Leiserson L2.1 Introduction to Algorithms 6.046J/18.401J LECTURE9 Randomly built binary.
Introduction to Algorithms
© 2001 by Charles E. Leiserson Introduction to AlgorithmsDay 17 L9.1 Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 9 Prof. Charles E. Leiserson.
0 - 0.
Addition Facts
1 Functional Programming Lecture 8 - Binary Search Trees.
CS16: Introduction to Data Structures & Algorithms
General algorithmic techniques: Balanced binary tree technique Doubling technique: List Ranking Problem Divide and concur Lecture 6.
Chapter 17 Linked Lists.
Parallel List Ranking Advanced Algorithms & Data Structures Lecture Theme 17 Prof. Dr. Th. Ottmann Summer Semester 2006.
Linked List Ranking Parallel Algorithms 1. Work Analysis – Pointer Jumping Number of Steps: Tp = O(Log N) Number of Processors: N Work = O(N log N) T1.
Parallel Algorithms.
Comp 122, Spring 2004 Graph Algorithms – 2. graphs Lin / Devi Comp 122, Fall 2004 Identification of Edges Edge type for edge (u, v) can be identified.
David Luebke 1 8/25/2014 CS 332: Algorithms Red-Black Trees.
Definitions and Bottom-Up Insertion
Divide and Conquer (Merge Sort)
Addition 1’s to 20.
Section 5: More Parallel Algorithms
2 x0 0 12/13/2014 Know Your Facts!. 2 x1 2 12/13/2014 Know Your Facts!
Chapter 12 Binary Search Trees
Foundations of Data Structures Practical Session #7 AVL Trees 2.
Datorteknik F1 bild 1 Higher Level Parallelism The PRAM Model Vector Processors Flynn Classification Connection Machine CM-2 (SIMD) Communication Networks.
Examples Concepts & Definitions Analysis of Algorithms
PRAM Algorithms Sathish Vadhiyar. PRAM Model - Introduction Parallel Random Access Machine Allows parallel-algorithm designers to treat processing power.
1 Lecture 5 PRAM Algorithm: Parallel Prefix Parallel Computing Fall 2008.
5 x4. 10 x2 9 x3 10 x9 10 x4 10 x8 9 x2 9 x4.
Parallel algorithms for expression evaluation Part1. Simultaneous substitution method (SimSub) Part2. A parallel pebble game.
Foldr and Foldl CS 5010 Program Design Paradigms “Bootcamp” Lesson 7.5 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Optimal PRAM algorithms: Efficiency of concurrent writing “Computer science is no more about computers than astronomy is about telescopes.” Edsger Dijkstra.
Lecture 3: Parallel Algorithm Design
Comp 122, Spring 2004 Binary Search Trees. btrees - 2 Comp 122, Spring 2004 Binary Trees  Recursive definition 1.An empty tree is a binary tree 2.A node.
Algorithms Analysis Lecture 6 Quicksort. Quick Sort Divide and Conquer.
Super computers Parallel Processing By: Lecturer \ Aisha Dawood.
PRAM (Parallel Random Access Machine)
Efficient Parallel Algorithms COMP308
Advanced Topics in Algorithms and Data Structures Classification of the PRAM model In the PRAM model, processors communicate by reading from and writing.
PRAM Models Advanced Algorithms & Data Structures Lecture Theme 13 Prof. Dr. Th. Ottmann Summer Semester 2006.
Parallel Prefix Computation Advanced Algorithms & Data Structures Lecture Theme 14 Prof. Dr. Th. Ottmann Summer Semester 2006.
Simulating a CRCW algorithm with an EREW algorithm Efficient Parallel Algorithms COMP308.
Uzi Vishkin.  Introduction  Objective  Model of Parallel Computation ▪ Work Depth Model ( ~ PRAM) ▪ Informal Work Depth Model  PRAM Model  Technique:
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
Advanced Topics in Algorithms and Data Structures 1 Lecture 4 : Accelerated Cascading and Parallel List Ranking We will first discuss a technique called.
Accelerated Cascading Advanced Algorithms & Data Structures Lecture Theme 16 Prof. Dr. Th. Ottmann Summer Semester 2006.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2001 Lecture 9 Tuesday, 11/20/01 Parallel Algorithms Chapters 28,
Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel.
The Euler-tour technique
1 Lecture 3 PRAM Algorithms Parallel Computing Fall 2008.
Fall 2008Paradigms for Parallel Algorithms1 Paradigms for Parallel Algorithms.
Advanced Topics in Algorithms and Data Structures 1 Two parallel list ranking algorithms An O (log n ) time and O ( n log n ) work list ranking algorithm.
Simulating a CRCW algorithm with an EREW algorithm Lecture 4 Efficient Parallel Algorithms COMP308.
1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 8, 2005 Session 8.
Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.
Parallel and Distributed Algorithms Eric Vidal Reference: R. Johnsonbaugh and M. Schaefer, Algorithms (International Edition) Pearson Education.
Parallel Algorithms. Parallel Models u Hypercube u Butterfly u Fully Connected u Other Networks u Shared Memory v.s. Distributed Memory u SIMD v.s. MIMD.
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-3.
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-1.
PRAM and Parallel Computing
Lecture 3: Parallel Algorithm Design
PRAM Model for Parallel Computation
Parallel Algorithms (chap. 30, 1st edition)
Lecture 2: Parallel computational models
PRAM Algorithms.
PRAM Model for Parallel Computation
CHAPTER 30 (in old edition) Parallel Algorithms
Parallel and Distributed Algorithms
Linked List and Selection Sort
Unit –VIII PRAM Algorithms.
Presentation transcript:

1 Parallel Algorithms (chap. 30, 1 st edition) Parallel: perform more than one operation at a time. PRAM model: Parallel Random Access Model. p0p0 p1p1 p n-1 Shared memory Multiple processors connected to a shared memory. Each processor access any location in unit time. All processors can access memory in parallel. All processors can perform operations in parallel.

2 Concurrent vs. Exclusive Access Four models –EREW: exclusive read and exclusive write –CREW: concurrent read and exclusive write –ERCW: exclusive read and concurrent write –CRCW: concurrent read and concurrent write Handling write conflicts –Common-write model: only if they write the same value. –Arbitrary-write model: an arbitrary one succeeds. –Priority-write model: the one with smallest index succeeds. EREW and CRCW are most popular.

3 Synchronization and Control Synchronization: –A most important and complicated issue –Suppose all processors are inherently tightly synchronized: All processors execute the same statements at the same time No race among processors, i.e, same pace. Termination control of a parallel loop: –Depend on the state of all processors –Can be tested in O(1) time.

4 Pointer Jumping –list ranking Given a single linked list L with n objects, compute, for each object in L, its distance from the end of the list. Formally: suppose next is the pointer field –d[i]= 0 if next[i]=nil – d[next[i]]+1 if next[i] nil Serial algorithm: (n).

5 List ranking –EREW algorithm LIST-RANK(L) (in O(lg n) time) 1.for each processor i, in parallel 2. do if next[i]=nil 3. then d[i] 0 4. else d[i] 1 5.while there exists an object i such that next[i] nil 6. do for each processor i, in parallel 7. do if next[i] nil 8. then d[i] d[i]+ d[next[i]] 9. next[i] next[next[i]]

6 List-ranking –EREW algorithm (a) (b) (c) (d)

7 List ranking –correctness of EREW algorithm Loop invariant: for each i, the sum of d values in the sublist headed by i is the correct distance from i to the end of the original list L. Parallel memory must be synchronized: the reads on the right must occur before the wirtes on the left. Moreover, read d[i] and then read d[next[i]]. An EREW algorithm: every read and write is exclusive. For an object i, its processor reads d[i], and then its precedent processor reads its d[i]. Writes are all in distinct locations.

8 LIST ranking EREW algorithm running time O(lg n): –The initialization for loop runs in O(1). –Each iteration of while loop runs in O(1). –There are exactly lg n iterations: Each iteration transforms each list into two interleaved lists: one consisting of objects in even positions, and the other odd positions. Thus, each iteration double the number of lists but halves their lengths. –The termination test in line 5 runs in O(1). –Define work =#processors running time. O(n lg n).

9 Parallel prefix on a list A prefix computation is defined as: –Input: –Binary associative operation –Output: –Such that: y 1 = x 1 y k = y k-1 x k for k=2,3, …,n, i.e, y k = x 1 x 2 … x k. –Suppose are stored orderly in a list. –Define notation: [i,j]= x i x i+1 … x j

10 Prefix computation LIST-PREFIX(L) 1.for each processor i, in parallel 2. do y[i] x[i] 3.while there exists an object i such that next[i] nil 4. do for each processor i, in parallel 5. do if next[i] nil 6. then y[next[i]] y[i] y[next[i]] 7. next[i] next[next[i]]

11 Prefix computation –EREW algorithm [1,1] x1x1 [2,2] x2x2 [3,3][4,4] x4x4 [5,5] x5x5 [6,6] x6x6 (a) x3x3 x4x4 (b) x1x1 x2x2 x5x5 x6x6 x3x3 [1,1] [1,2][2,3][3,4][4,5] [5,6] x1x1 x2x2 x5x5 x6x6 x3x3 x1x1 x2x2 x5x5 x6x6 x3x3 (c) (d) [1,1] [1,2][1,3][1,4][2,5] [3,6] [1,1] [1,2][1,3][1,4][1,5] [1,6]

12 Find root –CREW algorithm Suppose a forest of binary trees, each node i has a pointer parent[i]. Find the identity of the tree of each node. Assume that each node is associated a processor. Assume that each node i has a field root[i].

13 Find-roots –CREW algorithm FIND-ROOTS(F) 1.for each processor i, in parallel 2. do if parent[i] = nil 3. then root[i] i 4.while there exist a node i such that parent[i] nil 5. do for each processor i, in parallel 6. do if parent[i] nil 7. then root[i] root[parent[i]] 8. parent[i] parent[parent[i]]

14 Find root –CREW algorithm Running time: O(lg d), where d is the height of maximum-depth tree in the forest. All the writes are exclusive But the read in line 7 is concurrent, since several nodes may have same node as parent. See figure 30.5.

15 Find roots –CREW vs. EREW How fast can n nodes in a forest determine their roots using only exclusive read? (lg n) Argument: when exclusive read, a given peace of information can only be copied to one other memory location in each step, thus the number of locations containing a given piece of information at most doubles at each step. Looking at a forest with one tree of n nodes, the root identity is stored in one place initially. After the first step, it is stored in at most two places; after the second step, it is Stored in at most four places, …, so need lg n steps for it to be stored at n places. So CREW: O(lg d) and EREW: (lg n). If d=2 (lg n), CREW outperforms any EREW algorithm. If d= (lg n), then CREW runs in O(lg lg n), and EREW is much slower.

16 Find maximum – CRCW algorithm Given n elements A[0,n-1], find the maximum. Suppose n 2 processors, each processor (i,j) compare A[i] and A[j], for 0 i, j n-1. FAST-MAX(A) 1.n length[A] 2.for i 0 to n-1, in parallel 3. do m[i] true 4.for i 0 to n-1 and j 0 to n-1, in parallel 5. do if A[i] < A[j] 6. then m[i] false 7.for i 0 to n-1, in parallel 8. do if m[i] =true 9. then max A[i] 10.return max The running time is O(1). Note: there may be multiple maximum values, so their processors Will write to max concurrently. Its work = n 2 O(1) =O(n 2 ) m 5 F T T F T F 6 F F T F T F 9 F F F F F T 2 T T T F T F 9 F F F F F T A[j]A[j] A[i]A[i] max=9

17 Find maximum –CRCW vs. EREW If find maximum using EREW, then (lg n). Argument: consider how many elements think that they might be the maximum. –First, n, –After first step, n/2, –After second step n/4. …, each step, halve. Moreover, CREW takes (lg n).

18 Stimulating CRCW with EREW Theorem: –A p-processor CRCW algorithm can be no more than O(lg p) times faster than a best p-processor EREW algorithm for the same problem. Proof: each step of CRCW can be simulated by O(lg p) computations of EREW. –Suppose concurrent write: CRCW p i write data x i to location l i, (l i may be same for multiple p i s). Corresponding EREW p i write (l i, x i ) to a location A[i], (different A[i]s) so exclusive write. Sort all (l i, x i )s by l i s, same locations are brought together. in O(lg p). Each EREW p i compares A[i]= (l j, x j ), and A[i-1]= (l k, x k ). If l j l k or i=0, then EREW p i writes x j to l j. (exclusive write). See figure 30.7.

19 CRCW vs. EREW CRCW: –Some says: easier to program and more faster. –Others say: The hardware to CRCW is slower than EREW. And One can not find maximum in O(1). –Still others say: either EREW or CRCW is wrong. Processors must be connected by a network, and only be able to communicate with other via the network, so network should be part of the model.