Presentation is loading. Please wait.

Presentation is loading. Please wait.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. 2004.

Similar presentations


Presentation on theme: "Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. 2004."— Presentation transcript:

1 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. Pipelined Computations Chapter 5 Introduction to Pipelined Computations Computing Platform for Pipelined Computations Example Applications Adding numbers Sorting numbers Prime number generation Systems of linear equations

2 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.2 Introduction to Pipelined Computations We discussed partitioning techniques common to a range of problems in Chapter 4 We will now discuss a parallel programming technique—pipelining — applicable to a wide range of problems Pipelining is applicable to problems that are partially sequential in nature Sequential on the basis of data dependency etc Can, thus, be used to parallelize sequential code Problem divided into a series of tasks that have to be completed one after the other Each task executed by a separate process or processor. Parallelism viewed as a form of functional decomposition—the functions are performed in succession

3 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.3 Pipelined Computations Each task/function executed by a separate process or processor. Imagine what happens in a manufacturing plant How big/small should a task in each stage of the pipeline be? What are the trade-offs?

4 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.4 Example 1 Add all the elements of array a to an accumulating sum: for (i = 0; i < n; i++) sum = sum + a[i]; The loop could be “unfolded” (formulated as a pipeline) to yield sum = sum + a[0]; sum = sum + a[1]; sum = sum + a[2]; sum = sum + a[3]; sum = sum + a[4];.

5 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.5 Pipeline for an unfolded loop The statements are modeled as a chain of producers and consumers a separate pipeline stage for each statement Each process viewed as a consumer of data items for the process preceding and as a producer of data for the process following Each stage accepts the accumulating sum on its input, s in, and one element of the array on its input a, and produces the new accumulating sum on its output, s out

6 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.6 Example 2 Frequency filter - Objective to remove specific frequencies (f0, f1, f2,f3, etc.) from a digitized signal, f(t). Signal enters pipeline from left:

7 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.7 Where pipelining can be used to good effect Assuming problem can be divided into a series of sequential tasks, pipelined approach can provide increased execution speed under the following three types of computations: 1.If more than one instance of the complete problem is to be executed 2. If a series of data items must be processed, each requiring multiple operations 3. If information to start the next process can be passed forward before the process has completed all its internal operations

8 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.8 “Type 1” Pipeline Space-Time Diagram Assumption: Each process given same time to complete its task Each time period is one pipeline cycle

9 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.9 “Type 1” Pipeline: Execution Time With p processes constituting the pipeline and m instances of the problem to execute: The number of pipeline cycles to execute all instances is m+p-1 cycles The average number of cycles is (m+p-1)/m cycles tends to one cycle/problem instance for large n One instance of the problem will be completed in each pipeline cycle after the first p-1 cycles (pipeline latency)

10 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.10 Alternative space-time diagram

11 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.11 “Type 2” Pipeline Space-Time Diagram

12 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.12 “Type 3” Pipeline Space-Time Diagram Pipeline processing where information passes to next stage Utilized in parallel programs where there is only one instance of the problem to execute

13 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.13 If the number of stages is larger than the number of processors in any pipeline, a group of stages can be assigned to each processor: Grouping Pipelines

14 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.14 Computing Platform for Pipelined Applications Pipelining on clusters requires an interconnection that provides simultaneous transfer between adjacent processors Most clusters employ a switched interconnection structure that allows such transfers Key requirement: ability to send messages between adjacent processes in the pipeline Suggests direct communication links Ideal interconnection structure: multiprocessor system with a line configuration:

15 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.15 Example Pipelined Solutions (Examples of each type of computation)

16 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.16 Pipeline Program Examples Adding Numbers Type 1 pipeline computation

17 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.17 Basic code for process Pi : recv(&accumulation, P i-1 ); accumulation = accumulation + number; send(&accumulation, P i+1 ); except for the first process, P0, which is send(&number, P 1 ); and the last process, P n-1, which is recv(&number, P n-2 ); accumulation = accumulation + number; Pipeline Example: Adding Numbers (cont’d)

18 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.18 SPMD program if (process > 0) { recv(&accumulation, P i-1 ); accumulation = accumulation + number; } if (process < n-1) send(&accumulation, P i+1 ); The final result is in the last process. Instead of addition, other arithmetic operations could be done.

19 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.19 Pipelined addition numbers with a master process and ring configuration

20 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.20 Analysis Analyses in previous chapters assumed simultaneous computation and communication phases among processes Many not be appropriate in pipelining because each instance starts at a different time and ends at a different time Assumption: each process performs similar actions in each pipeline cycle We’ll then work out the communication and computation required in each pipeline cycle With a p- stage pipeline and m instances, the total execution time, t total, is The average time for a computation is

21 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.21 Analysis: Single Instance Problem Consider a case where a single number is being added in each stage, i.e., n=p The period of one pipeline cycle will be dictated by the time of one addition and one communication: Each pipeline cycle, t cycle, requires: With one group of numbers ( m=1 ), the total execution time will take n pipeline cycles and t total will be

22 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.22 Analysis: Multiple Instances Problem With m groups of numbers to add, each resulting in a separate answer, there will be ( m+n-1 ) cycles and t total will be: For large m, the average execution time, t a, is approximately

23 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.23 Data Partitioning with Multiple Instances Problem Consider the case when each stage processes a group of d numbers The number of processes is given by p = n/d Each communication will transfer one result But the computation will now require d numbers to be accumulated ( d-1 steps) plus the incoming number, thus we have: What is the impact of the size d of data partitioning on performance?

24 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.24 Example 2: Sorting Numbers A pipeline solution for sorting is to have the first process, P 0, accept the series of numbers one at a time, store the largest so far received and pass onward all smaller numbers Each subsequent process performs the same algorithm, When no more numbers are to be processed, P 0, will have the largest number, P 1 the next largest, and so on The basic algorithm for process P i, 0<i<p-1, is recv(&number, P i-1 ); if (number > x) { send(&x, P i+1 ); x = number; } else send(&number, P i+1 );

25 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.25 Sorting Numbers A parallel version of insertion sort.

26 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.26 right_procNum = n-i-1; recv(&x, P i-1 ); for(j=0; j<right_procNum; j++) recv(&number,P i-1 ); if (number > x) { send(&x, P i+1 ); x = number; } else send(&number, P i+1 ); } With n numbers, the i th process will accept n - i numbers It will pass onward n - i – 1 numbers Hence, a simple loop could be used. Sorting Numbers (cont’d)

27 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.27 Pipeline for Sorting Using Insertion Sort A series of operations performed on a series of data items No opportunity to continue useful work after passing smaller numbers onward Type 2 pipeline computation

28 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.28 Extracting the Sorted Numbers Results of the sorting algorithm can be extracted from the pipeline using The ring configuration in Slide 5.19, or The bi-directional line configuration shown below Advantage of bi-directional line: process can pass its result as soon as it receives its last input number More numbers pass thru processes nearer the master

29 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.29 right_procNum = n-i-1; recv(&x, P i-1 ); for(j=0; j<right_procNum; j++) recv(&number,P i-1 ); if (number > x) { send(&x, P i+1 ); x = number; } else send(&number, P i+1 ); } send(&x, P i-1 ); for(j=0; j<right_procNum; j++) recv(&number,P i+1 ); send(&number,P i-1 ); } Incorporating results being returned, process i, 0<i<p-1, could have the form: Extracting the Sorted Numbers(cont’d)

30 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.30 Analysis Regarding the compare-and-exchange operation as one computational step, the sequential time is: approximately n 2 /2 number of steps, unsuitable except for very small n. With n pipeline processes and n numbers to sort, the parallel implementation has n+n-1 = 2n-1 pipeline cycles. Each cycle has one compare and exchange operation and one send() Thus each pipeline requires (See figure on Slide 5.19): The total execution time is:

31 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.31 Example 3: Prime Number Generation Sieve of Eratosthenes is a classical way of extracting prime numbers from a series of all integers starting from 2 First number, 2, is prime and kept. All multiples of this number are deleted as they cannot be prime. Process repeated with each remaining number. The algorithm removes nonprimes, leaving only primes.

32 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.32 Sieve of Eratosthenes : Sequential Code Sequential program usually employs an array: with all elements initialized to true and later reset to false each element whose index is not a prime number for(i=2; i<=n; i++) prime[i] = 1; /* initialize array */ for(i=2; i<=sqrt_n; i++) /* for each prime */ if (prime[i]==1) for(j=i+i; j<=n; j = j+i) prime[j] = 0; /* strike its multiples */

33 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.33 Sieve of Eratosthenes : Sequential Code There are multiples of 2, multiples of 3 etc. Hence, Algorithm can be improved so that striking can start at i 2 rather than 2i, for a prime i. Notice that the early terms in the above equation will dominate the overall time There are more multiples of 2 than 3, more multiples of 3 than 4, etc

34 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.34 Pipelined Sieve of Eratosthenes A parallel implementation based on partitioning, where each process strikes out multiples of none number will not be effective. Why? A pipeline implementation can be quite effective: First a series of consecutive numbers is generated that feeds into the first pipeline stage This stage extracts all multiples of 2 and passes the other numbers to stage 2 The second stage extracts all multiples of 3 and passes the other numbers to stage 3 etc

35 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.35 The code for a process, Pi, could be based upon recv(&x, P i-1 ); /* repeat following for each number */ recv(&number, P i-1 ); if ((number % x) != 0) send(&number, P i+1 ); Each process will not receive the same amount of numbers and the amount is not known beforehand. Use a “terminator” message, which is sent at the end of the sequence: recv(&x, P i-1 ); for (i = 0; i < n; i++) { recv(&number, P i-1 ); if (number == terminator) break; (number % x) != 0) send(&number, P i+1 ); } Sieve of Eratosthenes : Parallel Code

36 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.36 Example 4: Solving a System of Linear Equations The final example is Type 3, where a process can continue with useful work after passing on information This is demonstrated by solving a system of linear equations of upper- triangular form: We need to for x 0,x 1, …, x n-1, where the a ’s and the b ’s are constants.

37 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.37 Back Substitution First, the unknown x0 is found from the last equation; i.e., Value obtained for x0 substituted into next equation to obtain x1; i.e., Values obtained for x1 and x0 substituted into next equation to obtain x2: and so on until all the unknowns are found.

38 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.38 Pipeline Solution This algorithm can be implemented as a pipeline: First pipeline stage computes x 0 and passes x 0 onto the second stage, The second stage computes x 1 from x 0 and passes both x 0 and x 1 onto the third stage, The third stage computes x 2 from x 0 and x 1, and so on. Type 3 pipeline computation

39 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.39 Each pipeline stage can be implemented with one process There are p=n processes for n equations The i th process ( 0 < i < p ) receives the values x 0, x 1, x 2, …, x i-1 and computes xi from the equation: Pipeline Solution (cont’d)

40 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.40 Sequential Code Given the constants a i,j and b k stored in arrays a[][] and b[], respectively, and the values for unknowns to be stored in an array, x[], the sequential code could be x[0] = b[0]/a[0][0]; /* computed separately */ for (i = 1; i < n; i++) { /*for remaining unknowns*/ sum = 0; for (j = 0; j < i; j++ sum = sum + a[i][j]*x[j]; x[i] = (b[i] - sum)/a[i][i]; }

41 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.41 Parallel Code Pseudocode of process P i ( i < p-1 ) of could be sum = 0; for (j = 0; j < i; j++) { recv(&x[j], P i-1 ); send(&x[j], P i+1 ); sum = sum + a[i][j]*x[j]; } x[i] = (b[i] - sum)/a[i][i]; send(&x[i], P i+1 ); Now we have additional computations to do after receiving and resending values.

42 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 5.42 Pipeline processing using back substitution


Download ppt "Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. 2004."

Similar presentations


Ads by Google