Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. 2004.

Slides:



Advertisements
Similar presentations
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Advertisements

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CSCI-455/552 Introduction to High Performance Computing Lecture 25.
1 5.1 Pipelined Computations. 2 Problem divided into a series of tasks that have to be completed one after the other (the basis of sequential programming).
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CSCI-455/552 Introduction to High Performance Computing Lecture 11.
Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 23, 2013.
4.1 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M.
1 5.1 Pipelined Computations. 2 Problem divided into a series of tasks that have to be completed one after the other (the basis of sequential programming).
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Numerical Algorithms Matrix multiplication
Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers Chapter 11: Numerical Algorithms Sec 11.2: Implementing.
Linear Algebraic Equations
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
1 Lecture 11 Sorting Parallel Computing Fall 2008.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Pipelined Computations Divide a problem into a series of tasks A processor completes a task sequentially and pipes the results to the next processor Pipelining.
COMPE575 Parallel & Cluster Computing 5.1 Pipelined Computations Chapter 5.
Topic Overview One-to-All Broadcast and All-to-One Reduction
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter 23 Algorithm Efficiency.
CSCI-455/552 Introduction to High Performance Computing Lecture 22.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
CHAPTER 7: SORTING & SEARCHING Introduction to Computer Science Using Ruby (c) Ophir Frieder at al 2012.
2a.1 Evaluating Parallel Programs Cluster Computing, UNC-Charlotte, B. Wilkinson.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
HOW TO SOLVE IT? Algorithms. An Algorithm An algorithm is any well-defined (computational) procedure that takes some value, or set of values, as input.
Liang, Introduction to Java Programming, Seventh Edition, (c) 2009 Pearson Education, Inc. All rights reserved Chapter 23 Algorithm Efficiency.
CSCI-455/552 Introduction to High Performance Computing Lecture 18.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved ADT Implementation:
Basic Communication Operations Based on Chapter 4 of Introduction to Parallel Computing by Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar These.
1 Chapter 24 Developing Efficient Algorithms. 2 Executing Time Suppose two algorithms perform the same task such as search (linear search vs. binary search)
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “ Introduction to the Design & Analysis of Algorithms, ” 2 nd ed., Ch. 1 Chapter.
Pipelined Computations P0P1P2P3P5. Pipelined Computations a s in s out a[0] a s in s out a[1] a s in s out a[2] a s in s out a[3] a s in s out a[4] for(i=0.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CSCI-455/552 Introduction to High Performance Computing Lecture 13.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest.
 2008 Pearson Education, Inc. All rights reserved JavaScript: Control Statements I.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.
UNIT-I INTRODUCTION ANALYSIS AND DESIGN OF ALGORITHMS CHAPTER 1:
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
CSCI-455/552 Introduction to High Performance Computing Lecture 6.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CSCI-455/552 Introduction to High Performance Computing Lecture 23.
Chapter 9 Sorting. The efficiency of data handling can often be increased if the data are sorted according to some criteria of order. The first step is.
Liang, Introduction to Java Programming, Sixth Edition, (c) 2007 Pearson Education, Inc. All rights reserved Chapter 23 Algorithm Efficiency.
1 UNIT-I BRUTE FORCE ANALYSIS AND DESIGN OF ALGORITHMS CHAPTER 3:
CSCI-455/552 Introduction to High Performance Computing Lecture 21.
CSCI-455/552 Introduction to High Performance Computing Lecture 15.
Chapter 15 Running Time Analysis. Topics Orders of Magnitude and Big-Oh Notation Running Time Analysis of Algorithms –Counting Statements –Evaluating.
Chapter 7 JavaScript: Control Statements, Part 1
Pipelined Computations
CS4230 Parallel Programming Lecture 12: More Task Parallelism Mary Hall October 4, /04/2012 CS4230.
Decomposition Data Decomposition Functional Decomposition
Pipelined Computations
Introduction to High Performance Computing Lecture 12
Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt Oct 24, 2013.
Pipelined Pattern This pattern is implemented in Seeds, see
Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt March 20, 2014.
Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson slides5.ppt August 17, 2014.
Introduction to High Performance Computing Lecture 16
Introduction to High Performance Computing Lecture 17
Presentation transcript:

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. Pipelined Computations Chapter 5 Introduction to Pipelined Computations Computing Platform for Pipelined Computations Example Applications Adding numbers Sorting numbers Prime number generation Systems of linear equations

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. 5.2 Introduction to Pipelined Computations We discussed partitioning techniques common to a range of problems in Chapter 4 We will now discuss a parallel programming technique—pipelining — applicable to a wide range of problems Pipelining is applicable to problems that are partially sequential in nature Sequential on the basis of data dependency etc Can, thus, be used to parallelize sequential code Problem divided into a series of tasks that have to be completed one after the other Each task executed by a separate process or processor. Parallelism viewed as a form of functional decomposition—the functions are performed in succession

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. 5.3 Pipelined Computations Each task/function executed by a separate process or processor. Imagine what happens in a manufacturing plant How big/small should a task in each stage of the pipeline be? What are the trade-offs?

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. 5.4 Example 1 Add all the elements of array a to an accumulating sum: for (i = 0; i < n; i++) sum = sum + a[i]; The loop could be “unfolded” (formulated as a pipeline) to yield sum = sum + a[0]; sum = sum + a[1]; sum = sum + a[2]; sum = sum + a[3]; sum = sum + a[4];.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. 5.5 Pipeline for an unfolded loop The statements are modeled as a chain of producers and consumers a separate pipeline stage for each statement Each process viewed as a consumer of data items for the process preceding and as a producer of data for the process following Each stage accepts the accumulating sum on its input, s in, and one element of the array on its input a, and produces the new accumulating sum on its output, s out

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. 5.6 Example 2 Frequency filter - Objective to remove specific frequencies (f0, f1, f2,f3, etc.) from a digitized signal, f(t). Signal enters pipeline from left:

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. 5.7 Where pipelining can be used to good effect Assuming problem can be divided into a series of sequential tasks, pipelined approach can provide increased execution speed under the following three types of computations: 1.If more than one instance of the complete problem is to be executed 2. If a series of data items must be processed, each requiring multiple operations 3. If information to start the next process can be passed forward before the process has completed all its internal operations

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. 5.8 “Type 1” Pipeline Space-Time Diagram Assumption: Each process given same time to complete its task Each time period is one pipeline cycle

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved. 5.9 “Type 1” Pipeline: Execution Time With p processes constituting the pipeline and m instances of the problem to execute: The number of pipeline cycles to execute all instances is m+p-1 cycles The average number of cycles is (m+p-1)/m cycles tends to one cycle/problem instance for large n One instance of the problem will be completed in each pipeline cycle after the first p-1 cycles (pipeline latency)

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved Alternative space-time diagram

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved “Type 2” Pipeline Space-Time Diagram

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved “Type 3” Pipeline Space-Time Diagram Pipeline processing where information passes to next stage Utilized in parallel programs where there is only one instance of the problem to execute

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved If the number of stages is larger than the number of processors in any pipeline, a group of stages can be assigned to each processor: Grouping Pipelines

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved Computing Platform for Pipelined Applications Pipelining on clusters requires an interconnection that provides simultaneous transfer between adjacent processors Most clusters employ a switched interconnection structure that allows such transfers Key requirement: ability to send messages between adjacent processes in the pipeline Suggests direct communication links Ideal interconnection structure: multiprocessor system with a line configuration:

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved Example Pipelined Solutions (Examples of each type of computation)

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved Pipeline Program Examples Adding Numbers Type 1 pipeline computation

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved Basic code for process Pi : recv(&accumulation, P i-1 ); accumulation = accumulation + number; send(&accumulation, P i+1 ); except for the first process, P0, which is send(&number, P 1 ); and the last process, P n-1, which is recv(&number, P n-2 ); accumulation = accumulation + number; Pipeline Example: Adding Numbers (cont’d)

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved SPMD program if (process > 0) { recv(&accumulation, P i-1 ); accumulation = accumulation + number; } if (process < n-1) send(&accumulation, P i+1 ); The final result is in the last process. Instead of addition, other arithmetic operations could be done.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved Pipelined addition numbers with a master process and ring configuration

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved Analysis Analyses in previous chapters assumed simultaneous computation and communication phases among processes Many not be appropriate in pipelining because each instance starts at a different time and ends at a different time Assumption: each process performs similar actions in each pipeline cycle We’ll then work out the communication and computation required in each pipeline cycle With a p- stage pipeline and m instances, the total execution time, t total, is The average time for a computation is

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved Analysis: Single Instance Problem Consider a case where a single number is being added in each stage, i.e., n=p The period of one pipeline cycle will be dictated by the time of one addition and one communication: Each pipeline cycle, t cycle, requires: With one group of numbers ( m=1 ), the total execution time will take n pipeline cycles and t total will be

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved Analysis: Multiple Instances Problem With m groups of numbers to add, each resulting in a separate answer, there will be ( m+n-1 ) cycles and t total will be: For large m, the average execution time, t a, is approximately

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved Data Partitioning with Multiple Instances Problem Consider the case when each stage processes a group of d numbers The number of processes is given by p = n/d Each communication will transfer one result But the computation will now require d numbers to be accumulated ( d-1 steps) plus the incoming number, thus we have: What is the impact of the size d of data partitioning on performance?

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved Example 2: Sorting Numbers A pipeline solution for sorting is to have the first process, P 0, accept the series of numbers one at a time, store the largest so far received and pass onward all smaller numbers Each subsequent process performs the same algorithm, When no more numbers are to be processed, P 0, will have the largest number, P 1 the next largest, and so on The basic algorithm for process P i, 0<i<p-1, is recv(&number, P i-1 ); if (number > x) { send(&x, P i+1 ); x = number; } else send(&number, P i+1 );

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved Sorting Numbers A parallel version of insertion sort.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved right_procNum = n-i-1; recv(&x, P i-1 ); for(j=0; j<right_procNum; j++) recv(&number,P i-1 ); if (number > x) { send(&x, P i+1 ); x = number; } else send(&number, P i+1 ); } With n numbers, the i th process will accept n - i numbers It will pass onward n - i – 1 numbers Hence, a simple loop could be used. Sorting Numbers (cont’d)

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved Pipeline for Sorting Using Insertion Sort A series of operations performed on a series of data items No opportunity to continue useful work after passing smaller numbers onward Type 2 pipeline computation

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved Extracting the Sorted Numbers Results of the sorting algorithm can be extracted from the pipeline using The ring configuration in Slide 5.19, or The bi-directional line configuration shown below Advantage of bi-directional line: process can pass its result as soon as it receives its last input number More numbers pass thru processes nearer the master

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved right_procNum = n-i-1; recv(&x, P i-1 ); for(j=0; j<right_procNum; j++) recv(&number,P i-1 ); if (number > x) { send(&x, P i+1 ); x = number; } else send(&number, P i+1 ); } send(&x, P i-1 ); for(j=0; j<right_procNum; j++) recv(&number,P i+1 ); send(&number,P i-1 ); } Incorporating results being returned, process i, 0<i<p-1, could have the form: Extracting the Sorted Numbers(cont’d)

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved Analysis Regarding the compare-and-exchange operation as one computational step, the sequential time is: approximately n 2 /2 number of steps, unsuitable except for very small n. With n pipeline processes and n numbers to sort, the parallel implementation has n+n-1 = 2n-1 pipeline cycles. Each cycle has one compare and exchange operation and one send() Thus each pipeline requires (See figure on Slide 5.19): The total execution time is:

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved Example 3: Prime Number Generation Sieve of Eratosthenes is a classical way of extracting prime numbers from a series of all integers starting from 2 First number, 2, is prime and kept. All multiples of this number are deleted as they cannot be prime. Process repeated with each remaining number. The algorithm removes nonprimes, leaving only primes.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved Sieve of Eratosthenes : Sequential Code Sequential program usually employs an array: with all elements initialized to true and later reset to false each element whose index is not a prime number for(i=2; i<=n; i++) prime[i] = 1; /* initialize array */ for(i=2; i<=sqrt_n; i++) /* for each prime */ if (prime[i]==1) for(j=i+i; j<=n; j = j+i) prime[j] = 0; /* strike its multiples */

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved Sieve of Eratosthenes : Sequential Code There are multiples of 2, multiples of 3 etc. Hence, Algorithm can be improved so that striking can start at i 2 rather than 2i, for a prime i. Notice that the early terms in the above equation will dominate the overall time There are more multiples of 2 than 3, more multiples of 3 than 4, etc

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved Pipelined Sieve of Eratosthenes A parallel implementation based on partitioning, where each process strikes out multiples of none number will not be effective. Why? A pipeline implementation can be quite effective: First a series of consecutive numbers is generated that feeds into the first pipeline stage This stage extracts all multiples of 2 and passes the other numbers to stage 2 The second stage extracts all multiples of 3 and passes the other numbers to stage 3 etc

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved The code for a process, Pi, could be based upon recv(&x, P i-1 ); /* repeat following for each number */ recv(&number, P i-1 ); if ((number % x) != 0) send(&number, P i+1 ); Each process will not receive the same amount of numbers and the amount is not known beforehand. Use a “terminator” message, which is sent at the end of the sequence: recv(&x, P i-1 ); for (i = 0; i < n; i++) { recv(&number, P i-1 ); if (number == terminator) break; (number % x) != 0) send(&number, P i+1 ); } Sieve of Eratosthenes : Parallel Code

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved Example 4: Solving a System of Linear Equations The final example is Type 3, where a process can continue with useful work after passing on information This is demonstrated by solving a system of linear equations of upper- triangular form: We need to for x 0,x 1, …, x n-1, where the a ’s and the b ’s are constants.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved Back Substitution First, the unknown x0 is found from the last equation; i.e., Value obtained for x0 substituted into next equation to obtain x1; i.e., Values obtained for x1 and x0 substituted into next equation to obtain x2: and so on until all the unknowns are found.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved Pipeline Solution This algorithm can be implemented as a pipeline: First pipeline stage computes x 0 and passes x 0 onto the second stage, The second stage computes x 1 from x 0 and passes both x 0 and x 1 onto the third stage, The third stage computes x 2 from x 0 and x 1, and so on. Type 3 pipeline computation

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved Each pipeline stage can be implemented with one process There are p=n processes for n equations The i th process ( 0 < i < p ) receives the values x 0, x 1, x 2, …, x i-1 and computes xi from the equation: Pipeline Solution (cont’d)

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved Sequential Code Given the constants a i,j and b k stored in arrays a[][] and b[], respectively, and the values for unknowns to be stored in an array, x[], the sequential code could be x[0] = b[0]/a[0][0]; /* computed separately */ for (i = 1; i < n; i++) { /*for remaining unknowns*/ sum = 0; for (j = 0; j < i; j++ sum = sum + a[i][j]*x[j]; x[i] = (b[i] - sum)/a[i][i]; }

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved Parallel Code Pseudocode of process P i ( i < p-1 ) of could be sum = 0; for (j = 0; j < i; j++) { recv(&x[j], P i-1 ); send(&x[j], P i+1 ); sum = sum + a[i][j]*x[j]; } x[i] = (b[i] - sum)/a[i][i]; send(&x[i], P i+1 ); Now we have additional computations to do after receiving and resending values.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M Pearson Education Inc. All rights reserved Pipeline processing using back substitution