Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pipelined Computations Divide a problem into a series of tasks A processor completes a task sequentially and pipes the results to the next processor Pipelining.

Similar presentations


Presentation on theme: "Pipelined Computations Divide a problem into a series of tasks A processor completes a task sequentially and pipes the results to the next processor Pipelining."— Presentation transcript:

1 Pipelined Computations Divide a problem into a series of tasks A processor completes a task sequentially and pipes the results to the next processor Pipelining illustrates functional decomposition. –The application is partitioned into different functions P0P0 P1P1 P4P4 P2P2 P3P3 P5P5 P0P0 P1P1 P4P4 P2P2 P3P3 P5P5 ∑A[i 0 ]∑A[i 1 ]∑A[i 2 ]∑A[i 3 ]∑A[i 4 ]∑A[i 5 ] zero Example of Summing Groups of Numbers total

2 Where is Pipelining Applicable? Type 1 –More than one instance of a problem –Example: Multiple simulations with different parameter settings Type 2 –Series of data items with multiple operations –Example: Signal Filter or Eratosthenes Sieve Type 3 –Partial results passed on while processing continues –Example: Solving sets of linear equations Considerations –Are there a series of sequential tasks? –Is the processing of each tack approximately equal? –Can items be grouped to minimize communication cost –If stages exceed processors oGroup stages oWrap last stage back to the first –What happens to the final result?

3 Type 1: Multiple Instances Sequential execution: t 1 = m*t m Parallel Processing: (m/p + p – 1)*t m Parallel Communication: (m+p-1)*(t start +n*t data ) Speed up: t p = m*t m /((m/p+p-1)*(t m +t start +n*t data )) P0P0 P1P1 P2P2 P3P3 P4P4 P5P5 P0P0 P1P1 P2P2 P3P3 P4P4 P5P5 P0P0 P1P1 P2P2 P3P3 P4P4 P5P5 P0P0 P1P1 P2P2 P3P3 P4P4 P5P5 P0P0 P1P1 P2P2 P3P3 P4P4 P5P5 Instance 4 Instance 3 Instance 2 Instance 1 Instance 0 Time Space Time Diagram

4 Type 2: Multiple Data Elements P0P0 P1P1 P4P4 P2P2 P3P3 P5P5 Filter f 0 Unfiltered Signal Filtered Signal Filter f 1 Filter f 2 Filter f 3 Filter f 4 Filter f 5 d9d8d7d6d5d4d3d2d1d0d9d8d7d6d5d4d3d2d1d0 P0P0 P0P0 P0P0 P0P0 P0P0 P0P0 Example: Signal Filter Each process removes one or more frequencies from a digitized signal

5 Type 3: Partial Processing The next stage receives information to continue processing Additional processing continues at the source processor Question: How do we determine speed-up? P0P0 P1P1 P2P2 P3P3 P4P4 P5P5 P0P0 P1P1 P2P2 P3P3 P4P4 P5P5 Linear EquationsA More Balanced Load = Idle = Executing

6 Operation at each processor Types 1 and 2 Processor with rank r = 0 –Generate the instance (type 1) or the data (type 2) to process –Process appropriately –Send message to the processor with rank 1 Processors with rank r = 1, 2, p-2 –Receive message from the processor with rank r-1 –Process appropriately –Send message to the processor with rank r+1 Processor with rank r = p-1 –Receive message from processor with rank r-1 –Process appropriately –Output final results

7 Parallel Pipeline Sort 54321 5432 5431 542 531 52 52 5 5 Step Numbers P 0 P 1 P 2 P 3 P 4 4, 3, 1, 2, 5 4, 3, 1, 2 4, 3, 1 4, 3 4 1 1 1 1 2 2 2 3 3 4 1 2 3 4 5 6 7 8 9 10 Pseudo code Receive x i IF x i < x max Send x i ELSE Send x max x max = x i Note: Processors can hold blocks of numbers for better efficiency

8 Bi-Directional Pipeline Use the pipeline to return results to the master –Useful for line topologies P0P0 P1P1 P4P4 P2P2 P3P3 P5P5 Sorting Phase P4P3P2P1P0P4P3P2P1P0 Time Gather Phase Shown for n=5 Sort Phase requires 2n-1 cycles Results Phase requires another n cycles

9 Prime Number Generation Sieve of Eratosthenes (Type 2 pipeline) Concept –Each processor filters blocks of non-primes from the flow of data –The “potential” prime numbers pass through to the next processor Pseudo-code The Master processor generates an array of odd n numbers In a loop after receiving a group of numbers Filter a group of numbers; pass unfiltered numbers down the pipeline Gather all of the primes Notes –Wrapping the pipeline in a ring could help maintain load balance –A termination message determines when the pipeline empties Question: What range of numbers should each processor get?

10 Solving Sets of Linear Equations Upper Triangular Form a n-1,0 x 0 + a n-1,1 x1 + … + a n-1,n-1 x n-1 = b n-1 a n-2,0 x 0 + a n-2,1 x1 + … + a n-2,n-2 x n-1 = b n-2 a 1, 0 x 0 + a 1,1 x 1 = b 1 a 0,0 x 0 = b 0 Back Substitution x 0 =b 0 /a 0,0 x 1 =(b 1 -a 1,0 x 0 )/a 1,1 x 2 =(b 2 -a 2,0 x 0 -a 2,1 x 1 )/a 2,2 General solution for x i x i = (b i – ∑ j=0 to i-1 a i,j x j )/a i,I Sequential code x[0] = b 0 /a 0,0, FOR (i=1; i<n; i++) sum=0; FOR (j=0; j<i; j++) sum += a i,I x j x i = (b i – sum)/a i,i This is a type 3 pipeline example Note: a i,j and b i are constants

11 Pipeline Solution DO IF p ≠ master, receive x j from previous processor IF p ≠ P-1, send x j to next processor back substitute x j UNTIL x i evaluated IF p ≠ P-1send x i to the next processor Notes: 1.Processing continues after sending values down the pipeline 2.Is the load imbalanced? How can we improve it? See next slide!

12 Illustration of Type 3 Solution Compute x 0 Compute x 1 Compute x 2 Compute x 3 x0x1x2x3x0x1x2x3 x0x0 x0x1x0x1 x0x1x2x0x1x2 P0P0 P1P1 P2P2 P3P3 Time P5P4P3P2P1P0P5P4P3P2P1P0 How balanced is This load?


Download ppt "Pipelined Computations Divide a problem into a series of tasks A processor completes a task sequentially and pipes the results to the next processor Pipelining."

Similar presentations


Ads by Google