Presentation is loading. Please wait.

Presentation is loading. Please wait.

Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.

Similar presentations


Presentation on theme: "Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel."— Presentation transcript:

1 Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel applications Functional Decomposition – Dividing an algorithm into its functional pieces and executing the pieces in separate processors – Example: Pipelining

2 Pipelined Computations Divide a problem into a series of tasks A processor completes a task sequentially and pipes the results to the next processor Example of Summing Groups of Numbers P0P0 P1P1 P4P4 P2P2 P3P3 P5P5 P0P0 P1P1 P4P4 P2P2 P3P3 P5P5 ∑A[i 0 ]∑A[i 1 ]∑A[i 2 ]∑A[i 3 ]∑A[i 4 ]∑A[i 5 ] zero total Question: Is this data or is it functional decomposition?

3 Where is Pipelining Applicable? Type 1 – More than one instance of a problem – Example: Multiple simulations with different parameter settings Type 2 – Series of data items with multiple operations – Example: Signal Filter or Eratosthenes Sieve Type 3 – Partial results passed on while processing continues – Example: Solving sets of linear equations Considerations – Are there a series of sequential tasks? – Is the processing of each tack approximately equal? – Can items be grouped to minimize communication cost – If stages exceed processors oGroup stages oWrap last stage back to the first – Determine where the result will be at the end of the process

4 Summing Numbers Example process P i >0 && <N-1 recv(&sum, P i-1 ); sum += number; send(&sum, P i+1 ); Process P 0 send(&number, P 1 ); Process P N-1 recv(&number, P n-2 ); sum += number; Save or display result

5 Application Remove frequencies from a signal – Sequential Algorithm: Fourier Analysis (O(N lg(N)) – Parallel: Apply filters to the signal (O(N*FilterLength)) with convolution. – Filter Examples: Chebyshev, ButtorWorth, etc. – Derive filter: Set Z-domain poles and zeroes, perform inverse tranformation. – Filters can be useful to manipulate signals, detect patterns, etc.

6 Chebyshev Filter Design Chebyshev in the z-domainChebyshev Frequency Response Note: Depending on the placement of the poles (+) and zeroes (0), the filter will effect a signal differently

7 Type 1: Multiple Instances Sequential execution: t 1 = m*t m Parallel Processing: (m + p – 1)*t m /p Parallel Communication: (m+p-1)*(t start +n*t data ) Speed up: t p = m*t m /((m+p-1)*(t m /p+t start +n*t data )) P0P0 P1P1 P2P2 P3P3 P4P4 P5P5 P0P0 P1P1 P2P2 P3P3 P4P4 P5P5 P0P0 P1P1 P2P2 P3P3 P4P4 P5P5 P0P0 P1P1 P2P2 P3P3 P4P4 P5P5 P0P0 P1P1 P2P2 P3P3 P4P4 P5P5 Instance 1 Instance 2 Instance 3 Instance 4 Instance 5 Time Space Time Diagram Notation 1.m = instances, p = processors 2.t start = latency t data = bandwidth 3.n = data transmitted /instance 4.t m = total time to process an instance 5.Total pipeline cycles = m + p – 1 6.Assume: Equal processing per stage

8 Type 2: Multiple Data Elements P0P0 P1P1 P4P4 P2P2 P3P3 P5P5 Filter f 0 Unfiltered Signal Filtered Signal Filter f 1 Filter f 2 Filter f 3 Filter f 4 Filter f 5 d9d8d7d6d5d4d3d2d1d0d9d8d7d6d5d4d3d2d1d0 P0P0 P0P0 P0P0 P0P0 P0P0 P0P0 Example: Signal Filter Each process removes one or more frequencies from a digitized signal

9 Type 2 Timing Diagram

10 Type 3: Partial Processing The next stage receives information to continue processing Additional processing continues at the source processor Question: How do we determine speed-up? P0P0 P1P1 P2P2 P3P3 P4P4 P5P5 P0P0 P1P1 P2P2 P3P3 P4P4 P5P5 Linear EquationsA More Balanced Load = Idle = Executing

11 Operation at each processor Types 1 and 2 Processor with rank r = 0 – Generate the instance (type 1) or the data (type 2) to process – Process appropriately – Send message to the processor with rank 1 Processors with rank r = 1, 2, p-2 – Receive message from the processor with rank r-1 – Process appropriately – Send message to the processor with rank r+1 Processor with rank r = p-1 – Receive message from processor with rank r-1 – Process appropriately – Output final results Examples 1)Adding Numbers: n1 -> n1+n2 -> n1+n2+n3 ->... 2)Frequency removal: f(t) -> f0; f(t-f0)-> f1; f(t-f0-f1)->...

12 Parallel Pipeline Sort 54321 5432 5431 542 531 52 52 5 5 Step Numbers P 0 P 1 P 2 P 3 P 4 4, 3, 1, 2, 5 4, 3, 1, 2 4, 3, 1 4, 3 4 1 1 1 1 2 2 2 3 3 4 1 2 3 4 5 6 7 8 9 10 Pseudo code Receive x i IF x i < x max Send x i ELSE Send x max x max = x i Note: Processors can hold blocks of numbers for better efficiency

13 Bi-Directional Pipeline Use the pipeline to return results to the master – Useful for line topologies, ring, or hypercube P0P0 P1P1 P4P4 P2P2 P3P3 P5P5 Sorting Phase P4P3P2P1P0P4P3P2P1P0 Time Gather Phase Phases N(generate steps); N-1 (propagate steps); N-1 (return steps) = 3N-2 Sort Phase If (myid == 0) generate number Else receive(&number, p myid-1 ) If (number > max and myid<P-1) { send(max,p myid+1 ); maximuSoFar=number;} Gather phase If (myid < P-1) receive sorted numbers from p myid+1 If (myid > 0) send sorted numbers to p myid-1 Example: Sorting

14 Sieve of Eratosthenes

15 Prime Number Generation Sieve of Eratosthenes (Type 2 pipeline) Concept – Each processor filters blocks of non-primes from the flow of data – The “potential” prime numbers pass through to the next processor Pseudo-code The Master processor generates an array of odd n numbers In a loop after receiving a group of numbers Filter a group of numbers; pass unfiltered numbers down the pipeline Gather all of the primes Notes – Wrapping the pipeline in a ring could help maintain load balance – A termination message determines when the pipeline empties Question: What range of numbers should each processor get?

16 Sequential code for (i = 2; i < n; i++) prime[i] = 1; for (i = 2; i <= sqrt_n; i++) if (prime[i] == 1) for (j = i + i; j < n; j = j + i) prime[j] = 0 Parallel Code Processor pi > 0 Recv(number, rank-1); PRIME = TRUE; FOR (int x=MIN; x<MAX; x+=MIN) IF ((number % x) == 0) PRIME = FALSE and BREAK IF (PRIME) send(number, rank+1); Termination recv(number, rank-1); send(number, rank+1) IF (number == terminator) break; Sequential Time O(n 2 ) Implementation

17 Upper Triangular Matrix All entries below the diagonal are zero Useful for solving N equations and N unknowns

18 Solving Sets of Linear Equations Upper Triangular Form a n-1,0 x 0 + a n-1,1 x1 + … + a n-1,n-1 x n-1 = b n-1 a n-2,0 x 0 + a n-2,1 x1 + … + a n-2,n-2 x n-1 = b n-2 a 1, 0 x 0 + a 1,1 x 1 = b 1 a 0,0 x 0 = b 0 Back Substitution x 0 =b 0 /a 0,0 x 1 =(b 1 -a 1,0 x 0 )/a 1,1 x 2 =(b 2 -a 2,0 x 0 -a 2,1 x 1 )/a 2,2 Parallel code for p i where 1<=i<n sum = 0 For (j=0; j<i; j++) {receive(&x[j], p i-1 ); sum += a i,j * x j ; send(x j,p i+1 ) } x i = (b i – sum)/a i,i General solution for x i x i = (b i – ∑ j=0 to i-1 a i,j x j )/a i,I Sequential code x[0] = b 0 /a 0,0, FOR (i=1; i<n; i++) sum=0; FOR (j=0; j<i; j++) sum += a i,I x j x i = (b i – sum)/a i,I Parallel Pseudo code for (j = 0; j < i; j++) recv(x[j], p-1); send(x[j], p+1); sum = 0; for (j = 0; j < i; j++) sum = sum + a[i][j]*x[j] x[i] = (b[i] - sum)/a[i][i]; send(x[i], p+1); This is a type 3 pipeline example Note: a i,j and b i are constants

19 Pipeline Solution DO IF p ≠ master, receive x j from previous processor IF p ≠ P-1, send x j to next processor back substitute x j UNTIL x i evaluated IF p ≠ P-1send x i to the next processor Notes: 1.Processing continues after sending values down the pipeline 2.Is the load imbalanced?

20 Illustration of Type 3 Solution Compute x 0 Compute x 1 Compute x 2 Compute x 3 x0x1x2x3x0x1x2x3 x0x0 x0x1x0x1 x0x1x2x0x1x2 P0P0 P1P1 P2P2 P3P3 Time P5P4P3P2P1P0P5P4P3P2P1P0 How balanced is This load?


Download ppt "Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel."

Similar presentations


Ads by Google