Presentation is loading. Please wait.

Presentation is loading. Please wait.

1/26 Design of parallel algorithms Linear equations Jari Porras.

Similar presentations


Presentation on theme: "1/26 Design of parallel algorithms Linear equations Jari Porras."— Presentation transcript:

1 1/26 Design of parallel algorithms Linear equations Jari Porras

2 2/26 Linear equations a 0,0 x 0 +... + a 0,n-1 x n-1 = b 0... a n-1,0 x 0 +... + a n-1,n-1 x n-1 = b n-1 Ax = b Usually solved in 2 stages –reduce into upper triangular system Ux = y –back-substitution x n-1... x 0 Gaussian elimination

3 3/26 Gaussian elimination

4 4/26 Gaussian elimination

5 5/26 Gaussian elimination Gaussian elimination requires –n 2 /2 divisions (line 6) –(n 3 /3) – (n 2 /2) subtractions and multiplications (line 12) Sequential run time 2n 3 /3 How is the gausian elimination peformed in parallel ?

6 6/26 Parallel Gaussian elimination Row/column striping vs. chackerboarding ? Block vs cyclic striped ? Number of processors p n Active processors ? Required steps ?

7 7/26

8 8/26 Analysis 1st step –k th iteration requires n – k – 1 divisions at processor P k 2 nd step –(t s + t w (n – k – 1)) log n time on hypercube 3 rd step –k th iteration requires n – k – 1 multiplications and subtractions at all processors P i Tp = 3/2 n(n-1) + t s nlog n + ½ t w n(n-1)logn

9 9/26 Analysis Not cost-optimal since pTp =  (n 3 logn) What is the main reason ? –Inefficient parallelization ? –What could be done ?

10 10/26

11 11/26

12 12/26 Analysis Pipelined operation –all n steps are executed in parallel –last step starts in nth step and is completed in constant time (changes only the bottm right corner element) –  (n) steps –Each step takes O(n) time –Thus parallel run time O(n 2 ) and cost (n 3 ) –Cost-optimal !!

13 13/26 p < n ? Block striping –several rows / processor Does the activity change ? –Block vs. cyclic striping

14 14/26

15 15/26

16 16/26 Analysis With block striping –processor with all rows belonging to the active part performs (n – k – 1)n/p multiplications and subtractions –if the pipelined version is used the number of arithmetic operations (2(n-k-1)n/p) is higher than number of words communicated (n-k-1) –computation dominates –parallel run time n 3 /p

17 17/26 Checkeboard partitioning Use n x n mesh Same approach as before, but –requires two broadcasts (rowwise and columnwise) –Analyse the cost-optimality How about the pipelining ?

18 18/26

19 19/26 Pipelined checkerboard

20 20/26 Pipelined checkerboard

21 21/26 p < n 2 Map matrix onto  p x  p mesh by usin block checkerboard partitioning Remember the effect of active processors !! Number of multiplications and subtractions n 2 /p and n/  p word communication –computation dominates !

22 22/26

23 23/26

24 24/26 Partial pivoting Basic algorithm fails if any elemnt on diagonal is zero Partial pivoting helps –select row that has the largest element on the wanted column and exchange rows What is the effect to the partitioning strategy ? How about pipelining

25 25/26 Back-substitution The second stage of solving linear equations Back-substitution is used to determine vector x Complexity n 2 –use partitioning scheme that is suitable for Gaussian elimination

26 26/26 Back substitution


Download ppt "1/26 Design of parallel algorithms Linear equations Jari Porras."

Similar presentations


Ads by Google