Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Similar presentations


Presentation on theme: "Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn."— Presentation transcript:

1 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn

2 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 11 Matrix Multiplication

3 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.  = Iterative, Row-oriented Algorithm Series of inner product (dot product) operations

4 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Block Matrix Multiplication  = Replace scalar multiplication with matrix multiplication Replace scalar addition with matrix addition

5 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Recurse Until B Small Enough

6 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. First Parallel Algorithm Partitioning Partitioning  Divide matrices into rows  Each primitive task has corresponding rows of three matrices Communication Communication  Each task must eventually see every row of B  Organize tasks into a ring

7 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. First Parallel Algorithm (cont.) Agglomeration and mapping Agglomeration and mapping  Fixed number of tasks, each requiring same amount of computation  Regular communication among tasks  Strategy: Assign each process a contiguous group of rows

8 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Communication of B A ABC A ABC A ABC A ABC

9 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Communication of B A ABC A ABC A ABC A ABC

10 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Communication of B A ABC A ABC A ABC A ABC

11 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Communication of B A ABC A ABC A ABC A ABC

12 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Complexity Analysis Algorithm has p iterations Algorithm has p iterations During each iteration a process multiplies (n / p)  (n / p) block of A by (n / p)  n block of B:  (n 3 / p 2 ) During each iteration a process multiplies (n / p)  (n / p) block of A by (n / p)  n block of B:  (n 3 / p 2 ) Total computation time:  (n 3 / p) Total computation time:  (n 3 / p) Each process ends up passing (p-1)n 2 /p =  (n 2 ) elements of B Each process ends up passing (p-1)n 2 /p =  (n 2 ) elements of B

13 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Weakness of Algorithm 1 Blocks of B being manipulated have p times more columns than rows Blocks of B being manipulated have p times more columns than rows Each process must access every element of matrix B Each process must access every element of matrix B Ratio of computations per communication is poor: only 2n / p Ratio of computations per communication is poor: only 2n / p

14 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Algorithm 2 (Cannon’s Algorithm) Associate a primitive task with each matrix element Associate a primitive task with each matrix element Agglomerate tasks responsible for a square (or nearly square) block of C Agglomerate tasks responsible for a square (or nearly square) block of C Computation-to-communication ratio rises to n /  p Computation-to-communication ratio rises to n /  p

15 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Elements of A and B Needed to Compute a Process’s Portion of C Algorithm 1 Cannon’s Algorithm

16 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Blocks Must Be Aligned BeforeAfter

17 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Blocks Need to Be Aligned A00 B00 A01 B01 A02 B02 A03 B03 A10 B10 A11 B11 A12 B12 A13 B13 A20 B20 A21 B21 A22 B22 A23 B23 A30 B30 A31 B31 A32 B32 A33 B33 Each triangle represents a matrix block Only same-color triangles should be multiplied

18 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Rearrange Blocks A00 B00 A01 B01 A02 B02 A03 B03 A10 B10 A11 B11 A12 B12 A13 B13 A20 B20 A21 B21 A22 B22 A23 B23 A30 B30 A31 B31 A32 B32 A33 B33 Block Aij cycles left i positions Block Bij cycles up j positions

19 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Consider Process P 1,2 B02 A10A11A12 B12 A13 B22 B32 Step 1

20 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Consider Process P 1,2 B12 A11A12A13 B22 A10 B32 B02 Step 2

21 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Consider Process P 1,2 B22 A12A13A10 B32 A11 B02 B12 Step 3

22 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Consider Process P 1,2 B32 A13A10A11 B02 A12 B12 B22 Step 4

23 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Complexity Analysis Algorithm has  p iterations Algorithm has  p iterations During each iteration process multiplies two (n /  p )  (n /  p ) matrices:  (n 3 / p 3/2 ) During each iteration process multiplies two (n /  p )  (n /  p ) matrices:  (n 3 / p 3/2 ) Computational complexity:  (n 3 / p) Computational complexity:  (n 3 / p) During each iteration process sends and receives two blocks of size (n /  p )  (n /  p ) During each iteration process sends and receives two blocks of size (n /  p )  (n /  p ) Communication complexity:  (n 2 /  p) Communication complexity:  (n 2 /  p)

24 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. This system is highly scalable! Sequential algorithm:  (n 3 ) Sequential algorithm:  (n 3 ) Parallel overhead:  (  pn 2 ) Parallel overhead:  (  pn 2 )


Download ppt "Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn."

Similar presentations


Ads by Google