Download presentation

Presentation is loading. Please wait.

Published byMadeline Cherry Modified over 2 years ago

1
Strassen's Matrix Multiplication Presented By: Gaurav Jain Lalchand Course Project On : Under The Guidance Of: Prof. Subodh Kumar

2
Basic Matrix Multiplication Suppose we want to multiply two matrices of size N x N: for example A x B = C. C 11 = a 11 b 11 + a 12 b 21 C 12 = a 11 b 12 + a 12 b 22 C 21 = a 21 b 11 + a 22 b 21 C 22 = a 21 b 12 + a 22 b 22 2x2 matrix multiplication can be accomplished in 8 multiplication.(2 log 2 8 =2 3 )

3
Strassens’s Matrix Multiplication

4
P 1 = (A 11 + A 22 )(B 11 +B 22 ) P 2 = (A 21 + A 22 ) * B 11 P 3 = A 11 * (B 12 - B 22 ) P 4 = A 22 * (B 21 - B 11 ) P 5 = (A 11 + A 12 ) * B 22 P 6 = (A 21 - A 11 ) * (B 11 + B 12 ) P 7 = (A 12 - A 22 ) * (B 21 + B 22 )

5
Strassens’s Matrix Multiplication P 1 = (A 11 + A 22 )(B 11 +B 22 ) P 2 = (A 21 + A 22 ) * B 11 P 3 = A 11 * (B 12 - B 22 ) P 4 = A 22 * (B 21 - B 11 ) P 5 = (A 11 + A 12 ) * B 22 P 6 = (A 21 - A 11 ) * (B 11 + B 12 ) P 7 = (A 12 - A 22 ) * (B 21 + B 22 ) C 11 = P 1 + P 4 - P 5 + P 7 C 12 = P 3 + P 5 C 21 = P 2 + P 4 C 22 = P 1 + P 3 - P 2 + P 6

6
Strassens’s Matrix Multiplication Ref : Accelerating High Performance Applications with CUDA and MPI

7
Why MPI + CUDA ?.. ➢ Equations naturally suitable for CUDA environment ➢ Incapability of CUDA : No inter GPU communication. ➢ MPI : Data distributing mechanism ➢ CUDA : Main Execution Engine

8
MPI + CUDA

9
➢ Divide the input matrix into four equal parts ➢ Send the appropiate part to the corresponding process ➢ Each process compute the corresponding equation Node Contains GPU Use kernels on their own GPU to compute result Steps Performed

10
➢ Divide the input matrix into four equal parts ➢ Send the appropiate part to the corresponding process ➢ Each process compute the corresponding equation ➢ Process will send their result to the head process of equation ➢ All Heads collect data ➢ Head will compute C's equation ➢ All head send their partial result to master node ➢ Master will combine & display the result Steps Performed

11
P 1 = (A 11 + A 22 )(B 11 +B 22 ) P 5 = (A 11 + A 12 ) * B 22 P 1 = (A 11 + A 22 )(B 11 +B 22 ) P 5 = (A 11 + A 12 ) * B 22 P 2 = (A 21 + A 22 ) * B 11 P 6 = (A 21 - A 11 ) * (B 11 + B 12 ) P 2 = (A 21 + A 22 ) * B 11 P 6 = (A 21 - A 11 ) * (B 11 + B 12 ) P 3 = A 11 * (B 12 - B 22 ) P 7 = (A 12 - A 22 ) * (B 21 + B 22 ) P 3 = A 11 * (B 12 - B 22 ) P 7 = (A 12 - A 22 ) * (B 21 + B 22 ) P 4 = A 22 * (B 21 - B 11 ) Detailed Description – Step 1

12
P 1, P 5 P 2, P 6 P 3, P 7 P4P4 P4P4 Detailed Description – Step 2

13
P 1, P 5 P 2, P 6 P3, P7 P4P4 P4P4 Declare Result Detailed Description – Step 3

14
Experimental Result - 1

15
Experimental Result - 2

16
Experimental Result - 3

17
References : Accelerating High Performance Applications with CUDA and MPI : N. P. Karunadasa & D. N. Ranasinghe Strassen’s Matrix Multiplication on GPUs : Junjie Li, Sanjay Ranka

18
Thanks

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google