GAMMA: An Efficient Distributed Shared Memory Toolbox for MATLAB Rajkiran Panuganti1, Muthu Baskaran1, Jarek Nieplocha2, Ashok Krishnamurthy3, Atanas Rountev1, P. Sadayappan1 1 The Ohio State University 2 PNNL 3 Ohio Supercomputer Center
Overview Motivation GAMMA Programming Model Implementation Overview Experimental Evaluation Conclusions 11/24/2018
High Productivity Computing Programmers’ productivity is extremely important C/Fortran – Good performance but poor productivity Parallel Programming in C/Fortran even harder MATLAB, Python etc. – Good programmer productivity Poor performance and inability to run large scale problems (memory limitations) 11/24/2018
MATLAB and High Productivity Numerous features resulting in High Programmer Productivity: Array Based Semantics Copy/Value based semantics Debugging and Profiling Support Integrated Development Environment Numerous Domain Specific libraries (Toolboxes) Visualization And a lot more...... Need to retain above features while addressing performance Issues 11/24/2018
Problem Out-Of-Memory! Out-Of-Memory! Performance! 199 sec 10.19 s 1. Remember the sizes for Class B for Fortran and EP! 199 sec 10.19 s 11/24/2018
ParaM :- ‘Parallel MATLAB’ USER user DParaM GAMMA Specialized Libraries user mexMPI Library Writers Compiler MATLAB GA + MVAPICH GA + MVAPICH 11/24/2018
Overview Motivation GAMMA Programming Model Implementation Overview Experimental Evaluation Conclusions 11/24/2018
Programming Model Global Shared View of the distributed Array Physical View Logical View (1,1) P1 P0 (250,75) P2 P3 (700,610) (1024,1024) A = GA([1024, 1024],distr); Block = A(250:700,75:610); 11/24/2018
Programming Model (Contd..) Get-Compute-Put Computation Model Get() Put() Put() Process 0 Get() Compute Process 1 Compute 11/24/2018
Other features in the Programming Model enabling Efficiency Pass-by-reference semantics for distributed arrays Intended for Library writers Management of Data Locality (NUMA) Distribution information can be retrieved by the programmer Reference based access to the local data Data replication Support for replicating near-neighbor data 11/24/2018
Other features in the Programming Model enabling Efficiency Contd.. Asynchronous operations Support for Library Writers Interoperable with ‘Message Passing’ Message Passing support using ‘mexMPI’ Interoperable with some other ‘Parallel MATLAB’ projects Interoperable with pMATLAB, Mathworks DCT 11/24/2018
Illustration by Example (FFT2) – 2D FFT [rank, nprocs] = Begin(); dims = [N N]; distr = [N N/nprocs]; A = GA(dims, distr); tmp=local(A); % GET() tmp = fft(tmp); % Compute() Put(A,tmp); % PUT() Sync(); ATmp = GA(A); Transpose(A,ATmp); % Collective Ops Tmp = local(ATmp); Put(ATmp,fft(Tmp)); Transpose(ATmp,A); GA_End(); Transpose 11/24/2018
Software Architecture User MATLAB Front-End GAMMA mexMPI MATLAB Computation Engine GA MPI SCALAPACK 11/24/2018
Overview Motivation GAMMA Programming Model Implementation Overview Experimental Evaluation Conclusions 11/24/2018
Evaluation OSC Pentium 4 Cluster Two 2.4 GHz Intel P4 processors per node, Linux kernel 2.6.6 , 4GB RAM, MVAPICH 0.9.4 Infiniband MATLAB Version 7.01 Fully distributed environment Evaluation using NAS Benchmarks 11/24/2018
Programmability Slight Increase in SLOC Moderate Increase in SLOC 11/24/2018
Performance Analysis 11/24/2018
Performance Analysis 11/24/2018
Speedup on Large Problem Sizes 11/24/2018
Related Work Early 90’s – MPI & Cluster Programming 1995 – ‘Why there isn’t a Parallel MATLAB?’ – Cleve Moler Embarrassingly Parallel Paralize(’98); Multi(’00); PLab(‘00); Parmatlab(‘01); Message Passing MultiMatlab(’96); PT(’96); DPToolbox(‘99); MATmarks(‘99); PMI(’99); MPITB/PVMTB(‘00); CMTM(‘01); Compilation Based Conlab(‘93); Falcon(’95); ParAL(‘95); Otter(‘98); Menhir(’98); MaJIC(’98); MATCH(‘00); RTExpress(’00); Backend Support Matpar(‘98); DLab(‘99); Netsolve(‘01); Paramat(‘01); 11/24/2018
Related Work (Currently Active) Star-P (’97) – MIT MatlabMPI(’98); pMATLAB(’02) – MIT-LL; File-based Message Passing Communication MATLAB_D (’00) – Rice Telescoping Compilation + HPF + JIT Compilation ParaM (’04) – OSU & OSC Mathworks(’04) – MDCE/MDCT 11/24/2018
Conclusions Discussed an efficient Distributed Shared Memory Toolbox for MATLAB Programming Model and Efficiency features of the toolbox Demonstrated efficiency using NAS Benchmarks Download available upon request 11/24/2018
Questions ? Contact: panugant@cse.ohio-state.edud 11/24/2018
Backup NAS FT – A NAS EP – A Implementation Issues 11/24/2018
Performance Analysis Contd… 11/24/2018
Implementation Issues Different Memory managers Automated Book Keeping Data layout inconsistencies In-Place Operations Data movement between different workspaces Out-of-order and irregular accesses 11/24/2018
11/24/2018