Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov.

Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov

Outline – Parallel Computing through MPI Technologies  Introduction  Overview of MPI  General Implementation  Examples  Application to Physics Problems  Concluding Remarks

Introduction – Need for Parallelism More stars in the sky than there are grains of sands on all the beaches of the world

Introduction – Need for Parallelism It requires approximately 204 billion atoms to encode the human genome sequence Vast number of problems from a wide range of fields have significant computational requirements

Introduction – Aim of Parallelism Attempt to divide a single problem into multiple parts Distribute the segments of said problem amongst various processes or nodes Provide a platform layer to manage data exchange between multiple processes that solve a common problem simultaneously

Introduction – Serial Computation Problem divided into discrete, serial sequence of instructions Each executed individually, on a single CPU

Introduction – Parallel Computation Same problem distributed amongst several processes (program and allocated data)

Introduction – Implementation Main goal is to save time and hence money – Furthermore can solve larger problems – depleted resources – Overcome intrinsic limitations of serial computation – Distributed systems provide redundancy, concurrency and access to non-local resources, e.g. SETI, Facebook, etc 3 methodologies for implementation of parallelism – Physical Architecture – Framework – Algorithm In practice will almost always be combination of above Greatest hurdle is managing distribution of information and data exchange i.e. overhead

Introduction – Top 500 Japan’s K Computer (Kei = 10 quadrillion) Currently fastest supercomputer cluster in the world 8.162 petaflops (~8 x 10 15 calculations per second)

Overview – What is MPI? Message Passing Interface One of many frameworks and technologies for implementing parallelization Library of subroutines (FORTRAN), classes (C/C++) and bindings for python packages that mediate communication (via messages) between single threaded processes, executing independently and in parallel

Overview – What is needed? Common user accounts with same password Administrator / root privileges for all accounts Common directory structure and paths MPICH2 installed on all machines This is combination of MPI-1 and MPI-2 standards CH – Chameleon portability layer provides backward compatibility to existing MPI frameworks

Overview – What is needed? MPICC & MPIF77 – Provide options and special libraries needed to compile and link MPI programs MPIEXEC – Initialize parallel jobs and spawn copies of the executable to all of the processes Each process executes its own copy of code By convention choose root process (rank 0) to serve as master process

General Implementation Hello World - C++

General Implementation Hello World - FORTRAN

General Implementation Hello World - Output

Example - Broadcast Routine Point-to-point (send & recv) and Collective (bcast) library routines are contained in library Source node mediates distribution of data to/from all other nodes

Example - Broadcast Routine Linear Case Apart from root and last nodes, each node receives from and sends to previous and next node respectively Use point-to-point library routines to build custom collective routine MPI_RECV(myProc - 1) MPI_SEND(myProc + 1)

Example - Broadcast Routine Binary Tree Each parent node sends message to two child nodes MPI_SEND(2 * myProc) MPI_SEND(2 * myProc+1) IF( MOD(myProc,2) == 0 ) MPI_RECV( myProc/2 ) ELSE MPI_RECV((myProc-1)/2)

Example – Broadcast Routine Output

Applications to Physics Problems Quadrature – Discretize interval [a,b] into N steps and divide amongst processes: – FOR LOOP (1+myProc to N;incr of numProcs) – E.g. with N = 10 and numProcs = 3 Process: Iteration1, Iteration2,… 0: 1,4,7,10 1: 2,5,8 2: 3,6,9 Finite Difference problems – Similarly divide mesh/grid amongst processes Many applications, limited only by our ingenuity

Closing Remarks In 1970’s, Intel co-founder Gordon Moore, correctly predicted that, ”number of transistors that can be inexpensively placed on an integrated circuit doubles approximately every 2 years” 10-Core Xeon E7 processor family chips are currently commercially available MPI easy to implement and well suited to many independent operations that can be executed simultaneously Only limitations are overhead incurred by inter- process communications, out ingenuity ands strictly sequential segments of program

Acknowledgements and Thanks NRF and South African Department of Science and Technology JINR, University Center Dr. Jacobs and Prof. Lekala Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov Last but not least my fellow colleagues

Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov.

Similar presentations

Presentation on theme: "Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov.

Similar presentations

Presentation on theme: "Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov."— Presentation transcript:

Similar presentations

About project

Feedback