Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 1 of 52 MPI Virtual Topologies Kadin Tseng Scientific Computing and.

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 1 of 52 MPI Virtual Topologies Kadin Tseng Scientific Computing and Visualization Group Boston University

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 2 of 52 Virtual Topologies Frequently, realistic science and engineering problems are inherently two or higher dimensional. For parallel computing, these higher physical dimensions often lead to domain decompositions of comparable number of dimensions in the computational domain. Ordinarily, the process ranks of the decomposition are expressed in linear order (i.e., 0, 1, 2, …). To enable a code to refer to these ranks in multi-dimensional coordinates similar to that which arises in a domain decomposition, the MPI library provides many virtual topology routines. There are two kinds of virtual topologies: Cartesian and general. Only the Cartesian topology routines will be discussed in the following...

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 3 of 52 Example: A 9 x 4 Array Consider case of calculations associated with a 9x4 matrix. The parenthesized number- pairs, (i, j), denote array row and column indexes, respectively. Assume six processes available for parallel computation. One domain decomposition leads to six 3x2 submatrices as shown. Bottom number in each cell denotes process rank to which array element (i, j) belongs. i j (0,0) 0 (0,0) 0 (0,1) 0 (0,2) 1 (0,3) 1 (1,0) 0 (1,1) 0 (1,2) 1 (1,3) 1 (2,0) 0 (2,1) 0 (2,2) 1 (2,3) 1 (6,0) 4 (6,1) 4 (6,2) 5 (6,3) 5 (7,0) 4 (7,1) 4 (7,2) 5 (7,3) 5 (8,0) 4 (8,1) 4 (8,2) 5 (8,3) 5 (3,0) 2 (3,1) 2 (3,2) 3 (3,3) 3 (4,0) 2 (4,1) 2 (4,2) 3 (4,3) 3 (5,0) 2 (5,1) 2 (2,2) 3 (2,3) 3

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 4 of 52 Domain Decomposition Domain decomposition yields linear order representation of process topology. Number in each cell denotes process, or thread, number. Each cell represents a 3x2 array out of the 9x4 array. Work within each cell is performed by a single process. 0 1 2 3 4 5

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 5 of 52 2D Cartesian Topology Can map linear rank order into a 2D Cartesian topology (i,j) via MPI function call. Example: linear rank 3 can be addressed by 2D Cartesian coordinates, (1,1). Each cell represents a 3x2 matrix block whose work will be performed by the indicated process rank. MPI range index starts from 0. MPI ranks follows row-major convention. (0,0) 0 (0,1) 1 (1,0) 2 (1,1) 3 (2,0) 4 (2,1) 5

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 6 of 52 Cartesian Topology Solver Example Laplace Equation Laplace Equation Discretized Computational Domain –Five-point finite-difference stencil Jacobi Scheme Parallel Jacobi Scheme –Unknowns At Border Cells –Message Passing For Boundary Cells –For Individual Threads... –MPI Functions Needed For Job Successive Over Relaxation (SOR) Scheme Parallel SOR Scheme Scalability

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 7 of 52 Laplace Equation Laplace Equation: Boundary Conditions: (1) (2) Analytical solution: (3)

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 8 of 52 Laplace Equation Discretized Discretize Equation (1) by centered-difference yields: where n and n+1 denote current and next time step, respectively, while For simplicity, we take (4) (5)

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 9 of 52 Computational Domain x, i y, j

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 10 of 52 Five-point Finite-Difference Stencil Interior (or solution) cells Where solution of the Laplace equation is sought. (i, j) Exterior (or boundary) cells Blue cells denote cells where non- homogeneous boundary conditions are imposed while homogeneous boundary conditions are shown as green cells. x x x x o x x x x o

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 11 of 52 Jacobi Scheme 1.Make initial guess for u at all interior points (i,j) at time n. 2.Use 5-pt stencil to compute at all interior points (i,j). 3.Stop if prescribed convergence threshold is reached, otherwise continue on to the next step. 4.Update: for all interior i and j. 5.Go to step 2. This is a simple iterative scheme that lends itself as an intuitive instructional procedure. Slowness in convergence renders it impractical for real applications.

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 12 of 52 Solution Contour Plot

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 13 of 52 Jacobi Iterative Solver – F77 PROGRAM Jacobi ! Solve Laplace equation using Jacobi iteration method IMPLICIT NONE INTEGER m, mp, mp2, iter, i, j, K, P REAL*8 TOL, gdel, PI REAL start_time, end_time CHARACTER*40 header, outfile INTEGER MAX_M, MAXSTEPS, INCREMENT PARAMETER (P=1, K=0, TOL=1.d-6, PI=3.14159265) PARAMETER (MAX_M=512, MAXSTEPS=50000, INCREMENT=100) REAL*8 u((MAX_M+2)*(MAX_M+2)) REAL*8 unew((MAX_M+2)*(MAX_M+2)) m = MAX_M mp = m/P gdel = 1.0d0 iter = 0

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 14 of 52 Jacobi... F77 (continued) call CPUtime(start_time) ! start timer, measured in seconds CALL bc(m, mp, u, K, P) ! initialize and define B.C. for u DO WHILE (gdel.gt. TOL) ! iterate until error below threshold iter = iter + 1 ! increment iteration counter IF(iter.gt. MAXSTEPS) THEN header = 'Number of iterations exceeds limit‘ CALL error(header,-1) ENDIF CALL update_jacobi(m, mp, u, unew, gdel) IF(MOD(iter,INCREMENT).eq. 0) THEN WRITE(*,"('iter,gdel:',i6,e12.4)")iter,gdel ENDIF ENDDO call CPUtime(end_time) ! stop timer

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 15 of 52 Jacobi... F77 (continued) WRITE(*,"(40('#'))") WRITE(*,"(' Wall clock time =',f10.2,' x ',i3)") & end_time - start_time,p WRITE(*,"(' Stopped at iteration =',i5)") iter WRITE(*,"(' The maximum error =',f12.6)") gdel WRITE(*,"(40('#'))") outfile = 'plots‘ OPEN(unit=40,file=outfile,form='formatted',status='unknown') WRITE(40,"(3i5)")m+2,mp+2,P close(40) WRITE(outfile, "('plots.',i1)")K OPEN(unit=41,file=outfile,form='formatted‘,status='unknown') WRITE(41,"(6e13.4)")(u(i),i=1,(m+2)*(mp+2)) close(41) END

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 16 of 52 bc.f SUBROUTINE bc(m, n, v, k, p) IMPLICIT NONE INTEGER p, m, n, k, i CHARACTER*40 header REAL*8 v(0:m+1, 0:n+1), PI PARAMETER (PI = ACOS(-1.0d0)) CALL init_array(m, n, v) ! Init. v IF (p.eq. 1) THEN ! serial DO i=0,m+1 v(i,0) = sin(PI*i/(m+1)) ! Y=0 v(i,n+1) = v(i,0)*exp(-PI) ! Y=1 ENDDO ELSE IF (p.gt. 1) THEN ! multithreads IF (k.eq. 0) THEN ! 1st thread DO i=0,m+1 v(i,0) = sin(PI*i/(m+1)) ! at y=0 ENDDO ENDIF IF (k.eq. p-1) THEN ! last thread DO i=0,m+1 ! at y=1 v(i,n+1) = sin(PI*i/(m+1))*exp(-PI) ENDDO ENDIF ELSE ! Invalid p header = 'p invalid in subroutine BC‘ CALL error(header,-1) ENDIF END

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 17 of 52 update_jacobi.f SUBROUTINE update_jacobi(m, n, v, vnew, del) IMPLICIT NONE INTEGER i, j, m, n REAL*8 v(0:m+1,0:n+1), vnew(0:m+1,0:n+1), del del = 0.0d0 DO j=1,n DO i=1,m vnew(i,j) = ( v(i,j+1) + v(i+1,j) + v(i-1,j) + v(i,j-1) )*0.25 del = del + DABS(vnew(i,j)-v(i,j)) ! local error sum ENDDO DO j=1,n DO i=1,m v(i,j) = vnew(i,j) ! update v ENDDO END

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 18 of 52 Jacobi Iterative Solver – C #include "solvers.h" INT main() { /********** MAIN PROGRAM ****************************** * Solve Laplace equation using Jacobi iterative method * * Kadin Tseng, Boston University, August, 2000 * *********************************************************/ INT iter, m, mp; REAL gdel, **u, **un; CHAR line[10]; fprintf(OUTPUT,"Enter size of interior points, m :"); (void) fgets(line, sizeof(line), stdin); (void) sscanf(line, "%d", &m); fprintf(OUTPUT,"m = %d\n",m); mp = m/P; u = allocate_2D(m, mp); /* allocate mem for 2D array */ un = allocate_2D(m, mp);

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 19 of 52 Jacobi … Solver – C (Cont’d) gdel = 1.0; /* initialize global error to 1 */ iter = 0; /* initialize iteration counter */ bc(m, mp, u, K, P); /* initialize and define B.C. for u */ replicate(m, mp, u, un); /* copy u to un */ while (gdel > TOL) { /* iterate until error below threshold */ iter++; /* increment iteration counter */ if(iter > MAXSTEPS) { fprintf(OUTPUT,"Iteration terminated (exceeds %6d", MAXSTEPS); fprintf(OUTPUT," )\n"); return (1); /* nonconvergent solution */ } update_jacobi(m, mp, u, un, &gdel); /* update u by Jacobi scheme */ if(iter%INCREMENT == 0) { fprintf(OUTPUT,"iter,gdel: %6d, %lf\n",iter,gdel); }

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 20 of 52 Jacobi … Solver – C (Cont’d) fprintf(OUTPUT,"Stopped at iteration %d\n",iter); fprintf(OUTPUT,"The maximum error = %f\n",gdel); /* write u to file for use in MATLAB plots */ write_file( m, mp, u, K, P ); return (0); }

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 21 of 52 bc.c void bc(INT m, INT n, REAL **u, INT k, INT p) { /*********** Boundary Conditions***************************** * PDE: Laplacian u = 0; 0<=x<=1; 0<=y<=1 * * B.C.: u(x,0)=sin(pi*x); u(x,1)=sin(pi*x)*exp(-pi); u(0,y)=u(1,y)=0 * * SOLUTION: u(x,y)=sin(pi*x)*exp(-pi*y) * ************************************************************/ INT i; init_array( m, n, u); /* initialize u to 0 */ if (p == 1) { for (i = 0; i <= m+1; i++) { u[i][ 0] = sin(PI*i/(m+1)); /* at y = 0; all x */ u[i][n+1] = u[i][0]*exp(-PI); /* at y = 1; all x */ } } else if (p < 1) { printf("p is invalid\n"); }

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 22 of 52 bc.c (continued) if (p > 1) { /* parallel processing */ if (k == 0) { /* the 1 st process */ for (i = 0; i <= m+1; i++) { u[i][0] = sin(PI*i/(m+1)); /* at y = 0; all x */ } if (k == p-1) { /* the last process */ for (i = 0; i <=m+1; i++) { u[i][n+1] = sin(PI*i/(m+1))*exp(-PI); /* at y = 1; all x */ }

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 23 of 52 update_jacobi.c INT update_jacobi( INT m, INT n, REAL **u, REAL **unew, REAL *del) { INT i, j; *del = 0.0; for (i = 1; i <= m; i++) { for (j = 1; j <= n; j++) { unew[i][j] = ( u[i ][j+1] + u[i+1][j ] + u[i-1][j ] + u[i ][j-1] )*0.25; *del += fabs(unew[i][j] - u[i][j]); /* find local max error */ } for (i = 1; i <= m; i++) { for (j = 1; j <= n; j++) { u[i][j] = unew[i][j]; } return (0); }

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 24 of 52 Domain Decompositions 1D Domain Decomposition2D Domain Decomposition Thread 0 Thread 1 Thread 2Thread 3 Thread 0 Thread 2 Thread 3 Thread 1

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 25 of 52 Unknowns At Border Cells – 1D thread 2 thread 1 thread 0 Five-point finite-difference stencil applied at thread domain border cells require cells from neighboring threads and/or boundary cells. Message passing required x x x x o x x x x o x x x x o

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 26 of 52 Message Passing to Fill Boundary Cells thread k+1 thread k thread k-1 current thread

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 27 of 52 For Individual Threads... thread k of p threads Recast 5-pt finite-difference stencil for individual threads Boundary Conditions For simplicity, assume m divisible by p B.C. time-dependent B.C. obtained by message-passing Additional boundary conditions on next page

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 28 of 52 Relationship Between u and v Physical boundary conditions Relationship between global solution u and thread-local solution v

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 29 of 52 MPI Functions Used For Job MPI_Sendrecv ( = MPI_Send + MPI_Recv) – to set boundary conditions for individual threads MPI_Allreduce – to search for global error to determine whether convergence has been reached. MPI_Cart_Create – to create Cartesian topology MPI_Cart_Coords – to find equivalent Cartesian coordinates of given rank MPI_Cart_Rank – to find equivalent rank of Cartesian coordinates MPI_Cart_shift – to find current thread’s adjoining neighbor threads

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 30 of 52 Successive Over Relaxation 1.Make initial guess for u at all interior points (i,j). 2.Define a scalar 3.Use 5-pt stencil to compute at all interior points (i,j). 4.Compute 5.Stop if prescribed convergence threshold is reached. 6.Update: 7.Go to step 2. In Step 3, compute u' with u at time n+1 wherever possible to accelerate convergence. This inhibits parallelism.

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 31 of 52 Red-Black SOR Scheme X X O x x x x o x x x x o x x x x o 1.Compute v at black cells at time n+1 in parallel using v at red cells at time n. 2.Compute v at red cells at time n+1 in parallel with v at black cells at time n+1. 3.Repeat steps 1 and 2 until v fully converged. Can alternate order of steps 1 and 2. Note that, by virtue of 5-pt stencil, solution at black cells depend on 4 adjacent red cells. Conversely, red solution cells depend only on 4 respective adjoining black cells. This suggests the following parallel algorithm:

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 32 of 52 Parallel SOR F77 Program PROGRAM sor_parallel_f77 IMPLICIT NONE INTEGER m, mp, iter, p, i, j, k REAL start_time, end_time CHARACTER*40 header, outfile REAL*8 TOL, omega, rhoj, rhojsq, PI, del, delr, delb, gdel INTEGER grid_comm, me, iv, coord, dims, periods, ndim, ierr INTEGER below, above LOGICAL reorder INTEGER MAX_M, MAXSTEP, INCREMENT PARAMETER (MAX_M = 512, MAXSTEP = 50000, INCREMENT=100) PARAMETER ( TOL =1.d-6, PI = 3.14159265) PARAMETER ( periods=0, ndim=1, reorder =.false.) REAL*8 v((MAX_M+1+2)*(MAX_M+2)) INCLUDE 'mpif.h' outfile = ‘plots’

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 33 of 52 Parallel SOR F77 Program (cont’d) CALL MPI_Init(ierr) ! starts MPI CALL MPI_Comm_rank(MPI_COMM_WORLD, k, ierr) CALL MPI_Comm_size(MPI_COMM_WORLD, p, ierr) start_time = MPI_Wtime() ! Start timer, in seconds m = MAX_M mp = m/p ! m’ gdel = 1.0d0 ! Initialize global error to 1 iter = 0 ! Initialize iteration counter rhoj = 1.0d0 - PI*PI*0.5/(m+2)/(m+2) ! Chebyshev coefficients rhojsq = rhoj*rhoj

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 34 of 52 Parallel SOR F77 Program (cont’d) ! create cartesian topology for matrix dims = p CALL MPI_Cart_create(MPI_COMM_WORLD, ndim, dims, & periods, reorder, grid_comm, ierr) CALL MPI_Comm_rank(grid_comm, me, ierr) CALL MPI_Cart_coords(grid_comm, me, ndim, coord, ierr) iv = coord CALL bc(m, mp, v, iv, p) ! set up boundary conditions CALL MPI_Cart_shift(grid_comm, 0, 1, below, above, ierr) omega = 1.0d0 CALL update_sor( m, mp, v, omega, delr, 'r') CALL update_bc_2( m, mp, v, iv, below, above) omega = 1.0d0/(1.0d0 - 0.50d0*rhojsq) CALL update_sor( m, mp, v, omega, delb, 'b') CALL update_bc_2( m, mp, v, iv, below, above)

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 35 of 52 Parallel SOR F77 Program (cont’d) DO WHILE (gdel.gt. TOL) ! Loop until gdel <=TOL iter = iter + 1 ! increment iteration counter omega = 1.0d0/(1.0d0 - 0.25d0*rhojsq*omega) CALL update_sor( m, mp, v, omega, delr, 'r') CALL update_bc_2( m, mp, v, iv, below, above) omega = 1.0d0/(1.0d0 - 0.25d0*rhojsq*omega) CALL update_sor( m, mp, v, omega, delb, 'b') CALL update_bc_2( m, mp, v, iv, below, above) IF(MOD(iter,INCREMENT).eq. 0) THEN ! Check error once awhile del = (delr + delb)*4.d0 CALL MPI_Allreduce( del, gdel, 1, MPI_DOUBLE_PRECISION, & MPI_MAX, MPI_COMM_WORLD, ierr ) ! find global max error IF ( iter.ge. MAXSTEP) THEN header = 'Number of iterations exceeded limit' CALL error(header,-1) ENDIF IF(k.eq. 0) WRITE(*,'(i5,3d13.5)')iter,del,gdel,omega ENDIF ENDDO

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 36 of 52 Parallel SOR F77 Program (cont’d) end_time = MPI_Wtime() ! stop timer IF (k.eq. 0) THEN WRITE(*,"(40('#'))") WRITE(*,"(' Wall clock time =',f10.2,' x ',i3)") & end_time - start_time,p WRITE(*,"(' Stopped at iteration ',i5)")iter WRITE(*,"(' The maximum error =',f12.6)")gdel WRITE(*,"(40('#'))") OPEN(unit=40, file=outfile, form='formatted', status='unknown') WRITE(40,"(3i5)")m+2,mp+2,p close(40) ENDIF WRITE(outfile, "('plots.',i1)")k OPEN(unit=41, file=outfile, form='formatted', status='unknown') WRITE(41,"(6e13.4)")(v(i),i=1,(m+2)*(mp+2)) close(41) CALL MPI_Finalize(ierr) END ! PROGRAM sor_parallel_f77

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 37 of 52 update_sor.f SUBROUTINE update_sor(m, n, u, omega, del, redblack) !*********************************************************** !* Updates Laplace solution u base on red-black iterative scheme * !* m - (INPUT) 1st dimension size * !* n - (INPUT) 2nd dimension size * !* u - (INPUT) solution array; u = u(0:m+1,0:n+1) * !* omega - (INPUT) iterative parameter * !* del - (OUTPUT) error between two iterative u * !* redblack - (INPUT) which to update; 'r' for red * !*********************************************************** IMPLICIT NONE CHARACTER redblack INTEGER m, n, i, ib, ie, j, jb, je REAL*8 omega, del, up REAL*8 u(0:m+1,0:n+1) del = 0.0d0

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 38 of 52 update_sor.f (cont’d) IF (redblack.eq. 'r') THEN ! process RED odd points... jb = 1 je = n ib = 1 ie = m DO j=jb,je,2 DO i=ib,ie,2 up = ( u(i,j+1) + u(i+1,j ) + & u(i-1,j ) + u(i,j-1) )*0.25d0 u(i,j) = (1.0d0 - omega)*u(i,j) + & omega*up del = del + dabs(up-u(i,j)) ENDDO ! process RED even points... jb = 2 je = n ib = 2 ie = m DO j=jb,je,2 DO i=ib,ie,2 up = ( u(i,j+1) + u(i+1,j ) + & u(i-1,j ) + u(i,j-1) )*0.25d0 u(i,j) = (1.0d0 - omega)*u(i,j) + & omega*up del = del + dabs(up-u(i,j)) ENDDO

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 39 of 52 update_sor.f (cont’d) ELSE IF ( redblack.eq. ‘b’) THEN ! process BLACK odd points... jb = 2 je = n ib = 1 ie = m DO j=jb,je,2 DO i=ib,ie,2 up = ( u(i,j+1) + u(i+1,j ) + & u(i-1,j ) + u(i,j-1) )*0.25d0 u(i,j) = (1.0d0 - omega)*u(i,j) + & omega*up del = del + dabs(up-u(i,j)) ENDDO ! process BLACK even points... jb = 1 je = n ib = 2 ie = m DO j=jb,je,2 DO i=ib,ie,2 up = ( u(i,j+1) + u(i+1,j ) + & u(i-1,j ) + u(i,j-1) )*0.25d0 u(i,j) = (1.0d0 - omega)*u(i,j) + & omega*up del = del + dabs(up-u(i,j)) ENDDO ELSE STOP ENDIF END

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 40 of 52 Subroutine update_bc_2 SUBROUTINE update_bc_2( m, mp, v, k, below, above ) IMPLICIT NONE INCLUDE 'mpif.h' INTEGER m, mp, k, below, above, ierr REAL*8 v(0:m+1,0:mp+1) INTEGER status(MPI_STATUS_SIZE) CALL MPI_Sendrecv( & v(1,mp), m, MPI_DOUBLE_PRECISION, above, 0, & v(1, 0), m, MPI_DOUBLE_PRECISION, below, 0, & MPI_COMM_WORLD, status, ierr ) CALL MPI_Sendrecv( & v(1, 1), m, MPI_DOUBLE_PRECISION, below, 1, & v(1,mp+1), m, MPI_DOUBLE_PRECISION, above, 1, & MPI_COMM_WORLD, status, ierr ) RETURN END ! SUBROUTINE update_bc_2

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 41 of 52 Parallel SOR C Program #include "solvers.h" #include "mpi.h" INT main(INT argc, CHAR *argv[]) { /***************MAIN PROGRAM **************************** * Solve Laplace equation using Successive Over Relaxation * * and Chebyshev Acceleration (see Numerical Recipe for detail) * * Kadin Tseng, Boston University, August, 2000 * ***********************************************************/ INT iter, m, mp, p, k, below, above; REAL omega, rhoj, rhojsq, del, delr, delb, gdel; CHAR line[80], red, black; MPI_Comm grid_comm; INT me, iv, coord[1], dims, periods, ndim, reorder; REAL **v, **vt; MPI_Init(&argc, &argv); /* starts MPI */ MPI_Comm_rank(MPI_COMM_WORLD, &k); /* get current process id */ MPI_Comm_size(MPI_COMM_WORLD, &p); /* get # procs from env or */

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 42 of 52 Parallel SOR C Program (cont’d) periods = 0; ndim = 1; reorder = 0; red = 'r'; black = 'b'; if(k == 0) { fprintf(OUTPUT,"Enter size of interior points, m :\n"); (void) fgets(line, sizeof(line), stdin); (void) sscanf(line, "%d", &m); fprintf(OUTPUT,"m = %d\n",m); } MPI_Bcast(&m, 1, MPI_INT, 0, MPI_COMM_WORLD); mp = m/p; v = allocate_2D(m, mp); /* allocate mem for 2D array */ vt = allocate_2D(mp, m); /* allocate mem for 2D array */ gdel = 1.0; /* initialize global error to 1 */ iter = 0; /* initialize iteration counter */ rhoj = 1.0 - PI*PI*0.5/(m+2)/(m+2); /* Chebyshev coefficient */ rhojsq = rhoj*rhoj; /* create cartesian topology for matrix */ dims = p; /* cartesian coordinates has same size of procs */

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 43 of 52 Parallel SOR C Program (cont’d) MPI_Cart_create(MPI_COMM_WORLD, ndim, &dims, &periods, reorder, &grid_comm); MPI_Comm_rank(grid_comm, &me); /* current proc */ MPI_Cart_coords(grid_comm, me, ndim, coord); /* current proc coord. */ iv = coord[0]; bc( m, mp, v, iv, p); /* set up boundary conditions */ transpose(m, mp, v, vt); /* transpose v into vt */ replicate(mp, m, vt, v); MPI_Cart_shift(grid_comm, 0, 1, &below, &above); omega = 1.0; update_sor( mp, m, vt, omega, &delr, red); update_bc_2( mp, m, vt, iv, below, above); omega = 1.0/(1.0 - 0.50*rhojsq); update_sor( mp, m, vt, omega, &delb, black); update_bc_2( mp, m, vt, iv, below, above);

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 44 of 52 Parallel SOR C Program (cont’d) while (gdel > TOL) { iter++; /* increment iteration counter */ omega = 1.0/(1.0 - 0.25*rhojsq*omega); update_sor( mp, m, vt, omega, &delr, red); update_bc_2( mp, m, vt, iv, below, above); omega = 1.0/(1.0 - 0.25*rhojsq*omega); update_sor( mp, m, vt, omega, &delb, black); update_bc_2( mp, m, vt, iv, below, above); if(iter%INCREMENT == 0) { del = (delr + delb)*4.0; MPI_Allreduce( &del, &gdel, 1, MPI_DOUBLE, MPI_MAX, MPI_COMM_WORLD); /* find global max error */ if (k == 0) { fprintf(OUTPUT,"iter gdel omega: %5d %13.5f %13.5f\n",iter,gdel,omega); } } if(iter > MAXSTEPS) { fprintf(OUTPUT,"Iteration terminated (exceeds %6d", MAXSTEPS); fprintf(OUTPUT," )\n"); return (1); /* nonconvergent solution */ } }

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 45 of 52 Parallel SOR C Program (cont’d) if (k == 0) { fprintf(OUTPUT,"Stopped at iteration %d\n",iter); fprintf(OUTPUT,"The maximum error = %f\n",gdel); } /* write v to file for use in MATLAB plots */ transpose(mp, m, vt, v); /* transpose v into vt */ write_file( m, mp, v, k, p); MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); return (0); }

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 46 of 52 Parallel SOR C Program (cont’d) INT update_sor( INT m, INT n, REAL **u, REAL omega, REAL *del, CHAR redblack) { /*********************************************************** * Updates u according to successive over relaxation method * * m - (INPUT) size of interior rows * * n - (INPUT) size of interior columns * * u - (INPUT) array * * omega - (INPUT) adjustable constant used to speed up * * convergence of SOR * * del - (OUTPUT) error norm between 2 solution steps * * redblack - (INPUT) either 'r' for red and 'b' for black * ************************************************************/ INT i, ib, ie, j, jb, je; REAL up; *del = 0.0;

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 47 of 52 Function update_bc_2 #include "solvers.h" #include "mpi.h" INT update_bc_2( INT mp, INT m, REAL **vt, INT k, INT below, INT above ) { MPI_Status status; MPI_Sendrecv( vt[mp]+1, m, MPI_DOUBLE, above, 0, vt[0 ]+1, m, MPI_DOUBLE, below, 0, MPI_COMM_WORLD, status ); MPI_Sendrecv( vt[1 ]+1, m, MPI_DOUBLE, below, 1, vt[mp+1]+1, m, MPI_DOUBLE, above, 1, MPI_COMM_WORLD, status ); return (0); }

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 48 of 52 Parallel SOR C Program (cont’d) if (redblack == 'r') { /* process RED odd points... */ jb = 1; je = n; ib = 1; ie = m; for ( j = jb; j <= je; j+=2 ) { for ( i = ib; i <=ie; i+=2 ) { up = ( u[i ][j+1] + u[i+1][j ] + u[i-1][j ] + u[i ][j-1] )*0.25; u[i][j] = (1.0 - omega)*u[i][j] + omega*up; *del += fabs(up-u[i][j]); } /* process RED even points... */ jb = 2; je = n; ib = 2; ie = m; for ( j = jb; j <= je; j+=2 ) { for ( i = ib; i <=ie; i+=2 ) { up = ( u[i ][j+1] + u[i+1][j ] + u[i-1][j ] + u[i ][j-1])*0.25; u[i][j] = (1.0 - omega)*u[i][j] + omega*up; *del += fabs(up-u[i][j]); }

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 49 of 52 Parallel SOR C Program (cont’d) } else { if (redblack == ‘b') { /* process BLACK odd points... */ jb = 2; je = n; ib = 1; ie = m; for ( j = jb; j <= je; j+=2 ) { for ( i = ib; i <=ie; i+=2 ) { up = ( u[i ][j+1] + u[i+1][j ] + u[i-1][j ] + u[i ][j-1] )*0.25; u[i][j] = (1.0 - omega)*u[i][j] + omega*up; *del += fabs(up-u[i][j]); } /* process BLACK even points... */ jb = 1; je = n; ib = 2; ie = m; for ( j = jb; j <= je; j+=2 ) { for ( i = ib; i <=ie; i+=2 ) { up = ( u[i ][j+1] + u[i+1][j ] + u[i-1][j ] + u[i ][j-1])*0.25; u[i][j] = (1.0 - omega)*u[i][j] + omega*up; *del += fabs(up-u[i][j]); } return (0); } else { return (1); } } }

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 50 of 52 SOR Timings On Origin2000

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 51 of 52 Timings for an Iterative Solver ProcessorsSGI Origin2000 IBM Regatta Intel Pentium 3 Linux cluster 148.3512.9143.94 223.635.3218.75 412.02.474.47 86.541.131.95 163.800.601.20

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 52 of 52 Timings … (continued)

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 1 of 52 MPI Virtual Topologies Kadin Tseng Scientific Computing and.

Similar presentations

Presentation on theme: "Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 1 of 52 MPI Virtual Topologies Kadin Tseng Scientific Computing and."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 1 of 52 MPI Virtual Topologies Kadin Tseng Scientific Computing and.

Similar presentations

Presentation on theme: "Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 1 of 52 MPI Virtual Topologies Kadin Tseng Scientific Computing and."— Presentation transcript:

Similar presentations

About project

Feedback