Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 1 of 52 MPI Virtual Topologies Kadin Tseng Scientific Computing and.

Slides:



Advertisements
Similar presentations
Numeric Types & Ranges. ASCII Integral Type Numerical Inaccuracies Representational error – Round-off error – Caused by coding a real number as a finite.
Advertisements

Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Introduction to arrays
Algebraic, transcendental (i.e., involving trigonometric and exponential functions), ordinary differential equations, or partial differential equations...
Chapter 3. MPI MPI = Message Passing Interface Specification of message passing libraries for developers and users –Not a library by itself, but specifies.
1 Parallel Programming with MPI- Day 4 Science & Technology Support High Performance Computing Ohio Supercomputer Center 1224 Kinnear Road Columbus, OH.
Virtual Topologies. Introduction Many computational science and engineering problems reduce at the end to either a series of matrix or some form of grid.
Introduction to MPI Programming (Part III)‏ Michael Griffiths, Deniz Savas & Alan Real January 2006.
Virtual Topologies Self Test with solution. Self Test 1.When using MPI_Cart_create, if the cartesian grid size is smaller than processes available in.
Reference: / MPI Program Structure.
MPI Program Structure Self Test with solution. Self Test 1.How would you modify "Hello World" so that only even-numbered processors print the greeting.
1 Friday, October 20, 2006 “Work expands to fill the time available for its completion.” -Parkinson’s 1st Law.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
1 Systems of Linear Equations Iterative Methods. 2 B. Iterative Methods 1.Jacobi method and Gauss Seidel 2.Relaxation method for iterative methods.
Chapter 6 Floyd’s Algorithm. 2 Chapter Objectives Creating 2-D arrays Thinking about “grain size” Introducing point-to-point communications Reading and.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 8 Matrix-vector Multiplication.
S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Message Passing Interface (MPI) Part I NPACI Parallel.
Comp 422: Parallel Programming Lecture 8: Message Passing (MPI)
Arrays Data Structures - structured data are data organized to show the relationship among the individual elements. It usually requires a collecting mechanism.
Lecture 8 Objectives Material from Chapter 9 More complete introduction of MPI functions Show how to implement manager-worker programs Parallel Algorithms.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Chapter 9 Introduction to Arrays
Parallel Programming Using Basic MPI Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy H. Kaiser, Ph.D. San Diego.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 9 October 30, 2002 Nayda G. Santiago.
CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.
Chapter 7: Arrays. In this chapter, you will learn about: One-dimensional arrays Array initialization Declaring and processing two-dimensional arrays.
Lecture 8 – Stencil Pattern Stencil Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science.
Multi-Dimensional Arrays
Array Processing Simple Program Design Third Edition A Step-by-Step Approach 7.
ECE 1747H : Parallel Programming Message Passing (MPI)
A.Abhari CPS1251 Multidimensional Arrays Multidimensional array is the array with two or more dimensions. For example: char box [3] [3] defines a two-dimensional.
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
Vectors and Matrices In MATLAB a vector can be defined as row vector or as a column vector. A vector of length n can be visualized as matrix of size 1xn.
Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Collectives 1 of 10 MPI Collective Communication Kadin Tseng Scientific Computing and Visualization.
C++ Programming: From Problem Analysis to Program Design, Fifth Edition Arrays.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
Message Passing Interface (MPI) 1 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.
Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
CSCI-455/552 Introduction to High Performance Computing Lecture 11.5.
Chapter 4 Message-Passing Programming. The Message-Passing Model.
Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.
Introduction to MPI CDP 1. Shared Memory vs. Message Passing Shared Memory Implicit communication via memory operations (load/store/lock) Global address.
CS 484. Iterative Methods n Gaussian elimination is considered to be a direct method to solve a system. n An indirect method produces a sequence of values.
 In computer programming, a loop is a sequence of instruction s that is continually repeated until a certain condition is reached.  PHP Loops :  In.
MPI Workshop - III Research Staff Cartesian Topologies in MPI and Passing Structures in MPI Week 3 of 3.
 2008 Pearson Education, Inc. All rights reserved. 1 Arrays and Vectors.
UNIT 10 Multidimensional Arrays.
Timing in MPI Tarik Booker MPI Presentation May 7, 2003.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
April 24, 2002 Parallel Port Example. April 24, 2002 Introduction The objective of this lecture is to go over a simple problem that illustrates the use.
Parallel Algorithms & Implementations: Data-Parallelism, Asynchronous Communication and Master/Worker Paradigm FDI 2007 Track Q Day 2 – Morning Session.
MPI Groups, Communicators and Topologies. Groups and communicators In our case studies, we saw examples where collective communication needed to be performed.
Message Passing Interface Using resources from
Research Staff Passing Structures in MPI Week 3 of 3
Introduction to parallel computing concepts and technics
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Message Passing Interface (cont.) Topologies.
Introduction to MPI.
Parallel Programming in C with MPI and OpenMP
MPI Groups, Communicators and Topologies
CSCE569 Parallel Computing
Lecture 14: Inter-process Communication
Parallel Programming with MPI- Day 4
May 19 Lecture Outline Introduce MPI functionality
Introduction to parallelism and the Message Passing Interface
topic16_cylinder_flow_relaxation
Vectors and Matrices In MATLAB a vector can be defined as row vector or as a column vector. A vector of length n can be visualized as matrix of size 1xn.
Jacobi Project Salvatore Orlando.
Introduction to High Performance Computing Lecture 16
Presentation transcript:

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 1 of 52 MPI Virtual Topologies Kadin Tseng Scientific Computing and Visualization Group Boston University

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 2 of 52 Virtual Topologies Frequently, realistic science and engineering problems are inherently two or higher dimensional. For parallel computing, these higher physical dimensions often lead to domain decompositions of comparable number of dimensions in the computational domain. Ordinarily, the process ranks of the decomposition are expressed in linear order (i.e., 0, 1, 2, …). To enable a code to refer to these ranks in multi-dimensional coordinates similar to that which arises in a domain decomposition, the MPI library provides many virtual topology routines. There are two kinds of virtual topologies: Cartesian and general. Only the Cartesian topology routines will be discussed in the following...

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 3 of 52 Example: A 9 x 4 Array Consider case of calculations associated with a 9x4 matrix. The parenthesized number- pairs, (i, j), denote array row and column indexes, respectively. Assume six processes available for parallel computation. One domain decomposition leads to six 3x2 submatrices as shown. Bottom number in each cell denotes process rank to which array element (i, j) belongs. i j (0,0) 0 (0,0) 0 (0,1) 0 (0,2) 1 (0,3) 1 (1,0) 0 (1,1) 0 (1,2) 1 (1,3) 1 (2,0) 0 (2,1) 0 (2,2) 1 (2,3) 1 (6,0) 4 (6,1) 4 (6,2) 5 (6,3) 5 (7,0) 4 (7,1) 4 (7,2) 5 (7,3) 5 (8,0) 4 (8,1) 4 (8,2) 5 (8,3) 5 (3,0) 2 (3,1) 2 (3,2) 3 (3,3) 3 (4,0) 2 (4,1) 2 (4,2) 3 (4,3) 3 (5,0) 2 (5,1) 2 (2,2) 3 (2,3) 3

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 4 of 52 Domain Decomposition Domain decomposition yields linear order representation of process topology. Number in each cell denotes process, or thread, number. Each cell represents a 3x2 array out of the 9x4 array. Work within each cell is performed by a single process

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 5 of 52 2D Cartesian Topology Can map linear rank order into a 2D Cartesian topology (i,j) via MPI function call. Example: linear rank 3 can be addressed by 2D Cartesian coordinates, (1,1). Each cell represents a 3x2 matrix block whose work will be performed by the indicated process rank. MPI range index starts from 0. MPI ranks follows row-major convention. (0,0) 0 (0,1) 1 (1,0) 2 (1,1) 3 (2,0) 4 (2,1) 5

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 6 of 52 Cartesian Topology Solver Example Laplace Equation Laplace Equation Discretized Computational Domain –Five-point finite-difference stencil Jacobi Scheme Parallel Jacobi Scheme –Unknowns At Border Cells –Message Passing For Boundary Cells –For Individual Threads... –MPI Functions Needed For Job Successive Over Relaxation (SOR) Scheme Parallel SOR Scheme Scalability

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 7 of 52 Laplace Equation Laplace Equation: Boundary Conditions: (1) (2) Analytical solution: (3)

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 8 of 52 Laplace Equation Discretized Discretize Equation (1) by centered-difference yields: where n and n+1 denote current and next time step, respectively, while For simplicity, we take (4) (5)

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 9 of 52 Computational Domain x, i y, j

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 10 of 52 Five-point Finite-Difference Stencil Interior (or solution) cells Where solution of the Laplace equation is sought. (i, j) Exterior (or boundary) cells Blue cells denote cells where non- homogeneous boundary conditions are imposed while homogeneous boundary conditions are shown as green cells. x x x x o x x x x o

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 11 of 52 Jacobi Scheme 1.Make initial guess for u at all interior points (i,j) at time n. 2.Use 5-pt stencil to compute at all interior points (i,j). 3.Stop if prescribed convergence threshold is reached, otherwise continue on to the next step. 4.Update: for all interior i and j. 5.Go to step 2. This is a simple iterative scheme that lends itself as an intuitive instructional procedure. Slowness in convergence renders it impractical for real applications.

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 12 of 52 Solution Contour Plot

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 13 of 52 Jacobi Iterative Solver – F77 PROGRAM Jacobi ! Solve Laplace equation using Jacobi iteration method IMPLICIT NONE INTEGER m, mp, mp2, iter, i, j, K, P REAL*8 TOL, gdel, PI REAL start_time, end_time CHARACTER*40 header, outfile INTEGER MAX_M, MAXSTEPS, INCREMENT PARAMETER (P=1, K=0, TOL=1.d-6, PI= ) PARAMETER (MAX_M=512, MAXSTEPS=50000, INCREMENT=100) REAL*8 u((MAX_M+2)*(MAX_M+2)) REAL*8 unew((MAX_M+2)*(MAX_M+2)) m = MAX_M mp = m/P gdel = 1.0d0 iter = 0

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 14 of 52 Jacobi... F77 (continued) call CPUtime(start_time) ! start timer, measured in seconds CALL bc(m, mp, u, K, P) ! initialize and define B.C. for u DO WHILE (gdel.gt. TOL) ! iterate until error below threshold iter = iter + 1 ! increment iteration counter IF(iter.gt. MAXSTEPS) THEN header = 'Number of iterations exceeds limit‘ CALL error(header,-1) ENDIF CALL update_jacobi(m, mp, u, unew, gdel) IF(MOD(iter,INCREMENT).eq. 0) THEN WRITE(*,"('iter,gdel:',i6,e12.4)")iter,gdel ENDIF ENDDO call CPUtime(end_time) ! stop timer

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 15 of 52 Jacobi... F77 (continued) WRITE(*,"(40('#'))") WRITE(*,"(' Wall clock time =',f10.2,' x ',i3)") & end_time - start_time,p WRITE(*,"(' Stopped at iteration =',i5)") iter WRITE(*,"(' The maximum error =',f12.6)") gdel WRITE(*,"(40('#'))") outfile = 'plots‘ OPEN(unit=40,file=outfile,form='formatted',status='unknown') WRITE(40,"(3i5)")m+2,mp+2,P close(40) WRITE(outfile, "('plots.',i1)")K OPEN(unit=41,file=outfile,form='formatted‘,status='unknown') WRITE(41,"(6e13.4)")(u(i),i=1,(m+2)*(mp+2)) close(41) END

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 16 of 52 bc.f SUBROUTINE bc(m, n, v, k, p) IMPLICIT NONE INTEGER p, m, n, k, i CHARACTER*40 header REAL*8 v(0:m+1, 0:n+1), PI PARAMETER (PI = ACOS(-1.0d0)) CALL init_array(m, n, v) ! Init. v IF (p.eq. 1) THEN ! serial DO i=0,m+1 v(i,0) = sin(PI*i/(m+1)) ! Y=0 v(i,n+1) = v(i,0)*exp(-PI) ! Y=1 ENDDO ELSE IF (p.gt. 1) THEN ! multithreads IF (k.eq. 0) THEN ! 1st thread DO i=0,m+1 v(i,0) = sin(PI*i/(m+1)) ! at y=0 ENDDO ENDIF IF (k.eq. p-1) THEN ! last thread DO i=0,m+1 ! at y=1 v(i,n+1) = sin(PI*i/(m+1))*exp(-PI) ENDDO ENDIF ELSE ! Invalid p header = 'p invalid in subroutine BC‘ CALL error(header,-1) ENDIF END

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 17 of 52 update_jacobi.f SUBROUTINE update_jacobi(m, n, v, vnew, del) IMPLICIT NONE INTEGER i, j, m, n REAL*8 v(0:m+1,0:n+1), vnew(0:m+1,0:n+1), del del = 0.0d0 DO j=1,n DO i=1,m vnew(i,j) = ( v(i,j+1) + v(i+1,j) + v(i-1,j) + v(i,j-1) )*0.25 del = del + DABS(vnew(i,j)-v(i,j)) ! local error sum ENDDO DO j=1,n DO i=1,m v(i,j) = vnew(i,j) ! update v ENDDO END

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 18 of 52 Jacobi Iterative Solver – C #include "solvers.h" INT main() { /********** MAIN PROGRAM ****************************** * Solve Laplace equation using Jacobi iterative method * * Kadin Tseng, Boston University, August, 2000 * *********************************************************/ INT iter, m, mp; REAL gdel, **u, **un; CHAR line[10]; fprintf(OUTPUT,"Enter size of interior points, m :"); (void) fgets(line, sizeof(line), stdin); (void) sscanf(line, "%d", &m); fprintf(OUTPUT,"m = %d\n",m); mp = m/P; u = allocate_2D(m, mp); /* allocate mem for 2D array */ un = allocate_2D(m, mp);

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 19 of 52 Jacobi … Solver – C (Cont’d) gdel = 1.0; /* initialize global error to 1 */ iter = 0; /* initialize iteration counter */ bc(m, mp, u, K, P); /* initialize and define B.C. for u */ replicate(m, mp, u, un); /* copy u to un */ while (gdel > TOL) { /* iterate until error below threshold */ iter++; /* increment iteration counter */ if(iter > MAXSTEPS) { fprintf(OUTPUT,"Iteration terminated (exceeds %6d", MAXSTEPS); fprintf(OUTPUT," )\n"); return (1); /* nonconvergent solution */ } update_jacobi(m, mp, u, un, &gdel); /* update u by Jacobi scheme */ if(iter%INCREMENT == 0) { fprintf(OUTPUT,"iter,gdel: %6d, %lf\n",iter,gdel); }

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 20 of 52 Jacobi … Solver – C (Cont’d) fprintf(OUTPUT,"Stopped at iteration %d\n",iter); fprintf(OUTPUT,"The maximum error = %f\n",gdel); /* write u to file for use in MATLAB plots */ write_file( m, mp, u, K, P ); return (0); }

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 21 of 52 bc.c void bc(INT m, INT n, REAL **u, INT k, INT p) { /*********** Boundary Conditions***************************** * PDE: Laplacian u = 0; 0<=x<=1; 0<=y<=1 * * B.C.: u(x,0)=sin(pi*x); u(x,1)=sin(pi*x)*exp(-pi); u(0,y)=u(1,y)=0 * * SOLUTION: u(x,y)=sin(pi*x)*exp(-pi*y) * ************************************************************/ INT i; init_array( m, n, u); /* initialize u to 0 */ if (p == 1) { for (i = 0; i <= m+1; i++) { u[i][ 0] = sin(PI*i/(m+1)); /* at y = 0; all x */ u[i][n+1] = u[i][0]*exp(-PI); /* at y = 1; all x */ } } else if (p < 1) { printf("p is invalid\n"); }

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 22 of 52 bc.c (continued) if (p > 1) { /* parallel processing */ if (k == 0) { /* the 1 st process */ for (i = 0; i <= m+1; i++) { u[i][0] = sin(PI*i/(m+1)); /* at y = 0; all x */ } if (k == p-1) { /* the last process */ for (i = 0; i <=m+1; i++) { u[i][n+1] = sin(PI*i/(m+1))*exp(-PI); /* at y = 1; all x */ }

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 23 of 52 update_jacobi.c INT update_jacobi( INT m, INT n, REAL **u, REAL **unew, REAL *del) { INT i, j; *del = 0.0; for (i = 1; i <= m; i++) { for (j = 1; j <= n; j++) { unew[i][j] = ( u[i ][j+1] + u[i+1][j ] + u[i-1][j ] + u[i ][j-1] )*0.25; *del += fabs(unew[i][j] - u[i][j]); /* find local max error */ } for (i = 1; i <= m; i++) { for (j = 1; j <= n; j++) { u[i][j] = unew[i][j]; } return (0); }

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 24 of 52 Domain Decompositions 1D Domain Decomposition2D Domain Decomposition Thread 0 Thread 1 Thread 2Thread 3 Thread 0 Thread 2 Thread 3 Thread 1

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 25 of 52 Unknowns At Border Cells – 1D thread 2 thread 1 thread 0 Five-point finite-difference stencil applied at thread domain border cells require cells from neighboring threads and/or boundary cells. Message passing required x x x x o x x x x o x x x x o

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 26 of 52 Message Passing to Fill Boundary Cells thread k+1 thread k thread k-1 current thread

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 27 of 52 For Individual Threads... thread k of p threads Recast 5-pt finite-difference stencil for individual threads Boundary Conditions For simplicity, assume m divisible by p B.C. time-dependent B.C. obtained by message-passing Additional boundary conditions on next page

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 28 of 52 Relationship Between u and v Physical boundary conditions Relationship between global solution u and thread-local solution v

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 29 of 52 MPI Functions Used For Job MPI_Sendrecv ( = MPI_Send + MPI_Recv) – to set boundary conditions for individual threads MPI_Allreduce – to search for global error to determine whether convergence has been reached. MPI_Cart_Create – to create Cartesian topology MPI_Cart_Coords – to find equivalent Cartesian coordinates of given rank MPI_Cart_Rank – to find equivalent rank of Cartesian coordinates MPI_Cart_shift – to find current thread’s adjoining neighbor threads

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 30 of 52 Successive Over Relaxation 1.Make initial guess for u at all interior points (i,j). 2.Define a scalar 3.Use 5-pt stencil to compute at all interior points (i,j). 4.Compute 5.Stop if prescribed convergence threshold is reached. 6.Update: 7.Go to step 2. In Step 3, compute u' with u at time n+1 wherever possible to accelerate convergence. This inhibits parallelism.

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 31 of 52 Red-Black SOR Scheme X X O x x x x o x x x x o x x x x o 1.Compute v at black cells at time n+1 in parallel using v at red cells at time n. 2.Compute v at red cells at time n+1 in parallel with v at black cells at time n+1. 3.Repeat steps 1 and 2 until v fully converged. Can alternate order of steps 1 and 2. Note that, by virtue of 5-pt stencil, solution at black cells depend on 4 adjacent red cells. Conversely, red solution cells depend only on 4 respective adjoining black cells. This suggests the following parallel algorithm:

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 32 of 52 Parallel SOR F77 Program PROGRAM sor_parallel_f77 IMPLICIT NONE INTEGER m, mp, iter, p, i, j, k REAL start_time, end_time CHARACTER*40 header, outfile REAL*8 TOL, omega, rhoj, rhojsq, PI, del, delr, delb, gdel INTEGER grid_comm, me, iv, coord, dims, periods, ndim, ierr INTEGER below, above LOGICAL reorder INTEGER MAX_M, MAXSTEP, INCREMENT PARAMETER (MAX_M = 512, MAXSTEP = 50000, INCREMENT=100) PARAMETER ( TOL =1.d-6, PI = ) PARAMETER ( periods=0, ndim=1, reorder =.false.) REAL*8 v((MAX_M+1+2)*(MAX_M+2)) INCLUDE 'mpif.h' outfile = ‘plots’

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 33 of 52 Parallel SOR F77 Program (cont’d) CALL MPI_Init(ierr) ! starts MPI CALL MPI_Comm_rank(MPI_COMM_WORLD, k, ierr) CALL MPI_Comm_size(MPI_COMM_WORLD, p, ierr) start_time = MPI_Wtime() ! Start timer, in seconds m = MAX_M mp = m/p ! m’ gdel = 1.0d0 ! Initialize global error to 1 iter = 0 ! Initialize iteration counter rhoj = 1.0d0 - PI*PI*0.5/(m+2)/(m+2) ! Chebyshev coefficients rhojsq = rhoj*rhoj

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 34 of 52 Parallel SOR F77 Program (cont’d) ! create cartesian topology for matrix dims = p CALL MPI_Cart_create(MPI_COMM_WORLD, ndim, dims, & periods, reorder, grid_comm, ierr) CALL MPI_Comm_rank(grid_comm, me, ierr) CALL MPI_Cart_coords(grid_comm, me, ndim, coord, ierr) iv = coord CALL bc(m, mp, v, iv, p) ! set up boundary conditions CALL MPI_Cart_shift(grid_comm, 0, 1, below, above, ierr) omega = 1.0d0 CALL update_sor( m, mp, v, omega, delr, 'r') CALL update_bc_2( m, mp, v, iv, below, above) omega = 1.0d0/(1.0d d0*rhojsq) CALL update_sor( m, mp, v, omega, delb, 'b') CALL update_bc_2( m, mp, v, iv, below, above)

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 35 of 52 Parallel SOR F77 Program (cont’d) DO WHILE (gdel.gt. TOL) ! Loop until gdel <=TOL iter = iter + 1 ! increment iteration counter omega = 1.0d0/(1.0d d0*rhojsq*omega) CALL update_sor( m, mp, v, omega, delr, 'r') CALL update_bc_2( m, mp, v, iv, below, above) omega = 1.0d0/(1.0d d0*rhojsq*omega) CALL update_sor( m, mp, v, omega, delb, 'b') CALL update_bc_2( m, mp, v, iv, below, above) IF(MOD(iter,INCREMENT).eq. 0) THEN ! Check error once awhile del = (delr + delb)*4.d0 CALL MPI_Allreduce( del, gdel, 1, MPI_DOUBLE_PRECISION, & MPI_MAX, MPI_COMM_WORLD, ierr ) ! find global max error IF ( iter.ge. MAXSTEP) THEN header = 'Number of iterations exceeded limit' CALL error(header,-1) ENDIF IF(k.eq. 0) WRITE(*,'(i5,3d13.5)')iter,del,gdel,omega ENDIF ENDDO

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 36 of 52 Parallel SOR F77 Program (cont’d) end_time = MPI_Wtime() ! stop timer IF (k.eq. 0) THEN WRITE(*,"(40('#'))") WRITE(*,"(' Wall clock time =',f10.2,' x ',i3)") & end_time - start_time,p WRITE(*,"(' Stopped at iteration ',i5)")iter WRITE(*,"(' The maximum error =',f12.6)")gdel WRITE(*,"(40('#'))") OPEN(unit=40, file=outfile, form='formatted', status='unknown') WRITE(40,"(3i5)")m+2,mp+2,p close(40) ENDIF WRITE(outfile, "('plots.',i1)")k OPEN(unit=41, file=outfile, form='formatted', status='unknown') WRITE(41,"(6e13.4)")(v(i),i=1,(m+2)*(mp+2)) close(41) CALL MPI_Finalize(ierr) END ! PROGRAM sor_parallel_f77

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 37 of 52 update_sor.f SUBROUTINE update_sor(m, n, u, omega, del, redblack) !*********************************************************** !* Updates Laplace solution u base on red-black iterative scheme * !* m - (INPUT) 1st dimension size * !* n - (INPUT) 2nd dimension size * !* u - (INPUT) solution array; u = u(0:m+1,0:n+1) * !* omega - (INPUT) iterative parameter * !* del - (OUTPUT) error between two iterative u * !* redblack - (INPUT) which to update; 'r' for red * !*********************************************************** IMPLICIT NONE CHARACTER redblack INTEGER m, n, i, ib, ie, j, jb, je REAL*8 omega, del, up REAL*8 u(0:m+1,0:n+1) del = 0.0d0

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 38 of 52 update_sor.f (cont’d) IF (redblack.eq. 'r') THEN ! process RED odd points... jb = 1 je = n ib = 1 ie = m DO j=jb,je,2 DO i=ib,ie,2 up = ( u(i,j+1) + u(i+1,j ) + & u(i-1,j ) + u(i,j-1) )*0.25d0 u(i,j) = (1.0d0 - omega)*u(i,j) + & omega*up del = del + dabs(up-u(i,j)) ENDDO ! process RED even points... jb = 2 je = n ib = 2 ie = m DO j=jb,je,2 DO i=ib,ie,2 up = ( u(i,j+1) + u(i+1,j ) + & u(i-1,j ) + u(i,j-1) )*0.25d0 u(i,j) = (1.0d0 - omega)*u(i,j) + & omega*up del = del + dabs(up-u(i,j)) ENDDO

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 39 of 52 update_sor.f (cont’d) ELSE IF ( redblack.eq. ‘b’) THEN ! process BLACK odd points... jb = 2 je = n ib = 1 ie = m DO j=jb,je,2 DO i=ib,ie,2 up = ( u(i,j+1) + u(i+1,j ) + & u(i-1,j ) + u(i,j-1) )*0.25d0 u(i,j) = (1.0d0 - omega)*u(i,j) + & omega*up del = del + dabs(up-u(i,j)) ENDDO ! process BLACK even points... jb = 1 je = n ib = 2 ie = m DO j=jb,je,2 DO i=ib,ie,2 up = ( u(i,j+1) + u(i+1,j ) + & u(i-1,j ) + u(i,j-1) )*0.25d0 u(i,j) = (1.0d0 - omega)*u(i,j) + & omega*up del = del + dabs(up-u(i,j)) ENDDO ELSE STOP ENDIF END

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 40 of 52 Subroutine update_bc_2 SUBROUTINE update_bc_2( m, mp, v, k, below, above ) IMPLICIT NONE INCLUDE 'mpif.h' INTEGER m, mp, k, below, above, ierr REAL*8 v(0:m+1,0:mp+1) INTEGER status(MPI_STATUS_SIZE) CALL MPI_Sendrecv( & v(1,mp), m, MPI_DOUBLE_PRECISION, above, 0, & v(1, 0), m, MPI_DOUBLE_PRECISION, below, 0, & MPI_COMM_WORLD, status, ierr ) CALL MPI_Sendrecv( & v(1, 1), m, MPI_DOUBLE_PRECISION, below, 1, & v(1,mp+1), m, MPI_DOUBLE_PRECISION, above, 1, & MPI_COMM_WORLD, status, ierr ) RETURN END ! SUBROUTINE update_bc_2

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 41 of 52 Parallel SOR C Program #include "solvers.h" #include "mpi.h" INT main(INT argc, CHAR *argv[]) { /***************MAIN PROGRAM **************************** * Solve Laplace equation using Successive Over Relaxation * * and Chebyshev Acceleration (see Numerical Recipe for detail) * * Kadin Tseng, Boston University, August, 2000 * ***********************************************************/ INT iter, m, mp, p, k, below, above; REAL omega, rhoj, rhojsq, del, delr, delb, gdel; CHAR line[80], red, black; MPI_Comm grid_comm; INT me, iv, coord[1], dims, periods, ndim, reorder; REAL **v, **vt; MPI_Init(&argc, &argv); /* starts MPI */ MPI_Comm_rank(MPI_COMM_WORLD, &k); /* get current process id */ MPI_Comm_size(MPI_COMM_WORLD, &p); /* get # procs from env or */

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 42 of 52 Parallel SOR C Program (cont’d) periods = 0; ndim = 1; reorder = 0; red = 'r'; black = 'b'; if(k == 0) { fprintf(OUTPUT,"Enter size of interior points, m :\n"); (void) fgets(line, sizeof(line), stdin); (void) sscanf(line, "%d", &m); fprintf(OUTPUT,"m = %d\n",m); } MPI_Bcast(&m, 1, MPI_INT, 0, MPI_COMM_WORLD); mp = m/p; v = allocate_2D(m, mp); /* allocate mem for 2D array */ vt = allocate_2D(mp, m); /* allocate mem for 2D array */ gdel = 1.0; /* initialize global error to 1 */ iter = 0; /* initialize iteration counter */ rhoj = PI*PI*0.5/(m+2)/(m+2); /* Chebyshev coefficient */ rhojsq = rhoj*rhoj; /* create cartesian topology for matrix */ dims = p; /* cartesian coordinates has same size of procs */

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 43 of 52 Parallel SOR C Program (cont’d) MPI_Cart_create(MPI_COMM_WORLD, ndim, &dims, &periods, reorder, &grid_comm); MPI_Comm_rank(grid_comm, &me); /* current proc */ MPI_Cart_coords(grid_comm, me, ndim, coord); /* current proc coord. */ iv = coord[0]; bc( m, mp, v, iv, p); /* set up boundary conditions */ transpose(m, mp, v, vt); /* transpose v into vt */ replicate(mp, m, vt, v); MPI_Cart_shift(grid_comm, 0, 1, &below, &above); omega = 1.0; update_sor( mp, m, vt, omega, &delr, red); update_bc_2( mp, m, vt, iv, below, above); omega = 1.0/( *rhojsq); update_sor( mp, m, vt, omega, &delb, black); update_bc_2( mp, m, vt, iv, below, above);

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 44 of 52 Parallel SOR C Program (cont’d) while (gdel > TOL) { iter++; /* increment iteration counter */ omega = 1.0/( *rhojsq*omega); update_sor( mp, m, vt, omega, &delr, red); update_bc_2( mp, m, vt, iv, below, above); omega = 1.0/( *rhojsq*omega); update_sor( mp, m, vt, omega, &delb, black); update_bc_2( mp, m, vt, iv, below, above); if(iter%INCREMENT == 0) { del = (delr + delb)*4.0; MPI_Allreduce( &del, &gdel, 1, MPI_DOUBLE, MPI_MAX, MPI_COMM_WORLD); /* find global max error */ if (k == 0) { fprintf(OUTPUT,"iter gdel omega: %5d %13.5f %13.5f\n",iter,gdel,omega); } } if(iter > MAXSTEPS) { fprintf(OUTPUT,"Iteration terminated (exceeds %6d", MAXSTEPS); fprintf(OUTPUT," )\n"); return (1); /* nonconvergent solution */ } }

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 45 of 52 Parallel SOR C Program (cont’d) if (k == 0) { fprintf(OUTPUT,"Stopped at iteration %d\n",iter); fprintf(OUTPUT,"The maximum error = %f\n",gdel); } /* write v to file for use in MATLAB plots */ transpose(mp, m, vt, v); /* transpose v into vt */ write_file( m, mp, v, k, p); MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); return (0); }

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 46 of 52 Parallel SOR C Program (cont’d) INT update_sor( INT m, INT n, REAL **u, REAL omega, REAL *del, CHAR redblack) { /*********************************************************** * Updates u according to successive over relaxation method * * m - (INPUT) size of interior rows * * n - (INPUT) size of interior columns * * u - (INPUT) array * * omega - (INPUT) adjustable constant used to speed up * * convergence of SOR * * del - (OUTPUT) error norm between 2 solution steps * * redblack - (INPUT) either 'r' for red and 'b' for black * ************************************************************/ INT i, ib, ie, j, jb, je; REAL up; *del = 0.0;

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 47 of 52 Function update_bc_2 #include "solvers.h" #include "mpi.h" INT update_bc_2( INT mp, INT m, REAL **vt, INT k, INT below, INT above ) { MPI_Status status; MPI_Sendrecv( vt[mp]+1, m, MPI_DOUBLE, above, 0, vt[0 ]+1, m, MPI_DOUBLE, below, 0, MPI_COMM_WORLD, status ); MPI_Sendrecv( vt[1 ]+1, m, MPI_DOUBLE, below, 1, vt[mp+1]+1, m, MPI_DOUBLE, above, 1, MPI_COMM_WORLD, status ); return (0); }

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 48 of 52 Parallel SOR C Program (cont’d) if (redblack == 'r') { /* process RED odd points... */ jb = 1; je = n; ib = 1; ie = m; for ( j = jb; j <= je; j+=2 ) { for ( i = ib; i <=ie; i+=2 ) { up = ( u[i ][j+1] + u[i+1][j ] + u[i-1][j ] + u[i ][j-1] )*0.25; u[i][j] = (1.0 - omega)*u[i][j] + omega*up; *del += fabs(up-u[i][j]); } /* process RED even points... */ jb = 2; je = n; ib = 2; ie = m; for ( j = jb; j <= je; j+=2 ) { for ( i = ib; i <=ie; i+=2 ) { up = ( u[i ][j+1] + u[i+1][j ] + u[i-1][j ] + u[i ][j-1])*0.25; u[i][j] = (1.0 - omega)*u[i][j] + omega*up; *del += fabs(up-u[i][j]); }

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 49 of 52 Parallel SOR C Program (cont’d) } else { if (redblack == ‘b') { /* process BLACK odd points... */ jb = 2; je = n; ib = 1; ie = m; for ( j = jb; j <= je; j+=2 ) { for ( i = ib; i <=ie; i+=2 ) { up = ( u[i ][j+1] + u[i+1][j ] + u[i-1][j ] + u[i ][j-1] )*0.25; u[i][j] = (1.0 - omega)*u[i][j] + omega*up; *del += fabs(up-u[i][j]); } /* process BLACK even points... */ jb = 1; je = n; ib = 2; ie = m; for ( j = jb; j <= je; j+=2 ) { for ( i = ib; i <=ie; i+=2 ) { up = ( u[i ][j+1] + u[i+1][j ] + u[i-1][j ] + u[i ][j-1])*0.25; u[i][j] = (1.0 - omega)*u[i][j] + omega*up; *del += fabs(up-u[i][j]); } return (0); } else { return (1); } } }

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 50 of 52 SOR Timings On Origin2000

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 51 of 52 Timings for an Iterative Solver ProcessorsSGI Origin2000 IBM Regatta Intel Pentium 3 Linux cluster

Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 52 of 52 Timings … (continued)