What is required for "standard" distributed parallel programming model? Mitsuhisa Sato Taisuke Boku and Jinpil Lee University of Tsukuba.

What is required for "standard" distributed parallel programming model? Mitsuhisa Sato Taisuke Boku and Jinpil Lee University of Tsukuba

2 My Background and Position OpenMP A standard parallel programming model and API for shared memory multiprocessors Extend the base language (Fortran/C/C++) with directives or pragma Incremental parallel programming, keep sequential semantics with ignoring directives allows range of programming styles For scientific applications. Support for loop-based parallelism Target: small-scale( ～ 16processors ） to medium-scale ( ～ 64processors ） First draft is published in 1997, now this standard is getting accepted for multi-core era. Omni OpenMP compiler project ( … now, inactive) The project done in Real World Computing Partnership (RWCP, ～ 2002 ) Research Objectives Portable implementation of OpenMP for SMPs Design and implementation of Cluster-enabled OpenMP for PC/WS/SMP clusters Support seamless programming from SMPs to clusters. Using page-based Software Distributed Shared Memory System Free and Open-Source, released since 1998

3 Agenda OpenMPD : directive-based programming model for distributed memory What is required for "standard" distributed parallel programming model?

4 OpenMPD : directive-based programming model for distributed memory Objectives Providing a simple and “ easy-to-understand ” programming model for distributed memory OpenMP is just for shared memory, not for distributed memory Supporting data parallelization and typical parallelization pattern by adding directive similar to OpenMP (inspired by OpenMP)

5 Features of OpenMPD Directive-based programming model for distributed memory system C programming language (Fortran) + directives Explicit communication and synchronization All action is taken by directive for being “ easy-to-understand ” in performance tuning Support typical communication pattern Scatter/gather, reduction, neighbor communication, … “ Directives ” describe typical data parallelization array distribution, data synchronization, … Highly portable implementation with translation to MPI the compiler translate the directives into parallel code using MPI functions

6 Code Example int array[YMAX][XMAX]; #pragma ompd distvar(var = array;dim = 2) main(){ int i, j, res; res = 0; #pragma ompd for affinity(array) reduction(+:res) for(i = 0; i < 10; i++) for(j = 0; j < 10; j++){ array[i][j] = func(i, j); res += array[i][j]; } } add to the serial code : incremental parallelization data distribution work sharing and data synchronization

7 The same code written in MPI int array[YMAX][XMAX]; main(int argc, char**argv){ int i,j,res,temp_res, dx,llimit,ulimit,size,rank; MPI_Init(argc, argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); dx = YMAX/size; llimit = rank * dx; if(rank != (size - 1)) ulimit = llimit + dx; else ulimit = YMAX; temp_res = 0; for(i = llimit; i < ulimit; i++) for(j = 0; j < 10; j++){ array[i][j] = func(i, j); temp_res += array[i][j]; } MPI_Allreduce(&temp_res, &res, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD); MPI_Finalize(); }

8 Array data distribution Each processor computes on different regions  #pragma ompd distvar(var=list; dim=num; sleeve=size) CPU1 CPU2 CPU3 CPU0 array[] Reference to assigned to other nodes Synchronization on data → Sync on sleeve area Sync. on whole array The programmer choose which sync is required In current implementation, whole array are replicated in each node

9 Data synchronization of array (Gather) Gather operation to distribute data to every nodes  #pragma ompd gather(var=list)  Execute communication to get data assigned to other nodes  Most easy way to synchronize CPU1 CPU2 CPU3 CPU0 array[] Now, we can access correct data by local access !! → But, communication is expensive!

10 Data synchronization of array (Sleeve) Exchange data only on “ sleeve ” region  If neighbor data is required to communicate, then only sleeve area can be considered.  example ： b[i] = array[i-1] + array[i+1] CPU1 CPU2 CPU3 CPU0 array[] Programmer specifies sleeve region explicitly Directive ： #pragma ompd sync_sleeve(var=array) #pragma ompd distvar(var = array; dim = 1 ) ; sleeve = 1)  Different from gather operation, communcation on sleeve is cheaper.  User has to specify sleeve region with the size.

11 Parallel Execution of “ for ” loop array[] CPU1 CPU2 CPU3 CPU0 Execute for loop to compute on array Data region to be computed by for loop Execute for loop in parallel with affinity to array distribution ： #pragma for affinity(array) Array distribution for(i=2; i <=10; i++)

12 Experimental Results constant speed-up with moderate scalability performance degraded by lack of multi-dim. array distribution

13 Related Work OpenMP Just only for shared memory Unified Parallel C PGAS (Partitioned Global Address Space) Language Co-Array Fortran Also, PGAS Above two providing alternative programming models of MPI for distributed memory OpenWP?

14 Future Work and Plan for OpenMPD Multi-dimensional array distribution and nested parallel loop execution Integration of PGAS feature for more flexible communication pattern and data distribution Current OpenMPD only support typical cases. Remote memory access (one-side communication) Part of assigned data should be allocated in each node Address translation is required. Supporting hybrid programming with OpenMP within node in SMP/multicore node clusters, even with MPI!

15 Agenda OpenMPD : directive-based programming model for distributed memory What is required for "standard" distributed parallel programming model?

16 Message Passing Model (MPI) Message passing model was the dominant programming model in the past. …. Yes. Message passing is the dominant programming model today. … Unfortunately, yes … Will OpenMP be a programming model for future system? … I hope so, but it is not perfect. OpenMP is only for shared memory model. (I think) some features for performance turning are missing data mapping, scalability, IO …

17 For application programmers Are programmers satisfied with MPI? yes … ? Many programmers writes MPI. Is MPI enough for parallelizing scientific parallel programs? Application programmer ’ s concern is to get their answers faster!! Automatic parallelizing compiler is the best, but … many problems remain.

18 “ Life is too short for MPI” (from WOMPAT2001 T-shirt message) Simple N-body problem for(i = 0; i < n_particles; i++) { p = &particles[i]; ax = 0.0; ay = 0.0; az = 0.0; for(j = 0; j < n_particles; j++){ if(i == j) continue; q = &particles[j]; dx = p->x - q->x; dy = p->y - q->y; dz = p->z - q->z; X = dx * dx + dy * dy + dz * dz; if (X m * (X - a2) * (X - b2); ax += f * dx; ay += f * dy; az += f * dz; } } p->ax = ax; p->ay = ay; p->az = az; } for(i = 0; i x += p->vx * DT; p->y += p->vy * DT; p->z += p->vz * DT; p->vx += p->ax * DT; p->vy += p->ay * DT; p->vz += p->az * DT; } MPI Data partitioning scheduling communication (broadcast, reduction) OpenMP just put #pragma omp parallel at loop!!! It takes several hours with MPI It takes just a few 10 min!!! #pragma omp parallel

19 Jede HPC++ mpc++ HPF Linda Mentat Fortran M Occam APL SAL pC++ SISAL NESL Clik pHaskel Prolog Orca mpC C* dataparallel C Split-C Fortran D V Charm++ CODE ZPL Fortran X3H5 ….. Parallel programming languages Programming language design reflects its model. So far, many parallel programming languages were proposed in computer science community. Are they actually used by application users? Where were they gone? What is missing in them?

20 Think about MPI, … Why was MPI accepted and so successful? Portability: Most parallel computing platforms can run MPI programs (even in SMP). Many free and portable software such as MPICH. Education: MPI Standard allows many programmers to learn MPI parallel programming. In university By book

21 Discussion The demand for parallel programming is increasing!! Low cost PC clusters SMP in PC box. On-chip multiprocessors, … multiprocessors even in PDA, now! Of course, … clear and excellent concept of modeling, good performance, … many factors are important! Standardization and Education are important for widespread use. Standardization enables a good education. It must be available in many platforms.

22 Discussion Cost of parallelization is also important for acceptance by application programmers. Easy to transfer from an original sequential program. What application programmers need to learn must be small. We have a plan to organize the group for “ standard ” parallel programming language for petaflops systems Will be supported by RIKEN Try to find a fund for development Should be international. For the standard, “ agreement ” process is important rather than “ advanced ” idea. Standardization and Education

What is required for "standard" distributed parallel programming model? Mitsuhisa Sato Taisuke Boku and Jinpil Lee University of Tsukuba.

Similar presentations

Presentation on theme: "What is required for "standard" distributed parallel programming model? Mitsuhisa Sato Taisuke Boku and Jinpil Lee University of Tsukuba."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

What is required for "standard" distributed parallel programming model? Mitsuhisa Sato Taisuke Boku and Jinpil Lee University of Tsukuba.

Similar presentations

Presentation on theme: "What is required for "standard" distributed parallel programming model? Mitsuhisa Sato Taisuke Boku and Jinpil Lee University of Tsukuba."— Presentation transcript:

Similar presentations

About project

Feedback