Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.

Similar presentations


Presentation on theme: "Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal."— Presentation transcript:

1 Parallelization Strategies Laxmikant Kale

2 Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal modification strategies Thread based techniques: ROCFLO,.. Some future plans

3 OpenMP Motivation: –Shared memory model often easy to program –Incremental optimization possible

4 ROCFLO via OpenMP Parallelization of ROCFLO using a loop- parallel paradigm via OpenMP –Poor speedup compared with MPI version –Was locality the culprit? Study conducted by Jay Hoeflinger –In collaboration with Fady Najjar

5 ROCFLO with MPI

6

7 The Methodology Do OpenMP/MPI comparison experiments. Write an OpenMP version of ROCFLO –Start with the MPI version of ROCFLO, –Duplicate the structure of the MPI code exactly (including message passing calls). –This removes locality as a problem. Measure performance –If any parts do not scale well, determine why.

8

9 Barrier Cost: MPI vs OpenMP (Origin 2000)

10

11

12

13

14

15 So Locality was not the whole problem! The other problems turned out to be: –I/O which doesn’t scale –ALLOCATE which doesn’t scale –our non-scaling reduction implementation –our first-cut messaging infrastructure which, could be improved Conclusion –Efficient loop parallel version may be feasible, avoiding Allocates and using scalable IO

16 Need for adaptive strategies Computation structure changes over time: –Combustion Adaptive techniques in application codes: –Adaptive refinement in structures or even fluid –Other codes such as crack propagation Can affect the load balance dramatically –One can go from 90% efficiency to less than 25%

17 Multi-partition decompositions Idea: decompose the problem into a number of partitions, –independent of the number of processors –# Partitions > # Processors The system maps partitions to processors –The system should be able to map and re-map objects as needed

18 Load Balancing Framework Aimed at handling... –Continuous (slow) load variation –Abrupt load variation (refinement) –Workstation clusters in multi-user mode Measurement based –Exploits temporal persistence of computation and communication structures –Very accurate (compared with estimation) –instrumentation possible via Charm++/Converse

19 Charm++ A parallel C++ library –supports data driven objects –many objects per processor, with method execution scheduled with availability of data –system supports automatic instrumentation and object migration –Works with other paradigms: MPI, openMP,..

20 Load balancing framework

21 Load balancing demonstration To test the abilities of the framework –A simple problem: Gauss-Jacobi iterations –Refine selected sub-domains AppSpector: web based tool –Submit parallel jobs –Monitor performance and application behavior –Interact with running jobs via GUI interfaces

22

23 Adapitivity with minimal modification Current code base is parallel (MPI) –But doesn’t support adaptivity directly –Rewrite the code with objects?... Idea: support adaptivity with minimal changes to F90/MPI codes Work by: –Milind Bhandarkar, Jay Hoeflinger, Eric de Sturler

24 Migratable threads approach Change required: –Encapsulate global variables in modules Dynamically allocatable Intercept MPI calls –Implement them in a multithreaded layer Run each original MPI process as a thread –User level thread Migrate threads as needed by load balancing –Trickier problem than object migration

25 Progress: Test Fortran-90 - C++ interface Encapsulation feasibility: Thread migration mechanics ROCFLO study: Test code implementation ROCFLO implementation

26 Another approach to adaptivity Cleanly separate parallel and sequential code: –All parallel code in C++ –All application code in Fortran 90 sequential subroutines Needs more restructuring of application codes –But is feasible, especially for new codes –Much easier to migrate –Improves modularity


Download ppt "Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal."

Similar presentations


Ads by Google