Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.

Similar presentations


Presentation on theme: "Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui."— Presentation transcript:

1 Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui

2 Outline Numerical methods Modern programming models DG method: Implementation and scalability

3 Outline Numerical methods Modern programming models DG method: Implementation and scalability

4 Classical approaches Finite difference:

5 Classical approaches Finite volume: oldwww.unibas.it

6 Limitations The finite volume method is a low order method. The approximate solution is piecewise constant. Very fine mesh = High number of degrees of freedom = Large linear system.

7 DG-Finite Element Method Allows us to use higher order approximation. Allows the modelling of complex geometries. The modern methods such as the DG method allows the implementation of hp-refinement in a relatively easy way. p=2 p=1 p=3

8 DG-Finite Element Method Allows us to use higher order approximation. Allows the modelling of complex geometries. The modern methods such as the DG method allows the implementation of hp-refinement in a relatively easy way.

9 Outline Numerical methods Modern programming models DG method: Implementation and scalability

10 Serial Computers Serial Computer Memory Unit Central Processing Unit (CPU) 1 Central Processing Unit (CPU). 1 Memory Unit.

11 From Serial to Parallel: Step I Idea: Add more cores! => Multi-core processor/CPU Architecture: Uniform memory access (UMA) UMA Node Memory Unit Central Processing Unit (CPU) Core Speed A

12 From Serial to Parallel: Step II Idea: Add more processors => Multi-processor nodes Architecture: Non-uniform memory access (NUMA) NUMA Node Memory Unit Central Processing Unit (CPU) Core Central Processing Unit (CPU) Core Speed A Speed B Speed A > Speed B

13 From Serial to Parallel: Step III Idea: Connect nodes by network (actual wires) Result: The majority of supercomputers around 2010. Architecture: Interconnected NUMA nodes … NUMA Node Speed С Speed A > Speed B > Speed С

14 Outline Numerical methods Modern programming models DG method: Implementation and scalability

15 Domain Decomposition and SPMD Single program, Multiple data (SPMD) Most common style of parallel programming Tasks are split up and run simultaneously on multiple processors with different input in order to obtain results faster. Same program is executed on every processor

16 Domain Decomposition Core 1Core 2 Ghost region

17 Domain Decomposition of The FE Method Core 1

18 Domain Decomposition of The FE Method Core 2Core 1 MPI

19 Load Balance The domain decomposition is done by elements. Assign weights to the elements to ensure load balance. p=2 p=1 p=3

20 Strong Scalability CRAY machine: 52 nodes with 2 CPUs =>Total number of cores = 1040 We use Hypre* to solve the linear system. * http://acts.nersc.gov/hypre/

21 Strong Scalability CRAY machine: 52 nodes with 2 CPUs =>Total number of cores = 1040 We use Hypre* to solve the linear system. * http://acts.nersc.gov/hypre/

22 Weak Scalability

23 Evolution of Supercomputers: GPUs Idea: Complement CPUs with accelerators/co-processors Result: The biggest supercomputers today. Architecture: Hybrid … NUMA Node Speed С GPU CPU NUMA Node GPU CPU NUMA Node GPU CPU NUMA Node GPU CPU

24 Domain Decomposition of The FE Method Node 1

25 Domain Decomposition of The FE Method Node 2Node 1 MPI

26 Scalability of The Hybrid Implementation I Comparison between HYPRE and AMGX made using 2 CPUs per node for HYPRE and one Tesla K40 GPU per node for AMGX.

27 NUMA Node Central Processing Unit (CPU) Drawbacks Core GPU SUBDOMAIN i Uniform Access Linear system

28 NUMA Node Optimized Implementation: OpenMP Central Processing Unit (CPU) Core GPU SUBDOMAIN i Access Linear system OpenMP

29 Scalability of The Hybrid Implementation II

30 Conclusion We were able to develop a very scalable software that takes into account modern technology to simulate geophysical applications. hp-refinement is fairly easy as a result of using DG method. Load balancing is ensured using parmetis.


Download ppt "Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui."

Similar presentations


Ads by Google