Download presentation
Presentation is loading. Please wait.
Published byGriffin Elliott Modified over 7 years ago
1
OCAPIE project W. Borreani (INFN) & G. Ciaccio (DIBRIS-Unige) Outline
(Ottimizzazione di CAlcolo Parallelo Intensivo Applicate a problematiche in ambito Energetico) W. Borreani (INFN) & G. Ciaccio (DIBRIS-Unige) Outline Project description and objectives Codes to be used (some) KNL architecture KNL Genova Cluster Performance tests and scalability OCAPIE collaboration: P. Saracco, W. Borreani, A. Brunengo, V. Calvelli, G. Ciaccio (Unige), M. Corosu, S. Farinon, G. Lomonaco (Unige) R. Musenich, M. Osipenko, M. Ripani, M. Taiuti (Unige) + PoliTo, PoliMi, ANN (Ansaldo Nucleare spa), ASG Superconductors CCR Workshop – May LNGS 1
2
Amount of the project: 200.000 € (180.000 financed)
Project description and objectives Financed in 2016 by “Compagnia di San Paolo” (Torino), following a public call and selection Amount of the project: € ( financed) to buy servers, for 2 AdR (W. Borreani, V. Calvelli) + movements Objectives: “the study and the development of parallel computing technologies useful to the description of multiscale/multiphysics phenomena in the broad field of energy production and distribution: complex systems modelization usually is grounded on a variety of physical models, because of the wide differences in the scales at which different physical phenomena occur.” “The traditional approach to describe such complex systems relies on the sequential use of simulation tools specifically tailored for each subset of the physical problems involved: (…)Realization of a full and reliable model usually requires many iterations of this computational process; then completion of the calculation is computationally expensive and requires large amount of human work…” to develop and optimize an intermediate approach: parallelization and (partial) integration of existing and reliable computational/simulation codes, by using the newest hardware and software technologies. This should reduce both the computational demands and the time needed to the full R&D process, due to (at least partial) use of computational methodologies that are optimized for the different physical processes involved and to the removal of the need of a validation phase, which has been carried on along decades for each of the codes. CCR Workshop – May LNGS 2
3
OpenFOAM v. 1612+ (MPI) for 3D thermohydraulics modelization
Codes to be used Transport codes: mainly MCNP6 (OpenMP, MPI), but we project to use also GEANT4, Tripoli, Fluka and Serpent 2 (integrates thermohydraulics, at least partially) Typical run times for realistic configurations on single CPU ∼ month(s) OpenFOAM v (MPI) for 3D thermohydraulics modelization Typical run times for realistic configurations on single CPU: steady state ∼ day(s), transients ∼ month(s) Need to iterate transport and thermohydraulics until convergence Different solvers and calculation meshes Use of validated codes for industrial design In-house developed code (OpenMP+MPI), dedicated to quench propagation in superconducting devices (see later on) Typical run times for realistic configurations on single CPU: days or more CCR Workshop – May LNGS 3
4
KNL package OmniPath available on some models only
CCR Workshop – May LNGS 4
5
KNL internal mesh CCR Workshop – May LNGS 5
6
KNL tile CCR Workshop – May LNGS 6
7
KNL MCDRAM CCR Workshop – May LNGS 7
8
KNL clustering modes CCR Workshop – May LNGS 8
9
KNL clustering modes CCR Workshop – May LNGS 9
10
KNL clustering modes CCR Workshop – May LNGS 10
11
MKL KNL e Xeon CCR Workshop – May LNGS 11
12
MKL KNL e Xeon CCR Workshop – May LNGS 12
13
MKL dcsrgemv() @ KNL e Xeon
CCR Workshop – May LNGS 13
14
MKL @ KNL: hyperthreading
Peak performance with #threads = #cores, no improvement with hyperthreading. Possible reasons: BLAS level 3 (CPU-bound): code so well optimized that one thread is enough to saturate a core BLAS level 2, and sparse (memory-bound): 64 threads are enough to saturate DRAM bandwidth But had no time to investigate this (do we need to?) CCR Workshop – May LNGS 14
15
KNL: notes quad/cache and a2a/cache look like best ones, overall. a2a/flat-mcdram good too, but limited to small jobs (16 GB max), so not quite practical weird performance of BLAS level 2 and sparse on snc4/cache; maybe because such a NUMA-like configuration requires suitable data placement in memory Sparse matrix * vector: MKL shows a bare 1.3 speedup over a handcrafted one written in C (CSR format, Intel compiler) when size is large (>100M) CCR Workshop – May LNGS 15
16
superconducting magnet
An electromagnet made of windings of a cable containing a superconducting wire inside. Zero resistance: huge current ( Ampere) can flow indefinitely, no power supply during service, no heat dissipation. Intense magnetic field (orders of 10 Tesla), not feasible with ordinary conductors. Cooling system (liquid He) necessary for keeping the superconducting state. CCR Workshop – May LNGS 16
17
the cable CCR Workshop – May LNGS 17
18
applications: magnetic resonance
CCR Workshop – May LNGS 18
19
applications: nuclear physics
CCR Workshop – May LNGS 19
20
quench Loss of superconductivity in a region of the wire.
Various causes: defective wire, mechanical stress, electric malfunction, cooling failure, nuclear particles swarm. More likely during the initial charging of the magnet, but also possible during normal operation. The resistive region dissipates energy as heat. Heat expands to near regions, triggering further losses of superconductivity. Energy dissipation increases, more heat is generated... CCR Workshop – May LNGS 20
21
quench CCR Workshop – May LNGS 21
22
quench Superconducting magnets are expensive toys...safety devices must detect and absorb the quench before a permanent damage occurs. The correct dimensioning of the safety devices requires a model of quench behaviour in space/time. Possible use of a simulator. CCR Workshop – May LNGS 22
23
quench simulators Some quench simulators are available, but
simplified model of the cable (single material) source code not available; impossible to customize slow execution; acceptable run time achieved at expenses of accuracy, rather than HPC techniques. So there is room for improvement, especially with parallel computing! CCR Workshop – May LNGS 23
24
QUEPS: QUEnch Parallel Simulator
A new quench simulator from scratch. work in progress. Start simple but with a look ahead: born as a parallel program: C++, MPI+OMP multiple-material cable solenoids only, no iron (for now) cooling: adiabatic, convection discretization: finite volume solver: BiCGSTAB + preconditioner CCR Workshop – May LNGS 24
25
QUEPS @ KNL and Xeon Test with a solenoid from literature
30 layers, 24 turns per layer cable is thin, NbTi + Cu + outer insulation adiabatic simulated time: 1.2 sec (quench inception at t=0) mesh: 3.8 millions cells tests on a single KNL node: OpenMP, MPI, hybrid comparison with a dual Xeon E x8 cores @ 2.6 GHz CCR Workshop – May LNGS 25
26
QUEPS @ KNL and Xeon: MPI
MCDRAM configured in cache mode execution times in minutes 64 processes 128 processes 256 processes a2a 125 81 86 quad 134 80.5 85 snc4 125.5 78.5 82 16 processes 32 processes (hyperthreading) dual Xeon 59.5 77 CCR Workshop – May LNGS 26
27
QUEPS @ KNL and Xeon: OpenMP
MCDRAM configured in cache mode execution times in minutes 64 threads 128 threads 256 threads a2a 55.5 39 35.5 quad 53 34 snc4 58 32 16 threads 32 threads (hyperthreading) dual Xeon 65 CCR Workshop – May LNGS 27
28
OpenFOAM (MPI) single node tests - icoFoam
OpenFOAM tests OpenFOAM (MPI) single node tests - icoFoam CCR Workshop – May LNGS 28
29
OpenFOAM (MPI) single node tests - pimpleFoam
OpenFOAM tests OpenFOAM (MPI) single node tests - pimpleFoam pimpleFoam [transient, incompressible, LES]_4Mcells CCR Workshop – May LNGS 29
30
MCNP (MPI+OMP) single node tests
MPCN tests MCNP (MPI+OMP) single node tests only OMP only MPI CCR Workshop – May LNGS 30
31
MCNP (MPI+OMP) 2 nodes tests
MPCN tests MCNP (MPI+OMP) 2 nodes tests With 2x-hypertreading Speed-UP about 1.54 Better performance with optimal ratio «MPI_processes / OMP_threads» CCR Workshop – May LNGS 31
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.