Presentation is loading. Please wait.

Presentation is loading. Please wait.

22-25 Oct. 2012M. Baldauf (DWD)1 Limited area model weather prediction using COSMO with emphasis on the numerics of dynamical cores (including some remarks.

Similar presentations


Presentation on theme: "22-25 Oct. 2012M. Baldauf (DWD)1 Limited area model weather prediction using COSMO with emphasis on the numerics of dynamical cores (including some remarks."— Presentation transcript:

1 22-25 Oct. 2012M. Baldauf (DWD)1 Limited area model weather prediction using COSMO with emphasis on the numerics of dynamical cores (including some remarks on the NEC vector supercomputer) Weather and climate prediction on next generation supercomputers 22-25 Oct. 2012, UK MetOffice, Exeter Michael Baldauf (DWD), Oliver Fuhrer (MeteoCH), Bogdan Rosa, Damian Wojcik (IMGW)

2 22-25 Oct. 2012M. Baldauf (DWD)2 Summary 1.Revision of the current dynamical core (‚Runge-Kutta‘): redesign of the fast waves solver 2.Computational aspects on the NEC vector computer a stencil library for COSMO 3.Alternative dynamical core developments in COSMO 4.A modified idealised test case with an analytic solution for the compressible Euler equations

3 Revision of the current dynamical core - redesign of the fast waves solver Time integration scheme of COSMO: Wicker, Skamarock (2002) MWR: split-explicit 3-stage Runge-Kutta  Stable integration of upwind advection (large  T); these tendencies are added in each fast waves step (small  t)

4 22-25 Oct. 2012M. Baldauf (DWD)4 D = div v ‚Fast waves‘ processes (p'T'-dynamics): f u, f v,... denote advection, Coriolis force and all physical parameterizations soundbuoyancy artificial divergence damping stabil. whole RK-scheme Spatial discretization: centered differences (2nd order) Temporal discretization: horizontally forward-backward, vertically implicit (Skamarock, Klemp (1992) MWR, Baldauf (2010) MWR)

5 22-25 Oct. 2012M. Baldauf (DWD)5 Main changes towards the current solver: 1.improvement of the vertical discretization: use of weighted averaging operators for all vertical operations 2.divergence in strong conservation form 3.optional: complete 3D (=isotropic) divergence damping 4.optional: Mahrer (1984) discretization of horizontal pressure gradients additionally some 'technical' improvements; hopefully a certain increase in code readability overall goal: improve numerical stability of COSMO  new version fast_waves_sc.f90 contained in official COSMO 4.24

6 22-25 Oct. 2012M. Baldauf (DWD)6 1. Improvement of the vertical discretization Averages from half levels to main level: Averages from main levels to half level with appropriate weightings (!): centered differences (2nd order if used for half levels to main level) G. Zängl could show the advantages of weighted averages in the explicit parts of the fast waves solver. New: application to all vertical operations (also the implicit ones)

7 22-25 Oct. 2012M. Baldauf (DWD)7 Divergence with weighted average of u (and v ) to the half levels: Divergence with only arithmetic average of u (and v ) to the half levels: Discretization error in stretched grids; Divergence not a consistent discretization if s  1 ! 1/s ·dz dz s · dz

8 22-25 Oct. 2012M. Baldauf (DWD)8 buoyancy term with weighted average of T' ( T 0 exact): buoyancy term with arithmetic average of T' ( T 0 exact): buoyancy term with weighted average for T ' and T 0 : Discretization error in stretched grids; Buoyancy ( T' / T 0 ) dz s · dz 1/s ·dz

9 22-25 Oct. 2012M. Baldauf (DWD)9 Quasi-3D - divergence damping in terrain following coordinates Stability criterium:  in particular near the bottom (  x,  y >>  z) a strong reduction of  div is necessary! This violates the requirement of not too small  div in the Runge-Kutta-time splitting scheme ( xkd ~0.1 (Wicker, Skamarock, 2002), in Baldauf (2010) MWR even xkd ~0.3 is recommended). Otherwise divergence damping is calculated as an additive tendency (no operator splitting)  a certain 'weakening' of the above stability criterium is possible … additionally necessary for stability: don't apply divergence damping in the first small time step! (J. Dudhia, 1993 (?))

10 22-25 Oct. 2012M. Baldauf (DWD)10 Flow over a Gaussian mountain in a vertically stretched grid h max = 2200 m a = 3000 m u 0 = 20 m/s N = 0.01 1/s Fr h = 0.9 Fr a = 0.67 max  = 27° max | h i -h i+1 | = 500 m grid ( z top = 25 km):  z bottom  25 m  z top  750 m current fast waves solver

11 22-25 Oct. 2012M. Baldauf (DWD)11 Flow over a Gaussian mountain in a vertically stretched grid h max = 2200 m a = 3000 m u 0 = 20 m/s N = 0.01 1/s Fr h = 0.9 Fr a = 0.67 max  = 27° max | h i -h i+1 | = 500 m grid ( z top = 25 km):  z bottom  25 m  z top  750 m current fast waves solver

12 22-25 Oct. 2012M. Baldauf (DWD)12 Flow over a Gaussian mountain in a vertically stretched grid h max = 2200 m a = 3000 m u 0 = 20 m/s N = 0.01 1/s Fr h = 0.9 Fr a = 0.67 max  = 27° max | h i -h i+1 | = 500 m grid ( z top = 25 km):  z bottom  25 m  z top  750 m current fast waves solver

13 22-25 Oct. 2012M. Baldauf (DWD)13 Flow over a Gaussian mountain in a vertically stretched grid h max = 2200 m a = 3000 m u 0 = 20 m/s N = 0.01 1/s Fr h = 0.9 Fr a = 0.67 max  = 27° max | h i -h i+1 | = 500 m grid ( z top = 25 km):  z bottom  25 m  z top  750 m current fast waves solver

14 22-25 Oct. 2012M. Baldauf (DWD)14 Flow over a Gaussian mountain in a vertically stretched grid h max = 2200 m a = 3000 m u 0 = 20 m/s N = 0.01 1/s Fr h = 0.9 Fr a = 0.67 max  = 27° max | h i -h i+1 | = 500 m grid ( z top = 25 km):  z bottom  25 m  z top  750 m current fast waves solver

15 22-25 Oct. 2012M. Baldauf (DWD)15 Flow over a Gaussian mountain in a vertically stretched grid h max = 3100 m a = 3000 m u 0 = 20 m/s N = 0.01 1/s Fr h = 0.65 Fr a = 0.67 max  = 35° max | h i -h i+1 | = 710 m grid ( z top = 25 km):  z bottom  25 m  z top  750 m new fast waves solver

16 22-25 Oct. 2012M. Baldauf (DWD)16 Flow over a Gaussian mountain in a vertically stretched grid h max = 4800 m a = 3000 m u 0 = 20 m/s N = 0.01 1/s Fr h = 0.42 Fr a = 0.67 max  = 47° max | h i -h i+1 | = 1100 m grid ( z top = 25 km): :  z bottom  25 m  z top  750 m new fast waves solver + Mahrer (1984)-discretization

17 24.04.2012M. Baldauf (FE13)17 Winter case experiment COSMO-DE Quality scores New / Old fast waves solver

18 24.04.2012M. Baldauf (FE13)18 Summer case experiment COSMO-DE Quality scores New / Old fast waves solver

19 24.04.2012M. Baldauf (FE13)19 Summer case experiment COSMO-DE Hit rates New / Old fast waves solver

20 Case study '13.07.2011, 0 UTC run' - 21h precipitation sum new FWRadar (RY/EY)current FW

21 upper air verification of COSMO-DE L65 with new FW solver compared with Parallel-Routine COSMO-DE L50 new fast wavesold fast waves BiasRMSE

22 22 Improved stability in real case applications COSMO-DE 28.06.2011, 6 UTC runs (shear instability case) must be cured by Smagorinsky diffusion operational deterministic and several COSMO-DE-EPS runs crashed after ~16 h COSMO-DE 12.07.2011, 6 UTC run (‚explosion‘ of qr in Alps) only stable with Bott2_Strang COSMO-DE L65, 15.07.2011, 12 UTC run model crash after 215 timesteps (~ 1.5 h) COSMO-2, 16.06.2011, 0 UTC run (reported by O. Fuhrer) model crash after 1190 timesteps (~6.6 h) ‚COSMO-1, 24.08.2011‘, resolution 0.01° (~ 1.1km) 1700 * 1700 grid points (reported by A. Seifert) model crash after 10 time steps... COSMO-DE L65 run stable for 2 months ('1.7.-31.8.2012') one counterexample: COSMO-EU (7 km) crash ('09.09.2012') by horizontal shear instability  must be cured by Smagorinsky-diffusion 08-11 Oct. 2012

23 from Axel Seifert (DWD) simulated radar reflectivity COSMO-run with a resolution of 0.01° (~ 1.1km) 1700 * 1700 grid points model crash after 10 time steps with the current fast waves solver stable simulation with the new FW

24 22-25 Oct. 2012M. Baldauf (DWD)24 Summary New fast waves solver behaves more stable in several realistic test cases  this is a necessary prerequisite to further develop COSMO for smaller scale applications! this is partly due to the fact that it allows steeper slopes Mahrer (1984) discretization available as an option and allows even steeper slopes (but still sometimes problems in real applications) efficiency: the new fast waves solver needs about 30% more computation time than the current one on the NEC-SX9  about 6% more time in total for COSMO-DE currently the new fast waves solver is under testing in the parallel routine at DWD and for the COSMO-1 km project at MeteoCH

25 Computational aspects Today computer performance is limited mostly by memory access, less by the compute operations = "Memory bandwidth bottleneck" computational intensity CI := number of calculations / number memory accesses CI~1 would be nice; experience: CI~0.6 is realistic

26 17 Oct. 2011M. Baldauf (DWD) One production and one research computer NEC SX-9, each with: 30 nodes with16 CPUs / node = 480 vector processors Peak Vector Performance / CPU: 100 GFlops Peak Vector Performance total: 48 TFlops main memory / node: 512 GByte main memory total: 15.4 TByte Internode crossbar switch (NEC IXS): 128 GB/s bidirectional Supercomputing environment at DWD (Sept. 2012) Login nodes: SUN X4600 15 nodes with 8 processors (AMD Opteron QuadCore)/node = 120 processors (2 nodes for interactive login) main memory / node: 128 GB Database server: two SGI Altix 4700 replacement end of 2013, but only with the same computing power

27 main memory (banked) CPU with vector registers principle of a vector computer banked memory: (example: NEC-SX8 has 4096 memory banks) Pipelining Prozessortakt banked memory  prefer vectorisation along the most inner loop a(i)a(i+1)...a(i+4095) a(i+4096)a(i+4097)...a(i+8191) a(i+8192)... Bank 1Bank 2Bank 4096

28 Personal experience with optimization on the NEC SX9 Analysis tools (ftrace, use of compiler analysis directives) are good Vectorisation for dynamical cores on structured mesh is not a problem 'memory bandwidth bottleneck'  increase in 'computational intensity' (CI) 'loop fusion' was the most important measure to reduce runtime (if multiple access to the same field element can be achieved) ftrace-output for the new fast waves solver: EXCLUSIVE MOPS MFLOPS V.OP AVER. BANK CONFLICT PROC.NAME TIME[sec]( % ) RATIO V.LEN CPU PORT NETWORK 61.331( 15.8) 49164.6 22572.2 98.30 207.8 4.120 22.510 fast_waves_sc = ~22% of peak performance Other optimizations (loop unrolling, …) are well done by the compiler (at least on structured grids) there exist several compiler-direktives (!CDIR...), which can increase the performance (e.g. use of the ADB (a relatively small cache (256 kB)). This is more for the NEC experts...

29 Optimisation strategy: 'Loop fusion' do i=1, n c(i) = a(i) + b(i) end do ( here all sort of things happen, but without change of a(i)) do i=1, n d(i) = 5.0 * a(i) end do do i=1, n c(i) = a(i) + b(i) d(i) = 5.0 * a(i) end do ( here all sort of things happen,...) now a multiple access to a(i) is possible  clear increase of performance EXCLUSIVE AVER.TIME MFLOPS V.OP AVER. BANK CONFLICT PROC.NAME TIME[sec]( % ) [msec] RATIO V.LEN CPU PORT NETWORK 0.368( 53.3) 1.841 11208.3 98.95 255.9 0.006 0.177 (1) 0.305( 44.2) 1.526 13518.6 99.39 255.9 0.010 0.128 (2) Example: c(x,y,z) = a + b, d(x,y,z) = 5 * a (1) (2)

30 How High Performance Computing (HPC) will look like in the future? Massive Parallelisation: O(10 5... ) processors (by even reduced clock frequency  power constraints) Even for HPC hardware development is mainly driven by consumer market Our models must run in a heterogenous world of hardware architectures (CPU's, Graphical Processing Units (GPU),...)  COSMO priority project ‚POMPA‘ (Performance on Massively Parallel Architectures) Project leader: Oliver Fuhrer (MeteoCH) in close collaboration with the Swiss national ‘High Performance and High Productivity Computing (HP2C)’ initiative http://www.hp2c.ch/projects/cclm/ http://www.hp2c.ch/projects/cclm/

31 Scalability of COSMO with 1500  1500  50 gridpoints, 2.8 km, 3 h, no output source: Fraunhofer institute/SCAI, U. Schättler (DWD) Additionally: although the scalability of the computations (dnamics, physics) is not bad, the efficiency of the code is not satisfying: NEC SX-9: 13 % of peak IBM pwr6: about 5-6 % of peak Cray XT4: about 3-4 % of peak (?)

32 O. Fuhrer (MeteoCH)

33 33 O. Fuhrer (MeteoCH)

34 34 O. Fuhrer (MeteoCH)  Domain Specific Embedded Language (DSEL)

35 35 O. Fuhrer (MeteoCH)

36 36 O. Fuhrer (MeteoCH)

37 22-25 Oct. 2012M. Baldauf (DWD)37 Pro‘s and Con‘s of a domain specific embedded language Stencil library can hide all the ugly stuff needed for different computer architectures / compilers (cache based, GPU‘s: CUDA for Nvidia, …) Developers are not longer working in ‚their‘ programming language, (instead they use the library) How will debugging be done? If needed features are not (yet) available in the library, how will the development process be affected? Do we need a common standard of such a library (like MPI for inter-node parallelisation)? Which communities might participate (weather prediction, climate, computational fluid dynamics,...)?

38 Alternative dynamical core projects in COSMO

39 COSMO-Priority Project 'Conservative dynamical core' (2008-2012) Main goals: develop a dynamical core with at least conservation of mass, possibly also of energy and momentum better performance and stability in steep terrain 2 development branches: assess aerodynamical implicit Finite-Volume solvers (Jameson, 1991) assess dynamical core of EULAG (e.g. Grabowski, Smolarkiewicz, 2002) and implement it into COSMO M. Ziemianski, M. Kurowski, B. Rosa, D. Wojcik, Z. Piotrovski (IMGW), M. Baldauf (DWD), O. Fuhrer (MeteoCH)

40 EULAG; Identity card anelastic approximated equations (Lipps, Hemler, 1982, MWR) FV discretization (A grid); MPDATA for advection terrain-following coordinates non-oszillatory forward-in-time (NFT) integration scheme GMRES for elliptic solver (other options are possible, but less common) Why EULAG? conservation of momentum, tracer mass total mass conservation seems to be accessible flux form eq. for internal energy ability to handle steep slopes good contacts between main developers (P. Smolarkiewicz,...) and IMGW (Poland) group

41 Exercise: how to implement an existing dynamical core into another model? ‚Fortran 90-style‘ (dynamic memory allocation, …) introduced into EULAG code domain decomposition: same Adaptation of EULAG data structures to COSMO: array EULAG (1-halo:ie+halo, …, 1:ke+1)  array COSMO (1:ie+2*halo, …, ke:1) A-grid  C-grid Rotated (lat-lon) coordinates in COSMO Adaptation of boundary conditions: EULAG elliptic solver needs globally non-divergent flow  adjust velocity at the boundaries Physical tendencies are integrated inside of the NFT-scheme (in 1st order)

42 1. MOTIVATION Example: Realistic Alpine flow Simulations have been performed for domains covering the Alpine region Three computational meshes with different horizontal resolutions have been used - standard domain with 496  336  61 grid points and horizontal resolution of 2.2 km (similar to COSMO 2 of MeteoSwiss) - the same as in COSMO 2 but with resolution 1.1 km in horizontal plane - truncated COSMO 2 domain (south-eastern part) with 0.55 km in horizontal resolution Initial and boundary conditions and orography from COSMO model of MeteoSwiss. Simulation with the finest resolution has separately calculated unfiltered orography. TKE parameterization of sub-scale turbulence and friction (COSMO diffusion- turbulence model) Heat diffusion and fluxes turned on Moist processes switched on Radiation switched on (Ritter and Geleyn MWR 1992) Simulation start at 00:00 UTC (midnight), 12 November 2009 Results are compared with Runge-Kutta dynamical core

43 Horizontal velocity at 500 m CE RK 12h 24h 12h 24h ‚12. Nov. 2009, 0 UTC + …‘

44 R-K Vertical velocity CE ‚12. Nov. 2009, 0 UTC + …‘

45 1. MOTIVATION R-K CE Cloud water – simulations with radiation ‚12. Nov. 2009, 0 UTC + …‘

46 Summary a new dynamical core is available in COSMO as a prototype split-explicit, HE-VI (Runge-Kutta, leapfrog): finite difference, compressible semi-implicit: finite difference, compressible COSMO-EULAG: finite volume, anelastic Realistic tests for Alpine flow with COSMO parameterization of friction, turbulence, radiation, surface fluxes. For the performed tests no artificial smoothing was required to achieve stable solutions The solutions are generally similar to Runge-Kutta results and introduce more spatial variability. In large number of tests (idealized, semi-realistic and realistic) we have not found a case in which an anelastic approximation would be a limitation for NWP. Outlook Follow-up project „COSMO-EULAG operationalization (CELO)“ (2012-2015) project leader: Zbigniew Piotrowski (IMGW)

47 47 cooperation between DWD, Univ. Freiburg, Univ. Warwick goals of the DWD: new dynamical core for COSMO high order conservation of mass, momentum and energy / potential temperature scalability to thousands of CPUs. ➥ use of discontinuous Galerkin methods HEVI time integration (horizontal explicit, vertical implicit) terrain following coordinates coupling of the physical parameterisations reuse of the existing parameterisations A new dynamical core based on Discontinuous Galerkin methods Project ‘Adaptive numerics for multi-scale flow’, DFG priority program ‘Metström’ D. Schuster, M. Baldauf (DWD)

48 48 test case linear mountain overflow Test case of: Sean L. Gibbons. Impacts of sigma coordinates on the Euler and Navier-Stokes equations using continuous Galerkin methods. PhD thesis, Naval Postgradudate School, March 2009. COSMO-RK COSMO-DG DG with 3D linear polynomials RK2 explicit time integration

49 An analytic solution for linear gravity waves in a channel as a test case for solvers of the non-hydrostatic, compressible Euler equations

50 Existing analytic solutions: stationary flow over mountains linear: Queney (1947,...), Smith (1979,...), Baldauf (2008) non-linear: Long (1955) for Boussinesq-approx. atmosphere non-stationary, linear expansion of gravity waves in a channel Skamarock, Klemp (1994) for Boussinesq-approx. atmosphere Most of the other idealized tests only possess 'known solutions' gained by other numerical models. There exist even less analytic solutions which use exactly the equations of the numerical model under consideration, i.e. in a sense that a numerical model converges to this solution. One exception is presented here: linear expansion of gravity/sound waves in a channel For development of dynamical cores (or numerical methods in general) idealized test cases are an important evaluation tool An analytic solution for linear gravity waves in a channel as a test case for solvers of the non-hydrostatic, compressible Euler equations M. Baldauf (DWD) 50

51 Non-hydrostatic, compressible, 2D Euler equations in a flat channel (shallow atmosphere) on an f-plane For analytic solution only one further approximation is needed: linearisation (= controlled approximation) around an isothermal, steady, hydrostatic atmosphere at rest (f  0 possible) or with a constant basic flow U 0 (and f=0) most LAMs using the compressible equations should be able to exactly use these equations in the dynamical core

52 Bretherton-, Fourier- and Laplace-Transformation  Analytic solution for the Fourier transformed vertical velocity w The frequencies ,  are the gravity wave and acoustic branch, respecticely, of the dispersion relation for compressible waves in a channel with height H ; k z = (  / H )  m k x  c s 2 / g   c s / g   m =0 m =1 m =2 analogous expressions for u b (k x, k z, t ),...

53 Small scale test with a basic flow U 0 =20 m/s f=0 Black lines: analytic solution Shaded: COSMO Initialization similar to Skamarock, Klemp (1994)

54 Large scale test U 0 =0 m/s f=0.0001 1/s Initialisation similar to Skamarock, Klemp (1994) Black lines: analytic solution Shaded: COSMO

55 Convergence properties of COSMO COSMO has a spatial-temporal convergence rate of about 0.7...1 very fine resolution is necessary to recognize this convergence behaviour order 1 order 0.7 order 1 order 0.7 L  -error L 2 -error T‘w‘

56 56 Temporal order conditions for the time-split RK3 fast processes (e.g. sound, gravity wave expansion): slow processes (e.g. advection): Insertion into the time-split RK3WS integration scheme: compare with exp [  T ( P s + P f ) ]  at most 2nd order, but only for: slow process = Euler forward and n s  ; fast process can be of 2nd order n s :=  T /  t Baldauf (2010) MWR

57 Convergence properties of DUNE - small scale test DUNE developed at Universities of Freiburg, Heidelberg, Stuttgart (Dedner et al., 2007) Discontinuous Galerkin Method of 2nd order and 2nd order time integration (Heun-scheme) Simulations performed by Slavko Brdar (Univ. Freiburg, Germany) slightly better than 2nd order convergence (evaluation or setup problem on finest resolution?) order 2 L  -error L 2 -error T‘w‘

58 Summary An analytic solution of the compressible, non-hydrostatic Euler equations was derived  a reliable solution for a well known test exist and can be used not only for qualitative comparisons but even as a reference solution for convergence tests 'standard' approximations used: f-plane, shallow atmosphere, can be easily realised in every atmospheric model only one further approximation for derivation: linearisation for fine enough resolutions COSMO has a spatial-temporal convergence rate of about 0.7...1. (discretizations used: spatial 2..5 order, temporal: < 2nd order) M. Baldauf, S. Brdar: An Analytic solution for Linear Gravity Waves in a Channel as a Test for Numerical Models using the Non-hydrostatic, Compressible Euler Equations, submitted to Quart. J. Roy. Met. Soc. (partly financed by the 'Metström' program of DFG) Recently a similar solution for the compressible equations on the sphere was found  talk at PDEs on the sphere

59


Download ppt "22-25 Oct. 2012M. Baldauf (DWD)1 Limited area model weather prediction using COSMO with emphasis on the numerics of dynamical cores (including some remarks."

Similar presentations


Ads by Google