Presentation is loading. Please wait.

Presentation is loading. Please wait.

Operational computing environment at EARS Jure Jerman Meteorological Office Environmental Agency of Slovenia (EARS)

Similar presentations


Presentation on theme: "Operational computing environment at EARS Jure Jerman Meteorological Office Environmental Agency of Slovenia (EARS)"— Presentation transcript:

1 Operational computing environment at EARS Jure Jerman Meteorological Office Environmental Agency of Slovenia (EARS)

2 Outline Linux Cluster at Environmental Agency of Slovenia, history and present state Linux Cluster at Environmental Agency of Slovenia, history and present state Operational experiences Operational experiences Future requirements for limited area modelling Future requirements for limited area modelling Needed ingredients for future system? Needed ingredients for future system?

3 History & background EARS: small service, limited resources for NWP EARS: small service, limited resources for NWP Small NWP group, research & operations Small NWP group, research & operations First research Alpha-Linux cluster (1996) – 20 nodes First research Alpha-Linux cluster (1996) – 20 nodes First Linux operational cluster at EARS (1997) First Linux operational cluster at EARS (1997) 5 x Alpha CPU 5 x Alpha CPU One among first operational clusters in Europe in the field of meteorology One among first operational clusters in Europe in the field of meteorology

4 Tuba – current cluster system Installed 3 years ago, already outdated Installed 3 years ago, already outdated Important for gathering of experiences Important for gathering of experiences Hardware: Hardware: 13 Compute Nodes, 13 Compute Nodes, 1 Master Node, Dual Xeon 2.4 Ghz, 1 Master Node, Dual Xeon 2.4 Ghz, 28 GB memory 28 GB memory Gigabit Ethernet Gigabit Ethernet Storage: 4 TB IDE2SCSI disk array, xfs filesystem Storage: 4 TB IDE2SCSI disk array, xfs filesystem

5 Tuba software Open source, whenever possible Cluster management software: Cluster management software: OS: RH Linux + SCore (5.8.2) (www.pccluster.org) OS: RH Linux + SCore (5.8.2) (www.pccluster.org)www.pccluster.org Mature parallel environment Mature parallel environment Lower latency MPI implementation Lower latency MPI implementation Transparent to user Transparent to user Gang scheduling Gang scheduling Pre-empting Pre-empting Checkpointing Checkpointing Parallel shell Parallel shell Automatic fault recovery (hardware or SCore) Automatic fault recovery (hardware or SCore) FIFO scheduler FIFO scheduler Capability of integration with OpenPBS and SGE Capability of integration with OpenPBS and SGE Lahey and Intel compilers Lahey and Intel compilers

6 Ganglia - Cluster Health monitoring

7 Operational experiences In production for almost 3 years In production for almost 3 years Unmonitored suite Unmonitored suite Minimal hardware related problems so far! Minimal hardware related problems so far! Some problems with SCore (mainly related to buffers in MPI) Some problems with SCore (mainly related to buffers in MPI) NFS related problems NFS related problems ECMWF's SMS, solves majority of problems ECMWF's SMS, solves majority of problems

8 Reliability

9 Operational setup ALADIN model 290x240x37 domain 290x240x37 domain 9.3 km resolution 9.3 km resolution 54h integration 54h integration Target: 1 h Target: 1 h

10 Optimizations Not everything in a hardware Code optimizations B-Level parallelization (up two 20 % at greater number of processors) B-Level parallelization (up two 20 % at greater number of processors) Load balancing of grid point computations (depending on the number of processors) Load balancing of grid point computations (depending on the number of processors) Parameter tuning Parameter tuning NPROMA cash tuning NPROMA cash tuning MPI message size MPI message size Improvement in compilers (Lahey –> Intel 8.1 20 – 25 %) Improvement in compilers (Lahey –> Intel 8.1 20 – 25 %) Still to work on: OpenMP (better efficiency of memory usage) Still to work on: OpenMP (better efficiency of memory usage)

11 Non operational use Downscaling of ERA-40 reanalysis with ALADIN model Downscaling of ERA-40 reanalysis with ALADIN model Estimation of wind energy potential over Slovenia Estimation of wind energy potential over Slovenia Multiple nesting of target computational domain into ERA- 40 data Multiple nesting of target computational domain into ERA- 40 data 10 years period, 8 years / month 10 years period, 8 years / month Major question: How to ensure coexistence with operational suite Major question: How to ensure coexistence with operational suite

12 Foreseen developments in limited area modeling Currently ALADIN 9 km Currently ALADIN 9 km 2008-2009 Arome, 2.5 km : ALADIN NH solver + Meso NH physics 2008-2009 Arome, 2.5 km : ALADIN NH solver + Meso NH physics 3 times more expensive per Grid Point 3 times more expensive per Grid Point Target Arome: ~200 x – 300 x more expensive (same computational domain, same time range) Target Arome: ~200 x – 300 x more expensive (same computational domain, same time range)

13 How to get there (if?) Linux commodity cluster at EARS? First upgrade in the mid 2006 First upgrade in the mid 2006 5 times the current system (if possible, below 64 processors) 5 times the current system (if possible, below 64 processors) Tests going on with: Tests going on with: New processors: AMD Opteron, Intel Itanium-2 New processors: AMD Opteron, Intel Itanium-2 Interconnection: Infinyband, Quadrics? Interconnection: Infinyband, Quadrics? Compilers: PathScale (AMD Opteron) Compilers: PathScale (AMD Opteron) Crucial: Parallel file system (TerraGrid), already installed, replacement of NFS Crucial: Parallel file system (TerraGrid), already installed, replacement of NFS

14 How to stay at the open side of the fence? Linux and other OpenSource projects are evolving Linux and other OpenSource projects are evolving Great number of more and more complex software projects Great number of more and more complex software projects Specific (operational) requirements in meteorology Specific (operational) requirements in meteorology Space for system integrators Space for system integrators Price/performance gap between commodity and brand name systems is getting smaller when the size of system is growing Price/performance gap between commodity and brand name systems is getting smaller when the size of system is growing Pioneer time of Beowulf clusters seems to be over Pioneer time of Beowulf clusters seems to be over Importance of extensive test of all cluster components Importance of extensive test of all cluster components

15 Conclusions Positive experiences with small commodity Linux cluster, great price/performance ratio Positive experiences with small commodity Linux cluster, great price/performance ratio Our present type of development of new cluster works for small cluster, might work for medium sized and doesn’t for big systems Our present type of development of new cluster works for small cluster, might work for medium sized and doesn’t for big systems Future are probably Linux clusters, but branded Future are probably Linux clusters, but branded


Download ppt "Operational computing environment at EARS Jure Jerman Meteorological Office Environmental Agency of Slovenia (EARS)"

Similar presentations


Ads by Google