Presentation is loading. Please wait.

Presentation is loading. Please wait.

Porting MM5 and BOLAM codes to the GRID

Similar presentations


Presentation on theme: "Porting MM5 and BOLAM codes to the GRID"— Presentation transcript:

1 Porting MM5 and BOLAM codes to the GRID
Earth Science Workshop January 30, 2009 – Paris, France The SEE-GRID-SCI initiative is co-funded by the European Commission under the FP7 Research Infrastructures contract no Earth Science Workshop, Paris FRANCE, 30 Jan 2009

2 Goal Run MM5 and BOLAM models on the grid to perform Ensemble weather forecasting Develop a generic weather model execution framework Support for deterministic forecasting Easily adopt to various other forecast models (e.g. WRF, RAMS etc). Earth Science Workshop, Paris FRANCE, 30 Jan 2009

3 Target workflow Weather models follow a specific workflow of execution
Retrieval of Initial Conditions Pre- Processing Model Run Post Processing HTTP Framework should be able to incorporate different codes for pre/post-processing and model execution. Parametric configuration of initial data retrieval N.O.M.A.D.S NCEP-GFS (USA) Earth Science Workshop, Paris FRANCE, 30 Jan 2009

4 Requirements Adopt existing NOA procedures for model execution
Hide the Grid as much as possible Give the feeling of local execution Simplify existing procedures and improve execution times Utilize high-level tools that facilitate better quality code and overcome low-level interactions with the Grid Satisfy specific model requirements Usage of commercial compiler not available in the Grid Time restrictions for completing application execution Earth Science Workshop, Paris FRANCE, 30 Jan 2009

5 Design Approach Keep existing “command-line” look’n’feel
Re-use and improve existing code base (shell scripts) Utilize Python language to replace various parts of the existing workflow Exploit the GANGA framework for job management and monitoring Earth Science Workshop, Paris FRANCE, 30 Jan 2009

6 Utilized Grid Services
gLite WMS – Job management LFC – Data management MPICH on gLite sites Ganga Developed in CERN. Endorsed by EGEE RESPECT program Provides a Python programming library and interpreter for object-oriented job management Facilitates high-level programming abstractions for job management More information: Earth Science Workshop, Paris FRANCE, 30 Jan 2009

7 Implementation Details
MM5 and BOLAM codes compiled locally in UI with PGI Fortran 3 different binaries produced for MM5, for 2, 6 and 12 CPUs respectively MPICH also compiled with PGI. MPICH libraries used for MM5 binaries generation Binaries were packed and stored on LFC Downloaded in WNs before execution Include Terrain data Models are running daily as cronjobs. Notifications are send to users by Log files and statistics are kept for post-mortem analysis Ganga also useful for identifying problems after execution Earth Science Workshop, Paris FRANCE, 30 Jan 2009

8 Implemented Architecture
UI UI N jobs UI WMS UI Ganga CE/WN LJM (Python) Lead-In/Out (shell script) ModelConfigfile LFC Binaries and Results SE Results Workflow Orchestrator (Python) Decode (Shell script) Pre-process (shell script) Model Run (shell script) Post-Process (shell script) http N.O.M.A.D.S NCEP-GFS (USA) mpiexec WN WN WN WN GRID Earth Science Workshop, Paris FRANCE, 30 Jan 2009

9 Ensemble Forecasting Each member is executed as a separate job
10 members in total, both for MM5 and BOLAM models Each member separately downloads its initial data from NCEP servers Whole ensemble execution is handled by a single compound job Compound job definition, execution and management handled by Ganga constructs (job splitters) Final stage of forecast production and graphics preparation performed locally on UI Earth Science Workshop, Paris FRANCE, 30 Jan 2009

10 Initial Performance Results
MM5: Typical execution time: ~2hrs (including scheduling overheads) Different completion times per member depending on total processors used. 12 process version takes ~40mins per member but takes longer time to get scheduled in a grid site. BOLAM: Typical execution time for 10 member ensemble forecast: 60-90mins (including scheduling overheads) One member takes ~25 minutes to complete in a local cluster with optimized binary. Ensemble would take ~4 hrs locally Overall, non uniformity of completion times due to Grid resources (un)availability Earth Science Workshop, Paris FRANCE, 30 Jan 2009

11 Adopting the framework for different models
Models that implement similar workflows should be easy to adopt Ultimately the user should only provide: The four workflow hooks Decode Pre-process Model run Post-process Model configuration file(s) Definition of different initial data sources Forecast region Terrain data Model binaries stored on the LFC Earth Science Workshop, Paris FRANCE, 30 Jan 2009

12 Problems/Pending Issues
Problems with initial data NCEP servers sometimes down or cannot generate requested files Grid resources availability imbed timely execution Not all members manage to complete on time Some may still be in scheduled state when time expires Grid robustness and predictability Jobs may be rescheduled while running in different sites for no apparent reason Central grid services might be unavailable (WMS, LFC) MM5 sensitive to execution environment Dying processes while model in parallel section MPI notoriously not well supported by grid sites (some sites “better” than others) Earth Science Workshop, Paris FRANCE, 30 Jan 2009

13 Future Work Application is still in pilot phase
Planned to execute “super-ensemble” runs by April Multi-model multi-analysis ensemble forecasting combining results from MM5, BOLAM, NCEP/ETA, NCEP/NMM Presentation to be delivered in UF4 in Catania. Anticipating more resources and better support from existing once. Earth Science Workshop, Paris FRANCE, 30 Jan 2009

14 Questions Earth Science Workshop, Paris FRANCE, 30 Jan 2009


Download ppt "Porting MM5 and BOLAM codes to the GRID"

Similar presentations


Ads by Google