Download presentation
Presentation is loading. Please wait.
Published byLoreen Perkins Modified over 8 years ago
1
RISICO on the GRID architecture First implementation Mirko D'Andrea, Stefano Dal Pra
2
Outline of the presentation ➲ Porting features; ➲ Jobs management; ➲ Implementation tests and results; ➲ Conclusions and further development.
3
Porting features ➲ Totally implemented in python. ➲ Uses the same executable of the RISICO system (no changes needed). ➲ Easily configurable through configuration file.
4
The RISICO system ➲ Italy: 310000 km^2 ➲ Current system: 300k regular cells, 1km side. ➲ Grid version: 30M regular cells, 0.1km side. GRIDIFICATION
5
RISICO vs GRID-RISICO Get Input from Database Run RISICO Write Output to Database GRIDIFICATION Get Input from Database Upload Input into catalog Create n jobs Run RISICO on dataset 1 Collect outputs from catalog Write Outputs to Database JOB 1 Get input from catalog Write output 1 to catalog Run RISICO on dataset n JOB n Get input from catalog Write output n to catalog
6
Job submission ➲ A RISICO's job is fully defined by a jdl (job description language) file and by a parameter file. ➲ Each submitted job must terminate successfully within a defined time. The job activity is monitored by a software module called JobMonitor. ➲ The job submission procedure is handled by a JobSubmitter, which creates a set of job and associates a JobMonitor with each job.
7
Job Monitoring ➲ All the jobs are monitored by an instance of a module called JobMonitor. ➲ The JobMonitor: Checks the job status during execution; Retrieves the job output from catalog; If the job fails, JobMonitor tries to resubmit it. JobMonitor will log the error if the job fails to run correctly.
8
Workflow: job creation, submission and data-collection ➲ Downloads input from remote meteo-data database, creates an archive and uploads it to catalog; ➲ Creates a jdl and parameters file for each job; ➲ Submits the jobs. ➲ Waits for jobs output. ➲ Gets jobs output from catalog and aggregates them.
9
Job definition (1) job 1 job n ➲ Each job works with a specific dataset defining a spatial domain (subset). ➲ Such subsets are created off-line and stored on the catalog. ➲ A parameters file states the association between a job and a dataset. ➲ Each job produces an output, whose path in the catalog is a-priori known.
10
Job definition (2) Job 1: Domain: celle/celle_01.tar.bz2 Status: celle/stato0_01.tar.bz2 Input: input/input_20070119.tar.bz2 Output: output/output_01_20071119.tar.bz2 ➲ Each job has its own domain. ➲ Job domain, status information and output are referred to the same geographical domain ➲ All jobs share the same input file.
11
Job definition (3) Job 2: Domain: celle/celle_02.tar.bz2 Status: celle/stato0_02.tar.bz2 Input: input/input_20070119.tar.bz2 Output: output/output_02_20071119.tar.bz2 Job n: Domain: celle/celle_nn.tar.bz2 Status: celle/stato0_nn.tar.bz2 Input: input/input_20070119.tar.bz2 Output: output/output_nn_20071119.tar.bz2 CATALOG Job 1: Domain: celle/celle_01.tar.bz2 Status: celle/stato0_01.tar.bz2 Input: input/input_20070119.tar.bz2 Output: output/output_01_20071119.tar.bz2
12
Final version ➲ Estimated performances on the complete set of data (30M cells): Total CPU-Time: about 2 hours and 30 minutes; Optimal job number: about 30 (5-10 minutes of CPU time for each job); Storage: 30GByte / day.
13
Test Results ➲ The porting has been tested with a subset (1M cells) of the RISICO system final working-set. ➲ 10 parallel jobs were used. ➲ Performances: Job CPU-time: 30 seconds Grid overhead: 2 minutes.
14
Conclusions ➲ RISICO represents a feasible and significative test case. ➲ Grid architecture provides a valuable benefits to operational activities.
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.