Presentation is loading. Please wait.

Presentation is loading. Please wait.

LOGO Scheduling system for distributed MPD data processing Gertsenberger K. V. Joint Institute for Nuclear Research, Dubna.

Similar presentations


Presentation on theme: "LOGO Scheduling system for distributed MPD data processing Gertsenberger K. V. Joint Institute for Nuclear Research, Dubna."— Presentation transcript:

1 LOGO Scheduling system for distributed MPD data processing Gertsenberger K. V. Joint Institute for Nuclear Research, Dubna

2 NICA scheme Gertsenberger K.V.2

3 Multipurpose Detector (MPD) The software MPDRoot is developed for the MPD event simulation, reconstruction of experimental or simulated data and following physical analysis of heavy ion collisions registered by the MultiPurpose Detector at the NICA collider. 3Gertsenberger K.V.

4 Development of the NICA cluster 2 main directions of the development:  data storage development for the experiment  organization of parallel processing of the MPD events 4 development and expansion distributed cluster for the MPD experiment based on LHEP farm development and expansion distributed cluster for the MPD experiment based on LHEP farm Gertsenberger K.V.

5 Current NICA cluster in LHEP 5Gertsenberger K.V.

6 Data storage on the NICA cluster 6Gertsenberger K.V. Distributed file system GlusterFS  it aggregates existing file systems in a common distributed file system  automatic replication works as background process  background self- checking service restores corrupted files in case of hardware or software failure

7 Parallel MPD data processing PROOF server parallel data processing in ROOT macros on the parallel architectures concurrent data processing MPD-scheduler scheduling system for the task distribution to parallelize data processing on the cluster nodes 7Gertsenberger K.V.

8 MPD-scheduler  Developed on C++ language with ROOT classes’ support. SVN: mpdroot/macro/mpd_scheduler  Uses scheduling system the Sun Grid Engine system (qsub command) for execution in cluster mode.  SGE combines cluster machines at the LHEP farm (nc10, nc11 and nc13) into the pool of worker nodes with 34 logical processors.  Jobs for distributed execution on the NICA cluster are described and passed to MPD-scheduler as XML file: $ mpd-scheduler my_job.xml 8Gertsenberger K.V.

9 9 The description starts and ends with tag. Tag sets information about macro being executed by MPDRoot:  name – file path of a ROOT macro to execute, necessary parameter  start_event – number of the first event to process for all input files, optional  count_event – count of the events to process for all input files, optional  add_args – additional arguments of the ROOT macro, if required Job description. Tag. Gertsenberger K.V.

10 10 Tag defines files to process by macro above:  input – input file path  output – result file path  start_event – number of the first event in the input file, optional  count_event – count of the events to process in the input file, optional  paralell_mode – processor count to parallel event processing of input file, optional  merge – whether merge result part files in parallel_mode, default: “true” Gertsenberger K.V. Job description. Tag.

11 11 … … db_input – string for defining a list of files from MPD simulation database mpd.jinr.ru – net address of the server with simulation database and some selection parameters: range of the collision energy, type of the particle generator, particles of the collision, description and other. The list of special variables of argument “output”: ${counter} = file counter with start value and step being equal 1 ${input} = input file path ${file_name} = name of the input file without extension ${file_name_with_ext} = name of the input file with extension Gertsenberger K.V. Processing event files from MPD simulation database.

12 12 Tag describes run parameters and the allocated resources for the job:  mode – execution mode: ‘global’ – distributed processing on the NICA cluster, ‘local’ – multithreaded execution on a multicore computer  count – maximum count of the processors allocated for this job  config – path of a bash file with environment variables (including ROOT environment variables) being executed before macro  logs – log file path for multithreaded mode Gertsenberger K.V. Job description. Tag.

13 13 Tag with argument line is used to run a non-ROOT command. Running non-ROOT command on the NICA cluster Gertsenberger K.V. Job description. Non-ROOT command.

14 Local use MPD-scheduler can be used to parallel event processing on user multicore machine in local mode 14Gertsenberger K.V. <file input=“~/mpdroot/macro/mpd/evetest1.root" output="~/mpdroot/macro/mpd/mpddst1.root“ start_event=”0” count_event=”0”/> <file input="~/mpdroot/macro/mpd/evetest2.root" output="~/mpdroot/macro/mpd/mpddst2.root“ start_event=”0” count_event=”1000” parallel_mode=“5” merge=“true”/>

15 MPD-scheduler on the NICA cluster 15Gertsenberger K.V.15Gertsenberger K.V. SGE SGE = Sun Grid Engine server SGE = Sun Grid Engine worker *.root GlusterFS SGE batch system (10) (14) qsub evetest1.root SGE MPD-scheduler evetest2.root evetest3.root free free mpddst2.root job_reco.xml job_command.xml mpddst1.root mpddst3.root job_command.xml

16 The speedup of the one reconstruction on the NICA cluster 16Gertsenberger K.V.

17 The description of the scheduling system on mpd.jinr.ru 17Gertsenberger K.V.

18 Conclusions  The distributed NICA cluster was deployed based on LHEP farm for the NICA/MPD experiment (Fairsoft, ROOT/PROOF, MPDRoot, Gluster, Sun Grid Engine). 128 cores  The data storage was organized with the GlusterFS distributed file system: /nica/mpd[1-8]. 10 TB  The system for the distributed job execution – MPD-scheduler was developed to run MPDRoot macros concurrently on the cluster. It’s based on the Sun Grid Engine scheduling system.  The web site mpd.jinr.ru in section Computing – NICA cluster – Batch processing presents the manual for the developed MPD scheduling system. 18Gertsenberger K.V.


Download ppt "LOGO Scheduling system for distributed MPD data processing Gertsenberger K. V. Joint Institute for Nuclear Research, Dubna."

Similar presentations


Ads by Google