Presentation is loading. Please wait.

Presentation is loading. Please wait.

DIANE Project CHEP 03 DIANE Distributed Analysis Environment for semi- interactive simulation and analysis in Physics Jakub T. Moscicki,

Similar presentations


Presentation on theme: "DIANE Project CHEP 03 DIANE Distributed Analysis Environment for semi- interactive simulation and analysis in Physics Jakub T. Moscicki,"— Presentation transcript:

1 Jakub.Moscicki@cern.ch DIANE Project CHEP 03 DIANE Distributed Analysis Environment for semi- interactive simulation and analysis in Physics Jakub T. Moscicki, CERN/IT

2 Jakub.Moscicki@cern.ch DIANE Project CHEP 03 The need for distribution do the analysis/simulation job in parallel tasks to speed up the work by using powerful, worldwide distributed computentional resources, acessing the data in mass storage systems otherwise too big to fit on your laptop.

3 Jakub.Moscicki@cern.ch DIANE Project CHEP 03 Practical Example example: simulation with analysis each task produces a file with histograms job result = sum of histograms produced by tasks master-worker model client starts a job workers perform tasks and produce histograms master integrates the results

4 Jakub.Moscicki@cern.ch DIANE Project CHEP 03 Tools at hand: local batch queue clusters/farms of PCs running batch queues use LSF or PBS to submit parallel analysis tasks producing histograms collect and post-process results by hand add all the resulting histogram files > foreach i (1 2 3 4 5 6 7 8 9 10) > bsub -q 8nh run-worker > end Job is submitted to queue.... >ls LSFJOB_250973 LSFJOB_250974 LSFJOB_250975

5 Jakub.Moscicki@cern.ch DIANE Project CHEP 03 Tools at hand: global batch queue federation of clusters also known as a GRID use EDG Resource Broker to submit tasks > dg-job-submit worker.jdl Connecting to host grid014.ct.infn.it, port 7771 Logging to host grid014.ct.infn.it, port 15830 ****************************************************************************************** JOB SUBMIT OUTCOME The job has been successfully submitted to the Resource Broker. Use dg-job-status command to check job current status. Your job identifier (dg_jobId) is: - https://grid014.ct.infn.it:7846/137.138.181.249/195456283026315?grid014.ct.infn.it:7771 ******************************************************************************************

6 Jakub.Moscicki@cern.ch DIANE Project CHEP 03 Comments using middleware directly requires a lot of manual work integration of task results keeping track of failed task and resubmiting workers not easy to monitor the job progress and cancel jobs only one task per worker very inefficient if worker initialization time is long

7 Jakub.Moscicki@cern.ch DIANE Project CHEP 03 User Wishlist automatic integration of task results monitoring of job progress and individual tasks automatic error-recovery policies granularity of the size of the task may change independently of the number of workers -- natural load-balancing and optimization of performance performance fine tuning – workers may be mapped to threads, processed or machines depending on the context uniform, transparent and easy user interface and API hiding complexity of underlying middleware mechanisms the same API and UI is used when running local jobs and GRID jobs batch, interactive and semi-interactive operation mode

8 Jakub.Moscicki@cern.ch DIANE Project CHEP 03 Wishlist (cntd) a lightweight “add-on” framework which drives the execution of parallel jobs in master worker model over any specific middleware implementation: application oriented: target common HEP use cases independent from any particular analysis tool with layered and modular architecture which is easy to adapt to new environment: important for middleware transition integrated in modern scripting environment: e.g. python using standards: e.g. exploit AIDA for analysis making it easy to plug your favourite analysis tool To address these issues DIANE Project was set up in CERN/IT

9 Jakub.Moscicki@cern.ch DIANE Project CHEP 03 DIANE Overview DIANE R&D Project started in 2001 in CERN/IT with very limited resources (~1FTE) collaboration with Geant 4 groups at CERN, INFN, ESA succesful prototypes running on LSF and EDG

10 Jakub.Moscicki@cern.ch DIANE Project CHEP 03 Applications of DIANE Examples of interdisciplinary applications Geant4 simulation and analysis speed-up factor ~ 30 times cern.ch/diane LHC: ntuple analysis and simulation radiotherapy: brachytherapy, IMRT space missions: ESA Bepi Colombo, LISA

11 Jakub.Moscicki@cern.ch DIANE Project CHEP 03 DIANE for HEP workgroup clusters  features  many users, many jobs  diverse applications:  ntuple analysis, simulation,...  interactive... semi-interactive... batch  ~ 100s of machines  dynamic environment  users may submit their analysis code  mixed CPU and I/O intensive  some applications may be preconfigured  general analysis e.g. ntuple projections or experiment specific apps  load balancing important

12 Jakub.Moscicki@cern.ch DIANE Project CHEP 03 DIANE for Simulation in Medical Apps  example: brachytherapy  optimization of the treatment planning by MC simulation  features  CPU intensive  few users, few jobs  one preconfigured application  interactive: seconds.. minutes  ~ 10s of machines  ongoing joint collaboration with G4 and hospital units in Torino, Italy

13 Jakub.Moscicki@cern.ch DIANE Project CHEP 03 DIANE for Simulation in Space Science  LISA: MC simulation for gravitational waves experiment  Bepi Colombo mission: HERMES experiment  features  CPU intensive  big jobs (10 processor-years)  preconfigured applications  batch: days  1000+ machines  requirements:  error recovery important  monitoring and diagnostics

14 Jakub.Moscicki@cern.ch DIANE Project CHEP 03 DIANE Prototype and Testing scalability tests 70 worker nodes 140 milion Geant 4 events

15 Jakub.Moscicki@cern.ch DIANE Project CHEP 03 DIANE Screenshot Sun Mar 16 14:58:31 2003 : DIANE.JobMaster.workerReady : worker 5 now ready Sun Mar 16 14:58:42 2003 : DIANE.JobMaster.run : number of tasks to finish: 1 len(self.master.job_progress) : 5 len(self.master.ready_workers) : 9 len(self.master.busy_workers) : 1 len(self.master.registered_workers):10 Sun Mar 16 14:58:45 2003 : DIANE.JobMaster.receiveTaskResult : recieved result, taskid =3 status: ok Processing file task-output2.hbk Adding histogram 10 Adding histogram 20 Scanned all IDs from 0 to 100, other HBOOK ids (if any) were ignored Sun Mar 16 14:58:45 2003 : DIANE.JobMaster.run : job completed ok, quitting control loop DIANE.JobMaster.notifyJobFinished : starting notification DIANE.JobMaster.notifyJobFinished : deactivating master DIANE.JobMaster.workerReady : master not activated DIANE.JobMaster.sendResultToClient : terminated... terminating JobMaster server process 312.520u 77.250s 15:09.53 42.8% 0+0k 0+0io 5835pf+0w [1] Done start_master

16 Jakub.Moscicki@cern.ch DIANE Project CHEP 03 DIANE Web Interface

17 Jakub.Moscicki@cern.ch DIANE Project CHEP 03 References more informarion: cern.ch/diane www.ge.infn.it/geant4/techtransf aida.freehep.org

18 Jakub.Moscicki@cern.ch DIANE Project CHEP 03 The end


Download ppt "DIANE Project CHEP 03 DIANE Distributed Analysis Environment for semi- interactive simulation and analysis in Physics Jakub T. Moscicki,"

Similar presentations


Ads by Google