Geant4 based simulation of radiotherapy in CUDA

Slides:



Advertisements
Similar presentations
Computer Simulation for Emission Tomography: Geant4 and GATE Xiao Han Aug
Advertisements

The Vin č a Institute of Nuclear Sciences, Belgrade, Serbia * Universita degli studi di Bologna, DIENCA, Italia Radovan D. Ili}, Milan Pe{i}, Radojko Pavlovi}
Optimization on Kepler Zehuan Wang
GAMOS tutorial Histogram and Scorers Exercises
Development of an Interface for Using EGS4 Physics Processes in Geant4 K.Murakami (KEK) 27/Mar./
Dose Calculations A qualitative overview of Empirical Models and Algorithms Hanne Kooy.
MONTE-CARLO TECHNIQUES APPLIED TO PROTON DOSIMETRY AND RADIATION SAFETY F. Guillaume, G. Rucka, J. Hérault, N. Iborra, P. Chauvel 1 XXXV European Cyclotron.
DCABES 2009 China University Of Geosciences 1 The Parallel Models of Coronal Polarization Brightness Calculation Jiang Wenqian.
Electromagnetic Physics I Joseph Perl SLAC National Accelerator Laboratory (strongly based on Michel Maire's slides) Geant4 v9.3.p01 Standard EM package.
K. LaihemE166 collaboration LCWS06 Bangalore March 12th 2006 The E166 experiment Development of a polarized positron source for the ILC. Karim Laihem on.
J. Tinslay 1, B. Faddegon 2, J. Perl 1 and M. Asai 1 (1) Stanford Linear Accelerator Center, Menlo Park, CA, (2) UC San Francisco, San Francisco, CA Verification.
Development of an Interface for Using EGS4 Physics Processes in Geant4 K.Murakami (KEK) 27/Mar./
Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Presented by: Ahmad Lashgar ECE Department, University of Tehran.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
Geant4: Electromagnetic Processes 2 V.Ivanchenko, BINP & CERN
Sergey Ananko Saint-Petersburg State University Department of Physics
R 3 B Gamma Calorimeter Agenda. ● Introduction ● Short presentation on the first ● Task definition for R&D period ( )
St. Petersburg State University. Department of Physics. Division of Computational Physics. COMPUTER SIMULATION OF CURRENT PRODUCED BY PULSE OF HARD RADIATION.
Photons e- e+ Bremsstrahlung Comton- scattered photon Pair production Photoelectric absorption Delta ray There are no analytical solutions to radiation.
Validation and TestEm series Michel Maire for the Standard EM group LAPP (Annecy) July 2006.
IFluka : a C++ interface between Fairroot and Fluka Motivations Design The CBM case: –Geometry implementation –Settings for radiation studies –Global diagnosis.
Offline Coordinators  CMSSW_7_1_0 release: 17 June 2014  Usage:  Generation and Simulation samples for run 2 startup  Limited digitization and reconstruction.
YOU LI SUPERVISOR: DR. CHU XIAOWEN CO-SUPERVISOR: PROF. LIU JIMING THURSDAY, MARCH 11, 2010 Speeding up k-Means by GPUs 1.
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
Medical Accelerator F. Foppiano, M.G. Pia, M. Piergentili
Detector Simulation on Modern Processors Vectorization of Physics Models Philippe Canal, Soon Yung Jun (FNAL) John Apostolakis, Mihaly Novak, Sandro Wenzel.
- DHRUVA TIRUMALA BUKKAPATNAM Geant4 Geometry on a GPU.
Calorimeters Chapter 4 Chapter 4 Electromagnetic Showers.
Detector Simulation Presentation # 3 Nafisa Tasneem CHEP,KNU  How to do HEP experiment  What is detector simulation?
"Distributed Computing and Grid-technologies in Science and Education " PROSPECTS OF USING GPU IN DESKTOP-GRID SYSTEMS Klimov Georgy Dubna, 2012.
1 Calorimeter in G4MICE Berkeley 10 Feb 2005 Rikard Sandström Geneva University.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
Adam Wagner Kevin Forbes. Motivation  Take advantage of GPU architecture for highly parallel data-intensive application  Enhance image segmentation.
QCAdesigner – CUDA HPPS project
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
IRCC & Mauriziano Hospital & INFN & S Croce e Carle Hospital
Bremsstrahlung Splitting Overview Jane Tinslay, SLAC March 2007.
Improvement of the Monte Carlo Simulation Efficiency of a Proton Therapy Treatment Head Based on Proton Tracking Analysis and Geometry Simplifications.
Geant4 Activities in Japan Some news from Takashi Sasaki, Koichi Murakami, Akinori Kimura and colleagues.
STEIN Analysis for CINEMA Using GEANT4 Seongha Park Kyung Hee University KHU/SSR,
University of Michigan Electrical Engineering and Computer Science Adaptive Input-aware Compilation for Graphics Engines Mehrzad Samadi 1, Amir Hormati.
Geant4 on GPU prototype Nicholas Henderson (Stanford Univ. / ICME)
P. Rodrigues, A. Trindade, L.Peralta, J. Varela GEANT4 Medical Applications at LIP GEANT4 Workshop, September – 4 October LIP – Lisbon.
Pedro Arce Introducción a GEANT4 1 GAMOS tutorial RadioTherapy Exercises Pedro Arce Dubois CIEMAT
3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.
Dae-Hyun Kim Dept. of Biomedical Engineering The Catholic University of Korea Department of Biomedical Engineering Research Institute.
S. Pardi Frascati, 2012 March GPGPU Evaluation – First experiences in Napoli Silvio Pardi.
Validation of GEANT4 versus EGSnrc Yann PERROT LPC, CNRS/IN2P3
Geant4 Simulation for KM3 Georgios Stavropoulos NESTOR Institute WP2 meeting, Paris December 2008.
Koichi MurakamiGeant4 Physics Verification and Validation (17-19/Jul./2006) 1 Results from the recent carbon test beam at HIMAC Koichi Murakami Statoru.
MCS overview in radiation therapy
NVIDIA® TESLA™ GPU Based Super Computer By : Adam Powell Student # For COSC 3P93.
GRID & Parallel Processing Koichi Murakami11 th Geant4 Collaboration Workshop / LIP - Lisboa (10-14/Oct./2006) 1 GRID-related activity in Japan Go Iwai,
Jun Doi IBM Research – Tokyo Early Performance Evaluation of Lattice QCD on POWER+GPU Cluster 17 July 2015.
Exploiting Graphics Processors for High-performance IP Lookup in Software Routers Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu IEEE INFOCOM.
GPU-based iterative CT reconstruction
INTERCOMPARISON P3. Dose distribution of a proton beam
Parallelized JUNO simulation software based on SNiPER
Genomic Data Clustering on FPGAs for Compression
Leiming Yu, Fanny Nina-Paravecino, David Kaeli, Qianqian Fang
Monte Carlo studies of the configuration of the charge identifier
Linchuan Chen, Xin Huo and Gagan Agrawal
NVIDIA Fermi Architecture
The Hadrontherapy Geant4 advanced example
The n-3He Simulation Using Geant4
P. Rodrigues, A. Trindade, L.Peralta, J. Varela
6- General Purpose GPU Programming
Presentation transcript:

Geant4 based simulation of radiotherapy in CUDA Koichi Murakami (KEK / CRC) Stanford ICME, SLAC, G4-Japan Collaboration supported by NVIDIA Sep/26/2013 18th Geant4 Collaboration meeting

Geant4 @ The Collaboration Should we include a KEK logo? Who else to list for g4? Mention ccoe? Special thanks to the CUDA Center of Excellence Program Sep/26/2013 18th Geant4 Collaboration meeting

18th Geant4 Collaboration meeting Contents Project Overview CUDA Parallelization Performance Sep/26/2013 18th Geant4 Collaboration meeting

18th Geant4 Collaboration meeting G4CU Project Overview Dose calculation for radiation therapy GPU-powered parallel processing with CUDA boost-up calculation speed CUDA porting of Geant4 Functions voxel geometry including DICOM interface material : water with variable densities limited Geant4 EM physic processes electron/positron/gamma scoring dose in each voxel DCMTK GPU processing Dose DICOMRT-Dose gMocren Sep/26/2013 18th Geant4 Collaboration meeting

18th Geant4 Collaboration meeting Physics Processes particles : electron, positron, gamma energy range up to 100 MeV material: water with variable densities processes: electron / positron energy loss (ionization, bremsstrahlung) multiple scattering positron annihilation gamma Compton scattering photo electric effect gamma conversion physics tables dE/dx, range, etc are retrieved from Geant4 Physics tables are prepared for "standard" water and rescaled with the density of each voxel. Sep/26/2013 18th Geant4 Collaboration meeting

18th Geant4 Collaboration meeting CUDA Basics “SIMD” architecture : Single Instruction, Multiple Data CUDA is a data parallel language wants to run same instruction on multiple pieces of data Coalesced memory access to maximize memory throughput, we want a single read to satisfy as many threads as possible Memory hierarchy CUDA provides access to several device memory types: global, shared, constant, texture better memory usage for better performance Race conditions arise when multiple CUDA threads attempt to write to same location in global memory may happen in dose accumulation we avoid race conditions or using atomic operations Sep/26/2013 18th Geant4 Collaboration meeting

Parallelization Challenges in Geant4 Programming Methodology CUDA porting of large and complex code large scale of object-oriented design inter-dependencies in many places Branching, look-up tables, single-thread optimizations Software Complexity Sophisticated geometry and tracking management Elaborate physics models Sep/26/2013 18th Geant4 Collaboration meeting

18th Geant4 Collaboration meeting G4CU Basics Each GPU thread processes a single track until it dies or exits GPU runs on 32k CUDA threads (256 threads on128bloks) A track stores data for particle spices, position, direction, energy, etc Each thread has two stacks one for storing secondary particles one for recording the energy dose in a voxel Periodic actions: check termination conditions generate primaries pop secondary particles Sep/26/2013 18th Geant4 Collaboration meeting

Notes on Dose Accumulation 2 critical issues on performance: Race conditions might arise when multiple CUDA threads attempt to write to same location in global memory. That may happen in dose accumulation in each voxel. Two ideas were tried: parallel stack for dose and reduction atomicAdd operation in CUDA -> better performance explicit memory access Double precision variables are used for dose. other variables are single precision. prevent from overflow small energy dep. + large accumulated dose Sep/26/2013 18th Geant4 Collaboration meeting

Performance Profiling nvpof is helpful to identify performance bottlenecks. Process Process total Post step PIL Along step Manag. Init. Step length At-rest Component total (%) 100.00 52.80 20.73 14.16 4.19 3.78 3.46 0.89 Bremsstrahlung 32.81 29.83 2.23 0.75 Ionization 14.83 1.70 3.50 8.84 0.79 Photo-electric effect 10.79 8.80 1.57 0.41 Gamma conversion 10.67 8.72 1.54 Multiple scattering 10.50 3.43 4.58 2.49 Transport 8.67 1.17 5.23 0.74 0.57 0.96 Compton scattering 4.20 2.14 1.58 0.48 Management Pair production 2.56 0.44 1.23 0.36 0.53 Electron deletion 0.43 Sep/26/2013 18th Geant4 Collaboration meeting

18th Geant4 Collaboration meeting Hardware & SDK GPU: Tesla K20 (Kepler) 2496 cores, 706 MHz, 5GB GDDR5 (ECC) SDK: CUDA 5.5, CURAND CPU: Xeon X5680 (3.33GHz) Sep/26/2013 18th Geant4 Collaboration meeting

Phantom Geometry Configurations voxel size: 5x5x2mm 30.5 cm Sep/26/2013 18th Geant4 Collaboration meeting

18th Geant4 Collaboration meeting Particle Sources Mono-energetic source : 20 MeV electron 6 MeV photons Spread energy source generated by medical linac 6 MV & 18 MV photons Sep/26/2013 18th Geant4 Collaboration meeting

Depth Dose Distribution, CPU vs. GPU Depth dose distribution for water phantom 6MeV photon and 20 MeV electron (pencil beam) depth dose along central voxels 100M primaries Sep/26/2013 18th Geant4 Collaboration meeting

Comparisons for Slab Phantoms Lung/Bone as an inner material 6MV photon 100M(CPU) vs 1M(GPU) Sep/26/2013 18th Geant4 Collaboration meeting

Computation Time of Geant4/CPU and G4CU/GPU Primary Phantom Time/History CPU (sec) Time/History GPU (sec) CPU/GPU 20 MeV electron Water 1.06E-03 2.52E-05 42.1 6 MeV photon 4.47E-04 1.12E-05 39.9 CPU: Intel Xeon X5680 (3.33 GHz) GPU : NVIDIA Tesla K20 G4CU achieves about 40 times speedup against CPU-based Geant4 simulation. Sep/26/2013 18th Geant4 Collaboration meeting

18th Geant4 Collaboration meeting Software workflow water phantom Visualization DICOM Data DCMTK G4CU voxel geom. dose output G4 voxel geom. GDD IO GDD file ROOT ROOT file Histogramming Sep/26/2013 18th Geant4 Collaboration meeting

Visualization with gMocren 50M 6MeV photons, pencil beam shot on head voxel size: 1.7 x 1.7 x 1.2 mm segments: 128 x 128 x 64 ≈ 1M voxels Just for Demonstration Sep/26/2013 18th Geant4 Collaboration meeting

18th Geant4 Collaboration meeting Summary Collaborative activity on Geant4-GPU between Stanford ICME, SLAC, and G4-Japan (KEK), supported by NVIDIA Focused on medical application Dose calculation in voxel domain Geant4 EM physics processes CUDA porting of Geant4 Core part of implementation was done. Verification against CPU version of Geant4 well-agreed in 1st order Performance gain about 40 times Several ongoing follow-ups robustness, performance improvements, improved code Sep/26/2013 18th Geant4 Collaboration meeting