Presentation is loading. Please wait.

Presentation is loading. Please wait.

Application of High Performance Computing to Situation Awareness Simulations Amit Majumdar Group Leader, Scientific Computing, San Diego Supercomputer.

Similar presentations

Presentation on theme: "Application of High Performance Computing to Situation Awareness Simulations Amit Majumdar Group Leader, Scientific Computing, San Diego Supercomputer."— Presentation transcript:

1 Application of High Performance Computing to Situation Awareness Simulations Amit Majumdar Group Leader, Scientific Computing, San Diego Supercomputer Center Associate Professor, Dept of Radiation Oncology University of California San Diego Application of High Performance Computing to Near-Real Time Simulations

2 Outline Academic High Performance Computing Applications Event-driven Science Online Adaptive Cancer Radiotherapy Dynamic Data Driven Image-guided Neurosurgery Summary 2

3 Academic High Performance Computing 3

4 TeraGrid NSF – National Science Foundation funds TeraGrid TeraGrid – NSF funded supercomputer centers in US – high BW connection Teraflop (TF) – 10 12 floating point operations/sec to Petaflop (PF) – 10 15 floating point operations/sec range HPC machines 11 Resource Providers, One Facility

5 NSF - TeraGrid TeraGrid is a facility that integrates computational, information, and analysis resources at the San Diego Supercomputer Center, the Texas Advanced Computing Center, the University of Chicago / Argonne National Laboratory, the National Center for Supercomputing Applications, Purdue University, Indiana University, Oak Ridge National Laboratory, the Pittsburgh Supercomputing Center, LSU, and the National Center for Atmospheric Research. SDSC TACC UC/ANL NCSA ORNL PU IU PSCNCAR Caltech USC-ISI Utah Iowa Cornell Buffalo UNC-RENCI Wisc LSU

6 5 top Top500 HPC Machines Top 5 November, 2009 Top 5 November, 2008

7 NSF HPC Perspective – Tflop - Pflop Track2 awards: Two plus one – 3 awards Track2-A/B: 30M$ for machine plus ~8-10M$/year operating cost - ~500 TF – 1PF range (peak) Ranger at TACC, U Texas (579 TF, ~62K cores) Kraken at NICS, ORNL (1 PF, ~99K cores) Track2-D: Three different machines : Data intensive, Experimental, Grid research Other awards for Visualization and Data systems Track1 award: One award – ~200M$ Multi PF system with sustained PF performance on scientific applications

8 Event-drive Science 8

9 On-demand Earthquake-induced Ground Wave Simulation Prof Jeroen Tromp (at Caltech when we collaborated, currently at Princeton) Caltechs near real time simulation of southern California seismic events using SPECFEM3D software Simulates SoCal seismic wave propagation based upon spectral element method (SEM) – a parallel MPI code The movies illustrate the up (red) and down (blue) velocity of Earths surface 9

10 Events Every time an earthquake of magnitude > 3.5 occurs in SoCal, 1000s of seismograms record at 100s of seismic stations epicenter, depth, intensity Automatically collect these seismic recordings from the SCSN via internet Subsequently simulate the seismic waves generated by the earthquake in a 3-D southern CA seismic velocity model using SCSN data After full 3-D wave simulation collect the surface motion data (disp, vel, accl) and map on top of the topography Render the data and generate movies Earthquake movies approved by a geophysicist at Caltech Movies are published – within ~45 mins of earthquake 10

11 On-demand HPC Earthquake can happen anytime On-demand HPC resources needed for fast simulation Code uses 144 cores (Intel Woodcrest dual-socket dual- core, 2.3 Ghz nodes) to complete simulations in about 20 mins HPC resources setup at SDSC – called Ondemand HPC This has special queue where Caltech shakemovie jobs can come in anytime automatically Batch software will kill other jobs to guarantee this job gets resources Results sent back to Caltech – all with no human intervention 11

12 Shake Movies Implications Emergency preparedness/response Tsunami warning Work is being extended to do global simulation Event: Sun Apr 11, 2010, 16:42:07; Lat:32.5285: Long: - 115:3433 Event 12

13 Online Adaptive Cancer Therapy 13

14 14 Conventional Radiotherapy Treatment simulation Build a virtual patient model Treatment planning Perform virtual treatment using virtual machine on virtual patient Treatment delivery Same treatment is repeated for many fractions Basic assumption: human body is a static system SimulationPlanning Days Treatment Repeat

15 15 Human Body Is A Dynamic System Week 1 Tumor Week 3 Van de Bunt et al. 06 Tumor volume shrinkage in response to the treatment Tumor shape deformation due to filling state change of neighboring organs Relative position change between tumor and normal organs

16 Consequence of Patient Anatomical Variation 16 An optimal treatment plan may become less optimal or not optimal at all Dose to tumor Dose to normal tissues Dose to tumor Tumor control Dose to normal tissues Toxicity Toxicity Prescribed tumor dose Tumor control

17 Solution Develop a new treatment plan that is optimal to patients new geometry Adaptive radiation therapy (ART) 17

18 18 SimulationPlanning Days 5-8 min On-board ImagingRe-planningTreatment Repeat Online ART On-board volumetric imaging has recently become available Major technical obstacle for clinical realization of online ART Real-time re-planning Imaging dose Clinical workflow

19 Our Solution to Real-time Re planning Problem Development of GPU-based computational tools 19

20 SCORE: Supercomputing On-line Re-planning Environment Project Goal To develop real-time re-planning tools based on GPUs Funded by a UC Lab Research Grant A collaboration with SDSC and Lawrence Livermore National Laboratory 20

21 Online Re-planning Process 21 Deformable Image Regis Dose Calculation Plan Re-optimization Plan Re-optimization Deformed pCT and Contours Dose Deposition Coefficients Planning CT w/ Contours Beam Setup Dose Distribution Initial Plan New Plan Treatment Planning System CBCT Reconstruction

22 Development of GPU-based Real- time Deformable Image Registration 22 Gu et al Phys Med Biol 55(1): 207-219, 2010

23 Deformable Image Registration 23 Morphing one image into another with correct correspondence

24 Deformable Image Registration with Demons 24 Gu et al Phys Med Biol 55(1): 207-219, 2010

25 Results for GPU-based Demons Algorithms MethodCase 1Case 2Case 3Case 4Case 5Average PF1.11/6.801.04 /7.181.36/7.392.51/6.491.84/7.241.57/7.02 ePF1.10/6.821.00/7.201.32/7.422.42/6.561.82/7.081.53/7.02 AF1.15/8.291.05/9.241.39/8.792.34/7.751.81/8.441.55/8.50 DF1.19/7.711.16/8.651.48/8.022.59/8.301.91/8.441.66/8.22 aDF1.11/8.361.02/8.691.35/8.972.27/7.771.80/8.701.51/8.50 IC1.24/11.071.28/11.471.42/11.543.27/10.461.67/10.981.78/11.10 25 3D spatial error (mm) / GPU time (s), image size 256×256×100 ~100x speedup compared to an Intel Xeon 2.27 GHz CPU

26 Development of GPU-based Real- time Dose Calculation 26 Gu et al Phys Med Biol 54(20) 6287-97, 2009 Jia et al Phys Med Biol 2010 (in print)

27 Finite-size Pencil Beam (FSPB) Model 27

28 Results for GPU-based FSPB Algorithm 28 Voxel size (cm 3 ) Beamlet size (cm 2 ) # Voxels ( 10 6 ) # Beamlets CPU Time (sec) GPU Time (sec) Speedup 0.50x0.50x0.500.20x0.20 0.22250021.220.06 373 0.37x0.37x0.370.20x0.20 0.51250042.800.10 409 0.30x0..30x0.300.20x0.20 1.00250078.270.18 419 0.25x0.25x0.250.20x0.20 1.732500124.540.30 421 0.25x0.25x0.250.25x0.251.731600120.140.29415 0.25x0.25x0.250.33x0.331.73900112.780.27416 0.25x0.25x0.250.50x0.501.73400100.770.24417 ~400x speedup compared to an Intel Xeon 2.27 GHz CPU < 1 sec for a 9-field prostate IMRT plan

29 Monte Carlo Dose Calculation on GPU Directly map DPM code on GPU Treat a GPU card as a CPU cluster 29 Start Transfer data to GPU including random # seeds, cross sections, and pre-generated e- tracks etc. a). Clean local counter b). Simulate one MC history on thread #1 c). Put dose to global counter a). Clean local counter b). Simulate one MC history on thread #1 c). Put dose to global counter Reach a preset # of histories ? End …… No Yes Transfer data from GPU to CPU a). Clean local counter b). Simulate one MC history on thread #1 c). Put dose to global counter a). Clean local counter b). Simulate one MC history on thread #1 c). Put dose to global counter a). Clean local counter b). Simulate one MC history on thread #1 c). Put dose to global counter a). Clean local counter b). Simulate one MC history on thread #1 c). Put dose to global counter

30 Results for GPU-based MC Dose Calculation 30 Case # Source type # of Histories Stan Dev CPU (%) Stan Dev GPU (%) T CPU (min) T GPU (min) T CPU /T GPU 1Electron10 7 0.660.658.31.84.5 2Photon10 9 0.41 94175.5 ~5x speedup compared to an Intel Xeon 2.27 GHz CPU < 3 min for 1% sigma for photon beams

31 Development of GPU-based Real- time Plan Re-optimization 31 Men et al Phys Med Biol 54(21):6565-6573, 2009 Men et al Phys Med Biol 2010 (under review) Men et al Med Phys 2010 (to be submitted)

32 Results of Real-time Re-planning We have developed GPU-based computational tools for real-time treatment re-planning For a typical 9-field prostate case The deformable registration can be done in 7 seconds The dose calculation takes less than 2 seconds The plan re-optimization takes less than 1 second (FMO), 2 seconds (DAP), or 30 seconds (VMAT) A new plan can be developed in about 10-40 seconds Online ART may substantially improve local tumor control while reducing normal tissue complications Tools can be used to solve other radiotherapy problems 32

33 Dynamic Data Driven Image-guided Neurosurgery A Majumdar 1, A Birnbaum 1, D Choi 1, A Trivedi 2, S. K. Warfield 3, K. Baldridge 1, and Petr Krysl 2 1 San Diego Supercomputer Center University of California San Diego 2 Structural Engineering Dept University of California San Diego 3 Computational Radiology Lab Brigham and Womens Hospital Harvard Medical School Grants: NSF: ITR 0427183,0426558; NIH:P41 RR13218, P01 CA67165, LM0078651, I3 grant (IBM) 33

34 Neurosurgery Challenge Challenges : Remove as much tumor tissue as possible Minimize the removal of healthy tissue Avoid the disruption of critical anatomical structures Know when to stop the resection process Compounded by the intra-operative brain shape deformation that happens as a result of the surgical process – preoperative plan diminishes Important to be able to quantify and correct for these deformations while surgery is in progress by dynamically updating pre-operative images in a way that allows surgeons to react to these changing conditions The simulation pipeline must meet the real-time constraints of neurosurgery – provide images approx. once/hour within few minutes during surgery lasting 6 to 8 hours

35 Intraoperative MRI Scanner at BWH

36 Brain Shape Deformation Before surgery After surgery

37 Example of visualization: Intra-op Brain Tumor with Pre-op fMRI

38 Overall Process Before image guided neurosurgery During image guided neurosurgery Segmentation and Visualization Preoperative Planning of Surgical Trajectory Preoperative Data Acquisition Preoperative data Intraoperative MRI SegmentationRegistration Surface matching Solve biomechanical Model for volumetric deformation Visualization Surgical process

39 Timing During Surgery Time (min) Before surgery During surgery 0 10 20 30 40 Preop segmentation Intraop MRI Segmentation Registration Surface displacement Biomech simulation Visualization Surgical progress

40 Current Prototype DDDAS Inside Hospital Pre and Intra-op 3D MRI (once/hr) Pre and Intra-op 3D MRI (once/hr) Local comput er at BWH Crude linear elastic FEM solution Merge pre and intra-op viz Intra-op surgical decision and steer Segmentation, Registration, Surface Matching for BC Once every hour or two for a 6 or 8 hour surgery

41 Two Research Aspects Grid Architecture – grid scheduling, on demand remote access to multi-teraflop machines, data transfer Data transfer from BWH to SDSC, solution of detail advanced biomechanical model, transfer of results back to BWH for visualization need to be performed in a few minutes Development of detailed advanced non-linear scalable viscoelastic biomechanical model To capture detail intraoperative brain deformation

42 End-to-end Timing of RTBM Timing of transferring ~20 MB files from BWH to SDSC, running simulations on 16 nodes (32 procs), transferring files back to BWH = 9* + (60** + 7***) + 50* = 124 sec. This shows that the grid infrastructure can provide biomechanical brain deformation simulation solutions (using the linear elastic model) to surgery rooms at BWH within ~ 2 mins using TG machines This satisfies the tight time constraint set by the neurosurgeons

43 Current and New Biomechanical Model Current linear elastic material model – RTBM Advanced model under development - FAMULS Advanced model is based on conforming adaptive refinement method – FAMULS package (AMR) Inspired by the theory of wavelets this refinement produces globally compatible meshes by construction First task was to replicate the linear elastic result produced by the RTBM code using FAMULS

44 Advanced Biomechanical Model The current solver is based on small strain isotropic elastic principle The new biomechanical model will be inhomogeneous scalable non-linear viscoelastic model with AMR We also want to increase resolution close to the level of MRI voxels i.e. millions of FEM meshes Since this complex model still has to meet the real time constraint of neurosurgery it requires fast access to remote multi-tflop systems

45 Summary HPC resources can enable near real time simulations for various scientific, engineering, and medical applications The architecture has to plan what are the right HPC resources how to access the HPC resources deal with data transfer etc. Overall this can facilitate Natural or man-made event-driven rapid response and preparedness Adaptive simulations to provide new capability Dynamic data driven simulations to enhance quality 45

Download ppt "Application of High Performance Computing to Situation Awareness Simulations Amit Majumdar Group Leader, Scientific Computing, San Diego Supercomputer."

Similar presentations

Ads by Google