Stencil-based Discrete Gradient Transform Using

Stencil-based Discrete Gradient Transform Using
Paper_97 Stencil-based Discrete Gradient Transform Using GPU Device in Compressed Sensing MRI Xuelin (Nick) Cui and Hongbin Guo FMI Medical Systems, Inc., Solon Rd., Solon, OH, U.S.A of corresponding author: Introduction Results The total variation (TV) has been widely incorporated in iterative image reconstruction using compressed sensing (CS) technique since its inception. Computation of TV requires computation of discrete gradient transform (DGT) of an image, and in order to pursue superior image quality, high-order calculation is employed here. However, high-order gradient computation is time consuming especially when programmed in central processing unit (CPU) code. Recently, graphic processing unit (GPU) has brought revolutionary changes in clinical context due to its extremely powerful parallelism of computation. A GPU simultaneously executes same set of instructions (single instruction; multiple data or SIMD) with its massive number of threads. In this work we demonstrate a stencil-based implementation of high-order DGT using shared memory can achieve exceptional image quality with fast computation in CS MRI. The reconstruction framework is tested on a MRI head image. The implementation is based on Microsoft Windows system PC equipped with a nVidia K2000M GPU with 2 GB memory. The sampling rate of MRI k-spcae is 30%. The test is conducted by comparing results from analytic reconstruction, CS reconstruction using regular DGT, and CS reconstruction using stencil-based high-order DGT. The results are shown in Fig. 3. Top left panel shows the original image as truth. Top right panel shows analytic reconstruction. Bottom left panel shows the CS MRI reconstruction with regular DGT. Bottom right shows the high-order stencil-based CS reconstruction. It is clear that the stencil-based method gives enhanced image precision with less artifacts and better contrast and resolution. A zoom-in of region of interest (ROI) is displayed in Fig. 4. The analytic method shows strong noise brain tissue. The regular DGT-based method shows less noise, but the detail of the anatomy is significantly lost. On the other hand, the stencil-based method shows distinct improvement comparing to both inverse Fourier-based method and regular DGT-based method. Fig. 5 shows halfway profile comparison between results, where stencil-based high order DGT method gives the best approximation to the truth. Fig. 6 shows stencil-based GPU implementation of higher-order DGT using shared memory converges significantly faster than CPU implementation. Fig. 3. Recon image comparison. Fig. 4. ROI comparison with details. Method The MRI system can be mathematically modeled as where is the system matrix, is the unknown image and is data. The TV-based CS model can be described as following equation [1] which can be further converted to a Lagrange-typed objective function In this optimization framework, TV prior is the key to govern image quality. Therefore, instead of using forward or backward finite difference, the stencil-based DGT is calculated here with the 8th order approximation to pursue exceptional precision. In particular, The computation of DGT of an image at location is defined as Where [2]. One of the major bottlenecks in GPU usage is to access global memory repeatedly and frequently. As described in Fig. 1, global memory is at the highest level and is slower than any other types of GPU memories. Therefore, we use shared memory here to store (2M +1)-point stencil with uniform spacing in the direction. Fig. 2 illustrates how an image is divided in to a group of tiles with stencil-sized shared memory surrounding a particular tile. In this configuration, the global memory (the image) is only loaded once during the entire process of computation. The shared memory is much faster than global memory and almost as fast as local registers. Stencil sizes are optimized by experiments such that coalescing on GPU device is achieved. The optimal stencil size in this work is 4, and it handles typical clinical image size such as or Fig. 5. Half-way profile comparison . Fig. 6. GPU Vs. CPU in performance. Fig. 2. An image is divided into tiles sized N X N with stencil size 2M + 1. Data in tile neighboring region are cached in shared memory Fig. 1. Diagram of parallelism architecture of CUDA and its architecture of memory. Conclusion High-order DGT can improve image quality in iterative based reconstruction with undersampled clinical data. GPU implementation of high-order DGT using stencil-based shared memory can significantly improve computational performance comparing to conventional implementation. References 1. X. Cui, H. Yu, G. Wang, and L. Mili, “Total variation minimizationbased multimodality medical image reconstruction,” Proc. SPIE 9212, Developments in X-Ray Tomography IX, vol. 9212, p. 11, 2014. 2. B. Fornberg, “Generation of finite difference formulas on arbitrarily spaced grids,” Mathematics of Computation, vol. 51, no. 184, pp. 699– 706, 1988. 3. Nvidia. (2016) Nvidia cuda c programming guide. [Online]. Available:

Stencil-based Discrete Gradient Transform Using

Similar presentations

Presentation on theme: "Stencil-based Discrete Gradient Transform Using"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Stencil-based Discrete Gradient Transform Using

Similar presentations

Presentation on theme: "Stencil-based Discrete Gradient Transform Using"— Presentation transcript:

Similar presentations

About project

Feedback