Download presentation

Presentation is loading. Please wait.

Published byBonnie Moore Modified about 1 year ago

1
Improvement of CT Slice Image Reconstruction Speed Using SIMD Technology Xingxing Wu Yi Zhang Instructor: Prof. Yu Hen Hu Department of Electrical & Computer Engineering University of Wisconsin, Madison

2
Motivation CT Slice Image Reconstruction is a very important part which will affect the reconstructed image quality and scanning speed CT Slice Image Reconstruction is very time- consuming Traditional methods for speedup: Specially designed hardware Parallel algorithm running on super computer Explore a new method: SIMD implementation

3
Parallel-Beam FBP Image Reconstruction Algorithm The Algorithm consists on three parts: data rebinning : data filtering back-projection

4
Parallel-Beam FBP Image Reconstruction Algorithm Projection: Data Rebinning: Data Filtering: Data Backprojection:

5
CT Slice Image Reconstruction Is Very Time Consuming A Whole Head Spiral Scanning will generate several GB projection data

6
Function Profiling

7
Can FBP Algorithm Benefit from SIMD? The Algorithm has the following features: Small, highly repetitive loops that operate on sequential arrays of integers and floating-point values Frequent multiplies and accumulates Computation-intensive algorithms Inherently parallel operations Wide dynamic range, hence floating-point based Regular memory access patterns Data independent control flow

8
Analysis of Data Dynamic Range and Quantization Errors Wide Dynamic Range Relative Error Metric 32-Bit Single-Precision Floating Point and SSE2

9
Updated Algorithm to Fit SIMD Update the algorithm to eliminate some conditional branches Reduce the on-the-fly calculations which are not suitable for the SIMD implementation

10
Parallel Implementation of Data Filtering In SIMD A0A1A2A3A4A5A6A7B0B1B2B3B4B5B6B7 Rebinned Data Weight A0*B0+A4*B4A1*B1+A5*B5A2*B2+A6*B6A3*B3+A7*B7 Filtered Data * * * * * * * *

11
Parallel Implementation of Backprojection in SIMD A0A1A2A B0B1B2B3C0C1C2C3D0D1D2D3E0E1E2E3F0F1F2F3G0G1G2G3H0H1H2H3 Index Calculation Index Ceil (index)Floor (index) Filtered Data Weight Reconstructed Image (fetch data)

12
Optimization of The Implementation Optimize Memory Access Ensure proper alignment to prevent data split across cache line boundary: data alignment, stack alignment, code alignment Observe store-forwarding constraints Optimize data structure layout and data locality to ensure efficient use of 64-byte cache line size and also reduce the frequency of memory loading and storing Use prefetching cacheability instructions control appropriately Minimize bus latency by segmenting the reads and writes into phases Replace Branches with Logic Operations Optimize Instruction Scheduling Optimize the Parallelism Loop Unrolling Break dependence chains

13
Optimization of The Implementation Optimize Instruction Selection avoid longer latency instruction avoid instructions that unnecessarily introduce dependence-related stalls Optimize the Floating-point Performance avoid exceeding the representable range avoid change floating-point control/status register enable flush-to-zero and DAZ mode

14
Improvement of Performance The differences of the reconstructed image pixel values between C implementation and SIMD implementation are less than 0.01

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google