Presentation is loading. Please wait.

Presentation is loading. Please wait.

Richard Sampson* Ming Yang† Siyuan Wei†

Similar presentations


Presentation on theme: "Richard Sampson* Ming Yang† Siyuan Wei†"— Presentation transcript:

1 Sonic Millip3De: Massively Parallel 3D Stacked Accelerator for 3D Ultrasound
Richard Sampson* Ming Yang† Siyuan Wei† Chaitali Chakrabarti† Thomas F. Wenisch* *University of Michigan †Arizona State University

2 Portable Medical Imaging Devices
Medical imaging moving towards portability MEDICS (X-Ray CT) [Dasika ‘10] Handheld 2D Ultrasound [Fuller ‘09] Not just a matter of convenience Improved patient health [Gunnarsson ‘00, Weinreb ‘08] Access in developing countries Why ultrasound? Low transmit power [Nelson ‘10] No dangers or side-effects

3 Handheld 3D Ultrasound 3D has numerous benefits over 2D
Easier to interpret images Greater volumetric accuracy … as well as many challenges 12k transducers, 10M image points 10-20x beyond state of the art High raw data bandwidth (6Tb/s) Major bottleneck in state of the art Tight handheld power budget (5W)

4 Why a Custom Accelerator?
Software algorithms load/store intensive von Neumann designs inefficient Large system would require over 700 DSPs General purpose CPUs even less efficient Architecture Energy/Scanline (1 fps) Single Core Time/Scanline Intel Core i7-2670 25.08J 4.46s ARM Cortex-A8 33.04J 132.18s TI C6678 DSP 2.84J 2.27s

5 Contributions Iterative delay calculation algorithm
Reduces storage by over 400x Enables streaming data flow Sonic Millip3De design Leverages 3D die stacking technology Transform-select-reduce accelerator framework Power and image analysis of Sonic Millip3De Negligible change in image quality Able to meet 5W power budget by 11nm node

6 Outline Introduction Ultrasound background Algorithm design
System design Sonic Millip3De Select Sub-Unit Results and analysis Conclusions

7 Ultrasound: Transmit and Receive
Image Space 𝜏 Receive Raw Channel Data Receive Transducer Focal Points Transmit Transducer

8 Ultrasound: Transmit and Receive
𝜏

9 Ultrasound: Transmit and Receive
𝜏

10 Ultrasound: Transmit and Receive
𝜏

11 Ultrasound: Transmit and Receive
𝜏

12 Ultrasound: Transmit and Receive
𝜏

13 Ultrasound: Transmit and Receive
𝜏

14 Ultrasound: Transmit and Receive
𝜏

15 Ultrasound: Transmit and Receive
𝜏

16 Ultrasound: Transmit and Receive
𝜏

17 Ultrasound: Transmit and Receive
𝜏

18 Ultrasound: Transmit and Receive
𝜏

19 Ultrasound: Transmit and Receive
𝜏

20 Ultrasound: Transmit and Receive
𝜏 Each transducer stores array of raw receive data

21 Ultrasound: Image Reconstruction
Image reconstructed from data based on round trip delay

22 Ultrasound: Image Reconstruction
Images from each transducer combined to produce full frame

23 Delay Index Calculation
Iterate through all image points for each transducer and calculate delay index Often done with lookup tables (LUTs) instead 50 GB LUT required for target 3D system 𝑃 𝜏 𝑃

24 Challenges of Handheld 3D Ultrasound
Delay index LUT requires too much storage New iterative algorithm reduces necessary constant storage by 400x Peak raw data bandwidth (6Tb/s) infeasible Sub-aperture multiplexing reduces peak data rate, but requires more transmits Handheld power budget very tight (5W) 3D stacked, highly parallel data streaming design reconstructs images efficiently

25 Iterative Delay Index Calculation
Deltas between adjacent focal points on a scanline form smooth curve Fit piecewise quadratic approx. to delta function Two sections sufficient for negligible error One section is off by 40 Section 1 Section 2

26 Sub-aperture Multiplexing
Peak raw data bandwidth (6Tb/s) infeasible Solution: sub-aperture multiplexing Transmit multiple times from same location Receive with subset of transducers (sub-aperture) Sum images together Prior work: reduce data rate Our design: also reduces HW and power requirements

27 System Design

28 Sonic Millp3De comprises 1,024 parallel pipelines
System Design Sonic Millp3De comprises 1,024 parallel pipelines

29 System Design: Transducers
Be sure to mention high power requirement Interchangeable CMOS transducer layer; can use older process

30 System Design: ADC/Storage
6kB SRAMs Separate storage layer to reduce wire lengths

31 System Design: Transform-Select-Reduce
Accelerator units in fast, low power process

32 Select Sub-Unit Design
Selects sample closest to each focal point using our algorithm

33 Select Sub-Unit Design
Section 1 Section 2 All delays for a scanline estimated using 9 constants

34 Select Sub-Unit Design
Section 1 Section 2 A(n+1)2 + B(n+1) + C = (An2 + Bn + C) + 2An + (A+B) Adders calculate next iteration of quadratic approximation

35 Select Sub-Unit Design
Section 1 Section 2 Decrementor selects sample for next image focal point

36 Select Sub-Unit Design
Section 1 Section 2 Section decrementor indicates when to change constants

37 Outline Introduction Ultrasound background Algorithm design
System design Sonic Millip3De Select Sub-Unit Results and analysis Conclusions

38 System Parameters Parameters Value Sub-apertures 12 Transmit Sources
16 Transmits per Frame 192 Transducers per Sub-aperture 1,024 Total Transducers 12,288 Storage per Transducer 4,096 x 12 bits Focal Points per Scanline 4,096 Image Depth 6 cm Image Angular Width π/4 Sampling Frequency 40 MHz Interpolation Factor 4x Interpolated Sampling Frequency (fs) 160 MHz Speed of Sound (tissue) 1,540 m/s Target Frame Rate 1 fps Standard aperture and parameters for 3D abdominal scans

39 Image Quality Comparison
Simulations using Field II [Jensen ‘92, ‘95] Ideal Our Design (12 bit) 11 bit Explain CNR (standard metric, gives ratio of contrast between cyst and tissue and the noise in those regions) Bits Ideal 14 13 12 11 10 CNR 2.972 2.942 2.960 2.536 2.233 Our design has negligible difference from ideal system

40 Power Analysis and Scaling
Can meet 5W by 11nm node

41 Conclusions 3D die stacked Sonic Millip3De design is able to meet 5W power budget by 11nm Algorithm/HW co-design enables order-of-magnitude gains Power and output quality goals often in conflict Need guidance from domain experts to balance Architects have much to offer for application-specific system designs

42 Questions? Special thanks to: Brian Fowlkes Oliver Kripfgans
Ron Dreslinski


Download ppt "Richard Sampson* Ming Yang† Siyuan Wei†"

Similar presentations


Ads by Google