Presentation is loading. Please wait.

Presentation is loading. Please wait.

11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.

Similar presentations


Presentation on theme: "11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1."— Presentation transcript:

1 11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1 University of Michigan 2 Arizona State University

2 22 2 Customizing Wide-SIMD Architectures for H.264 Outline  Motivation  H.264 Analysis  Proposed Architecture  H.264 Kernel Mappings  Results  Conclusion 2

3 33 3 Customizing Wide-SIMD Architectures for H.264 Motivation – Smart Phone 3 Reference Images : http://www.apple.com/iphone/gallery/http://www.apple.com/iphone/gallery/

4 44 4 Customizing Wide-SIMD Architectures for H.264 Motivation – Inside Smart Phone 4 Reference Images : http://idannyb.files.wordpress.com/2008/07/xiuvbfueck3gsdum-large.jpghttp://idannyb.files.wordpress.com/2008/07/xiuvbfueck3gsdum-large.jpg

5 55 5 Customizing Wide-SIMD Architectures for H.264 H.264 Design 5 Reference Images : I. Richardson, “H.264 and MPEG-4 video compression,” WILEY, 2003 H.264 encoder/decoder reference design

6 66 6 Customizing Wide-SIMD Architectures for H.264 H.264 – Analysis  H.264 Kernel Algorithms  Heavy SIMD workload  Different natural SIMD widths  High & Medium Thread Level Parallelism  Need to support multiple SIMD widths to maximize the SIMD utilization 6

7 77 7 Customizing Wide-SIMD Architectures for H.264 H.264 – Analysis  Example – Deblocking Filter  Two dimensional data are used for multimedia algorithms.  Row or column order memory access works well for one set of edges, but not for the other.  Diagonal memory bank system helps to access blocks along a row or a column. 7 Horizontal Filtering Vertical Filtering

8 88 8 Customizing Wide-SIMD Architectures for H.264 H.264 – Analysis  Subgraphs for Innerloops of two kernel algorithms  Large amount of data locality  Large RF power consumption (Read/Write)  Bypass and Temporary buffer support 8

9 99 9 Customizing Wide-SIMD Architectures for H.264 H.264 - Analysis  Instruction Pairs  Heavy usage of shuffle and arithmetic operations  Add-Shift : round operation  Sub-Abs : SAD operation  Need to fuse the frequently used instruction pairs 9

10 10 Customizing Wide-SIMD Architectures for H.264 H.264 - Analysis  Permutation Patterns for Intraprediction  Fixed set of shuffle patterns  Need for programmable shuffle network 10

11 11 Customizing Wide-SIMD Architectures for H.264 Modified SIMD architecture 11

12 12 Customizing Wide-SIMD Architectures for H.264 Modified SIMD architecture 12 Multiple SIMD widths Thread-Level Parallelism

13 13 Customizing Wide-SIMD Architectures for H.264 Modified SIMD architecture 13 Diagonal Memory Organization Memory Bank System + Shuffle Network

14 14 Customizing Wide-SIMD Architectures for H.264 Modified SIMD architecture 14 Short-lived values stored in temporary buffers

15 15 Customizing Wide-SIMD Architectures for H.264 Modified SIMD architecture 15 Short-lived values Fused Operation

16 16 Customizing Wide-SIMD Architectures for H.264 Modified SIMD architecture 16 Shuffle Networks are placed here and there to align data

17 17 Customizing Wide-SIMD Architectures for H.264 Mapping of H.264 Kernels  Intra Prediction 17

18 18 Customizing Wide-SIMD Architectures for H.264 Results  System Breakdown  H.264 CIF video at 30fps 18

19 19 Customizing Wide-SIMD Architectures for H.264 Results  Speedup Breakdown  2.13x performance increase on average 19

20 20 Customizing Wide-SIMD Architectures for H.264 Results  Energy-Delay product comparison  29% energy-delay improvement on average 20

21 21 Customizing Wide-SIMD Architectures for H.264 Results 21  Comparison with latest H.264 encoders [17] T. C. Chen et.al, “2.8 to 62.7 mW low-power and power-aware H.264 encoder for mobile applications,” 2007 IEEE Symposium on VLSI Circuits, pp. 222–223, June 2007. [18] M. Bhatnagar, “TMS320DM6446/3 Power Consumption Summary,” Texas Instruments Application Reports, http://focus.ti.com/lit/an/spraad6a/spraad6a.pdf, Feb. 2008.

22 22 Customizing Wide-SIMD Architectures for H.264 Conclusion  Key architectural enhancements  SIMD partitioning  Diagonal memory bank system  Bypass and temporary buffer support  Fused operation support  Programmable crossbar  Future work  Image processing algorithms on SIMD architecture 22

23 23 Customizing Wide-SIMD Architectures for H.264 Backup Slides 23

24 24 Customizing Wide-SIMD Architectures for H.264 H.264 – Analysis  Diagonal Memory Organization  Two dimensional data are used for multimedia algorithms.  Blocks along a row or a column need to be accessed easily. 24

25 25 Customizing Wide-SIMD Architectures for H.264 Mapping of H.264 Kernels  Deblocking Filter 25

26 26 Customizing Wide-SIMD Architectures for H.264 Mapping of H.264 Kernels  Motion Compensation 26

27 27 Customizing Wide-SIMD Architectures for H.264 Mapping of H.264 Kernels  Motion Estimation 27


Download ppt "11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1."

Similar presentations


Ads by Google