Presentation is loading. Please wait.

Presentation is loading. Please wait.

Evaluation of Data-Parallel Splitting Approaches for H.264 Decoding

Similar presentations


Presentation on theme: "Evaluation of Data-Parallel Splitting Approaches for H.264 Decoding"— Presentation transcript:

1 Evaluation of Data-Parallel Splitting Approaches for H.264 Decoding
Florian H. Seitner, Michael Bleyer, Ralf M. Schreier, Margrit Gelautz International Conference on Advances in Mobile & Multimedia (MoMM 2008)

2 Outline Introduction Parallel H.264 Decoding Evaluated Methods
Experimental Results Conclusions

3 Introduction H.264 video standard is currently used in a wide range of video-related areas Video content distribution Television broadcasting High coding efficiency Qpel motion estimation Variable block size Multiple reference frames Significantly increased CPU and memory loads

4 Introduction Using multi-core systems to increase system performance
How to distribute H.264 decoding algorithm among multiple processing units ? The decoding load should be distributed equally Data dependency issues Inter-communication Synchronization

5 Introduction The aim of this work is to evaluate the behavior of different decoding approaches Run-time complexity Efficient core usage Data transfers

6 Parallel H.264 Decoding Functional and Data-parallel splitting
Functional partitioned decoding system Decoding tasks are assigned to individual processing cores Each processing unit can be optimized for a certain task Unequal workload distribution High transfer rate for inter-communication

7 Parallel H.264 Decoding Functional and Data-parallel splitting
Data-parallel decoding system Distributing MBs among multiple processing unit Data dependencies between different cores must be minimized MB distribution onto the processing cores must achieve an equal workload balancing

8 Parallel H.264 Decoding The H.264 Decoder
The H.264 decoding process Encoded Bitstream Inverse Quantization Inverse DCT Stream Parsing Entropy Decoder Deblocking + Spatial Prediction Motion Compensation Reference Frames Reconstructor Data-Parallel Processing Parser

9 Parallel H.264 Decoding Macroblock Dependencies
Data-parallel splitting of the decoder’s reconstruction module is challenging due to spatial and temporal dependencies Intra prediction Deblocking Inter prediction

10 Evaluated Methods Overview
Comparing the performance of five different approaches for accomplishing data-parallel splitting of the decoder’s reconstructor module Single row approach Multi-column approach Blocking slice-parallel method Nonblocking slice-parallel method Diagonal approach

11 Evaluated Methods Single Row Approach
The assignment of MBs to processors 2 Cores 4 Cores 8 Cores N is the number of processors Processor i ( i = 0, 1, …, N - 1 ) is responsible for decoding the yth row of MBs if ( y mod N ) = i

12 Evaluated Methods Single Row Approach
An example of SR approach ( 2 cores ) It takes a constant value of 1 unit of time to process a macroblock T = 2 T = 3 T = 8 T = 10 T = 34

13 Evaluated Methods Single Row Approach
Advantage Simplicity Only a small start delay Disadvantage So many dependencies across processor assignment borders

14 Evaluated Methods Multi-column Approach
The assignment of MBs to processors 2 Cores 4 Cores 8 Cores w is the width of a multi-column Processor i ( i = 0, 1, …, N - 1 ) is responsible for decoding a MB of the xth column if iw < x < ( i + 1)w

15 Evaluated Methods Multi-column Approach
An example of MC approach ( 2 cores ) Advantage Less dependencies across processors One processor has to wait for the results only at the boundaries T = 4 T = 5 T = 8 T = 36

16 Evaluated Methods Slice-parallel Approach
The assignment of MBs to processors 2 Cores 4 Cores 8 Cores h is the height of a slice Processor i ( i = 0, 1, …, N - 1 ) is responsible for decoding a MB of the yth row if ih < x < (i + 1)h

17 Evaluated Methods Slice-parallel Approach
An example of SP approach in the blocking version ( 2 cores) Disadvantage Long delay CPU idle, less core usage T = 26 T = 32 T = 58

18 Evaluated Methods Slice-parallel Approach
An example of SP approach in the non-blocking version ( 2 cores ) No dependencies is considered across slice boundaries (completely independent) NBSP requires having full control over the encoder T = 1 T = 32

19 Evaluated Methods Diagonal Approach
The assignment of MBs to processors Dividing the first line of MBs into equally-sized columns The assignments for the subsequent lines are derived by left-shifting the MB of the line above 2 Cores 4 Cores 8 Cores

20 Evaluated Methods Diagonal Approach
An example of DG approach T = 4 T = 10 T = 12 T = 13 T = 16 T = 18 T = 20 T = 23 T = 24 T = 43

21 Evaluated Methods Diagonal Approach
Comparing the inter-processor dependencies introduced by DG and MC approach Diagonal approach Multi-column approach Dependencies for CPU 2 originate solely from MB assigned to CPU1 MBs assigned to CPU 2 are also dependent on CPU 3

22 Experimental Results Overview
Test sequences Parameters GOP size = 14 Search range = +/- 16 pixels 5 reference frames

23 Experimental Results Run-time Complexity
Two major indicators for the efficiency of multi-core decoding system Decoder’s run-time A low run-time indicates a high system decoding performance Number of data-dependency stalls occurring during the decoding process The number of stalls provides an estimate on how efficiently the system’s computational resources are used

24 Experimental Results Run-time Complexity
Speed-up in run-time The speed increase for each parallelization approach in multiples of the single-core performance

25 Experimental Results Run-time Complexity
Stall cycles caused by data dependencies between the cores

26 Experimental Results Inter-communication
Memory transfer to and from the external DRAM and between the cores’ local memories are expensive in terms of power consumption and transfer time Core inter-communication Loading reference data and deblocking pixels

27 Experimental Results Inter-communication
Data transform volume for reference data and deblocking information

28 Conclusions In this study, we have evaluated 5 data-parallel approaches for the H.264 decoder The run-time of each parallelization approaches is influenced by the frame partitions’ sizes and shapes Large and dependency-minimizing partitions cause less inter-communication between cores


Download ppt "Evaluation of Data-Parallel Splitting Approaches for H.264 Decoding"

Similar presentations


Ads by Google