Presentation is loading. Please wait.

Presentation is loading. Please wait.

Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad.

Similar presentations


Presentation on theme: "Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad."— Presentation transcript:

1 Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad Shafique, Jörg Henkel Chair for Embedded Systems (CES), Karlsruhe Institute of Technology (KIT), Germany Design, Automation and Test in Europe Conference and Exhibition (DATE), 2014

2 Outline  Introduction  Analysis of HEVC Encoder  Proposed method  Experimental Result  Conclusion 2

3 Outline  Introduction  Analysis of HEVC Encoder  Proposed method  Experimental Result  Conclusion 3

4 Introduction(1/4)  By aiming at 50% bit-rate reduction and preserving the same subjective video quality as H.264, HEVC has become a prime candidate to replace H.264 encoders.  This gain in compression efficiency comes at a high cost of computational complexity due to the inclusion of numerous additional encoding tools. 4

5 Introduction(2/4)  In real-world video encoding systems, the video must be compressed under tight constraints of time budget and output bit-rate.  The additional tools and timing-constraints give rise to several challenges for implementing a HEVC system on a hardware platform. 5

6 Introduction(3/4)  By distributing the workload of HEVC encoder on multiple cores, the total encoding time can be reduced that may potentially improve the overall energy efficiency.  HEVC standard allows exploiting parallel encoding tools, like slices and video tiles to fulfill these requirements. 6

7 Introduction(4/4)  HEVC video encoder on a many-core system must exploit the changing workload to reduce the total power consumption while meeting the quality of service demands.  The power consumption can be dynamically reduced by using a workload-driven operating frequency adaptation scheme. 7

8 Outline  Introduction  Analysis of HEVC Encoder  Proposed method  Experimental Result  Conclusion 8

9 Analysis of HEVC Encoder(1/3)  Tiles are at the lowest level of coding hierarchy. Therefore, they will consume least system memory, and hence, will be fastest among other parallelisms.  Unlike slices, tiles do not have their associated headers. Thus, tiles exhibit the potential to provide relatively better output video quality compared to slices. 9

10 Analysis of HEVC Encoder(2/3) 10

11 Analysis of HEVC Encoder(3/3)  A single tile per frame generates the best quality.  For T 2 tiles per frame, T×T tile results in the best video quality [24].  The total number of tiles within a slice must be minimized and the total tile-rows and tile-columns within a slice must be equal. 11 [24] C. Chi et al., “Improving the parallelization efficiency of HEVC decoding,” in ICIP, pp. 213–216, 2012.

12 Outline  Introduction  Analysis of HEVC Encoder  Proposed method  Experimental Result  Conclusion 12

13 Proposed method(1/2)  Workload Estimation  Select the tile structure and the maximum workload of each core.  considering operating frequency, total number of cores and frames per second.  Workload Allocator  Allocating workload to each core by utilizing user’s tolerance of the output bit-rate.  Workload Manager  Managing the workload by adapting the operating frequency of each core in order to reduce power consumption. 13

14 Proposed method(2/2) 14

15 Proposed method – Workload Estimation(1/3)  Tile Formation and Maximum Workload Estimation  Number of cores is determined to distribute the HEVC-Intra application’s workload.  Adjust the number of Intra directions to curtail the computational complexity. 15

16 Proposed method – Workload Estimation(2/3)  Workload is given by: 16

17 Proposed method – Workload Estimation(3/3) 17

18 Proposed method – Workload Allocator(1/6)  Workload Allocator  For workload balancing, an adaptation interval is defined. 18

19 Proposed method – Workload Allocator(2/6)  The starting tile of this interval is always a fully searched tile (θ = θ init,k ) to achieve best compression.  Workload of tile in the future frames is gradually adjusted down (θ ≤ θ init,k ) to reduce workload and power consumption.  If the total number of compressed bytes for the current NKT increases beyond a certain threshold, we increase θ, thereby increasing the workload. 19

20 Proposed method – Workload Allocator(3/6)  The threshold is set statistically using the following equation: 20 B : total number of compressed bytes. μ : the average bit-rate. υ : the variance of B.

21 Proposed method – Workload Allocator(4/6)  If a certain number of frames have been processed or B exceeds a threshold, adaptation and KT insertion is required. 21

22 Proposed method – Workload Allocator(5/6)  For every CTU, if threshold in equation 4 is satisfied, θ is adjusted as: 22 u : a user defined parameter.

23 Proposed method – Workload Allocator(6/6)  We can estimate the total cycles consumed per CTU: 23

24 Proposed method – Workload Manager(1/3)  Workload Manager  Adjusting θ will increase or decrease the workload.  The intra prediction mode selected corresponds to the direction of texture flow.  Determine the most probable prediction and θ is centered on this prediction. 24

25 Proposed method – Workload Manager(2/3)  This prediction/direction is obtained by sorting a histogram created by gradients of each individual pixel [22][26].  We propose a much simpler solution, similar to the one presented in [27]. 25 [22] W. Jiang, H. Ma, Y. Chen, “Gradient based fast mode decision algorithm for intra prediction in HEVC,” in CECNet, pp. 1836–1840, 2012. [26] M. Shafique, B. Molkenthin, J. Henkel, “An HVS-based Adaptive Computational Complexity Reduction Scheme for H.264/AVC video encoder using Prognostic Early Mode Exclusion,” in DATE, pp.1713–1718, 2010. [27] M. U. K. Khan, J. M. Borrmann, L. Bauer, M. Shafique, J. Henkel, “An H.264 Quad-FullHD low- latency intra video encoder,” in DATE, pp.115–120, 2013.

26 Proposed method – Workload Manager(3/3) 26

27 Outline  Introduction  Analysis of HEVC Encoder  Proposed method  Experimental Result  Conclusion 27

28 Experimental Result(1/4)  We have developed a C++ based multi-threaded HEVC Intra-encoder in our lab.  With 1-tile (single thread) configuration, our software is ~13 faster than HM-9.2 reference software for full- HD (1920*1080) video sequences.  Hardware platform simulation is performed via the Sniper many-core simulator [30]. 28 [30] T.E. Carlson, W. Heirman, L. Eeckhout, “Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation,” in SC, pp. 1–12, 2011.

29 Experimental Result(2/4) 29

30 Experimental Result(3/4) 30

31 Experimental Result(4/4) 31

32 Outline  Introduction  Analysis of HEVC Encoder  Proposed method  Experimental Result  Conclusion 32

33 Conclusion  A novel software architecture of HEVC-Intra encoding with run-time power-efficient workload balancing on many-core systems is presented.  This adjusted workload is used to adapt operating frequency, thereby reducing the power consumption of the many-core system. 33


Download ppt "Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad."

Similar presentations


Ads by Google