A Highly Parallel Framework for HEVC Coding Unit Partitioning Tree Decision on Many-core Processors Chenggang Yan, Yongdong Zhang, Jizheng Xu, Feng Dai,

A Highly Parallel Framework for HEVC Coding Unit Partitioning Tree Decision on Many-core Processors Chenggang Yan, Yongdong Zhang, Jizheng Xu, Feng Dai, Liang Li, Qionghai Dai, and Feng Wu. IEEE SIGNAL PROCESSING LETTERS, VOL. 21, NO. 5, MAY 2014

Outline  Introduction  Related Work  Proposed Method  Experimental Results  Conclusion 2

Introduction(1/3)  In HEVC, each frame is divided into non- overlapping CTUs, which can be recursively split into smaller CUs.  For a CTU, the CU partitioning tree (CUPT) controls how a CTU is coded with CUs with variable block sizes and coding modes.  The price to be paid for higher coding efficiency is higher computational complexity. 3

Introduction(2/3)  To speed up the decision process of CUPT, many researchers have tried to reduce the search space by avoiding searching the full branches of the quad-tree [10].  In order to guarantee the coding efficiency, many branches of the quad-tree can’t be skipped and the speedup is no more than two times.  Many researchers only consider the RD-based intra mode selection, while inter mode selection is much more time-consuming. 4 [10] L. Shen, Z. Liu, and X. Zhang et al., “An effective CU size decision method for HEVC encoders,” IEEE Trans. Multimedia, vol. 15, pp. 465–470, Jan. 2013.

Introduction(3/3)  Many-core processors are good candidates for speeding up compression algorithms.  Efficient parallelization of CUPT decision (CUPTD) on many-core processors is challenging, because CUPTD has complicated data dependencies.  If CUPTD isn’t extensively parallelizable, cores will be left unused and performance might suffer. 5

Related Work(1/3)  HEVC CU Partition Tree Decision(CUPTD) 6

Related Work(2/3)  For RD-based intra prediction:  Instead of applying the intra coding at PU level, HEVC conducts intra prediction in TU level sequentially, which always utilize the nearest neighboring reference samples from the already reconstructed TUs.  To enhance the coding efficiency of HEVC, HEVC provides as many as 35 prediction modes.  Just like H.264/AVC, left, above, and above-right neighboring reconstructed sample will be used for intra prediction. 7

Related Work(3/3)  For RD-based inter prediction:  The best motion vector predictor is selected from a given advanced motion vector prediction candidate list.  The AMVPCL is composed of both spatial candidates and temporal candidates.  Spatial candidates need the motion information of neighboring left, left-down, upper, upper-left and upper-right PUs.  According to RD-based intra/inter prediction, the search of the current CU branch may have data dependencies on its neighboring left, left-down, upper, upper-left and upper-right CU branches. 8

Proposed Method A(1/2)  Problem Formulation 9

Proposed Method A(2/2) 10

Proposed Method B(1/3)  CTU-Level Parallelism  The best RD costs in the current CTU’s neighboring left, upper, upper-left, and upper-right CTUs are computed.  The current CTU has data dependencies on its neighboring left, upper, upper-left, and upper-right CTUs.  We use the same DAG-based order as described in our previous work [14] to parallelize CTUs. 11 [14] C. Yan et al., “Highly parallel framework for HEVC motion estimation on many-core platform,” in Data Compression Conf., Snowbird, UT, 2013, pp. 63–72.

Proposed Method B(2/3)  Generate a DAG to capture the dependency relationships of CTUs.  Consists of a set of vertices V and edges E.  data dependency an edge.  Processed remove 12

Proposed Method B(3/3) 13

Proposed Method B(1/)  Step1 :  Initialize DQ and CM. DQ is a waiting queue. CM is designed to record the number of related CTUs for each CTU.  Step2 :  When some values in the CM become zero, get the corresponding coordinates and push them into DQ.  Step3 :  Get coordinates from DQ and process corresponding CTUs in parallel on many-core platform.  Step4 :  Update CM. When a CTU with coordinate (i, j) in CM is processed, the values of coordinates (i+1, j), (i+1, j-1), (i,j+1) and (i+1,j+1) in CM will minus one operation.  Step5 :  Repeat above steps 2~4 until each frame is over. 14

Proposed Method C(1/3) 15

Proposed Method C(2/3)  CICUs :  The CICU’s left boundary and CTU’s left boundary overlap.  The CICU’s upper boundary and CTU’s upper boundary overlap. 16

Proposed Method C(3/3)  PICUs :  PICUs don’t meet requirements of CICUs.  The PICU’s left boundary and CTU’s left boundary overlap or neighboring left largest size CU has been computed.  The PICU’s upper boundary and CTU’s upper boundary overlap or neighboring upper and upper-right largest size CUs have been computed. 17

Experimental Results  To compare our proposed method with serial execution, we adopt an encoder migrated from HEVC reference software HM7.0 without any optimization.  The experiment platform of this letter is based on Tile64, which is a member of TILERA many-core platform and contains 64 processing cores [17]. 18 [17] S. Bell et al., “TILE64-Processor: A 64-core SoC with mesh,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 2008, pp. 88–598.

Experimental Results 19

Experimental Results 20

Conclusion  We propose an efficient parallel framework for HEVC CUPTD on many-core processors.  Experiments conducted on Tile64 platform demonstrate that our method saves more time than the default encoding scheme in HM 7.0. 21

A Highly Parallel Framework for HEVC Coding Unit Partitioning Tree Decision on Many-core Processors Chenggang Yan, Yongdong Zhang, Jizheng Xu, Feng Dai,

Similar presentations

Presentation on theme: "A Highly Parallel Framework for HEVC Coding Unit Partitioning Tree Decision on Many-core Processors Chenggang Yan, Yongdong Zhang, Jizheng Xu, Feng Dai,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Highly Parallel Framework for HEVC Coding Unit Partitioning Tree Decision on Many-core Processors Chenggang Yan, Yongdong Zhang, Jizheng Xu, Feng Dai,

Similar presentations

Presentation on theme: "A Highly Parallel Framework for HEVC Coding Unit Partitioning Tree Decision on Many-core Processors Chenggang Yan, Yongdong Zhang, Jizheng Xu, Feng Dai,"— Presentation transcript:

Similar presentations

About project

Feedback