1 Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection Bongsoo Jung, Byeungwoo Jeon Journal of Visual Communication.

Slides:



Advertisements
Similar presentations
Low-complexity merge candidate decision for fast HEVC encoding Multimedia and Expo Workshops (ICMEW), 2013 IEEE International Conference on Muchen LI,
Advertisements

A Performance Analysis of the ITU-T Draft H.26L Video Coding Standard Anthony Joch, Faouzi Kossentini, Panos Nasiopoulos Packetvideo Workshop 2002 Department.
-1/20- MPEG 4, H.264 Compression Standards Presented by Dukhyun Chang
A Highly Parallel Framework for HEVC Coding Unit Partitioning Tree Decision on Many-core Processors Chenggang Yan, Yongdong Zhang, Jizheng Xu, Feng Dai,
Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad.
An Early Block Type Decision Method for Intra Prediction in H.264/AVC Jungho Do, Sangkwon Na and Chong-Min Kyung VLSI Systems Lab. Korea Advanced Institute.
Efficient Bit Allocation and CTU level Rate Control for HEVC Picture Coding Symposium, 2013, IEEE Junjun Si, Siwei Ma, Wen Gao Insitute of Digital Media,
H.264/AVC Baseline Profile Decoder Complexity Analysis Michael Horowitz, Anthony Joch, Faouzi Kossentini, and Antti Hallapuro IEEE TRANSACTIONS ON CIRCUITS.
{ Fast Disparity Estimation Using Spatio- temporal Correlation of Disparity Field for Multiview Video Coding Wei Zhu, Xiang Tian, Fan Zhou and Yaowu Chen.
Fast Mode Decision for Multiview Video Coding Liquan Shen, Tao Yan, Zhi Liu, Zhaoyang Zhang, Ping An, Lei Yang ICIP
CABAC Based Bit Estimation for Fast H.264 RD Optimization Decision
2009/04/07 Yun-Yang Ma.  Overview  What is CUDA ◦ Architecture ◦ Programming Model ◦ Memory Model  H.264 Motion Estimation on CUDA ◦ Method ◦ Experimental.
Evaluation of Data-Parallel Splitting Approaches for H.264 Decoding
Wei Zhu, Xiang Tian, Fan Zhou and Yaowu Chen IEEE TCE, 2010.
Ai-Mei Huang and Truong Nguyen Video Processing LabECE Dept, UCSD, La Jolla, CA This paper appears in: Image Processing, ICIP IEEE International.
Yu-Han Chen, Tung-Chien Chen, Chuan-Yung Tsai, Sung-Fang Tsai, and Liang-Gee Chen, Fellow, IEEE IEEE CSVT
Highly Parallel Rate-Distortion Optimized Intra-Mode Decision on Multicore Graphics Processors Ngai-Man Cheung, Oscar C. Au, Senior Member, IEEE, Man-Cheung.
Shaobo Zhang, Xiaoyun Zhang, Zhiyong Gao
Outline Introduction Introduction Fast Inter Prediction Mode Decision for H.264 – –Pre-encoding An Efficient Inter Mode Decision Approach for H.264 Video.
Overview of the H.264/AVC Video Coding Standard
Efficient Moving Object Segmentation Algorithm Using Background Registration Technique Shao-Yi Chien, Shyh-Yih Ma, and Liang-Gee Chen, Fellow, IEEE Hsin-Hua.
Low-complexity mode decision for MVC Liquan Shen, Zhi Liu, Ping An, Ran Ma and Zhaoyang Zhang CSVT
1 Single Reference Frame Multiple Current Macroblocks Scheme for Multiple Reference IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY Tung-Chien.
Fast Mode Decision And Motion Estimation For JVT/H.264 Pen Yin, Hye – Yeon Cheong Tourapis, Alexis Michael Tourapis and Jill Boyce IEEE ICIP 2003 Sep.
Decision Trees for Error Concealment in Video Decoding Song Cen and Pamela C. Cosman, Senior Member, IEEE IEEE TRANSACTION ON MULTIMEDIA, VOL. 5, NO. 1,
1 Efficient Multithreading Implementation of H.264 Encoder on Intel Hyper- Threading Architectures Steven Ge, Xinmin Tian, and Yen-Kuang Chen IEEE Pacific-Rim.
FAST MULTI-BLOCK SELECTION FOR H.264 VIDEO CODING Chang, A.; Wong, P.H.W.; Yeung, Y.M.; Au, O.C.; Circuits and Systems, ISCAS '04. Proceedings of.
Multi-Frame Reference in H.264/AVC 卓傳育. Outline Introduction to Multi-Frame Reference in H.264/AVC Multi-Frame Reference Problem Two papers propose to.
1 An Efficient Mode Decision Algorithm for H.264/AVC Encoding Optimization IEEE TRANSACTION ON MULTIMEDIA Hanli Wang, Student Member, IEEE, Sam Kwong,
Feature-Based Intra-/InterCoding Mode Selection for H.264/AVC C. Kim and C.-C. Jay Kuo CSVT, April 2007.
Block Partitioning Structure in the HEVC Standard
BY AMRUTA KULKARNI STUDENT ID : UNDER SUPERVISION OF DR. K.R. RAO Complexity Reduction Algorithm for Intra Mode Selection in H.264/AVC Video.
Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International.
Xinqiao LiuRate constrained conditional replenishment1 Rate-Constrained Conditional Replenishment with Adaptive Change Detection Xinqiao Liu December 8,
An Introduction to H.264/AVC and 3D Video Coding.
1. 1. Problem Statement 2. Overview of H.264/AVC Scalable Extension I. Temporal Scalability II. Spatial Scalability III. Complexity Reduction 3. Previous.
January 26, Nick Feamster Development of a Transcoding Algorithm from MPEG to H.263.
Liquan Shen Zhi Liu Xinpeng Zhang Wenqiang Zhao Zhaoyang Zhang An Effective CU Size Decision Method for HEVC Encoders IEEE TRANSACTIONS ON MULTIMEDIA,
Video Coding. Introduction Video Coding The objective of video coding is to compress moving images. The MPEG (Moving Picture Experts Group) and H.26X.
1 Efficient Reference Frame Selector for H.264 Tien-Ying Kuo, Hsin-Ju Lu IEEE CSVT 2008.
By Abhishek Hassan Thungaraj Supervisor- Dr. K. R. Rao.
EE 5359 PROJECT PROPOSAL FAST INTER AND INTRA MODE DECISION ALGORITHM BASED ON THREAD-LEVEL PARALLELISM IN H.264 VIDEO CODING Project Guide – Dr. K. R.
1 Data Partition for Wavefront Parallelization of H.264 Video Encoder Zhuo Zhao, Ping Liang IEEE ISCAS 2006.
Adaptive Multi-path Prediction for Error Resilient H.264 Coding Xiaosong Zhou, C.-C. Jay Kuo University of Southern California Multimedia Signal Processing.
Low-Power H.264 Video Compression Architecture for Mobile Communication Student: Tai-Jung Huang Advisor: Jar-Ferr Yang Teacher: Jenn-Jier Lien.
H.264/AVC 基於影像複雜度與提早結束之快速 階層運動估計方法 Content-Based Hierarchical Fast Motion Estimation with Early Termination in H.264/AVC 研究生:何銘哲 指導教授:蔣依吾博士 中山大學資訊工程學系.
Fast Mode Decision for H.264/AVC Based on Rate-Distortion Clustering IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 3, JUNE 2012 Yu-Huan Sung Jia-Ching.
2 3 Be introduced in H.264 FRExt profile, but most H.264 profiles do not support it. Do not need motion estimation operation.
Guillaume Laroche, Joel Jung, Beatrice Pesquet-Popescu CSVT
Fast Mode Decision Algorithm for Residual Quadtree Coding in HEVC Visual Communications and Image Processing (VCIP), 2011 IEEE.
Fast motion estimation and mode decision for H.264 video coding in packet loss environment Li Liu, Xinhua Zhuang Computer Science Department, University.
High-efficiency video coding: tools and complexity Oct
IEEE Transactions on Consumer Electronics, Vol. 58, No. 2, May 2012 Kyungmin Lim, Seongwan Kim, Jaeho Lee, Daehyun Pak and Sangyoun Lee, Member, IEEE 報告者:劉冠宇.
UNDER THE GUIDANCE DR. K. R. RAO SUBMITTED BY SHAHEER AHMED ID : Encoding H.264 by Thread Level Parallelism.
Block-based coding Multimedia Systems and Standards S2 IF Telkom University.
Time Optimization of HEVC Encoder over X86 Processors using SIMD Kushal Shah Advisor: Dr. K. R. Rao Spring 2013 Multimedia.
A hybrid error concealment scheme for MPEG-2 video transmission based on best neighborhood matching algorithm Li-Wei Kang and Jin-Jang Leou Journal of.
1 Hierarchical Parallelization of an H.264/AVC Video Encoder A. Rodriguez, A. Gonzalez, and M.P. Malumbres IEEE PARELEC 2006.
Outline  Introduction  Observations and analysis  Proposed algorithm  Experimental results 2.
Fine-granular Motion Matching for Inter-view Motion Skip Mode in Multi-view Video Coding Haitao Yanh, Yilin Chang, Junyan Huo CSVT.
Fast disparity motion estimation in MVC based on range prediction Xiao Zhong Xu, Yun He ICIP 2008.
Multi-Frame Motion Estimation and Mode Decision in H.264 Codec Shauli Rozen Amit Yedidia Supervised by Dr. Shlomo Greenberg Communication Systems Engineering.
Computational Controlled Mode Selection for H.264/AVC June Computational Controlled Mode Selection for H.264/AVC Ariel Kit & Amir Nusboim Supervised.
Automatic Video Shot Detection from MPEG Bit Stream
Overview of the Scalable Video Coding
Steven Ge, Xinmin Tian, and Yen-Kuang Chen
Research Topic Error Concealment Techniques in H.264/AVC for Wireless Video Transmission Vineeth Shetty Kolkeri EE Graduate,UTA.
Sum of Absolute Differences Hardware Accelerator
Fast Decision of Block size, Prediction Mode and Intra Block for H
Bongsoo Jung, Byeungwoo Jeon
Presentation transcript:

1 Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection Bongsoo Jung, Byeungwoo Jeon Journal of Visual Communication and Image Representation 2008

2 Outline  Introduction  Complexity Analysis  Method Pre Macroblock Mode Selection Adaptive Slice-level Parallelism  Experimental Results  Conclusions

3 Introduction  H.264/AVC achieves high coding efficiency Variable block size, multiple reference frame, quarter-pel motion vector accuracy,etc.  High computational complexity Complexity reduction algorithm Parallel processing

4 Introduction  GOP level Simple but high latency  Frame level Keep coding efficiency, but the dependence among frames limits the thread scalability  Slice level Encode independently but less coding efficiency  Macroblock level High dependency

5 Introduction  MBs in a slice may not have similar computational complexity. Unnecessary extra waiting time in some threads. slice 0 slice 1 slice 2 slice 3 slice 4 slice 5 slice 6 slice 7 Encoding time PU0 PU1 PU2 PU3 PU4 PU5 PU6 PU7

6 Main Purpose  Objective Using parallel algorithm to speed up H.264/AVC encoder Maximize the parallelism efficiency by distributing the workload equally.  Method Pre processing: Fast MB mode selection Adaptive slice-level parallelism

7 Complexity Analysis  Inter prediction mode of MBs in H.264  Intra prediction mode: 4*4, 16*16

8 Complexity Analysis  The run-time complexity of the H.264/AVC encoder Pentium IV 2.4GHz Foreman_CIF with IPPP structure

9 Pre Macroblock Mode Selection Overview  Why? High computational complexity of ME in variable block size Remove unnecessary ME block size and RD calculation of intra prediction mode  This removal leads to Complexity reduction Workload balancing among slices

10 Pre Macroblock Mode Selection Inter MB mode selection  MC block sizes in video sequence Foreground region : 8*8 or smaller Non-moving region : 16*16  High temporal correlation Check consistency history of block size 16*16 and zero MV  Two measurements Zero motion consistency (ZMC) Large block consistency (LBC)

11 Pre Macroblock Mode Selection Inter MB mode selection  Zero Motion Consistency (ZMC) Indicates how long a specified block has had a zero MV consecutively When a block is encoded in intra mode  ZMC is set to 0 t : frame index, ZMC 0 = 0, (n,m;i,j) indicates a 4*4 block at (n,m) within a MB (i,j) high value of ZMC  high prob. of belonging to background region

12 Pre Macroblock Mode Selection Inter MB mode selection  Zero Motion Consistency Score Indicates how likely a MB being a stationary region T MOTION : A threshold value

13 Pre Macroblock Mode Selection Inter MB mode selection  Large Block Consistency (LBC) Indicates the number of continuous frames having a 16*16 MC block size at (i,j) th MB When a block is encoded in intra mode  LBC is set to 0 bestMode t (i,j) : The best MB mode of the (i,j) MB in tth frame LBC 0 = 0

14 Pre Macroblock Mode Selection Inter MB mode selection  Large Block Consistency Score Indicates how likely a MB being partitioned in 16*16 T MODE1,T MODE2 : Threshold values used to make the assessment of the LBC

15 Pre Macroblock Mode Selection Inter MB mode selection  A illustration of LBCS

16 Pre Macroblock Mode Selection Inter MB mode selection  Conditional probability of MB modes given ZMCS = High The other block sizes are very unlikely to appear (less than about 0.04) Early detect SKIP and P16*16 mode T Motion = 4

17 Pre Macroblock Mode Selection Inter MB mode selection  Joint conditional probability of given LBCS with ZMCS = Low A: LBCS = High, B: LBCS = Medium, C: LBCS = Low T MODE1 = 1, T MODE2 = 4

18 Pre Macroblock Mode Selection Pre selective intra mode selection  High computational load of computing RD costs of intra mode  Comparing temporal correlation with spatial correlation of the current MB prior to frame coding

19 Pre Macroblock Mode Selection Selective intra mode selection  Mean Absolute Temporal Difference  Mean Absolute Spatial Difference c x,y : Pixel values at location (x,y) of MB in current frame r x,y : Pixel values at location (x,y) of MB in previous frame X, Y : Horizontal and vertical dimensions of a MB MASD H : The MASD between horizontally neighboring pixels MASD V : The MASD between vertically neighboring pixels

20 Pre Macroblock Mode Selection Selective intra mode selection  Comparing MATD and MASD to determine whether current MB should calculate RD costs of intra modes A larger w makes skipping intra mode search easier A smaller QP will incur more intra modes than a larger QP w: Weighting factor, currently is set to 0.6 More temporally correlated than spatially correlated

21 Pre Macroblock Mode Selection MB mode classfication  Decision table of candidate MB mode  A block diagram of MB selection

22 Adaptive Slice-level Parallelism Overview  Characteristic Easy to implement Lower overhead of inter communication among processor unit Good scalability Increase bitrate  Slice boundary is defined on the basis of a fixed number of MBs or fixed number of bits Hard to decide a slice boundary prior to encoding

23 Adaptive Slice-level Parallelism Fixed MB assignment  The number of consecutive MBs in each slice L : The number of processor units on a multi-core system M : The total number of MBs in a frame i : Slice index Example : number of processing unit L = 8, sequence resolution is CIF (352*288), M = 22*18 = 396  We can assign about 49 MBs to each slice

24 Adaptive Slice-level Parallelism Fixed MB assignment  The scheduling of slice-level parallelism in eight processor units slice 0 slice 1 slice 2 slice 3 slice 4 slice 5 slice 6 slice 7 Encoding time PU0 PU1 PU2 PU3 PU4 PU5 PU6 PU7 slice 0 slice 1 slice 2 slice 3 slice 4 slice 5 slice 6 slice 7 Encoding time PU0 PU1 PU2 PU3 PU4 PU5 PU6 PU7 Ideal casePractical case Bottleneck

25 Adaptive Slice-level Parallelism Fixed MB assignment  The imbalance of computational load distribution Exhaustive Search Method Fast ME / Fast Mode Search

26 Adaptive Slice-level Parallelism Fixed MB assignment  Computational load for encoding one frame in slice level parallelism  Computation load of the t th frame by a single processor system C t slice(i) : The computational load of i th slice in t th frame L : Number of slice in a frame

27 Adaptive Slice-level Parallelism Fixed MB assignment  The speedup of multiprocessor system over a single processor system  To achieve the maximum speedup Computation loads of each slice should be as similar as possible  Adaptive slice partition method

28 Adaptive Slice-level Parallelism Complexity estimation model  A simple estimation method by utilizing the result of fast MB mode selection  Define the group value g corresponding to the candidate MB modes

29 Adaptive Slice-level Parallelism Complexity estimation model  Complexity model C k,CHKIntra (g) : Complexity cost of the k th MB g : Group index e inter : Estimated complexity cost of inter mode in g = 1 e intra : Complexity cost according to the intra mode check in g = 1 α 1, α 2, α 3, β 1 β 2 β 3 : Weighting values of complexity cost

30 Adaptive Slice-level Parallelism Complexity estimation model  Relative computational load CHK intra = 0 CHK intra = 1 Assume e inter = 1, e intra = 0  α 1 =2.42, α 2 =3.12,α 3 =5.28  β 1 =0.82, β 2 =0.83, β 3 =0.84 Assume e inter = 1, e intra = 3.97

31 Adaptive Slice-level Parallelism Adaptive MB assignment  The total computational load at the t th frame  Ideal computational load of each slice for the uniform workload distribution

32 Adaptive Slice-level Parallelism Adaptive MB assignment  MB assignment of slice  Much better than fixed MB assignment in each slice

33 Adaptive Slice-level Parallelism Adaptive MB assignment  Entire block diagram

34 Experimental Results Overview  Performance comparison between proposed MB mode decision and the conventional method  Comparing adaptive slice-level parallelism with fixed slice-level parallelism

35 Experimental Results MB mode selection  Average encoding time saving AST[%]  BDPSNR and BDBR are used to measure the performance against FULL_1Slice FULL_1Slice : Exhaustive method FMD_1Slice : Fast MB mode search method

36 Experimental Results Rate distortion curves

37 Experimental Results  R-D performance compared to one slice per frame (FMD_1Slice)

38 Experimental Results Rate distortion curves

39 Experimental Results Slice-level parallelism  Comparing adaptive and fixed slice level parallelism  Speedup Encoding time of one slice per frame by a single processor system The longest encoding time of a slice using fixed mode The longest encoding time of a slice using adaptive mode

40 Experimental Results Speedup

41 Conclusions  Proposed a fast MB mode selection using consistency history of block size and a zero MV  Proposed a intra mode selection by comparing the correlation  Using these two schemes, they proposed a new adaptive slice-level parallelism to speed up H.264/AVC encoder

42 Reference  Z. Chen, P. Zhou, Y. He, Fast motion estimation for JVT, JVT Doc.JVT-G016,March  B. Jeon, J. Lee, Fast mode decision for H.264, JVT-J003, ISO/IEC MPEG and ITU-T VCEG Joint Video Team, (Waikoloa, HI), December  I. Choi, J. Lee, B. Jeon, Fast coding mode selection with rate-distortion optimization for MPEG-4 Part-10 AVC/H.264, IEEE Trans. Circuits Syst. VideoTechnol. 16 (12) (2006) 1557–1561.