Time Optimization of HEVC Encoder over X86 Processors using SIMD

Slides:



Advertisements
Similar presentations
Towards Efficient Wavefront Parallel Encoding of HEVC: Parallelism Analysis and Improvement Keji Chen, Yizhou Duan, Jun Sun, Zongming Guo 2014 IEEE 16th.
Advertisements

Time Optimization of HEVC Encoder over X86 Processors using SIMD
MULTIMEDIA PROCESSING STUDY AND IMPLEMENTATION OF POPULAR PARALLELING TECHNIQUES APPLIED TO HEVC Under the guidance of Dr. K. R. Rao By: Karthik Suresh.
-1/20- MPEG 4, H.264 Compression Standards Presented by Dukhyun Chang
FAST MODE DECISION ALGORITHM FOR INTRA PREDICTION IN HEVC
MULTIMEDIA PROCESSING
Efficient Bit Allocation and CTU level Rate Control for HEVC Picture Coding Symposium, 2013, IEEE Junjun Si, Siwei Ma, Wen Gao Insitute of Digital Media,
1 Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection Bongsoo Jung, Byeungwoo Jeon Journal of Visual Communication.
CABAC Based Bit Estimation for Fast H.264 RD Optimization Decision
Shaobo Zhang, Xiaoyun Zhang, Zhiyong Gao
Overview of the H.264/AVC Video Coding Standard
1 An Efficient Mode Decision Algorithm for H.264/AVC Encoding Optimization IEEE TRANSACTION ON MULTIMEDIA Hanli Wang, Student Member, IEEE, Sam Kwong,
Final Report – Spring 2014 Course: EE5359 – Multimedia Processing
Block Partitioning Structure in the HEVC Standard
BY AMRUTA KULKARNI STUDENT ID : UNDER SUPERVISION OF DR. K.R. RAO Complexity Reduction Algorithm for Intra Mode Selection in H.264/AVC Video.
Topics in Signal Processing Project Proposal
Optimizing Baseline Profile in H
HARDEEPSINH JADEJA UTA ID: What is Transcoding The operation of converting video in one format to another format. It is the ability to take.
Shiba Kuanar Analysis of Motion Estimation Algorithm (HEVC), using Multi-core processing Shiba Kuanar
3D EXTENSION of HEVC: Multi-View plus Depth Parashar Nayana Karunakar Student Id: Department of Electrical Engineering.
3D EXTENSION of HEVC: Multi-View plus Depth Parashar Nayana Karunakar Student Id: Department of Electrical Engineering.
Liquan Shen Zhi Liu Xinpeng Zhang Wenqiang Zhao Zhaoyang Zhang An Effective CU Size Decision Method for HEVC Encoders IEEE TRANSACTIONS ON MULTIMEDIA,
PROJECT PROPOSAL HEVC DEBLOCKING FILTER AND ITS IMPLIMENTATION RAKESH SAI SRIRAMBHATLA UTA ID: EE 5359 Under the guidance of DR. K. R. RAO.
PROJECT INTERIM REPORT HEVC DEBLOCKING FILTER AND ITS IMPLEMENTATION RAKESH SAI SRIRAMBHATLA UTA ID:
PERFORMANCE COMPARISON OF HEVC AND H
Reducing/Eliminating visual artifacts in HEVC by Deblocking filter By: Harshal Shah Under the guidance of: Dr. K. R. Rao.
By Abhishek Hassan Thungaraj Supervisor- Dr. K. R. Rao.
EE 5359 PROJECT PROPOSAL FAST INTER AND INTRA MODE DECISION ALGORITHM BASED ON THREAD-LEVEL PARALLELISM IN H.264 VIDEO CODING Project Guide – Dr. K. R.
Analysis of Motion Estimation Algorithm (HEVC), using Multi-core processing Shiba Kuanar
- By Naveen Siddaraju - Under the guidance of Dr K R Rao Study and comparison of H.264/MPEG4.
Low-Power H.264 Video Compression Architecture for Mobile Communication Student: Tai-Jung Huang Advisor: Jar-Ferr Yang Teacher: Jenn-Jier Lien.
Sub pixel motion estimation for Wyner-Ziv side information generation Subrahmanya M V (Under the guidance of Dr. Rao and Dr.Jin-soo Kim)
Rate-GOP Based Rate Control for HEVC SHANSHE WANG, SIWEI MA, SHIQI WANG, DEBIN ZHAO, AND WEN GAO IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING,
FAST MODE DECISION ALGORITHM FOR INTRA PREDICTION IN HEVC FINAL REPORT Lanka Naga Venkata Sai Surya Teja Student ID Mail ID
Implementation and comparison study of H.264 and AVS China EE 5359 Multimedia Processing Spring 2012 Guidance : Prof K R Rao Pavan Kumar Reddy Gajjala.
- By Naveen Siddaraju - Under the guidance of Dr K R Rao Study and comparison between H.264.
EE5359 Multimedia Processing Interim Presentation SPRING 2015 ADVISOR: Dr. K.R.Rao EE5359 Multimedia Processing1 BY: BHARGAV VELLALAM SRIKANTESWAR
Figure 1.a AVS China encoder [3] Video Bit stream.
INTERIM Presentation on Topic: Advanced Video Coding (Comparison of HEVC with H.264 and H.264 with MPEG-2) A PROJECT UNDER THE GUIDANCE OF DR. K. R. RAO.
Reducing the Complexity of inter-prediction mode decision for High Efficiency Video Codec Kushal Shah Department of Electrical Engineering University of.
Study and Optimization of the Deblocking Filter in H.265 and its Advantages over H.264 By: Valay Shah Under the guidance of: Dr. K. R. Rao.
High-efficiency video coding: tools and complexity Oct
UNDER THE GUIDANCE DR. K. R. RAO SUBMITTED BY SHAHEER AHMED ID : Encoding H.264 by Thread Level Parallelism.
A HIGH PERFORMANCE DEBLOCKING FILTER IMPLEMENTAION FOR HEVC
-BY KUSHAL KUNIGAL UNDER GUIDANCE OF DR. K.R.RAO. SPRING 2011, ELECTRICAL ENGINEERING DEPARTMENT, UNIVERSITY OF TEXAS AT ARLINGTON FPGA Implementation.
Reducing/Eliminating visual artifacts in HEVC by Deblocking filter Submitted By: Harshal Shah Under the guidance of Dr. K. R. Rao.
Porting of Fast Intra Prediction in HM7.0 to HM9.2
Transcoding from H.264/AVC to HEVC
Overview of the High Efficiency Video Coding (HEVC) Standard
COMPARATIVE STUDY OF HEVC and H.264 INTRA FRAME CODING AND JPEG2000 BY Under the Guidance of Harshdeep Brahmasury Jain Dr. K. R. RAO ID MS Electrical.
EE5359 Multimedia Processing Final Presentation SPRING 2015 ADVISOR: Dr. K.R.Rao EE5359 Multimedia Processing1 BY: BHARGAV VELLALAM SRIKANTESWAR
Time Optimization of HEVC Encoder over X86 Processors using SIMD Kushal Shah Advisor: Dr. K. R. Rao Spring 2013 Multimedia.
FAST MODE DECISION ALGORITHM FOR INTRA PREDICTION IN HEVC Lanka Naga Venkata Sai Surya Teja Student ID Mail ID
By: Santosh Kumar Muniyappa ( ) Guided by: Dr. K. R. Rao Final Report Multimedia Processing (EE 5359)
A Frame-Level Rate Control Scheme Based on Texture and Nontexture Rate Models for HEVC IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,
PERFORMANCE COMPARISON OF DAALA AND HEVC By Rohith Reddy Etikala
Interim Report – Spring 2014 Course: EE5359 – Multimedia Processing Performance Comparison of HEVC & H.264 using various test sequences Under the guidance.
Implementation and comparison study of H.264 and AVS china EE 5359 Multimedia Processing Spring 2012 Guidance : Prof K R Rao Pavan Kumar Reddy Gajjala.
PERFORMANCE COMPARISON OF DAALA AND HEVC By Rohith Reddy Etikala
EE 5359 MULTIMEDIA PROCESSING PROJECT PROPOSAL SPRING 2016 STUDY AND PERFORMANCE ANALYSIS OF HEVC, H.264/AVC AND DIRAC By ASHRITA MANDALAPU
E ARLY TERMINATION FOR TZ SEARCH IN HEVC MOTION ESTIMATION PRESENTED BY: Rajath Shivananda ( ) 1 EE 5359 Multimedia Processing Individual Project.
EE 5359 MULTIMEDIA PROCESSING INTERIM PRESENTATION SPRING 2016 STUDY AND PERFORMANCE ANALYSIS OF HEVC, H.264/AVC AND DIRAC By ASHRITA MANDALAPU
Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.
Early termination for tz search in hevc motion estimation
Porting of Fast Intra Prediction in HM7.0 to HM9.2
MMX Multi Media eXtensions
Submitted By: Harshal Shah Under the guidance of Dr. K. R. Rao
Study and Optimization of the Deblocking Filter in H
PROJECT PROPOSAL HEVC DEBLOCKING FILTER AND ITS IMPLIMENTATION RAKESH SAI SRIRAMBHATLA UTA ID: EE 5359 Under the guidance of DR. K. R. RAO.
Bongsoo Jung, Byeungwoo Jeon
Presentation transcript:

Time Optimization of HEVC Encoder over X86 Processors using SIMD Spring 2013 Multimedia Processing EE5359 Advisor: Dr. K. R. Rao Kushal Shah 1000857252 kushal.shah7@mavs.uta.edu

Objective With a lot of enhanced coding tools introduced, HEVC is expected to achieve 50% bit rate reductions at similar mean opinion score (MOS) compared with the previous standard H.264/AVC [1]. However, the computational complexity of HEVC has greatly increased, making encoding speed a serious problem in the implementation of HEVC [2].

Overview of HEVC [1] High Efficiency Video Coding (HEVC) is the newest video coding standard of the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. The main goal of the HEVC standardization effort is to enable significantly improved compression performance relative to existing standards—in the range of 50% bit-rate reduction for equal perceptual video quality.

HEVC Encoder Block Diagram [1] Fig.1: HEVC encoder block diagram [1]

Fig. 2 Macroblocks in HEVC [5]

Time Analysis of HEVC Encoder [2][3] Fig. 3: Time analysis of HEVC encoder [2][3]

Time Analysis of HEVC Encoder[2][3] HEVC utilizes a quadtree structure [4] to support large and flexible block sizes. The size of a coding unit (CU) can be 64x64, 32x32, 16x16 and 8x8. Each CU is split into one or more prediction units (PU) and transform units (TU). For PU, the width and height of a PU vary from 4 to 64, indicating that the blocks to be processed in motion compensation (MC) can be as large as 64x64.

Time Analysis of HEVC Encoder [2][3] In motion estimation (ME), sum of absolute differences (SAD) and sum of absolute transformed differences (SATD) of different block sizes are calculated. Due to the flexible block structure, each 4x4 block will be calculated several times from 4x4 to 64x64 ME, which can be quite time-consuming.

8-Tap and 4-Tap Interpolation[7] 8-Tap Interpolation Filter: Fig. 4: Interpolation filter for fractional pels in motion compensation [7]

Intel SSE Instruction [6] Streaming SIMD extensions (SSE) are the SIMD instruction set extension over the x86 architecture. It is further enhanced to SSE2, SSE3, SSSE3 and SSE4 subsequently. SSE contains eight 128-bit registers originally, known as XMM0 through XMM7. The number of registers is extended to sixteen in AMD64. Each 128-bit register can be divided into two 64-bit integers, four 32-bit integers, eight 16-bit short integers or sixteen 8-bit bytes. With SSE series instructions, several XMM registers can be operated at the same time, indicating considerable data-level parallelism.

Intel SSE Instruction[6] The PMADDUBSW instruction takes two 128-bit SSE registers as operands, with the first one containing sixteen unsigned 8-bit integers, and the second one containing sixteen signed 8-bit integers. With this instruction, It is only necessary to sum the values in the destination register to get the final results. Fig 5: SSE Instruction structure [6]

Intel SSE Instruction[6] The PMADDW instruction takes two 64-bit SSE registers as operands, with the first one containing eight unsigned 8-bit integers, and the second one containing eight signed 8-bit integers. This instruction adds and concatenates values of this two operands. Fig 6: SSE Instruction structure [6]

Calculating Motion Vectors[7] Fig. 7 : Luminance and chrominance row interpolation [7]

Fig. 8 Hadamard transform algorithm

Fig. 9 Instruction structure for hadamard transform calculation

SAD/SSD Calculation [2] Fig. 10 Instruction structure for SAD/SSD calculation

Experimental Configuration IntraPeriod : 32 # Period of I-Frame GOPSize : 8 # GOP Size QP : 32 # Quantization Parameter FramesToBeEncoded : 100 # Number of frames to be coded FrameRate : 60 # Frame Rate per second Number of frames :100 # frames used per sequence Intel COREi5, Windows 8 and 8GB RAM

Test sequences [8] BQSquare_416x240_60.yuv BQMall_832x480_60.yuv Fig 11: Test sequences BQTerrace_1920x1080_60.yuv

PSNR Fig 12: PSNR comparison

Bit Rate Fig 13: Bitrate comparison

Time Fig 14: Time comparison

Comparison using BD-PSNR Fig 15: BD-PSNR Comparison

Comparison using BD-Bitrate Fig 15: BD-Rate Comparison

R-D Plot Fig 16: R-D Plot

Conclusion As proposed by implementing SIMD on various blocks of HEVC encoder there is significant optimization on time scale without affecting the throughput and quality of video. This result shows significant reduction in encoding time of test sequences due to optimization in motion vector calculation, Hadamard transform and SAD/SSD calculation. It can observed from test results for PSNR comparison there is no significant reduction in quality of video as there is about 0.5dB reduction in PSNR which is tolerable. Bitrate of the optimized test sequence is also consistent as compared to original test sequences. But it can be observed that there is major difference in encoding period of test sequences as there is lot of optimization done in calculation of motion vectors, Hadamard transform and SAD/SSD calculation in HEVC encoder which is the most time consuming block. SIMD instructions are used for all these calculation due to which processing time reduces to greater extent without affecting quality of video sequences.

Future Work SIMD optimization can be future implemented on calculation of integer transform and RDOQ. Along with these, performing parallel programming on HEVC code can be implement using GPU.

Acronyms AVC: Advanced Video Coding CABAC: Context-Adaptive Binary Arithmetic Coding CB: Coding Block CTB: Coding Tree Block CTU: Coding Tree Unit CU: Coding Unit GPU: Graphical Processing Unit HEVC: High Efficiency Video Coding JCT-VC: Joint Collaborative Team on Video Coding MC: Motion Compensation ME: Motion Estimation MOS: Mean Opinion Score PB: Prediction Block

Acronyms PU: Prediction Unit RDOQ: Rate Distortion Optimized Quantization SAD: Sum of Absolute Differences SAO: Sample Adaptive Offset SATD: Sum of Absolute Transformed Differences (SATD) SIMD: Single Instruction Multiple Data SSD: Sum of Squared Difference SSE: Streaming SIMD Extensions TB: Transform Block TU: Transform Unit

References [1] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the High Efficiency Video Coding (HEVC) standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12, pp. 1648–1667, Dec. 2012. [2] Keji Chen, Yizhou Duan, Leju Yan, Jun Sun and Zongming Guo, “Efficient SIMD Optimization of HEVC Encoder over X86 Processors ,” Institute of Computer Science and Technology, Peking University, Beijing 100871, China. [3] JCT-VC, “HM6: High Efficiency Video Coding (HEVC) Test Model 6 Encoder Description,”JCTVC-H1002, Feb. 2012. [4] D. Marpe et al., “Video compression using nested quadtree structures, leaf merging, and improved techniques for motion representation and entropy coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 20, no. 12, pp. 1676 –1687, Dec. 2010. [5] Explanation of block partition: http://codesequoia.wordpress.com/2012/10/28/hevc-ctu-cu-ctb-cb-pb-and-tb/

References [6] Intel Corp., Intel® 64 and IA-32 Architectures Software Developers Manual http://download.intel.com/products/processor/manual/325383.pdf [7] Leju Yan; Yizhou Duan; Jun Sun; Zongming Guo , “Implementation of HEVC decoder on x86 processors with SIMD optimization,” VCIP, pp. 1-6, Nov. 2012. [8] Test Sequence : ftp://ftp.tnt.uni-hannover.de/testsequences [9] HM9.2 Software: https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/tags/HM-9.2rc1/ [10] BD Rate and BD PSNR Calculation : http://wftp3.itu.int/av-arch/video-site [11] SIMD implementation sample: http://sci.tuomastonteri.fi/programming/sse/example1

THANK YOU