Presentation on theme: "EE 5359 MULTIMEDIA PROCESSING FINAL PRESENTATION SPRING 2016 STUDY AND PERFORMANCE ANALYSIS OF HEVC, H.264/AVC AND DIRAC By ASHRITA MANDALAPU 1001096980."— Presentation transcript:
EE 5359 MULTIMEDIA PROCESSING FINAL PRESENTATION SPRING 2016 STUDY AND PERFORMANCE ANALYSIS OF HEVC, H.264/AVC AND DIRAC By ASHRITA MANDALAPU 1001096980 email@example.com With Guidance by, Dr.K.R.RAO
ACRONYMS AVC: Advanced Video Coding. ASO: Arbitrary Slice Order BBC: British Broadcasting Corporation BD-BR: Bjontegaard Delta Bit rate. BD-PSNR: Bjontegaard Delta Peak Signal to Noise Ratio. CABAC: Context Adaptive Binary Arithmetic Coding CAVLC: Context-Adaptive Variable Length Coding CIF: Common Intermediate Format CODEC: Coder and Decoder CTB: Coding Tree Block. CTU: Coding Tree Unit. CU: Coding Unit. DVB: Digital Video Broadcasting EBU: European Broadcasting Unit FMO: Flexible Macroblock Ordering fps: Frames per second HD: High Definition HDTV: High Definition Television HEVC: High Efficiency Video Coding. HM: HEVC Test Model. ICME: International Conference on Multimedia and Expo IEC: International Electro-technical Commission. ISDB: Integrated Services Digital Broadcasting
ISO: International Organization for Standardization. ITU-T: International Telecommunication Union- Telecommunication Standardization Sector. JCT: Joint Collaborative Team. JCT-VC: Joint Collaborative Team on Video Coding. JM: H.264 Test Model. JPEG: Joint Photographic Experts Group. MBAFF: Macro-Block Adaptive Frame-Field MC: Motion Compensation. ME: Motion Estimation. MPEG: Moving Picture Experts Group. MSE: Mean Square Error. MSU: Moscow State University PB: Prediction Block. PSNR: Peak Signal to Noise Ratio. QCIF: Quarter Common Intermediate Format QF: Quality Factor QP: Quantization Parameter RTP: Real-time Transport Protocol SSIM: Structural Similarity Index. TB: Transform Block. TU: Transform Unit. VCEG: Visual Coding Experts Group. VQMT: Video Quality Measurement Tool
OBJECTIVE The objective of this project is to study, implement and compare the video coding standards of HEVC , H.264/AVC  and Dirac . The analysis is performed in terms of complexity, video quality, bit rates, compression ratio using different performance metrics like MSE, PSNR, BD-BR, BD-PSNR and SSIM  . The advantages and disadvantages for the baseline profiles of the above mentioned video coding standards are studied for understanding of the video coding standards.
Color Spaces The common color spaces for digital image and video representation are: 1.RGB Color space – Each pixel is represented by three numbers indicating the relative proportions of red, green and blue color. 2.Y Cr Cb Color space – Y is the luminance component, a monochrome version of color image. Y is a weighted average of R, G and B: Y = kr R + kg G + kb B, where k values are the weighting factors. The color information is represented as color differences or chrominance components, where each chrominance component is difference between R, G or B and the luminance Y.
The popular patterns of sub-sampling  are: 4:4:4 – The three components, Y:C r :C b have the same resolution, which is for every 4 luminance samples there are 4 Cr and 4 Cb samples. 4:2:2 – For every 4 luminance samples in the horizontal direction, there are 2 C r and 2 C b samples. This representation is used for high quality video color reproduction. Figure1. 4:2:2 and 4:4:4 sub-sampling patterns 
4:2:0 – C r and C b each have half the horizontal and vertical resolution of Y. This is popularly used in applications such as video conferencing, digital television and DVD storage. Figure2. 4:2:0 Sub-sampling pattern 
CIF and QCIF Formats 1.Common Intermediate Format (CIF) and Quadrature Common Intermediate Format (QCIF) determine the resolution of the frame. The resolution of CIF is 352x288 and the resolution of QCIF is 1/4 of CIF, which is 176x144 . 2.Consider the Y C r C b family of color spaces where Y represents the luminance, C b represents the blue-difference chroma component and C r represents the red-difference chroma component . 3.For QCIF and CIF, the luminance Y is equal to the resolution. If sampling resolution 4:2:0 is used, for CIF, the C b and C r are 176 x 144 lines and for QCIF, the C b and C r are 88 x 72 lines.
HEVC HEVC is a successor to the H.264/AVC video coding standard.  HEVC achieves 2x higher compression compared to H.264/AVC.  HEVC will provide a flexible, reliable and robust solution to support the next decade of video. HEVC benefits include 1.Reduce the burden on global networks 2.Easier streaming of HD video to mobile devices 3.Account for advancing screen resolutions (e.g. Ultra-HD)
Encoder The video encoder performs the following steps: Partitioning each picture into multiple units Predicting each unit using inter or intra prediction, and subtracting the prediction from the unit Transforming and quantizing the residual (the difference between the original picture unit and the prediction) Entropy encoding transform output, prediction information, mode information and headers
Decoder The video decoder performs the following steps: Entropy decoding and extracting the elements of the coded sequence Rescaling and inverting the transform stage Predicting each unit and adding the prediction to the output of the inverse transform Reconstructing a decoded video image
Figure5. Block diagram of HEVC encoder with built-in decoder (gray shaded region) .
H.264/AVC H.264/AVC is an efficient video compression technique available today, developed as a result of the collaboration between the ISO/IEC Moving Picture Experts Group (MPEG) and the ITU-T Video Coding Experts Group (VCEG). It is the most widely used video coding standard  for streaming videos, mobile/handheld applications, HDTV broadcasting etc. The H.264 standard supports three sampling patterns for luminance component (Y), red-difference chroma component (Cr) and blue-difference chroma component (Cb) .
Encoder and Decoder An H.264 encoder converts the raw video into a compressed version and the decoder converts the compressed video back to its original format. The H.264 encoder block diagram is shown in Figure 6. The encoder performs transform, quantization, prediction and encoding to produce compressed video. The decoder shown in Figure 7 on the other hand does the inverse operations to obtain the uncompressed video.
H.264 Standard Profiles The standard is defined as a set of capabilities, which are referred to as profiles, targeting specific classes of applications. These are declared as a profile code (profile_idc) and a set of constraints applied in the encoder. This allows a decoder to recognize the requirements to decode that specific stream. H.264 standard profiles are shown in figure 8.
Baseline Profile (BP)  Primarily for low-cost applications that require additional data loss robustness, this profile is used in some video conferencing and mobile applications. Apart from the common features, baseline profile consists of some error resilience tools such as flexible macroblock order (FMO), arbitrary slice order (ASO) and redundant slices. It was designed for low delay applications, as well as for applications that run on platforms with low processing power and in high packet loss environment. Among the three profiles, it offers the least coding efficiency.
Extended Profile (XP)  Intended as the streaming video profile, this profile has relatively high compression capability and some extra tricks for robustness to data losses and server stream switching. The Extended profile is a superset of the Baseline profile . Besides tools of the baseline profile it includes B-, SP- and SI-slices, data partitioning, and interlace coding tools . SP and SI are specifically coded P and I slice respectively which are used for efficient switching between different bitrates in some streaming applications. It however does not include context adaptive binary arithmetic coding (CABAC) . It is thus more complex but also provides better coding efficiency.
Main Profile (MP)  Other than the common features main profile includes tools such as CABAC for entropy coding, B-slices. It does not include any error resilience tools such as FMO . Main profile is used in Broadcast television and high resolution video storage and playback. It also contains interlaced coding tools like extended profile . It is not, however, used for high-definition television broadcasts, as the importance of this profile faded when the High Profile was developed in 2004 for that application.
High profiles are the superset of main profile . High profiles are used for applications such as content-contribution, content-distribution, and studio editing and post-processing . High Profile (HiP)  The primary profile for broadcast and disc storage applications, particularly for high- definition television applications. It also includes additional tools such as adaptive transform block size, quantization scaling matrices. High 10 Profile (Hi10P)  This profile builds on top of the High Profile, adding support for up to 10 bits per sample of decoded picture precision. High 4:2:2 Profile (Hi422P)  Primarily targeting professional applications that use interlaced video, this profile builds on top of the High 10 Profile, adding support for the 4:2:2 chroma sub-sampling format while using up to 10 bits per sample of decoded picture precision.chroma sub-sampling
DIRAC Dirac is a video compression system  developed by the British Broadcasting Corporation (BBC). It utilizes motion compensation and wavelet transforms to provide high-quality video compression for web streaming and HDTV applications . Dirac can compress any size of picture from low resolution QCIF (176 x 144) to HDTV (1920 x 1080). There are two parts in the Dirac development process: (i)a compression specification for the bit stream and decoder, and (ii)software for compression and decompression.
Encoder and Decoder In the Dirac codec, image motion is tracked and the motion information is used to make a prediction of a later frame. A wavelet transform is applied to the prediction error between the current frame and the previous frame aided by motion compensation and the transform coefficients are quantized and entropy coded. Temporal and spatial redundancies are removed by motion estimation, motion compensation and discrete wavelet transform respectively. Dirac uses a more flexible and efficient form of entropy coding called arithmetic coding which packs the bits efficiently into the bit stream . The encoder has the architecture as shown in Figure 9, while the decoder performs the inverse operations as shown in Figure 10.
Wavelet Transform The 2D discrete wavelet transforms provide Dirac with the flexibility to operate at a range of resolutions. Applied to two-dimensional images, wavelet filters are normally applied in both vertical and horizontal directions to each image component to produce four so called sub bands termed Low-Low (LL), Low-High (LH), High-Low (HL) and High-High (HH) . In the case of two dimensions, only the LL band is iteratively decomposed to obtain the required data.
COMPARISON METRICS Structural Similarity Index Metric (SSIM) : This index is a method for measuring the similarity between two frames. It is a full reference metric, or in other words, the measuring of image quality is done using an initial uncompressed or distortion-free frame as reference. Mean Squared Error (MSE) : The MSE is computed by averaging the squared intensity differences of the distorted and reference image/frame pixels. Two distorted images with the same MSE may have very different types of errors, some of which are much more visible than others. Given a noise-free m x n monochrome image I and its noisy approximation K, MSE is defined as:
Peak Signal-to-Noise Ratio (PSNR) : The PSNR is most commonly used as a measure of quality of reconstruction of compression codecs. The signal in this case is the original data, and the noise is the error introduced by compression. PSNR is defined as: where is the maximum possible pixel value of the image. For test sequences in 4:2:0 color format, PSNR is computed as a weighted average of luminance (PSNR-Y) and chrominance (PSNR-U, PSNR-V) components  as given below:
Bjontegaard Delta Bitrate (BD-BR) and Bjontegaard Delta PSNR (BD- PSNR): To objectively evaluate the coding efficiency of video codecs, Bjontegaard Delta PSNR (BD-PSNR) was proposed. Based on the rate-distortion (R-D) curve fitting, BD-PSNR provides a good evaluation of the R-D performance  . BD metrics allow to compute the average gain in PSNR or the average percent saving in bit rate between two rate-distortion curves. However, BD- PSNR has a critical drawback: It does not take the coding complexity into account .
PERFORMANCE ANALYSIS Performance analysis in this project can be done using the following: Profiles used: HM 16.8 , JM 19.0  and Dirac 0.2.0  Test Sequences: CIF and QCIF formats  Quality Metrics: MSE, PSNR, SSIM, BD-BR and BD-PSNR   Measurement tool: MSU Video Quality Measurement Tool (VQMT)  Testing Platform: Processor: Intel Core ™ i5-4210U, 1.7GHz Memory: 4 GB Operating System: 64 bit Windows 8.1 Video codecs are analyzed at various QP (Quantization Parameter) values and quality metrics are measured.
Test Sequence 1: container_cif.yuv Width: 352; Height: 288 Total number of frames: 300 Figure12. Original frame of container_cif.yuv test sequence
HM 16.8 Number of frames used: 100 Frames rate: 30 frames/second (a) QP=0 (b) QP=30(c) QP=50 Figure13. Decoded frames for (a) QP=0 (b) QP=30 (c) QP=50 Figure 13 shows the decoded frames of container_cif.yuv test sequence when QP=0, QP=30 and QP=50. Table 1 shows the implementation results of container_cif.yuv test sequence using HM 16.8. Figure 14 shows the graphical representation of PSNR vs. Bitrate and MSE vs. Bitrate using values from table 1.
Figure14. Graphical representation of (a) PSNR vs. Bitrate and (b) MSE vs. Bitrate for container_cif.yuv test sequence (a) (b)
JM 19: Number of frames used: 100 Frames rate: 30 frames/second (a) QP=0(b) QP=30 (c) QP=50 Figure15. Decoded frames for (a) QP=0 (b) QP=30 (c) QP=50 Figure 15 shows the decoded frames of container_cif.yuv test sequence when QP=0, QP=30 and QP=50. Table 2 shows the implementation results of container_cif.yuv test sequence using JM 19. Figure 16 shows the graphical representation of PSNR vs. Bitrate and MSE vs. Bitrate using values from table 2.
Figure16. Graphical representation of (a) PSNR vs. Bitrate and (b) MSE vs. Bitrate for container_cif.yuv test sequence (a) (b)
Dirac 0.2.0: Number of frames used: 100 Frames rate: 12.5 frames/second (a) QF=0 (b) QF=8(c) QF=12 Figure17. Decoded frames for (a) QF=0 (b) QF=8 (c) QF=12 Figure 17 shows the decoded frames of container_cif.yuv test sequence when QF=0, QF=8 and QF=12. Table 3 shows the implementation results of container_cif.yuv test sequence using Dirac 0.2.0. Figure 18 shows the graphical representation of PSNR vs. Bitrate and MSE vs. Bitrate using values from table 3.
Figure18. Graphical representation of (a) PSNR vs. Bitrate and (b) MSE vs. Bitrate for container_cif.yuv test sequence (a) (b)
Results: Computational complexity is not mentioned. Performance comparison of HEVC, H.264/AVC and Dirac for container_cif.yuv test sequence is shown in figure 19. Similarly, for container_qcif.yuv, coastguard_cif.yuv and coastguard_qcif.yuv test sequences in figures 20,21 and 22 respectively. Figure19. Performance comparison of HEVC, H.264/AVC and Dirac for container_cif.yuv test sequence
Figure20. Performance comparison of HEVC, H.264/AVC and Dirac for container_qcif.yuv test sequence
Figure21. Performance comparison of HEVC, H.264/AVC and Dirac for coastguard_cif.yuv test sequence
Figure22. Performance comparison of HEVC, H.264/AVC and Dirac for coastguard_qcif.yuv test sequence
CONCLUSION The results indicate that the HEVC standard can provide better performance and quality compared to previous standard H.264/AVC. HEVC, Dirac and H.264/AVC maintain a near constant quality at low bit rates, which is beneficial for applications such as video streaming. The syntax and coding structures of the various standards were explained, and the base encoder optimization was described. The PSNR and SSIM increase with increase in the bitrate whereas MSE decreases with increase in the bitrate. The variation in the bitrate is achieved by changing the QP for HEVC and H.264/AVC and QF for Dirac respectively.
REFERENCES  G.J. Sullivan et al, “Overview of the high efficiency video coding (HEVC) standard”, IEEE Trans. on Circuits and Systems for Video Technology (CSVT), vol. 22, pp.1649-1668, Dec. 2012.  T. Wiegand, G. Sullivan, G. Bjontegaard and A. Luthra, “Overview of the H.264/AVC video coding standard,” IEEE Trans. on Circuits and Systems for Video Technology (CSVT), vol. 13, pp.560-576, July 2003.  “The Dirac web page”:http://www.bbc.co.uk/rd/projects/dirac/intro.shtml.  G.J. Sullivan et al, “Standardized Extensions of High Efficiency Video Coding (HEVC)”, IEEE Journal of selected topics in Signal Processing, Vol. 7, No. 6, pp. 1001-1016, Dec. 2013.  V. Sze, M. Budagavi and G.J. Sullivan (Editors), “High efficiency video coding: Algorithms and architectures”, Springer 2014.  S.Wenger et al, "RFC 3984 : RTP Payload Format for H.264 Video". p. 2."RFC 3984 : RTP Payload Format for H.264 Video"  S.K.Kwon, A. Tamhankar and K.R.Rao, “Overview of H.264/MPEG-4 Part 10” J.VCIR, Vol. 17, pp. 186-216, April 2006, Special Issue on “Emerging H.264/AVC video coding standard”.  K. R. Rao and D. N. Kim, “Current Video Coding Standards: H.264/AVC, Dirac, AVS China and VC-1,” IEEE 42nd Southeastern symposium on system theory (SSST), March 7-9 2010, pp. 1-8, March 2010.  T. Borer, and T. Davies, “Dirac video compression using open technology”, BBC EBU Technical Review, July 2005
 A. Ravi, and K.R. Rao, “Performance Analysis and Comparison of the Dirac video codec with H.264/MPEG-4 part 10 AVC”, International Journal of Wavelets, Multiresolution and Information Processing (accepted), January 2010. Available: http://www-ee.uta.edu/Dip/Courses/EE5359/index.html http://www-ee.uta.edu/Dip/Courses/EE5359/index.html  BBC Research on Dirac: http://www.bbc.co.uk/rd/projects/dirac/index.shtmlhttp://www.bbc.co.uk/rd/projects/dirac/index.shtml  Dirac software: http://sourceforge.net/projects/dirac/http://sourceforge.net/projects/dirac/  Z. Wang and A.C. Bovik, “A universal image quality index”, IEEE Signal Processing Letters, Vol.9, pp. 81-84, March 2002.  Z. Wang, et al, “Image Quality Assessment: From Error Visibility to Structural Similarity”, IEEE Transactions on Image Processing, vol.13, no.4, pp. 600-612, April 2004.  HEVC software: https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/tags/HM- 16.8  AVC software: http://iphome.hhi.de/suehring/tml/download/jml9.0.ziphttp://iphome.hhi.de/suehring/tml/download/jml9.0.zip  “Video coding for low bit rate communications,” ITUT, ITU-T Recommendation H.263, ver. 1, 1995.  T. Wiegand and G.J. Sullivan, “The picturephone is here. Really,” IEEE Spectrum, vol. 48, pp. 50-54, Sept. 2011.  YUV video sequences: http://trace.eas.asu.edu/yuv/  I.E.G Richardson, “The H.264 advanced video coding standard”, Second Edition, Wiley, 2010
 MSU tool: http://compression.ru/video/quality_measure/video_measurement_tool_en.html http://compression.ru/video/quality_measure/video_measurement_tool_en.html  I.E.G. Richardson, “Video Codec Design: Developing Image and Video Compression Systems”, Wiley, 2002.  J. Vanne et al, “Comparative Rate-Distortion-Complexity Analysis of HEVC and AVC Video Codecs”, IEEE Transactions on Circuits and Systems for Video Technology (CSVT), Vol. 22, No. 12, pp. 1885-1898, Dec. 2012.  X. Li et al, “Rate-complexity-distortion evaluation for hybrid video coding”, IEEE International Conference on Multimedia and Expo (ICME), pp. 685-690, July 2010.  G. Bjontegaard, “Calculation of Average PSNR Differences between RD Curves”, document VCEGM33, ITU-T SG 16/Q 6, Austin, TX, Apr. 2001.  HEVC Software Reference Manual: https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/branches/HM-16.8-dev/doc/software- manual.pdf  Tutorial: D. Grois, et al, “HEVC/H.265 Video Coding Standard (Version 2) including the Range Extensions, Scalable Extensions, and Multiview Extensions,” (Tutorial) Sunday 27 Sept 2015, 9:00 am to 12:30 pm), IEEE ICIP, Quebec City, Canada, 27 – 30 Sept. 2015. This tutorial is for personal use only. Password: a2FazmgNK https://datacloud.hhi.fraunhofer.de/owncloud/public.php?service=files&t=8edc97d26d46d44 58a9c1a17964bf881  M. Wien, “HEVC – coding tools and specifications”, Tutorial, IEEE ICME, San Jose, CA, July 2013