Overview of the H.264/AVC Video Coding Standard

Slides:



Advertisements
Similar presentations
Università degli studi Roma Tre
Advertisements

March 24, 2004 Will H.264 Live Up to the Promise of MPEG-4 ? Vide / SURA March Marshall Eubanks Chief Technology Officer.
with RGB Reversibility
AVC Compression Update: FRExt and future Matthew Goldman Vice President of Technology Compression Systems.
Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.
KIANOOSH MOKHTARIAN SCHOOL OF COMPUTING SCIENCE SIMON FRASER UNIVERSITY 6/24/2007 Overview of the Scalable Video Coding Extension of the H.264/AVC Standard.
Overview of the H.264/AVC Video Coding Standard
MPEG-2 to H.264/AVC Transcoding Techniques Jun Xin Xilient Inc. Cupertino, CA.
H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005.
A Performance Analysis of the ITU-T Draft H.26L Video Coding Standard Anthony Joch, Faouzi Kossentini, Panos Nasiopoulos Packetvideo Workshop 2002 Department.
Basics of MPEG Picture sizes: up to 4095 x 4095 Most algorithms are for the CCIR 601 format for video frames Y-Cb-Cr color space NTSC: 525 lines per frame.
A Brief Overview of the MPEG2 Standard Dr. David Corrigan.
2004 NTU CSIE 1 Ch.6 H.264/AVC Part2 (pp.200~222) Chun-Wei Hsieh.
Overview of the H. 264/AVC video coding standard.
-1/20- MPEG 4, H.264 Compression Standards Presented by Dukhyun Chang
Chapter 11.3 MPEG-2 MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications: Simple,
Error Resilience for MPEG-4 Environment Nimrod Peleg Nov
Technion - IIT Dept. of Electrical Engineering Signal and Image Processing lab Transrating and Transcoding of Coded Video Signals David Malah Ran Bar-Sella.
1 Video Coding Concept Kai-Chao Yang. 2 Video Sequence and Picture Video sequence Large amount of temporal redundancy Intra Picture/VOP/Slice (I-Picture)
SWE 423: Multimedia Systems
H.264/AVC Baseline Profile Decoder Complexity Analysis Michael Horowitz, Anthony Joch, Faouzi Kossentini, and Antti Hallapuro IEEE TRANSACTIONS ON CIRCUITS.
Ch. 6- H.264/AVC Part I (pp.160~199) Sheng-kai Lin
Overview of Error Resiliency Schemes in H.264/AVC Standard Sunil Kumar, Liyang Xu, Mrinal K. Mandal, and Sethuraman Panchanathan Elsevier Journal of Visual.
Overview of the H.264/AVC Video Coding Standard
H.264 / MPEG-4 Part 10 Nimrod Peleg March 2003.
Decision Trees for Error Concealment in Video Decoding Song Cen and Pamela C. Cosman, Senior Member, IEEE IEEE TRANSACTION ON MULTIMEDIA, VOL. 5, NO. 1,
Overview of the H.264/AVC Video Coding Standard ThomasWiegand, Gary J. Sullivan, Gisle Bj ø ntegaard, and Ajay Luthra IEEE TRANSACTIONS ON CIRCUITS AND.
BY AMRUTA KULKARNI STUDENT ID : UNDER SUPERVISION OF DR. K.R. RAO Complexity Reduction Algorithm for Intra Mode Selection in H.264/AVC Video.
H.264/AVC for Wireless Applications Thomas Stockhammer, and Thomas Wiegand Institute for Communications Engineering, Munich University of Technology, Germany.
H.264/AVC.
1 Image and Video Compression: An Overview Jayanta Mukhopadhyay Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur,
An Introduction to H.264/AVC and 3D Video Coding.
Delivering More Video Content at Half the Cost Using MPEG-4 AVC Bob Wilson Chairman & CEO
MPEG-2 Digital Video Coding Standard
H.264 ITU-T H.264 or ISO/IEC IS (MPEG-4 part 10) Advanced Video Coding (AVC)
Page 19/15/2015 CSE 40373/60373: Multimedia Systems 11.1 MPEG 1 and 2  MPEG: Moving Pictures Experts Group for the development of digital video  It is.
Profiles and levelstMyn1 Profiles and levels MPEG-2 is intended to be generic, supporting a diverse range of applications Different algorithmic elements.
Windows Media Video 9 Tarun Bhatia Multimedia Processing Lab University Of Texas at Arlington 11/05/04.
Vineeth Shetty Kolkeri University of Texas, Arlington
Outline JVT/H.26L: History, Goals, Applications, Structure
- By Naveen Siddaraju - Under the guidance of Dr K R Rao Study and comparison of H.264/MPEG4.
Video Compression Standards for High Definition Video : A Comparative Study Of H.264, Dirac pro And AVS P2 By Sudeep Gangavati EE5359 Spring 2012, UT Arlington.
EE 5359 TOPICS IN SIGNAL PROCESSING PROJECT ANALYSIS OF AVS-M FOR LOW PICTURE RESOLUTION MOBILE APPLICATIONS Under Guidance of: Dr. K. R. Rao Dept. of.
By: Hitesh Yadav Supervising Professor: Dr. K. R. Rao Department of Electrical Engineering The University of Texas at Arlington Optimization of the Deblocking.
High Efficiency Video Coding Kiana Calagari CMPT 880: Large-scale Multimedia Systems and Cloud Computing.
- By Naveen Siddaraju - Under the guidance of Dr K R Rao Study and comparison between H.264.
Fundamentals of Multimedia Chapter 12 MPEG Video Coding II MPEG-4, 7 Ze-Nian Li & Mark S. Drew.
Figure 1.a AVS China encoder [3] Video Bit stream.
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
Vineeth Shetty Kolkeri University of Texas, Arlington
High-efficiency video coding: tools and complexity Oct
1 Modular Refinement of H.264 Kermin Fleming. 2 What is H.264? Mobile Devices Low bit-rate Video Decoder –Follow on to MPEG-2 and H.26x Operates on pixel.
Video Compression—From Concepts to the H.264/AVC Standard
Block-based coding Multimedia Systems and Standards S2 IF Telkom University.
Video Compression and Standards
Introduction to MPEG Video Coding Dr. S. M. N. Arosha Senanayake, Senior Member/IEEE Associate Professor in Artificial Intelligence Room No: M2.06
Implementation and comparison study of H.264 and AVS china EE 5359 Multimedia Processing Spring 2012 Guidance : Prof K R Rao Pavan Kumar Reddy Gajjala.
Multi-Frame Motion Estimation and Mode Decision in H.264 Codec Shauli Rozen Amit Yedidia Supervised by Dr. Shlomo Greenberg Communication Systems Engineering.
Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.
Present by 楊信弘 Advisor: 鄭芳炫
CSI-447: Multimedia Systems
Thomas Daede October 5, 2017 AV1 Update Thomas Daede October 5, 2017.
Overview of the Scalable Video Coding
Research Topic Error Concealment Techniques in H.264/AVC for Wireless Video Transmission Vineeth Shetty Kolkeri EE Graduate,UTA.
Supplement, Chapters 6 MC Course, 2009.
ENEE 631 Project Video Codec and Shot Segmentation
Introduction to H.264/AVC Video Coding
Standards Presentation ECE 8873 – Data Compression and Modeling
MPEG4 Natural Video Coding
H.264/AVC Video Coding Standard
Presentation transcript:

Overview of the H.264/AVC Video Coding Standard T. Wiegand, G. Sullivan, G. Bjontegaard, and A. Luthra Overview of the H.264/AVC Video Coding Standard, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 13, No. 7, pp. 560-576, July 2003. CMPT 820: Multimedia Systems

Outline Overview Network Abstraction Layer (NAL) Video Coding Layer (VCL) Profiles and Applications Feature Highlights Conclusions

Evolution of Video Compression Standards ITU-T MPEG H.261 Video Telephony MPEG-1 Video-CD H.262/MPEG-2 Digital TV/DVD H.263 Video Conferencing MPEG-4 Visual Object-based Coding MPEG-2: an enabling technology for digital television systems initially designed to extend MPEG-1 and support interlaced video coding Used by transmission of SD/HD TV signal and storage of SD video onto DVDs. H.263/+/++: conversation/communication application Support diversified networks and adapted to their characteristics, e.g. PSTN, mobile, LAN/Internet Considering loss and error robustness requirements MPEG-4 Visual: object-based video coding Provide shape coding H.264 MPEG-4 AVC

H.264/AVC Coding Standard Various Applications Challenge: Broadcast: cable, satellites, terrestrial, and DSL Storage: DVDs (HD DVD and Blu-ray) Video Conferencing: over different networks Multimedia Streaming: live and on-demand Multimedia Messaging Services (MMS) Challenge: How to handle all these applications and networks Flexibility and customizability H.264/AVC consider a variety of applications over heterogeneous networks, such as broadcast over unidirectional comm. Channel, optical/magnetic media that supports sequential playouts as well as random access (jump to some points), low bit rate video conferencing, possibly high bit rate video streaming, and digital video mails. Often these applications impose different requirements, e.g., high/low bit rates, while the networks have different characteristics, e.g., low/high latency (satellites), low/high loss rate (wire/wireless) networks

Structure of H.264/AVC Codec Layered design Network Abstraction Layer (NAL) formats video and meta data for variety of networks Video Coding Layer (VCL) represents video in an efficient way Scope of H.264 standard To address the heterogeneity of applications and networks, H.264/AVC adopt a layered design, where …

Outline Overview Network Abstraction Layer (NAL) Video Coding Layer (VCL) Profiles and Applications Feature Highlights Conclusions

Network Abstraction Layer Provide network friendliness for sending video data over various network transports, such as: RTP/IP for Internet applications MPEG-2 streams for broadcast services ISO File formats for storage applications We present a few NAL concepts The main objective of NAL is to enable simple and effective customization of sending video data over different network systems.

NAL Units Packets consist of video data short packet header: one byte Support two types of transports stream-oriented: no free unit boundaries  use a 3-byte start code prefix packet-oriented: start code prefix is a waste Can be classified into: VCL units: data for video pictures Non-VCL units: meta data and additional info NAL units are packets that carry video data. NAL units have very short packet header, and most of its capacity is used for carrying payloads. … We next present non VCL units

Non-VCL NAL Units Two types of non-VCL NAL units Parameter sets: headers shared by a large number of VCL NAL units a VCL NAL unit has a pointer to its picture parameter set a picture parameter set points to its sequence parameter set Supplemental enhancement info (SEI): optional info for higher-quality reconstruction and/or better application usability Sent over in-band or out-of-band channels Parameter sets units reduce the traffic amount by exploring the header redundancy among VCL NAL units There are two type of parameter sets. Picture parameter set contains headers for one or a few pictures. Sequence parameter set contains (less frequently changed) headers for even more pictures. SEI units are not mandatory and are used to improve the playout quality. For example, rate-distortion info can be carried by SEI units. Also, SEI is quite extensible, and applications can define their custom SEI messages. Out-of-band often enables more flexibility, such as sending parameter sets with stronger FEC code, possible retransmissions, and higher routing priority

Access Units A set of NAL units Decoding an access unit results in one picture Structure: Delimiter: for seeking in a stream SEI: timing and other info primary coded picture: VCL redundant coded picture: for error recovery Now, we know the building blocks. But how to put NAL units together to represent ONE picture? * Delimiter for seeking in byte-stream format * SEI: timing information and other supplemental data that may enhance application usability. * Consist of a set of VCL NAL units which together compose a primary coded picture. * Several redundant slices for error recovery. Redundant coded pictures are usually not decoded. Only when decoders fail to decode the primary coded picture, they will try to decode redundant coded picture.

Video Sequences and IDR Frames Sequence: an independently decodable NAL unit stream  don’t need NALs from other sequences with one sequence parameter set starts with an instantaneous decoding refresh (IDR) access unit IDR frames: random access points Intra-coded frames no subsequent picture of an IDR frame will require reference to pictures prior to the IDR frame decoders mark buffered reference pictures unusable once seeing an IDR frame Intuitively, aggregating several assess units (pictures) gives us a sequence. But, video sequence carries other requirement in H.264/AVC.

Outline Overview Network Abstraction Layer (NAL) Video Coding Layer (VCL) Profiles and Applications Feature Highlights Conclusions

Video Coding Layer (VCL) (Like other) hybrid video coding: H.264/AVC represents pictures in macroblocks Motion compensation: temporal redundancy Transform: spatial redundancy Small improvements add up to huge gain Combining many coding tools together Pictures/Frames/Fields Fields: top/bottom field contains even/odd rows Interlaced: two fields were captured at diff time Progressive: otherwise Like prior video coding standards, H.264 is a hybrid coder, which use MC to exploit temporal redundancy, and transform to exploit spatial redundance. These coding tools are not totally new. In fact, many of them have been proposed many years ago, but have been thought impractical due to computational complexity. They become acceptable today because of the advance of computer hardware. Each picture contains a frame, where each frame has two fields. Interlaced

Macroblocks and Slices Fixed size MBs: 16x16 for luma, and 8x8 for chroma Slice: a set of MBs that can be decoded without use of other slices I slice: intra-prediction (I-MB) P slice: possibly one inter-prediction signal (I- and P-MBs) B slice: up to two inter-prediction signals (I- and B-MBs) SP slice: efficient switch among streams SI slice: used in conjunction with SP slices The only reason that we may require adjacent slices is for deblocking filter. The main feature of SP-frames is that identical SP-frames can be reconstructed even when different reference frames are used for their prediction. SI-frames are used in conjunction with SP-frames. This property make them useful for applications such as random access, adapt to network condition, and error recovery/ resilience We next talk about ordering of MBs.

Flexible Macroblock Ordering (FMO) MBs in a slice: in raster-order Slice group: more flexible Each slice group contains one or several slices Possible usages: Region-of-interest (ROI) Checker-board for video conferencing ROI: several foreground objects, and one left-over background Checker-board, for error concealment.

Adaptive Field Coding Two fields of a frame can be coded as: A single frame (frame mode) Two separate fields (field mode) A single frame with adaptive mode (mixed mode) Picture-adaptive frame/field (PAFF) frame/field decision is made at frame level 16% - 20% bit rate reduction over frame only Macroblock-adaptive frame/field (MBAFF) frame/field decision is made at MB level 14% - 16% bit rate reduction over PAFF  suitable for Interlaced and high motion Why need different modes? In interlaced frames, two adj. row may have low correlation, because frames were taken at different time instances. For regions with more movements, field mode is better, for regions with less movements, frame mode is better. This is why we want to have MBAFF. [Ref] http://scien.stanford.edu/2005projects/ee398/projects/presentations/Guerrero Chan Tsang - Project Presentation - Fast Macroblock Adaptive Coding in H264.ppt

Intra-frame Prediction In spatial domain, using samples to the left and/or on above to predict samples in a MB Types of intra-frame prediction: Intra_4x4: detailed luma block Intra_16x16: smooth luma blocks Chroma_8x8: similar to Intra_16x16 as chroma components are smooth I_PCM: bypass prediction/transform, send samples anomalous pictures, loseless, and predictable bit rate Unlike MPEG-4, intra-frame prediction in H.264 is done in spatial domain. There are four types of intra-frame prediction: 4x4 16x16, chroma, and I_PCM. Why I_PCM: (1) anomalous shape (good coding efficiency), (2) loseless, and (3) predictable bit rate

Intra_4x4 Prediction Samples in 4x4 block are predicted using 13 neighboring sample 8 prediction mode: 1 DC and 8 directional Sample D is used if E-H is not available DC: use one value to predict the whole block. For example: samples in mode 0 are copied into the 4x4 block. E-H may be unavailable because of (1) decoding order, (2) outside the subject slice First macroblock coded in a picture, we use DC prediction.

Sample Intra_4x4 Prediction Interpolation is used in some modes [Ref] Foreman sequence, http://www.vcodex.com/files/h264_intrapred.pdf

Intra_16x16 Prediction 4 modes Vertical Horizontal DC Planer (Diagonal) Use 33 neighboring samples. Good for smooth/flat areas.

Inter-Prediction in P Slices Two-level segmentation of MBs Luma MBs are divided into at most 4 partitions (as small as 8x8) 8x8 partitions are divided into at most 4 partitions Chroma – half size horizontally and vertically Maximum of 16 motion vectors for each MB Why do we need different partition sizes? Motion vector are expensive Large partition - smooth area Small partition: - detailed area Note: motion vectors are coded using predictive coding

Examples of Segmentation [Ref] http://www.vcodex.com/files/h264_interpred.pdf

Inter-Prediction Accuracy ¼-pixel for luma, 1/8-pixel for Chroma Half-pixel samples: 6-tap FIR filter Quarter-pixel samples: average of neighbors Chroma predictions: bilinear interpolation C D A B E K L M N O P F G H I J T U R S cc dd ee ff aa bb gg hh b a c e f g i j k p q r d h n m s The granularity of motion vectors finite impulse response (FIR) filter J can be computed in two ways… Actual computations are done with addition, bit-shift, and integer arithmetics

Multiframe Inter-Prediction in P Slices More than one prior reference pictures (by diff. MBs) Encoders/decoders buffer the same reference pictures for inter-prediction Reference index is used when coding MVs MVs for regions smaller than 8x8 uses the same index for all MVs in the 8x8 region P_skip mode: Don’t send residual signals nor MVs nor reference index Use buffered frame 0 as the reference picture Use neighbor’s MVs Large areas with no change or constant motion like slow panning can be represented with very few bits. Prior standards often couple the playout order with reference order. E.g., a B frame use two adjacent I or P frames for inter-prediction. This is not true anymore in H.264/AVC. Several reference frames can be used, and are organized as a buffered frame list Full sample, half sample and quarter sample predictions represent different degrees of low pass filtering, which is chosen automatically by ME. P_skip is a mode of P-predicted macroblock (16x16)

Multiframe Inter-Prediction in B Slices Weighted average of 2 predictions B-slices can be used as reference Two reference picture lists are used One out of four pred. methods for each partition: list 0 list 1 bi-predictive direct prediction: inferred from prior MBs The MB can be coded in B_skip mode (similar to P_skip)

4x4 Integer Transform Why smaller transform: Only use add and shift, an exact inverse transform is possible  no decoding mismatch Not too much residue to code Less noise around edge (ringing or mosquito noise) Less computations and shorter data type (16-bit) An approximation to 4x4 DCT: Ringing is still there, just they are smaller, and harder to see. , where

2nd Transform and Quantization Parameter 2nd Transform: Intra_16x16 and Chroma modes are for smooth area DC coefficients are transformed again to cover the whole MB Quantization step is adjusted by an exponential function of quantization parameter  to cover a wider range of QS QP increases by 6 => QS doubles QP increases by 1 => QS increases by 12% => bit rate decreases by 12% 2nd transform is needed to take advantage of spatial redundancy in lat areas. In addition to cover a wider range, exponential quantization step also simplifies the problem of rate control.

Entropy Coding Non-transform coefficients: an infinite-extent codeword table Transform coefficients: Context-Adaptive Variable Length Coding (CAVLC) several VLC tables are switched dep. on prior transmitted data  better than a single VLC table Context-Adaptive Binary Arithmetic Coding (CABAC) flexible symbol probability than CAVLC  5 – 15% rate reduction efficient: multiplication free Previous standards usually have SEPARATE VLC tables for different elements, because they tend to have different probability characteristics In H.264/AVC a simple and efficient infinite-extent codeword table is shared by all elements For different element, only a mapping to this single codeword book is required. Arithmetic coding: non-integer number of bit to each symbol

In-loop Deblocking Filter Operate within coding loop Use filtered frames as ref. frames  improve coding efficiency Adaptive deblocking, need to determine Blocking effects or object edges Strong or weak deblocking Intuitions Large difference near a block edge -> likely a block artifact If the difference is too large to be explained by the QP difference -> likely a real edge In-loop is better than post-filter, because it can improve the quality. E.g., Filter p0 and q0 if

Hypothetical Reference Decoder (HRD) Standard receiver buffer models  encoders must produce bit streams that are decodable to HRD Two buffers Coded picture buffer (CPB) models the bit arrival and removal time Decoded picture buffer (DPB) models the frame decoded and output time in reference frame lists So, if a designer mimics the behavior of HRD, his decoder will work

Outline Overview Network Abstraction Layer (NAL) Video Coding Layer (VCL) Profiles and Applications Feature Highlights Conclusions

Profiles and Applications Defines a set of coding tools and algorithms Conformance points for interoperability 3 Profiles for different applications Baseline – video conferencing Main – broadcast, media storage, digital cinema Extended – streaming over IP (wire/wireless) 15 Levels pic size decoding rate (MB/s) bit rate buffer size Baseline: low latency real-time [Ref] J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira, T. Stockhammer, and T. Wedi Video coding with H.264/AVC: tools, performance and complexity IEEE Circuits and Systems Magazine 4(1) pp. 7 - 28 May 2004

Outline Overview Network Abstraction Layer (NAL) Video Coding Layer (VCL) Profiles and Applications Feature Highlights Conclusions

Feature Highlights -- Prediction Variable blocksize MCs Quarter-sample accurate MCs MVs over pic. Boundaries Multiple reference pictures Weighted Bi-directional prediction Decoupling of referencing from display orders Decoupling prediction mode from reference capability (uses B frames as reference) Improved Skip/Direct modes Intra prediction in Spatial domain In-loop deblocking filter

Feature Highlights -- Transform Small block-size transform 2-level block transform (repeated DC transform) Short data-type transform (16-bit) Exact inverse transform Context-adaptive entropy coding Arithmetic entropy coding

Feature Highlights -- Network Parameter set structure (efficient) NAL unit syntax structure (flexibility) Flexible slice size Flexible macroblock ordering (FMO) Arbitrary slice ordering (ASO) Redundant pictures Data partitioning (unequal error protection) SP/SI switching pictures

Outline Overview Network Abstraction Layer (NAL) Video Coding Layer (VCL) Profiles and Applications Feature Highlights Conclusions

Conclusions Key improvements Enhanced prediction (intra- and inter-) Small block size exact match transform Adaptive in-loop deblocking filter Enhanced entropy coding method [Ref] G Sullivan and T. Wiegand, Video Compression—From Concepts to the H.264/AVC Standard, Proc. of IEEE, 93(1), Jan 2005