Presentation on theme: "Overview of the H.264/AVC Video Coding Standard"— Presentation transcript:
1Overview of the H.264/AVC Video Coding Standard Ahmed HamzaSchool of Computing ScienceSimon Fraser UniversityFebruary, 2009
2Outline Overview Network Abstraction Layer (NAL) Video Coding Layer (VCL)Profiles and ApplicationsPerformance ComparisonConclusions
3Outline Overview Network Abstraction Layer (NAL) Video Coding Layer (VCL)Profiles and ApplicationsPerformance ComparisonConclusions
4History of H.264/AVCInitiated by the Video Coding Experts Group (VCEG) in early 1998Previous name H.26LTarget to double the coding efficiencyFirst draft was adopted in Oct. of 1999In Dec. of 2001, VCEG and the Moving Pictures Experts Group (MPEG) formed a Joint Video Team (JVT)Approved by the ITU-T as H.264 and ISO/IEC as International Standard (MPEG-4 part 10) Advanced Video Codec (AVC) in Mar. 2003MPEG -> A study group that develops standards for International Standards Organization (ISO)VCEG -> A study group of the International Telecommunications Union (ITU)
5Timeline of Video Development Video TelephonyVideo-CDVideo ConferencingDigital TV/DVDMPEG-2: an enabling technology for digital television systemsinitially designed to extend MPEG-1 and support interlaced video codingUsed by transmission of SD/HD TV signal and storage of SD video onto DVDs.H.263/+/++: conversation/communication applicationSupport diversified networks and adapted to their characteristics, e.g. PSTN, mobile, LAN/InternetConsidering loss and error robustness requirementsMPEG-4 Visual: object-based video codingProvide shape coding
6MotivationCreate a standard capable of providing good video quality at substantially lower bit rates than previous standards (e.g. half or less), without increasing the complexity of design so much that it would be impractical or excessively expensive to implement.Provide enough flexibility to allow the standard to be applied to a wide variety of applications on a wide variety of networks and systemsincluding low and high bit rates, low and high resolution video, broadcast, DVD storage, RTP/IP packet networks, and ITU-T multimedia telephony systems.
7H.264/AVC Coding Standard Various Applications Challenge: Broadcast: cable, satellites, terrestrial, and DSLStorage: DVDs (HD DVD and Blu-ray)Video Conferencing: over different networksMultimedia Streaming: live and on-demandMultimedia Messaging Services (MMS)Challenge:How to handle all these applications and networksFlexibility and customizabilityH.264/AVC consider a variety of applications over heterogeneous networks, such as broadcast over unidirectional comm. Channel, optical/magnetic media that supports sequential playouts as well as random access (jump to some points), low bit rate video conferencing, possibly high bit rate video streaming, and digital video mails.Often these applications impose different requirements, e.g., high/low bit rates, while the networks have different characteristics, e.g., low/high latency (satellites), low/high loss rate (wire/wireless) networks
8Structure of H.264/AVC Codec Layered designNetwork Abstraction Layer (NAL)formats video and meta data for variety of networksVideo Coding Layer (VCL)represents video in an efficient wayScope of H.264 standardTo address the heterogeneity of applications and networks, H.264/AVC adopt a layered design, where …
9Outline Overview Network Abstraction Layer (NAL) Video Coding Layer (VCL)Profiles and ApplicationsPerformance ComparisonConclusions
10Network Abstraction Layer The purpose of separately specifying the VCL and NAL is to distinguish between coding- specific features (at the VCL) and transport- specific features (at the NAL)Provide network friendliness for sending video data over various network transports, such as:RTP/IP for Internet applicationsMPEG-2 streams for broadcast servicesISO File formats for storage applicationsThe main objective of NAL is to enable simple and effective customization of sending video data over different network systems.
11NAL Units Packets consist of video data short packet header: one byteSupport two types of transportsstream-oriented: no free unit boundaries use a 3-byte start code prefixpacket-oriented: start code prefix is a wasteCan be classified into:VCL units: data for video picturesNon-VCL units: meta data and additional infoNAL units are packets that carry video data.NAL units have very short packet header, and most of its capacity is used for carrying payloads.… We next present non VCL units
12Sequence of NAL units (Access Unit) Each NAL unit contains a Raw Byte Sequence Payload (RBSP), a set of data corresponding to coded video data or header information.Sequence of NAL units (Access Unit)
13Access Units A set of NAL units Decoding an access unit results in one pictureStructure:Delimiter: for seeking in a streamSEI (Supp. Enhancement Info.): timing and other infoprimary coded picture: VCLredundant coded picture: for error recoveryNow, we know the building blocks. But how to put NAL units together to represent ONE picture?* Delimiter for seeking in byte-stream format* SEI: timing information and other supplemental data that may enhance application usability.* Consist of a set of VCL NAL units which together compose a primary coded picture.* Several redundant slices for error recovery.Redundant coded pictures are usually not decoded. Only when decoders fail to decode the primary coded picture, they will try to decode redundant coded picture.
15Video Sequences and IDR Frames Sequence: an independently decodable NAL unit stream don’t need NALs from other sequenceswith one sequence parameter setstarts with an instantaneous decoding refresh (IDR) access unitIDR frames: random access pointsIntra-coded framesno subsequent picture of an IDR frame will require reference to pictures prior to the IDR framedecoders mark buffered reference pictures unusable once seeing an IDR frameIntuitively, aggregating several access units (pictures) gives us a sequence. But, video sequence carries other requirement in H.264/AVC.
16Outline Overview Network Abstraction Layer (NAL) Video Coding Layer (VCL)Profiles and ApplicationsFeature HighlightsConclusions
17Macroblocks and Slices Fixed size MBs: 16x16 for luma, and 8x8 for chromaSlice: a set of MBs that can be decoded without use of other slicesI slice: intra-prediction (I-MB)P slice: possibly one inter-prediction signal (I- and P-MBs)B slice: up to two inter-prediction signals (I- and B-MBs)SP slice: efficient switch among streamsSI slice: used in conjunction with SP slicesThe only reason that we may require adjacent slices is for deblocking filter.The main feature of SP-frames is that identical SP-frames can be reconstructed even when different reference frames are used for their prediction.SI-frames are used in conjunction with SP-frames.This property make them useful for applications such as random access, adapt to network condition, and error recovery/ resilienceWe next talk about ordering of MBs.
18H.264 Slice Modes Slice Type Description Profile(s) I (Intra) Contains only I macroblocks (each block or MB is predicted from previously coded data within the same slice).AllP (Predicted)Contains P macroblocks (each MB or MB partition is predicted from one list 0 reference picture) and/or I MBs.B (Bi-predictive)Contains B macroblocks (each MB or MB partition is predicted from a list 0 and/or a list 1 reference picture) and/or I macroblocks.Extended and MainSP (Switching P)Facilitates switching between coded streams; contains P and/or I macroblocks.ExtendedSI (Switching I)Facilitates switching between coded streams; contains SI macroblocks (a special type of intra coded MB).An H.264 encoder may optionally insert a picture delimiter RBSP unit at the boundarybetween coded pictures. This indicates the start of a new coded picture and indicates whichslice types are allowed in the following coded picture. If the picture delimiter is not used, thedecoder is expected to detect the occurrence of a new picture based on the header of the firstslice in the new picture.
19Slice Groups Flexible MB Order (FMO) Arbitrary Slice Order (ASO) Multiple slice groups makes it possible to map the sequence of coded MBs to the decoded picture in a number of flexible ways.Arbitrary Slice Order (ASO)sending and receiving the slices of the picture in any order relative to each othercan improve end-to-end delay in real-time applications, particularly when used on networks having out-of-order delivery behavior
20MB to Slice Group Mappings Foreground and BackgroundInterleavedDispersed
21Inter PredictionUnlike earlier standards, H.264 supports a range of block sizes (from 16 × 16 down to 4×4) and fine subsample motion vectors (quarter-sample resolution in the luma component).Partitioning MBs into motion compensated sub- blocks of varying size is known as tree structured motion compensation.
22Inter-Prediction in P Slices Two-level segmentation of MBsLumaMBs are divided into at most 4 partitions (as small as 8x8)8x8 partitions are divided into at most 4 partitionsChroma – half size horizontally and verticallyMaximum of 16 motion vectors for each MBEach chroma block is partitioned in the sameway as the luma component, except that the partition sizes have exactly half the horizontal andvertical resolution.
23Tree-Structured MCWhy do we need different partition sizes? Motion vector are expensiveLarge partition - smooth areaSmall partition: - detailed areaNote: motion vectors are coded using predictive coding
24Why different partition sizes? MVs are expensiveSmooth area large partitionDetailed area small partition
25Inter-Prediction Accuracy Using sub-pel motion estimationGranularity of ¼-pixel for luma, 1/8-pixel for ChromaActual computations are done with addition, bit- shift, and integer arithmeticsThe granularity of motion vectorsfinite impulse response (FIR) filterJ can be computed in two ways…Actual computations are done with addition, bit-shift, and integer arithmetics
26Interpolation of luma half-pel positions Using a 6-tap Finite Impulse Response (FIR) filterWith weights (1/32,−5/32, 5/8, 5/8,−5/32, 1/32).Note that un-rounded versions of h and m are used to generate jb = round((E − 5F + 20G + 20H − 5I + J) /32)
27Interpolation of luma quarter-pel positions Using average of neighborsChroma predictions: bilinear interpolationsa = round((G + b) / 2)
28H.264 EncoderIn the figures, the reference picture is shown as the previous encoded picture Fn−1 but the prediction reference for each macroblock partition (in inter mode) may be chosen from a selection of past or future pictures (in display order) that have already been encoded, reconstructed and filtered.A filter is applied to reduce the effects of blocking distortion and the reconstructed reference picture is created from a series of blocks Fn.
30Intra PredictionUnlike previous standards (e.g. H.263 and MPEG-4 Visual), Intra prediction in H.264 is conducted in the spatial (not the transform) domain.To prevent spatio-temporal error propagation, prediction is restricted to Intra-coded neighboring MBs.
31Intra 4x4 Prediction Modes Samples in 4x4 block are predicted using 13 neighboring sample9 prediction mode: 1 DC and 8 directionalSample D is used if E-H is not availableUsed for luma predicitonWell suited for coding of parts of a picture with significant details.
32Intra 4x4 Prediction Modes DC: use one value to predict the whole block.For example: samples in mode 0 are copied into the 4x4 block.E-H may be unavailable because of (1) decoding order, (2) outside the subject sliceFirst macroblock coded in a picture, we use DC prediction.
33Intra 16x16 Prediction Modes Used for luma predictionMore suited for coding very smooth areas of a picture.
34Deblocking FilterBlock-based coding produces annoying visible block artifacts.H.264 defines an adaptive in-loop deblocking filter that reduces blockiness.The filter smoothes block edges, improving the appearance of decoded frames.Filter is applied after the inverse transform in the encoder (before reconstructing and storing the MB for future predictions) and in the decoder (before reconstructing and displaying the MB).The filtered image is used for motion-compensated prediction of future frames andthis can improve compression performance because the filtered image is often a more faithfulreproduction of the original frame than a blocky, unfiltered image.
35Deblocking FilterBlock edges are typically reconstructed with less accuracy than interior pixels.Basic Idea:Relatively large absolute difference between samples near a block edge is probably a blocking artifact.Very large magnitude difference more likely reflects the actual behaviour of source picture.Whether to filter or not is determined using quantiazation parameter (QP) dependable thresholds.
36Reconstructed, QP=36 (no filter) Reconstructed, QP=36 (with filter) Deblocking FilterOriginal FrameReconstructed, QP=36 (no filter)Reconstructed, QP=36 (with filter)
37Scanning Order of Residual Blocks within an MB If the MB is coded in 16×16 Intra mode, then the block labelled ‘−1’, containing thetransformed DC coefficient of each 4 × 4 luma block, is transmitted first.Next, the luma residual blocks 0–15 are transmitted in the order shown (the DC coefficients in a macroblockcoded in 16×16 Intra mode are not sent).Blocks 16 and 17 are sent, containing a 2×2 array of DC coefficients from the Cb and Cr chroma components respectively and finally, chromaresidual blocks 18–25 (without DC coefficients) are sent.
38TransformH.264 uses three transforms depending on the type of residual data that is to be coded:DCT-based transform for all other 4 × 4 blocks in the residual data.Hadamard transform for the 4×4 array of luma DC coefficients in Intra MBs predicted in 16×16 mode.Hadamard transform for the 2 × 2 array of chroma DC coefficients (in any macroblock).
394x4 Integer Transform Why smaller transform: Only use add and shift, an exact inverse transform is possible no decoding mismatchNot too much residue to codeLess noise around edge (ringing or mosquito noise)Less computations and shorter data type (16-bit)An approximation to 4x4 DCT:It would be desirable if only 16-bit multiplications were required, since many processors cannot perform 32-bit multiplications with a single instructionRinging is still there, just they are smaller, and harder to see.
404x4 Integer Transform Factorizing the matrix multiplication: The symbol ⊗ indicates that each element of (CXCT) is multiplied by the scaling factor in the same position in matrix E (scalar multiplication rather than matrix multiplication)The constants a and b are as before and d is c/b (approximately 0.414).
414x4 Integer TransformTo simplify the implementation of the transform, d is approximated by 0.5.Scale 2nd and 4th rows of C and 2nd and 4th columns of CT by a factor of 2 and scale down E to compensateThus, we avoid multiplications by half in the ‘core’ transform CXCT which could result in loss of accuracyusing integer arithmetic.
422nd Transform and Quantization Parameter 2nd Transform: Intra_16x16 and Chroma modes are for smooth areaDC coefficients are transformed again to cover the whole MBQuantization step is adjusted by an exponential function of quantization parameter to cover a wider range of QSQP increases by 6 => QS doublesQP increases by 1 => QS increases by 12% => bit rate decreases by 12%2nd transform is needed to take advantage of spatial redundancy in lat areas.In addition to cover a wider range, exponential quantization step also simplifies the problem of rate control.
43QuantizationThe mechanisms of the forward and inverse quantisers are complicated by the requirements toavoid division and/or floating point arithmetic, andincorporate the post- and pre-scaling matrices Ef and EiA total of 52 values of Qstep are supported by the standard, indexed by a Quantisation Parameter, QP.Qstep doubles in size for every increment of 6 in QP.The wide range of quantiser step sizes makes it possible for an encoder to control the tradeoff accurately and flexibly between bit rate and quality. The values of QP can be different for luma and chroma.
44Entropy CodingNon-transform coefficients: an infinite-extent codeword table (Exp-Golomb)Transform coefficients:Context-Adaptive Variable Length Coding (CAVLC)several VLC tables are switched dep. on prior transmitted data better than a single VLC tableContext-Adaptive Binary Arithmetic Coding (CABAC)flexible symbol probability than CAVLC 5 – 15% rate reductionefficient: multiplication freePrevious standards usually have SEPARATE VLC tables for different elements, because they tend to have different probability characteristicsIn H.264/AVC a simple and efficient infinite-extent codeword table is shared by all elementsFor different element, only a mapping to this single codeword book is required.Arithmetic coding: non-integer number of bit to each symbol
45Exp-Golomb CodesFor pre-defined code tables (e.g. pre-calculated Huffman-based coding), encoder and decoder must store table in some form.Exponential Golomb (Exp-Golomb) codes use codes that can be generated automatically on- the-fly if input symbol is known.Exp-Golomb codes are VLCs with a regular construction.
47Outline Overview Network Abstraction Layer (NAL) Video Coding Layer (VCL)Profiles and ApplicationsPerformance ComparisonConclusions
48H.264 Profiles The Baseline Profile The Main Profile intra and inter-coding (using I-slices and P-slices)entropy coding with context-adaptive variable-length codes (CAVLC)The Main Profilesupports interlaced videointer-coding using B-slicesinter coding using weighted predictionentropy coding using context-based arithmetic coding (CABAC)The Extended Profiledoes not support interlaced video or CABACbut adds modes to enable efficient switching between coded bitstreams (SP- and SI-slices) and improved error resilience (Data Partitioning).
50Multi-frame Inter-Prediction in B Slices Weighted average of2 predictionsB-slices can be usedas referenceTwo reference picturelists are usedOne out of four pred. methods for each partition:list 0list 1bi-predictivedirect prediction: inferred from prior MBsThe MB can be coded in B_skip mode (similar to P_skip)
51Partition Prediction in B Slices Each MB partition in an inter coded MB in a B slice may be predicted from one or two reference pictures, before or after the current picture in temporal order.
52Potential Applications Baseline (low latency)H.320 conversational video services3GPP conversational H.324/M servicesH.323 with IP/RTP3GPP using IP/RTP and SIP3GPP streaming using IP/RTP and RTSPMain (moderate latency)Modified H.222.0/MPEG-2Broadcast via satellite, cable, terrestrial or DSLDVD and VODExtendedStreaming over wired InternetAny (no requirement on latency)3GPP MMSVideo mail
53Outline Overview Network Abstraction Layer (NAL) Video Coding Layer (VCL)Profiles and ApplicationsPerformance ComparisonConclusions
55Outline Overview Network Abstraction Layer (NAL) Video Coding Layer (VCL)Profiles and ApplicationsPerformance ComparisonConclusions
56ConclusionsH.264 provides mechanisms for coding video that are optimised for compression efficiency and aim to meet the needs of practical multimedia communication applications.The success of a practical implementation of H.264 (or MPEG-4 Visual) depends on careful design of the CODEC and effective choices of coding parameters.
57ReferencesWiegand, T.; Sullivan, G.J.; Bjontegaard, G.; Luthra, A., "Overview of the H.264/AVC video coding standard," Circuits and Systems for Video Technology, IEEE Transactions on , vol.13, no.7, pp , July 2003Sullivan, G.J.; Wiegand, T., "Video Compression - From Concepts to the H.264/AVC Standard," Proceedings of the IEEE , vol.93, no.1, pp.18-31, Jan. 2005Richardson, I.; " H.264 and MPEG-4 Video Compression: Video Coding for Next Generation Multimedia," Wiley, 2003Richardson, I.; "Video Codec Design: Developing Image and Video Compression Systems," Wiley, 2002
64Macroblock Syntax Elements mb_typeWhether the macroblock is coded in intra or inter (P or B) mode; determines macroblock partition size.mb_predDetermines intra prediction modes (intra MBs); determines list 0 and/or list 1 references and differentially coded motion vectors for each macroblock partition (inter MBs, except for inter MBs with 8 × 8 MB partition size).sub_mb_pred(Inter MBs with 8 × 8 MB partition size only) Determines sub-MB partition size for each sub-MB; list 0 and/or list 1 references for each MB partition.coded_block_patternWhich 8 × 8 blocks (luma and chroma) contain coded transform coefficients.mb_qp_deltaChanges the quantiser parameter.residualCoded transform coefficients corresponding to the residual image samples after predictionEach MB contains a series of header elements (see Table) and coded residual data.
65Design Features Highlights Features for enhancement of predictionDirectional spatial prediction for intra codingVariable block-size motion compensation with small block sizeQuarter-sample-accurate motion compensationMotion vectors over picture boundariesMultiple reference picture motion compensationDecoupling of referencing order form display orderDecoupling of picture representation methods from picture referencing capabilityWeighted predictionImproved “skipped” and “direct” motion inferenceIn-the-loop deblocking filtering
66Design Features Highlights Features for improved coding efficiencySmall block-size transformExact-match inverse transformShort word-length transformHierarchical block transformArithmetic entropy codingContext-adaptive entropy coding
67Design Features Highlights Features for robustness to data errors/lossesParameter set structureNAL unit syntax structureFlexible slice sizeFlexible macroblock ordering (FMO)Arbitrary slice ordering (ASO)Redundant picturesData PartitioningSP/SI synchronization/switching pictures