Presentation is loading. Please wait.

Presentation is loading. Please wait.

Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Università degli studi Roma Tre Overview of the H.264AVC video coding standard Maiorana.

Similar presentations


Presentation on theme: "Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Università degli studi Roma Tre Overview of the H.264AVC video coding standard Maiorana."— Presentation transcript:

1 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Università degli studi Roma Tre Overview of the H.264AVC video coding standard Maiorana Emanuele

2 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Agenda Introduction Project overview & timeline Standardization concepts Codec technical design New Fewtures Prediction De-Blocking Entropy Coding Profiles & levels Comparisons

3 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Introduction H.264/AVC is newest video coding standard developed by the ITU- T/ISO/IEC Joint Video Team (JVT), consisting of experts from: ITU-T Video Coding Experts Group (VCEG) ISO/IEC Moving Picture Experts Group (MPEG) Its design represents a delicate balance between: Coding Gain (improved efficiency by a factor of two over MPEG-2) Implementation complexity Costs based on state of VLSI (ASICs and Microprocessors) design technology

4 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Terminology The following terms are used interchangeably: H.26L The Work of the JVT or JVT CODEC JM2.x, JM3.x, JM4.x The Thing Beyond H.26L The AVC or Advanced Video CODEC Proper Terminology going forward: MPEG-4 Part 10 (Official MPEG Term) ISO/IEC AVC H.264 (Official ITU Term)

5 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC History The digital representation of the TeleVision signals created many services for the content delivery: Satellite Cable TV Terrestrial Broadcasting ADSL and Fiber on IP To optimize this services, there is the need of: High Quality of Service (QoS) Low Bit-Rate Low Power Consumption The source coding is responsible for the reduction of the bit-rate. Example: the complete transmission of the TV signal, as in Recommendation ITU-R BT.601, would require: 720 × (360 × 576) × 25 × 8= 166 Mbit/s Conflicting

6 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Video Coding History Efforts in maximizing coding efficiency while dealing with: diversification of network types characteristic formatting and loss/error robustness requirements. ITU Standard for VideoTelephony H.261, H.263, H.263+ ISO-MPEG Standard MPEG-1:medium quality, physical support MPEG-2:medium/high quality, physical and transmission support MPEG-4:audio video objects

7 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Evolution (1/2) MPEG-2 Introduction MPEG-4 in Comparison

8 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Evolution (2/2) H.26L Provides Focus MPEG-4 Adopts H.26L

9 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Codec Defects Blocking Low OriginalHigh

10 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Codec Defects Packet Loss

11 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Video Coding History Early 1998: Started as ITU-T Q.6/SG16 (VCEG - Video Coding Experts Group) H.26L standardization activity August 1999: first draft design July 2001: MPEG open call for AVC technology: H.26L wins December 2001: Formation of the Joint Video Team (JVT) between VCEG and MPEG to finalize H.26L as a joint project similar to MPEG-2/H.262 July 2002: Final Committee Draft status in MPEG March 2003: formal approval submission October 2004: final ITU-T and ISO approvation

12 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Versions

13 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC JVT Project Technical Objectives Primary technical objectives: Significant improvement in coding efficiency: Average bit rate reduction of 50% compared to any other video standard Network-friendly video representation for conversational (video telephony) and non-conversational (storage, broadcast or streaming) applications Error resilient coding Simple syntax specification and targeting simple and clean solutions The scope of the standardization is only the central decoder, by: imposing restrictions on the bitstream and syntax defining the decoding process such that every conforming decoder will produce similar output with an encoded bitstream input Pre-ProcessingEncoding Post-Processing & Error Recovery Decoding Source Destination Scope of the Standard

14 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Applications The new standard is designed for technical solutions including at least the following application areas Broadcast over cable, satellite, Cable Modem, DSL, terrestrial, etc. Interactive or serial storage on optical and magnetic devices, DVD, etc. Conversational services over ISDN, Ethernet, LAN, DSL, wireless and mobile networks, modems, etc. or mixtures of these. Video-on-demand or multimedia streaming services over ISDN, Cable Modem, DSL, LAN, wireless networks, etc. Multimedia Messaging Services (MMS) over ISDN, DSL, Ethernet, LAN, wireless and mobile networks, etc. How to handle this variety of applications and networks?

15 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC H.264 Design To address this need for flexibility and customizability, the H.264/AVC design covers a: Video Coding Layer (VCL): representation of the video content (performing all the classic signal processing tasks) Network Abstraction Layer (NAL): adaptation of VCL representations in a manner appropriate for conveyance by a variety of transport layers or storage media Video Coding Layer H.320 Control Data Coded Macroblock Coded Slice/Partition etc.MPEG-2H323/IPMP4FF Network Abstraction Layer Data Partitioning

16 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Features enhancing coding efficiency (1) Enhancements on picture encoding is enabled through value prediction methods Variable block-size motion compensation with small block sizes Quarter-sample-accurate motion compensation Multiple reference picture motion compensation Weighted prediction Directional spatial prediction for intra coding In-the-loop deblocking filtering

17 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Features enhancing coding efficiency (2) Enhancements on picture encoding is enabled through high performance tools Small block-size transform Hierarchical block transform Short word-length transform Exact-match inverse transform Arithmetic entropy coding Context-adaptive entropy coding

18 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Features enhancing Robustness Enhancements on Robustness to data errors/losses and flexibility for operation over a variety of network environments is enabled by new design aspects new Parameter set structure NAL unit syntax structure Flexible slice size Flexible macroblock ordering (FMO) Redundant pictures SP/SI synchronization/switching pictures

19 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Network Abstraction Layer The Network Abstraction Layer (NAL) is designed in order to provide "network friendliness, facilitating the ability to map H.264/AVC VCL data to transport layers such as RTP/IP for any kind of real-time wire-line and wireless Internet services (conversational and streaming) File formats, e.g. ISO MP4 for storage and MMS H.32X for wireline and wireless conversational services MPEG-2 systems for broadcasting services, etc. Some key concepts of the Network Abstraction Layer are: NAL Units Use of NAL Units in: Byte stream format systems Packet-Transport systems Parameter Sets Access Units

20 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC NAL Units The coded video data is organized into NAL units, each of which is effectively a packet that contains an integer number of bytes NAL units are classified into VCL NAL units: contain the data associated to the video pictures non-VCL NAL units: contain any associated additional information Header byte: first byte of each NAL unit; contains an indication of the type of data in the NAL unit, and the remaining bytes contain payload data of the type indicated by the header Emulation Prevention Bytes: bytes inserted in the payload data to prevent the accidentally generation of a particular pattern of data called a start code prefix NAL unit stream: series of NAL units generated by an encoder

21 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Use of NAL Units Bitstream-oriented transport systems (H.320, MPEG-2 systems) Delivery of the entire or partial NAL unit stream as an ordered stream of bytes or bits the locations of NAL unit boundaries need to be identifiable In the byte stream format, each NAL unit is prefixed by a specific pattern of three bytes called a start code prefix Packet-oriented transport systems (IP, RTP systems) The coded data is carried in packets that are framed by the system transport protocol the boundaries of NAL units within the packets can be established without use of start code prefix patterns The NAL units can be carried in data packets without start code prefixes.

22 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Parameters Set A Parameter Set contains information that is expected to rarely change. Types of parameter sets: sequence parameter sets: relative to a series of consecutive coded video pictures (coded video sequence) picture parameter sets: relative to one or more individual pictures. Parameter sets can be sent One time (ahead the VCL NAL Units) Many time(to provide robustness) In-band (same VCL NAL Unit Channel) Out-of-Band(different Channel) Out-of-band Transmission

23 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Access Units A set of NAL units in a specified form is referred to as an Access Unit. The decoding of each access unit results in one decoded picture. It can be composed by: access unit delimiter: to aid in locating the start of the access unit. supplemental enhancement information (SEI): containing data such as picture timing information primary coded picture: set of VCL NAL units that represent the samples of the video picture. redundant coded pictures: for use by a decoder in recovering from loss or corruption end of sequence: if the coded picture is the last picture of a coded video sequence end of stream: if the coded picture is the last coded picture in the entire NAL unit stream

24 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Video Coding Layer The VCL design follows the so-called block-based hybrid video coding approach There is no single coding element in the VCL that provides the majority of the significant improvement in compression efficiency in relation to prior video coding standards. Entropy Coding Scaling & Inv. Transform Motion- Compensation Control Data Quant. Transf. coeffs Motion Data Intra/Inter Decoder Motion Estimation Transform/ Scal./Quant. - Input Video Signal Split into Macroblocks 16x16 pixels Intra-frame Prediction De-blocking Filter Output Video Signal Coder Control

25 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Video Coding Layer The picture is split into blocks. Intra coded of the first picture or a random access point Inter coding for all remaining pictures or between random access points Transmission of the motion data as side information Transform of the residual of the prediction (Intra or Inter) Quantization of the transform coefficients Entropy coding and transmission of the quantized transform coefficients, together with the side information

26 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Pictures, frames, and fields A coded pictures can represent either an entire frame or a single field A frame of video can be considered to contain two interleaved fields interlaced frame: the two fields of a frame were captured at different time instants progressive frame The coding representation in H.264/AVC is primarily agnostic with respect to this video characteristic

27 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Adaptive frame/field coding operation In interlaced frames with regions of moving objects, two adjacent rows tend to show a reduced degree of statistical dependency H.264/AVC design allows any of the following decisions for coding a frame: Frame mode: combine the two fields together Field mode: not combine the two fields together The choice can be made adaptively for each frame and is referred to as Picture Adaptive Frame/Field (PAFF) coding Field mode: Motion compensation utilizes reference fields De-blocking filter is not used for horizontal edges of macroblocks Moving region field mode Non-moving region frame mode

28 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Sampling YCbCr color space H.264/AVC uses a sampling structure called 4:2:0 sampling with 8 bits of precision per sample The chroma component has one fourth of the number of samples than the luma component (in both the horizontal and vertical dimensions) Y is called luma, and represents brightness. Cb and Cr are called chroma, and represent the deviation from gray toward blue and red

29 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Macroblocks and Slices Fixed-size macroblocks partition with 16x16 samples of the luma component and 8x8 samples of each of the two chroma components. Slices are a sequence of macroblocks which are processed in the order of a raster scan when not using Flexible Macroblock Ordering (FMO). A picture is a collection of one or more slices in H.264/AVC. Indipendency Each slice can be correctly decoded without use of data from other slices.

30 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Flexible Macroblock Ordering FMO uses the concept of slice groups A set of macroblocks defined by a macroblock to slice group map, specified in the picture parameter set A slice is a sequence of macroblocks within the same slice group Useful for concealment in video conferencing applications

31 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Slice Types I slice: a slice in which all macroblocks of the slice are coded using intra prediction is coded exploiting only the spatial correlation P slice: In addition to the coding types of the I slice, some macroblocks of the P slice can also be coded using inter prediction with backward references (I or P slices) B slice: In addition to the coding types available in a P, some macroblocks of the B slice can also be coded using inter prediction with forward references (I, P or B slices) The following two coding types for slices are new: SP slice: a slice that is coded such that efficient switching between different pre-coded pictures becomes possible SI slice: a slice that allows an exact match of a macroblock in an SP slice for random access and error recovery purposes

32 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Motivation for SP and SI slices The best-effort nature of todays networks causes variations of the effective bandwidth available to a user For Video Streaming, the server should adjusting, on the fly, source encoding parameters Representation of each sequence using multiple and independent streams Prior video encoding standards Switching is possible only at I-frames. H.264 Identical SP-frames can be obtained even when they are predicted using different reference frames.

33 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Intra-frame Prediction In all slice-coding types, the following types of intra coding are supported Intra_4x4 with chroma prediction: areas of a picture with significant detail Intra_16x16 with chroma prediction : very smooth areas of a picture I_PCM: values of anomalous picture content (accurately representation) Intra prediction in H.264/AVC is always conducted in the spatial domain IDR: picture composed of slice I only can be decoded without any reference no subsequent picture in the stream will require reference to pictures prior to IDR Chroma samples: similar prediction technique as for the luma component in Intra_16x16 macroblocks

34 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Intra_4x4 mode M AGH I J K L abcd efgh ijkl mnop B CDEF Labelling of prediction samples(4x4) MAGH I J K L BCDEF 0 (vertical) MAGH I J K L BCDEF 1 (horizontal) MAGH I J K L BCDEF 2 (DC) (A+B+C+D+ I+J+K+L)/8 MAGH I J K L BCDEF 3 (diagonal down-left)

35 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Intra_4x4 mode M AGH I J K L abcd efgh ijkl mnop B CDEF Labelling of prediction samples(4x4) MAGH I J K L BCDEF 4 (diagonal down-right) M AGH I J K L B CDEF 5 (vertical-right) M AG H I J K L B CDEF 6 (horizontal-down) M AG H I J K L B CDEF 7 (vertical-left)

36 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Intra_4x4 mode When samples E-H are not available, they are replaced by D M AGH I J K L abcd efgh ijkl mnop B CDEF Labelling of prediction samples( 4x4) M AG H I J K L B CDEF 8 (horizontal-up)

37 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Intra_16x16 and I_PCM mode Intra_16x16 mode 0: vertical Mode 1: horizontal mode 2: DC Mode 3: plane (a linear plane function is fitted to the upper and left-hand samples in areas of smoothly-varying luminance) I_PCM sends directly the values of the encoded samples, to to precisely represent them

38 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Inter-frame Prediction in P Slices Partitions with luma block sizes of 16x16, 16x8, 8x16, and 8x8 samples are supported by the syntax. In case partitions with 8x8 samples are chosen, one additional syntax element for each 8x8 partition is transmitted. The prediction signal is specified by a translational motion vector a picture reference index a maximum of sixteen motion vectors may be transmitted for a single P macroblock.

39 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Algorithm The encoder selects the best partition size for each part of the frame, to minimize the coded residual and motion vectors. The macroblock partitions chosen for each area are shown superimposed on the residual frame. little change between the frames (residual appears grey) a 16x16 partition is chosen detailed motion (residual appears black or white) smaller partitions are more efficient. Residual (no motion compensation)

40 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Effects (1/2) Residual (no motion compensation) Frame F n Frame F n-1 Residual (16x16 bock size)

41 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Effects (2/2) Residual (4x4 bock size; half pixel) Residual (8x8 bock size)Residual (4x4 bock size) Residual (4x4 bock size; quarter pixel)

42 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Example (1/2) Residual F n – F n-1 (no motion compensation) Frame F n Reconstructed reference Frame F n-1 16x16 Motion Vector Field

43 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Example (2/2) Motion compensation reference frameMotion compensation residual frame

44 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Motion Estimation Accuracy The accuracy of motion compensation is in units of one quarter of the distance between luma samples. Integer-sample position the prediction signal consists of the corresponding samples of the reference picture Non integer-sample position the corresponding sample is obtained using interpolation to generate non-integer positions. The prediction values at half-sample positions are obtained by applying a one-dimensional 6-tap FIR Wiener filter horizontally and vertically

45 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Motion Estimation Accuracy Half sample positions (aa, bb, b, s, gg, hh and cc, dd, h, m, ee, ff) are derived by first calculating intermediate values Ex.:b 1 = ( E – 5 F + 20 G + 20 H – 5 I + J ) h 1 = ( A – 5 C + 20 G + 20 M – 5 R + T ) b = (b ) >>5 h = (h ) >> 5 Position j j1 = cc 1 – 5 dd h m 1 – 5 ee 1 + ff 1 j = ( j ) >> 10 Quarter sample positions (a, c, d, n, f, i, k, q) are derived by averaging with upward rounding of the two nearest samples at integer and half sample positions Ex.:a = ( G + b + 1 ) >> 1 Quarter sample positions (e, g, p, r) are derived by averaging with upward rounding of the two nearest samples at half sample positions in the diagonal direction as, for example, by Ex.:e = ( b + h + 1 ) >> 1

46 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Motion Estimation Accuracy The prediction values for the chroma component are always obtained by bi-linear interpolation. For chroma the resolution is halved (4:2:0) therefore the motion compensation precision is down to one-eighth pixel ¼ pixels accuracy a = round{[(8-d x )·(8-d x )·A]+d x ·(8-d y )·B+(8-d x )·d y · C+d x · d y · D ]/64} Ex.: a = round[(30A+10B+18C+6D)/64]

47 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Multi-picture Prediction Multi-picture motion compensation using previously-encoded pictures as references allows up to 32 reference pictures to be used in some cases Very significant bit rate reduction for scenes with rapid repetitive flashing back-and-forth scene cuts uncovered background areas

48 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Inter-frame Prediction in B Slices The concept of B slices is generalized in H.264/AVC Other pictures can reference pictures containing B slices for motion- compensated prediction Some macroblocks or blocks may use a weighted average of two distinct motion-compensated prediction values for building the prediction signal In B slices, four different types of inter-picture prediction are supported: list 0 (backward) list 1 (forward) bi-predictive: weighted average of motion-compensated list 0 and list 1 prediction signals direct prediction: inferred from previously transmitted syntax elements It is also possible to have both motion predictions from past, or both motion predictions from future.

49 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Transform: types Each residual macroblock is transformed, quantized and coded H.264 uses a smaller size transform H.264 uses three transforms depending on the type of residual data that has to be coded a 4x4 transform for the luma DC coefficients in Intra_16x16 macroblocks a 2x2 transform for the chroma DC coefficients a 4x4 transform for all other blocks Adaptive block size transform mode Further transforms are (eventually) chosen depending on the motion compensation block size (4x8, 8x4, 8x8, 16x8, etc)

50 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Transform: Order For a 16x16 Intra mode coded Macroblock -1 Block DC coefficient of each 4x4 luma block 0-15 Blocks Luma residual blocks Blocks DC coefficients from the Cb and Cr components Bloks Chroma residual blocks Coding of smooth areas

51 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC 4x4 residual transform This transform operates on 4x4 blocks of residual data (labelled 0-15 and 18-25) Fundamental differences from DCT transform: integer transform fully specified inverse transform mismatch between encoders and decoders should not occur requires only additions and shifts scaling multiplication integrated into the quantizer transform and quantization can be carried out using 16-bit integer arithmetic Reasons better prediction method less spatial correlation in the residual visual benefits: less noise around edges less computations and a smaller processing wordlength

52 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Transform: Order The 4x4 DCT of an input array X is given by: CXC T core 2-D transform E matrix of scaling factors d=c/b

53 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Transform: Order To simplify the implementation of the transform, while keeping it orthogonal, the parameters must be modified The inverse transform is given by:

54 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Quantization Requirements avoid division and/or floating point arithmetic incorporate the post- and pre-scaling matrices E f and E i described above A total of 52 values of Qstep are supported by the standard and these are indexed by a Quantization Parameter, QP. Qstep doubles in size for every increment of 6 in QP Qstep increases by 12.5% for each increment of 1 in QP QP may be different for luma and chroma (an offset can be signalled in a Picture Parameter Set)

55 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC DC Components Transform Luma DC coefficient (Intra_4x4 block) 4x4 Hadamard transform: Chroma DC coefficient (any block) 2x2 Hadamard transform:

56 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Scanning The quantized transform coefficients of a block generally are scanned in a zig-zag fashion and transmitted using entropy coding methods The 2x2 DC coefficients of the chroma component are scanned in raster- scan order and transmitted using entropy coding methods

57 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC De-Blocking Filter Benefits of the application to every decoded macroblock: reduced blocking distortion block improved image appearance better motion-compensated prediction of further frames Filtering is applied to vertical or horizontal edges of 4x4 blocks: 1) Filter 4 vertical boundaries of the luma component (a,b,c,d) 2) Filter 4 horizontal boundaries of the luma component (e,f,g,h) 3) Filter 2 vertical boundaries of each chroma component (i,j) 4) Filter 2 horizontal boundaries of each chroma component (k,l)

58 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Filtering Choice The choice of filtering outcome depends on boundary strength gradient across the boundary BSRule 4p or q is intra coded and boundary is a macroblock boundary 3p or q is intra coded and boundary is not a macroblock boundary 2neither p or q is intra coded; p or q contain coded coefficients 1neither p or q is intra coded; neither p or q contain coded coefficients; p and q have different reference frames or a different number of reference frames or different motion vector values 0neither p or q is intra coded; neither p or q contain coded coefficients; p and q have same reference frame and identical motion vectors no filtering The filter is stronger where there is likely to be significant blocking distortion (high values of BS)

59 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Filter decision A group of samples is filtered only if BS > 0 and the following conditions is satisfied: |p 0 -q 0 | < (QP) |p 1 -p 0 | < (QP) |q 1 -q 0 | < (QP) with (QP)< (QP) Small QP Anything other than a very small gradient across the boundary is likely to be due to image features low (QP) and (QP) value Large QP blocking distortion is likely to be more significant high (QP) and (QP) value, so that more filtering takes place.

60 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Filter implementation I. BS=1,2,3 A 4-tap linear filter is applied with inputs p 1, p 0, q 0 and q 1, producing filtered outputs P 0 and Q 0 For luma only oIf |p 2 -p 0 | < (QP), a 4-tap linear filter is applied with inputs p 2, p 1, p 0 and q 0, producing filtered output P 1 oIf |q 2 -q 0 | < (QP), a 4-tap linear filter is applied with inputs q 2, q 1, q 0 and p 0, producing filtered output Q 1 II. BS=4 If |p 2 -p 0 |< (QP) and |p 0 -q 0 |< (QP) /4 P 0 is produced by 5-tap filtering of p 2, p 1, p 0, q 0 and q 1 P 1 is produced by 4-tap filtering of p 2, p 1, p 0 and q 0 Luma P 2 is produced by 5-tap filtering of p 3, p 2, p 1, p 0 and q 0 else: P 0 is produced by 3-tap filtering of p 1, p 0 and q 1 (the same for q i pixels)

61 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC De-blocking: Example Original Frame Reconstructed Frame; QP=32; No Filter Reconstructed Frame; QP=32; With Filter Reconstructed Frame; QP=36; No Filter Reconstructed Frame; QP=36; With Filter

62 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Entropy Coding Parameters that require to be encoded and transmitted Macroblock Type (Prediction method) QP Motion data: Reference frame and Motion Vector Coded block pattern (blocks containing coded coefficients) Residual Data H.264/MPEG-4 AVC uses a number of techniques for entropy coding: Exp-Golomb codes All syntax elements except the quantized transform coefficients. Context Adaptive Variable Length Coding (CAVLC) Quantized transform coefficients Context Adaptive Binary Arithmetic Coding (CABAC).

63 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Exp-Golomb Codes Exp-Golomb codes (Exponential Golomb codes) are variable length codes with a regular construction. uses a single infinite-extent codeword table only the mapping to the single codeword table is customized according to the data statistics Each codeword is constructed as follows: [M zeros][1][INFO]codeword length (2M+1) bits Encoding M = round[log 2 (code_num+1)] INFO = code_num+1-2 M Decoding Code_num = 2 M +INFO-1

64 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Exp-Golomb Codes: Value mapping Each element v is assigned a type reflecting how data is to be mapped Unsigned Exponential ue(v) Maping: code_num = v Used for macroblock type, reference frame index Signed Exponential se(v) Mapping ocode_num = 2|v|(v < 0) ocode_num = 2|v| - 1(v > 0) Used for motion vector difference, QP. Mapped Exponential me(v) Mapping specified in the standard Used for the Coded block patterns. Truncated Exponential te(v) Each mapping is designed to produce short codewords for frequently occurring values and longer codewords for less common parameter values. Table for Inter predicted macroblocks: coded_block_pattern indicates which 8x8 blocks in a macroblock contain non-zero coefficients

65 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC CAVLC Method for coding residual, zig-zag ordered transformed blocks VLC tables for various syntax elements are switched depending on already transmitted syntax elements CAVLC takes advantage of several characteristics of quantized 4x4 blocks: Sparse blocks (containing mostly zeros) Highest non-zero coefficients after the zig-zag scan are often sequences of +/-1 (Trailing 1s or T1s) The number of non-zero coefficients in neighbouring blocks is correlated The level (magnitude) of non-zero coefficients tends to be higher at the start of the reordered array (near the DC coefficient) and lower towards the higher frequencies

66 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC CAVLC encoding CAVLC encoding of a block of transform coefficients proceeds as follows: 1.Encode the number of coefficients and trailing ones in a 4x4 block 2.Encode the sign of each T1 3.Encode the levels of the remaining non-zero coefficients 4.Encode the total number of zeros before the last coefficient 5.Encode each run of zeros

67 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC CAVLC: Steps (1/3) Step 1: Encode the number of coefficients (TotalCoeffs) and T1s TotalCoeffs range from 0 to 16 (in a 4x4 block) T1s range from 0 to 3 (max value allowed) Coding: 4 choices for look-up table (coeff_token) Num_VCL0 biased towards small numbers of coefficients Num_VCL1 biased towards medium numbers of coefficients Num_VCL2 biased towards higher numbers of coefficients Num_FLC fixed 6-bit length code The choice depends on the number of non-zero coefficients in upper and left-hand previously coded blocks N U and N L (context adaptivity) U,L available N=(N U +N L )/2 U available N=N U L available N=N L U,L unavailable N=0

68 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC CAVLC: Steps (2/3) Step 2: Encode the sign of each T1. For each T1 up to three, a single bit encodes the sign (0=+, 1=-). Encoding is in reverse order, starting with the highest-frequency T1 Step 3: Encode the levels of the remaining non-zero coefficients. The level (sign and magnitude) of each remaining non-zero coefficient in the block is encoded in reverse order Encoding VLC table depends on successive coded level (context adaptivity) 1.Initialise the table to Level_VLC0 2.Encode highest-frequency coefficient 3.If magnitude is larger than a threshold, move up to the next VLC table

69 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC CAVLC: Steps (2/3) Step 4: Encode the total number of zeros before the last coefficient Coding with a VLC of the number of all zeros preceding the highest non- zero coefficient. Step 5:Encode each run of zeros The number of zeros preceding each non-zero coefficient (run_before) is encoded in reverse order. The VLC for each run of zeros is chosen depending on the number of zeros that have not yet been encoded (ZerosLeft) value of run_before parameter.

70 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC CAVLC: example (1/2) 4x4 Block Reordered Block: 0,3,0,1,-1,-1,0,1,0,0,0,0,0,0,0,0 TotalCoeff = 5; TotalZeros=3; T1s =3 (max value) Encoding Transmitted bitstream (24 bits)

71 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC CAVLC: example (2/2) Decoding Input string: Values added to the output array at each stage are underlined The decoder has inserted two zeros; however, TotalZeros is equal to 3 and so another 1 zero is inserted before the lowest coefficient, making the final output array: 0,3,0,1,-1,-1,0,1,…

72 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC CABAC CABAC uses arithmetic coding effective use of probability models of occurrence of symbols Particularly beneficial for symbol probabilities greater than 0.5 The use of adaptive codes permits adaptation to non-stationary symbol statistics easily adapt to changing statistical characteristics of the data to be coded Context modelling: The statistics of already-coded syntax elements are used to estimate the conditional probabilities CABAC provides a reduction in bit-rate between 5% to 15% over CAVLC, when coding TV signals at the same quality.

73 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC CABAC basis Encoding with CABAC consists of three stages Binarization needed for syntax elements that are non-binary valued Context modeling a model is selected such that the choice may depend on previous encoded syntax elements or bins Adaptive binary arithmetic coding coding of bin sequence with updating of probabilities takes place

74 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Binarization Requirements for a successful context coding method fast and accurate estimation of conditional probabilities the computational complexity must be kept at a minimum Pre-processing step to reduce the alphabet size of the syntax elements The design of binarization schemes in CABAC (mostly) relies on a few basic code trees Unary or Truncated Unary code x>0 x 1 bits plus a terminating 0 bit (unary) 0

75 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Context Modeling A model probability distribution is assigned to the given symbols, to drive the actual coding engine to generate a sequence of bits as a coded representation of the symbols Define a modeling function F:T C operating on a set T of past symbols For each symbol x to be coded, a conditional probability p(x|F(z)) is estimated according to the already coded neighboring symbols z T After encoding x, the probability model is updated with the value of the encoded symbol x Context template consisting of two neighboring syntax elements A and B to the left and on top of the current syntax element C

76 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Profiles and Levels Profile Defines a set of coding tools or algorithms that can be used in generating a conforming bit-stream Level Places constraints on certain key parameters of the bitstream Decoders conforming to a profile must support all features in that profile In H.264/AVC, three profiles are defined (plus extensions) Baseline Main Extended Profile Fidelity Range Extensions: four HighProfile versions (High, High 10, High 4:2:2, and High 4:4:4), for high quality uses Fifteen levels are defined specifying upper limits for picture size decoder-processing rate size of the multi-picture buffers video bit rate video buffer size

77 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Profile features Baseline profile supports all features in H.264/AVC except the following two feature sets: Set 1 oB slices oWeighted prediction oCABAC oField coding oPicture or macroblock adaptive switching between frame and field coding Set 2 oSP/SI slices Main profile supports Set 1, and does not support the FMO and redundant pictures features Extended Profile supports all features except for CABAC.

78 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Profile Map

79 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Application Areas Conversational services operate typically below 1 Mbps; low latency requirements Baseline profile for following application H.320 conversational services with circuit-switched ISDN-based video conferencing 3GPP conversational H.324/M services H.323 conversational services over the Internet with best effort IP/RTP protocols. 3GPP conversational services using IP/RTP for transport and SIP for session set-up. Entertainment video applications operate between 1-8 Mbps; moderate latency (0.5 to 2 seconds) Main profile for following application Broadcast via satellite, cable, terrestrial, or DSL DVD for standard and high-definition video Video on demand via various channels. Streaming services operate at 50 kbps–1.5 Mbps; latency of 2 or more seconds Baseline or Extended profile; differences based on the use for wired or wireless environments as follows: 3GPP streaming using IP/RTP for transport and RTSP Baseline profile Streaming over the wired Internet using IP/RTP protocol and RTSP for session set- up Extended profile

80 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Differences

81 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Complexity of Codec Design Codec design includes relaxation of traditional bounds on complexity (memory & computation) – rough guess 2-3x decoding power increase relative to MPEG-2, 3-4x encoding Problem areas: Smaller block sizes for motion compensation (cache access issues) Longer filters for motion compensation (more memory access) Multi-frame motion compensation (more memory for reference frame storage) More segmentations of macroblock to choose from (more searching in the encoder) More methods of predicting intra data (more searching) Arithmetic coding (adaptivity, computation on output bits)

82 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Comparison (1/2) Bit-rate [kbit/s] Foreman QCIF 10Hz MPEG-2 H.263 MPEG-4 JVT/H.264/AVC Quality Y-PSNR [dB]

83 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Comparison (2/2) Tempete CIF 30Hz Bit-rate [kbit/s] Quality Y-PSNR [dB] MPEG-2 H.263 MPEG-4 JVT/H.264/AVC

84 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Test Set Results for Perceptual Quality Informal perceptual tests At the same PSNR, people generally prefer JVT Small motion compensation block size (breaks up block structure) Small transform block size (breaks up block structure, reduces ringing) In-loop deblocking filter By how much? Needs further study No rigorous testing reported 10-15% might be a good guess

85 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC References IEEE Transactions on Circuits and Systems for Video Technology (2003): Special Issue on the H.264/AVC Video Coding Standard Signal Processing: Image Communication (2004): Video coding using the H.264/MPEG-4 AVC compression standard


Download ppt "Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC Università degli studi Roma Tre Overview of the H.264AVC video coding standard Maiorana."

Similar presentations


Ads by Google