City University of Hong Kong Moving Picture Expert Group - Established in 1988 by the Joint ISO/IEC Technical Committee on IT. Mission - To develop standards for coded representation of motion pictures and audio at a bit rate of up to 1.5Mb/s. MPEG-1 was issued in MPEG-2 (1994) - higher quality (not lower than NTSC and PAL) with bit rates between 2-10Mb/s. Applications - Digital CATV and Terrestrial digital broadcasting distribution, Video recording and retrieval.
City University of Hong Kong Lossy compression Trade off image quality with bit rate according to objective or subjective criteria Video sequences usually contains large statistical redundancies in both temporal and spatial directions Intraframe coding Interframe coding Subsampling of Chrominance - Human eye is more sensitive to luminance than chrominance
City University of Hong Kong Encoding of a single picture Similar to JPEG Discrete Cosine Transform- Converts spatial to frequency domain Quantization of spectral coefficients DPCM to encode DC terms Zigzag scan to group zeros into long sequences, followed by run-length coding Lossless, Variable Length Coding to encode AC coefficients
City University of Hong Kong Remove temporal redundancies between frames Use extensively in MPEG-1 and MPEG-2 Based on estimation of motion between video frames Use of motion vectors to describe displacement of pixels from one frame to the next Spatial correlation between motion vectors are high One motion vector can represent the motion of a block of pixels.
City University of Hong Kong Current frame Previous frame Figure 1 For each image block in the current frame, Find its nearest counterpart in the previous frame. Record the displacement vector
City University of Hong Kong Figure 2 mv Frame N-1Frame N Search Window Previous Block Location Current Block Location Only the prediction error (residual) images are transmitted Good prediction reduces information content in residual images
Partition the previous and the current images into non- overlapping square blocks of size NxN Previous frameCurrent frame
Previous frameCurrent frame e.g., N=8 Partition the previous and the current images into non- overlapping square blocks of size NxN
Represent each block with a 2D matrix: f(x,y) for previous frame g(x,y) for current frame Previous frameCurrent frame Partition the previous and the current images into non- overlapping square blocks of size NxN
Represent each block with a 2D matrix: f(x,y) for previous frame g(x,y) for current frame Previous frameCurrent frame Partition the previous and the current images into non-overlapping square blocks of size NxN Difference between any two blocks is given by
Previous frameCurrent frame Partition the previous and the current images into non-overlapping square blocks of size NxN The lower the difference, the more similar is the pair of blocks Represent each block with a 2D matrix: f(x,y) for previous frame g(x,y) for current frame
A motion vector is computed for ‘EVERY’ blocks in the current frame. HOW? Previous frameCurrent frame
A motion vector is computed for ‘EVERY’ blocks in the current frame. HOW? Previous frameCurrent frame Each block in the current frame matched against all the blocks in the previous frame, the closest one is taken to be its counterpart.
MV=(-2,-3) x y Previous frameCurrent frame Each block in the current frame matched against all the blocks in the previous frame, the closest one is taken to be its counterpart.
MV=(-1,-3) x y Previous frameCurrent frame Each block in the current frame matched against all the blocks in the previous frame, the closest one is taken to be its counterpart.
MV=(-1,-2) x y Previous frameCurrent frame Each block in the current frame matched against all the blocks in the previous frame, the closest one is taken to be its counterpart.
The method is slow, especially if the image resolution and N are large. Previous frameCurrent frame
A motion vector is computed for ‘EVERY’ blocks in the near neighborhood of the current frame. Previous frameCurrent frame For example, only the blocks that are adjacent to the current one is tested. The method is faster but the search area is restricted
Search Window Previous frameCurrent frame For example, only the blocks that are adjacent to the current one is tested. The method is faster but the search area is restricted A motion vector is computed for ‘EVERY’ blocks in the near neighborhood of the current frame.
Assumption: changes between frames are small and are restricted within the search window. Search Window However, the search time is still long Previous frameCurrent frame A motion vector is computed for ‘EVERY’ blocks in the near neighborhood of the current frame.
Given a block in the current frame, search the best match in the previous frame along the vertical direction Search Window Previous frameCurrent frame
Search Window Best Match Previous frameCurrent frame Given a block in the current frame, search the best match in the previous frame along the vertical direction
Search the best match in the previous frame along the horizontal direction Search Window Solution Previous frameCurrent frame
Search Window Solution Non-optimal solution with the assumption of smooth intensity distribution Previous frameCurrent frame Search the best match in the previous frame along the horizontal direction
F Applications - multimedia and video transmission F Based on JPEG and H.261 standards F Flexible picture size and frame rate specified by users F Video source - Non-interlaced video signals. F Minimum requirements on decoders –resolution of 720X576 –30 frames/s –1.86Mb/s F Applications - multimedia and video transmission F Based on JPEG and H.261 standards F Flexible picture size and frame rate specified by users F Video source - Non-interlaced video signals. F Minimum requirements on decoders –resolution of 720X576 –30 frames/s –1.86Mb/s City University of Hong Kong
Layer structure in MPEG bitstream City University of Hong Kong F Sequence F Group Of Pictures (GOP) F Picture F Slice F Macroblock F Block F Sequence F Group Of Pictures (GOP) F Picture F Slice F Macroblock F Block
Video Sequence City University of Hong Kong Group Of Pictures A Picture A Slice Macroblock Block
F Partitioning of images into Macroblocks (MB) F Intraframe coding on one out of every K images F Motion estimation on MBs F Generate (K-1) predicted frames F Encode residual error images F Conditional Replenishment of Macroblocks F Partitioning of images into Macroblocks (MB) F Intraframe coding on one out of every K images F Motion estimation on MBs F Generate (K-1) predicted frames F Encode residual error images F Conditional Replenishment of Macroblocks City University of Hong Kong
F An image is partitioned into Macroblocks of size 16X16 F 1 MB = 4 luminance (Y) and 2 chrominance blocks (U,V) F The sampling ratio between Y, U and V is 4:1:1 F An image is partitioned into Macroblocks of size 16X16 F 1 MB = 4 luminance (Y) and 2 chrominance blocks (U,V) F The sampling ratio between Y, U and V is 4:1:1 City University of Hong Kong
Figure 3
Chrominance format: 4:4:4, 4:2:2, 4:1:1, 4:2:0 Chrominance format: 4:4:4, 4:2:2, 4:1:1, 4:2:0 Assuming 8bits for Y, U and V components 4:4:4 4*8 (Y) + 4*8 (U) + 4*8 (V) = 96 bits 4*8 (Y) + 4*8 (U) + 4*8 (V) = 96 bits Bits per pixel = 96/4 = 24 bpp Bits per pixel = 96/4 = 24 bpp
Chrominance format: 4:4:4, 4:2:2, 4:1:1, 4:2:0 Chrominance format: 4:4:4, 4:2:2, 4:1:1, 4:2:0 Assuming 8bits for Y, U and V components 4:2:2 4*8 (Y) + 2*8 (U) + 2*8 (V) = 64 bits 4*8 (Y) + 2*8 (U) + 2*8 (V) = 64 bits Bits per pixel = 64/4 = 16 bpp Bits per pixel = 64/4 = 16 bpp
Chrominance format: 4:4:4, 4:2:2, 4:1:1, 4:2:0 Chrominance format: 4:4:4, 4:2:2, 4:1:1, 4:2:0 Assuming 8bits for Y, U and V components 4:1:1 4*8 (Y) + 1*8 (U) + 1*8 (V) = 48 bits 4*8 (Y) + 1*8 (U) + 1*8 (V) = 48 bits Bits per pixel = 48/4 = 12 bpp Bits per pixel = 48/4 = 12 bpp
Chrominance format: 4:4:4, 4:2:2, 4:1:1, 4:2:0 Chrominance format: 4:4:4, 4:2:2, 4:1:1, 4:2:0 Assuming 8bits for Y, U and V components 4:2:0 4*8 (Y) + 1*8 (U) + 1*8 (V) = 48 bits 4*8 (Y) + 1*8 (U) + 1*8 (V) = 48 bits Bits per pixel = 48/4 = 12 bpp Bits per pixel = 48/4 = 12 bpp
F DCT F Weighted (I-frame)/Uniform (P-frame) Quantization F DPCM on DC terms F Zigzag scan + runlength + VLC F DCT F Weighted (I-frame)/Uniform (P-frame) Quantization F DPCM on DC terms F Zigzag scan + runlength + VLC City University of Hong Kong
Macroblock Block to be encoded DCT Q sz DPCM DC AC ZigZag Scanning Runlength Encoding VLC sz: Step Size JPEG encoded DC JPEG encoded AC Figure 4
F Previous I or P frame is stored in both encoder and decoder F Motion Compensation is performed on a macroblock basis F One motion vector (mv) is generated for each macroblock F The mvs are coded and transmitted to the receiver F Previous I or P frame is stored in both encoder and decoder F Motion Compensation is performed on a macroblock basis F One motion vector (mv) is generated for each macroblock F The mvs are coded and transmitted to the receiver
F Motion prediction error of pixels in each macroblock is calculated F Error blocks (size 8X8) are encoded in the same manner as those in the I-Picture F A video buffer plus step size adjustment maintain a constant target bit-rate F Motion prediction error of pixels in each macroblock is calculated F Error blocks (size 8X8) are encoded in the same manner as those in the I-Picture F A video buffer plus step size adjustment maintain a constant target bit-rate
Current signal x(n) is predicted from previous sample x(n-1). Predicted value is x p (n) Predicted error e(n)=x(n)-x p (n) is compressed (encode) and transmitted The encoded error is decoded and added back to x p (n) to reconstruct the current signal x(n). However there are loss in the codec and the reconstructed signal x r (n) is not identical to x(n) x r (n) is taken to predicted the next sample x(n+1)
City University of Hong Kong Figure 5 Block to be encoded 8 8 DCTQ sz RLC VLC MC + FRAME STORE VB CONTROL Q DCT ENCODED RESIDUAL ERROR MOTION VECTOR
City University of Hong Kong F I-Pictures are encoded independently F I-Pictures can therefore be used as access point for random access, fast-forward (FF) or fast-reverse (FR) F P-Pictures cannot be decoded alone, hence cannot be used as an access point F B-Pictures are constructed with the nearest I or P Pictures F Backward prediction requires the presence of the start and end frames, both can be used as access points F I-Pictures are encoded independently F I-Pictures can therefore be used as access point for random access, fast-forward (FF) or fast-reverse (FR) F P-Pictures cannot be decoded alone, hence cannot be used as an access point F B-Pictures are constructed with the nearest I or P Pictures F Backward prediction requires the presence of the start and end frames, both can be used as access points
Figure 6a IPPBBBB Order of Coding
Figure 6b IIIIIII IIIPPPP IIIPBPB CompressionRandom AccessCoding Delay I Pictures onlyLowHighestLow I and P PicturesMediumLowMedium I, P and B PicturesHighMediumHigh
City University of Hong Kong F Only macroblocks that have been changed in the decoder are updated F Three types of MB are classified in MPEG standard F Skipped MB - Zero motion vector, the MB is neither encoded nor transmitted F Inter MB - Motion Prediction is valid, the MB type and address, motion vector and the coded DCT coefficients are transmitted F Intra MB - Encoded DCT coefficients of the MB are transmitted. No Motion Compensation is used F Only macroblocks that have been changed in the decoder are updated F Three types of MB are classified in MPEG standard F Skipped MB - Zero motion vector, the MB is neither encoded nor transmitted F Inter MB - Motion Prediction is valid, the MB type and address, motion vector and the coded DCT coefficients are transmitted F Intra MB - Encoded DCT coefficients of the MB are transmitted. No Motion Compensation is used
Pred-mcq Pred-mc Pred-m Intra-q Intra-d Pred-cq Pred-c Skipped Q No Q No C C Q No Q I Non I No C C Q No Q MC No MC MB
Pred-mcq Pred-mc Pred-m Intra-q Intra-d Pred-cq Pred-c Skipped Q No Q No C C Q No Q I Non I No C C Q No Q MC No MC MB Non-zero motion vector, Error coded with defined quantization
Pred-mcq Pred-mc Pred-m Intra-q Intra-d Pred-cq Pred-c Skipped Q No Q No C C Q No Q I Non I No C C Q No Q MC No MC MB Non-zero motion vector, Error coded with default quantization
Pred-mcq Pred-mc Pred-m Intra-q Intra-d Pred-cq Pred-c Skipped Q No Q No C C Q No Q I Non I No C C Q No Q MC No MC MB Non-zero motion vector, Error not coded
Pred-mcq Pred-mc Pred-m Intra-q Intra-d Pred-cq Pred-c Skipped Q No Q No C C Q No Q I Non I No C C Q No Q MC No MC MB Macroblock intra- coded with defined quantization
Pred-mcq Pred-mc Pred-m Intra-q Intra-d Pred-cq Pred-c Skipped Q No Q No C C Q No Q I Non I No C C Q No Q MC No MC MB Macroblock intra- coded with default quantization
Pred-mcq Pred-mc Pred-m Intra-q Intra-d Pred-cq Pred-c Skipped Q No Q No C C Q No Q I Non I No C C Q No Q MC No MC MB MV = 0 (not predicted) Error coded with defined quantization
Pred-mcq Pred-mc Pred-m Intra-q Intra-d Pred-cq Pred-c Skipped Q No Q No C C Q No Q I Non I No C C Q No Q MC No MC MB MV = 0 (not predicted) Error coded with default quantization
Pred-mcq Pred-mc Pred-m Intra-q Intra-d Pred-cq Pred-c Skipped Q No Q No C C Q No Q I Non I No C C Q No Q MC No MC MB Macroblock copied from predictor picture
Pred-f/b/i cq Pred-f/b/i c Skipped Pred-f/b/i Intra-q Intra-d Q No Q No C C Q No Q I Non I MB
Reverse process of the encoder Figure 7
City University of Hong Kong F A superset of MPEG-1 and backward compatible to the latter F Support interlaced video signals F Scalable video-coding property, can be decoded by receivers with different capabilities F Permits partial implementation defined by Profiles and Levels F A Profile defines a new set of algorithms added as a superset to the algorithms in the profile that follows F A Level specifies the range of parameters supported by the implementation F A superset of MPEG-1 and backward compatible to the latter F Support interlaced video signals F Scalable video-coding property, can be decoded by receivers with different capabilities F Permits partial implementation defined by Profiles and Levels F A Profile defines a new set of algorithms added as a superset to the algorithms in the profile that follows F A Level specifies the range of parameters supported by the implementation
City University of Hong Kong
Figure 8
City University of Hong Kong FMain Profile: MPEG-2 Non-scalable Coding Mode FA straightforward extension of MPEG-1 to accomodate interlaced video signals FField/Frame Macroblocks FTwo types of prediction êFrame Prediction: Prediction based on one or more previously decoded frames êField Prediction : Prediction of individual field based on one or more previously decoded field FMain Profile: MPEG-2 Non-scalable Coding Mode FA straightforward extension of MPEG-1 to accomodate interlaced video signals FField/Frame Macroblocks FTwo types of prediction êFrame Prediction: Prediction based on one or more previously decoded frames êField Prediction : Prediction of individual field based on one or more previously decoded field
City University of Hong Kong Figure 9a The four sub-blocks of a Frame Macroblock A stationary scene image o e o e o e o e
City University of Hong Kong Figure 9b The four sub-blocks of a Frame Macroblock A moving scene image o e o e o e o e
City University of Hong Kong FThe object shape is changed with motion because of the interlacing mechanism FSame object may appear different in successive frames because of the above reason - prediction is not accurate FSimple image patterns may become complicated FMore AC coefficients are required to describe each component in the frame Macroblock FThe object shape is changed with motion because of the interlacing mechanism FSame object may appear different in successive frames because of the above reason - prediction is not accurate FSimple image patterns may become complicated FMore AC coefficients are required to describe each component in the frame Macroblock
City University of Hong Kong Figure 9c The four sub-blocks of a Frame Macroblock A moving scene image o e o e o e o e
City University of Hong Kong Compute Field-based Variance Compute Frame-based Variance If Field-based Variance < Frame-based Variance MB coded with Field-based DCT
City University of Hong Kong var1=0; for(m=0;m<COL;m++) for(n=0;n<ROW-2;n++) { D1=x(m,n)-x(m,n+1); D2=x(m,n+1)-x(m,n+2) var1+=(D1*D1)+(D2*D2); } o e o e n ROW-1
City University of Hong Kong var1=0; for(m=0;m<COL;m++) for(n=0;n<ROW-2;n++) { D1=x(m,n)-x(m,n+2); D2=x(m,n+1)-x(m,n+3) var1+=(D1*D1)+(D2*D2); } o e o e n ROW-1
FA top field is predicted from either previously coded top or bottom field with Motion Compensation (MC) FBottom fields are predicted from previously coded top field with MC FCombine frame and field prediction is used in MPEG-2 FA top field is predicted from either previously coded top or bottom field with Motion Compensation (MC) FBottom fields are predicted from previously coded top field with MC FCombine frame and field prediction is used in MPEG-2
City University of Hong Kong FProvide interoperability between different services and systems FBase layer - encodes downscaled video at reduced bitstream FEnhance layer - encodes the difference between original signal and the upscaled base-layer video FProvide interoperability between different services and systems FBase layer - encodes downscaled video at reduced bitstream FEnhance layer - encodes the difference between original signal and the upscaled base-layer video
Figure 10
City University of Hong Kong FA 2-layer DCT, VLC and MC encoder FBoth layers encoded video signal at the same resolution FBase layer - DCT coefficients are coarsely quantized and is protected from transmission error FEnhancement layer: DCT coefficients are finely quantized and their difference with the base layer is transmitted FA 2-layer DCT, VLC and MC encoder FBoth layers encoded video signal at the same resolution FBase layer - DCT coefficients are coarsely quantized and is protected from transmission error FEnhancement layer: DCT coefficients are finely quantized and their difference with the base layer is transmitted
x Quantizer Quantized level L De-quantizer x Q =s * L Step size s
x Quantizer Quantized level L De-quantizer x Q =s * L Step size s Quantization error
x Quantizer Quantized level L De-quantizer x Q =S * L Coarse step size S Quantization error E Quantizer De-quantizer Fine step size s LRLR E Q =s * L Q E Q can be used to compensate error in x Q
Figure 11
FTemporal prediction from previous frame FEstimation of mv of lost MB from neighbouring Mbs FAdd mvs to MBs of I-Frame for error concealment F2 Layer Coding using Data Partitioning, Spatial and Frequency scalability FTemporal prediction from previous frame FEstimation of mv of lost MB from neighbouring Mbs FAdd mvs to MBs of I-Frame for error concealment F2 Layer Coding using Data Partitioning, Spatial and Frequency scalability City University of Hong Kong