Introduction to JPEG, MPEG 1/2, and H.261/H.263 Chuan-Yu Cho
Outline Video/Image Compression Still Image Compression –JPEG/ JPEG 2000 'Joint Photographic Experts Group ‘ Video Compression –H.261, H.263, H.263+, MPEG-1, MPEG-2, MPEG-4, MPEG-7, MPEG-21.
Still Image Coding JPEG, JPEG2000
Image/Video Redundancy Spatial redundancy A B
Transform coding Encoder Decoder TQ Entropy coding Entropy coding Q -1 T -1 Image block Transform Coefficients Zigzag Scan (2D->1D) Bitstream Inverse Zigzag Scan (1D->2D) Reconstructed Transform Coefficients Reconstructed Image block
Block-Based Coding Why divide to blocks? Image->Blocks
–3 1 –3 2 –6 2 –4 1 – – –1 –1 EOB 2D->1D Number->binary -26 –3 1 –3 2 –6 2 –4 1 – – –1 –1 EOB /16 = -26 Example of JPEG Coding(Encoder) Transform coding(DCT) Quantization Zigzag Scan Entropy Coding (bit stream)
–3 1 –3 2 –6 2 –4 1 – – –1 –1 EOB 1D->2D Binary->number –3 1 –3 2 –6 2 –4 1 – – –1 –1 EOB Example of JPEG Coding(decoder) Inverse Entropy Coding (bit stream) Inverse Zigzag Scan Inverse Quantization Inverse Transform coding(DCT)
Transform coding Encoder Decoder TQ Entropy coding Entropy coding Q -1 T -1 Image block Transform Coefficients Zigzag Scan (2D->1D) Bitstream Inverse Zigzag Scan (1D->2D) Reconstructed Transform Coefficients Reconstructed Image block
Transform (0,1) (1,0) (-1,1)(1,1) (0.2,1.8) = 0.2(1,0)+1.8(0,1) = 1(1,1)+0.8(-1,1)
Basis of Transform Basis vectors{v 1,v 2, …,v n } Orthogonal : (v i ) · (v j ) = 0 if i!=j Normalized : (v i ) · (v i ) = 1 Orthonormal : orthogonal and normalized –eg. orthonormal : {(0,1),(1,0)} Orthogonal : {(1,1),(-1,1)}
Why DCT is used for image compressing KLT(Karhunen-Loeve transform): –Statistically optimal transform: minimal MSE for any specific bandwidth reduction –KLT depends on the type of signal statistics –No fast algorithm DCT approaches KLT for highly correlated signals: –sample values typically vary slowly from point to point across an image =>Highly correlated signals –Fast algorithm(but not optimal)
DCT-basis
DCT :Discrete Cosine Transform Frequency DomainSpatial Domain [8,8,8,8,8,8,8,8] [8,8,8,8,8,8,8,9] [8,8,10,9,7,8,8,9] [8,90,-100,3,4,-10,2,80] DCT [44,0,0,0,0,0,0,0] [44,-2,0,-2,0,-2,0,-2] [46,-2,-2,-4,-2,2,0,-2] [48,-56,146,6,74,-148,-158,-136]
DCT Example of JPEG Coding(Encoder)
Transform coding Encoder Decoder TQ Entropy coding Entropy coding Q -1 T -1 Image block Transform Coefficients Zigzag Scan (2D->1D) Bitstream Inverse Zigzag Scan (1D->2D) Reconstructed Transform Coefficients Reconstructed Image block
Quantization 目的:提高壓縮倍率 缺點:還原後的值會有誤差 原則:希望還原後的值,與原值差距較小 再經過較佳的 IQ 再直接乘以 3 ( 一般的 IQ) 經過 Q( 整除以 3) 原值
Quantization(con ’ t) DC term : Uniform quantization AC terms BACK
/16 = -26 Example of JPEG Coding(Encoder)
Example of JPEG Coding(Encoder)
Transform coding Encoder Decoder TQ Entropy coding Entropy coding Q -1 T -1 Image block Transform Coefficients Zigzag Scan (2D->1D) Bitstream Inverse Zigzag Scan (1D->2D) Reconstructed Transform Coefficients Reconstructed Image block
Zigzag Scan 2D->1D DC term AC term BACK
–3 1 –3 2 –6 2 –4 1 – – –1 –1 EOB 2D->1D Example of JPEG Coding(Encoder) Transform coding(DCT) Quantization Zigzag Scan Zigzag Scan Entropy Coding (bit stream)
Transform coding Encoder Decoder TQ Entropy coding Entropy coding Q -1 T -1 Image block Transform Coefficients Zigzag Scan (2D->1D) Bitstream Inverse Zigzag Scan (1D->2D) Reconstructed Transform Coefficients Reconstructed Image block
Entropy Coding (Variable-Length Coding) Huffman coding Run-length coding Arithmetic coding
Huffman Coding 設法讓 ” 出現次數最多 ” 的字 (word) ,使用 最短的代碼 (code) Variable- Length Code Fixed- Length Code 1/24 1/63/4 出現機率 ‘D’‘C’‘B’‘A’ 範例 1*(3/4)+2*(1/6)+3*(1/24)+3 *(1/24) = *(3/4)+2*(1/6)+2*(1/24)+2 *(1/24) = 2 平均長度
DPCM : Differential PCM 若連續出現重複字 (word) 或相近字的機率 很高,則 coding ” 差值 ” 會比個別 coding 每個 字效果好 例如 ‘ AAFFFFFCCC ’ –PCM => ’ 65,65,70,70,70,70,70,67,67,67 ’ or ‘ 0,0,5,5,5,5,5,2,2,2 ’ –DPCM => ’ 0,0,5,0,0,0,0,-3,0,0 ’
Run-Length Coding – EOB ^^^^^^^ ^ ^^^ (3,1) (0,6) (1,3) BACK s s s s s s
Video Coding MPEG I/II, H.261/H.263
Main Ideas of Still Image Coding (Intra Coding) Block-based coding Transform coding (DCT) Quantization Zagzig scan DPCM (Differential PCM) Entropy coding (Variable-length coding) –Huffman coding –Run-length coding –Arithmetic coding
Main Ideas of Video Coding (Inter Coding) Intra coding –Block-based coding, transform coding, quantization, zagzig scan, DPCM, entropy coding Inter coding –Intra coding for residual –Motion estimation/compensation
Image/Video Redundancy Spatial redundancy Temporal redundancy A B A Frame N-1 B Frame N Use A to code B
Video Compression Encoder For Still Image TQ Entropy coding Image block Transform Coefficients Zigzag Scan (2D->1D) Bitstream Encoder For Video Sequence Q -1 T -1 Reconstructed Transform Coefficients Reconstructed Image block MC -
Results of DCT Coding JPEG PSNR (Peak Singal-to-Noise Ratio) MSE (Mean Square Error)
Temporal Redundancy Frame #1Frame #2
Residual Image Frame #2 – Frame #1 =
Results of Motion Compensation Coding PSNR = dB, MSE=6.50, MAE=25 Bits for motion vector = 1002 bits Residual Image Coded Image DCT Coding PSNR = dB Bit Rate = bits/frame Compression ration= = (256 * 256 * 8) / = 23.9
ITU-T Recommendation H.261 (Previously “ CCITT Recommendation ” ) Video Codec for Audiovisual Services at p × 64 kbit/s Geneva, 1990: revised at helsinki, 1993
H.261 v.s. p × 64 The Recommendation H.261 describes the video coding and decoding methods for the moving picture component of audiovisual services(videophone, videoconference, etc.) at the rates of p × 64 kbit/s, where p is in the range 1 to 30. => p × 64 (called p times sixty four) coder
H.261 v.s. MPEG The H.261 specification is already implemented in several manufacturers. Its target is telecommunications at a rate as low as 64 kbits. MPEG is defined for higher bit rate – 0.9 Mbits to 1.5 Mbits and consequently for higher quality.
H.261 Video codec for audiovisual services –ISDN Videophone and video conferencing –Low bit rates, low delay 1984: at m × 384 kbits/s ( m = 1, …, 5) : at p × 64 kbits/s ( p = 1, …, 30)
H.261 Coder DCTQ Inverse DCT Motion Compensation Loop Filter Video in
Motion Estimation For each 16*16 superblock(SB), ME searches the best match in the referenced frame, and returns a motion vector MV = (X,Y). Both X and Y have integer value not exceeding ±15. Only the difference (residual) between the SB and the best match is DCT encoded
Motion Estimation (32,16) (-10,4) (22,20) Referenced frame Current frame
Coding of Motion Vectors Differential coding VLC for MV difference Example: MVDCode… -7& & & & & & & & & & & & … … … …
Motion Compensation(MC) & Motion Estimation (ME) MC is optional for each MB. (MTYPE => MB based) Only one MV for each MB. The ME compares a 16x16 superblock in the luminance block (Y) throughout a small search area of the previously transmitted image. Both horizontal and vertical components of these motion vectors have integer values not exceeding ±15. The MV is used for all 4 Y blocks. The MV for both Cb and Cr is derived by halving the component values of the MB MV. [NOT in H.261] The displacement with the smallest absolute superblock difference, determined by the sum of the absolute values of the pel-to-pel difference throughout the block, is considered the MV for the particular MB
Quantization # of quantizers is 1 for INTRA dc coefficient and 31 for all other coefficients. Within a MB, the same quantizer is used for all coefficient excepts the INTRA dc one. The equations for the quantizer can be written in terms of the MB quantization factor, Q sometimes termed MQUANT: –C(u,v) = F(u,v) / 2Q if Q is odd –C(u,v) = (F(u,v) ±1)Q 1 if Q is even (F>0 => +-, F -+ Quantization for INTRA dc term: –C = (F+4) / 8 with inverse F = 8C. ±
Loop Filter (FIL) The filter is separable into one-dimensional horizontal and vertical functions. The function is non-recursive with coefficients of ¼, ½, ¼ except at block edges. The function has coefficients of 0, 1, 0 at block edges. The filter is switched on/off for all 6 blocks in a MB according to MTYPE. ×¼×¼ ×½×½ ×¼×¼
H.261 Decoder Inverse DCT Motion Compensation Loop Filter Intra Inter
Decoder Source format –Pictures are coded as luminance and two colour difference components (Y, Cb, and Cr). CIF (Common Intermediate Format) –Y: 352 × 288 –Cb, Cr: 176 × 144
Decoder QCIF (Quarter-CIF) –Y: 176 × 144 –Cb, Cr: 88 × 72 CIF for NTSC (National Television System Committee) input (MPEG SIF 525) –Y: 352 × 240 –Cb, Cr: 176 × 120 All codecs must be able to operate using QCIF. Some codecs can also operate with CIF.
H.261 Video Formats Video Format Luminance (Y)Chrominance(Cb, Cr) pixels/linelines/framepixels/linelines/frame CIF QCIF Y pixel Cb, Cr pixel Block boundary
Arrangement of H QCIF CIF
Arrangements of data structure in H QCIF picture GOB (Group Of Block) Y1Y2 Y3Y4 UV MB (Macro Block)
Positioning of luminance and chrominance smaples Y pixel Cb, Cr pixel Block boundary
Data Structure of Compressed Bitstream in H.261 Picture HeaderGOB data … Picture Layer GOB HeaderMB data … GOB Layer MB Header Block data … MB Layer TCOEFF … Block data Block Layer Fixed Length Code Variable Length Code
Structure of picture layer Picture start code (PSC) (20 bits) Temporal reference (TR) (5 bits) It is formed by incrementing its value in the previously transmitted picture header by one plus the number of non-transmitted pictures since that last transmitted one. (Only the five LSBs used) PSCTRPTYPEPEIPSPARE … PEI … GOB data
Structure of picture layer Type information (PTYPE) (6 bits) Bit 1 Split screen indicator Bit 2 Document camera indicator, “ 0 ” off, “ 1 ” on; Bit 3 Freeze picture release, “ 0 ” off, “ 1 ” on; Bit 4 Source format, “ 0 ” QCIF, “ 1 ” CIF; Bit 5 Optional still image model HI_RES, “ 0 ” on, “ 1 ” off Bit 6 Spare where Bit 1 is MSB Extra insertion information (PEI) (1 bit) “ 1 ” signals the presence of the following optional data field. PSCTRPTYPEPEIPSPARE … PEI … GOB data
GOB Layer Group of blocks start code (GBSC) (16 bits) – (if “ 0000 ” followed, then it is treated as a PSC) Group number (GN) (4 bits) –GN indicates the position of the group of blocks. 13, 14 and 15 are reserved for future use. 0 (0000) is used in the PSC. GBSCGNGQUANTGEIGSPARE … GEI … MB data
GOB Layer Quantizer information (GQUANT) (5 bits) –The quantizer to be used in the GOB until overridden by any subsequent MQUENT. Extra insertion information (GEI) (1 bit) –“ 1 ” signals the presence of the following optional data field. Spare information (GSPARE) (0/8/16 … bits) –If PEI = “ 1 ”, then the following 8-bits data is GSPARE. GBSCGNGQUANTGEIGSPARE … GEI … MB data
MB Layer Macroblock address(MBA) (Variable length: TABLE 1) –MBA indicates the position of a MB within a GOB. It is the difference between the absolute addresses of the MB and the last transmitted MB. Type information (MTYPE) (Variable length: TABLE 2) MBAMTYPEMQUANTMVDCBPBlock data
MB Layer Quantizer (MQUANT) (5 bits) –MQUANT is present only if so indicated by MTYPE (1, 3, 6, 9). MBAMTYPEMQUANTMVDCBPBlock data
MB Layer Motion vector data (MVD) (Variable length: TABLE 3) –MVD is obtained from the MV (for the MB) by subtracting the vector of the preceding MB. The vector of the preceding MB is regarded as zero in the following three situations: 1) evaluating MVD for MB 1, 12, 23. 2)evaluating MVD for MBs in which MBA does not represent a difference of 1 3) MTYPE of the previous MB was not MC. –Only one of the pair will yield a MV falling within the permitted range. MBAMTYPEMQUANTMVDCBPBlock data
MB Layer Coded block pattern (CBP) (Variable length: TABLE 4) –CBP is present if indicated by MTYPE (2, 3, 5, 6, 8, 9). The codeword gives a pattern number signifying those blocks in the MB for which at least one transform coefficient is transmitted. –CBP = 32P P 2 + 8P 3 + 4P 4 + 2P 5 + P 6 where P n = 1 if any coefficient is present for block n, else 0. MBAMTYPEMQUANTMVDCBPBlock data Y CbCb CrCr
Block Layer Transform coefficients (TCOEFF) (Variable length: TABLE 5) –TCOEFF is always present for all six blocks in a MB when MTYPE indicates INTRA. In other cases MTYPE and CBP signal which blocks have coefficient data transmitted for them. –The most commonly occurring combination of successive zeros (RUN) and the following value (LEVEL) are encoded with variable length codes in TABLE 5. Other combinations of (RUN, LEVEL) are encoded with a 20-bit word consisting of 6 bits ESCAPE, 6 bits RUN and 8 bits LEVEL.
Block Layer There are two code tables in TABLE 5: –1) Being used for the first transmitted LEVEL in INTER, INTER+MC, and INTER+MC+FIL blocks. (EOB is not included). –2) Being used for all other LEVELs (EOB is included) except the first one in INTRA blocks which is fixed length coded with 8 bits. Coefficients after the last non-zero one are not transmitted. EOB is always the last item in blocks for which coefficients are transmitted.
Structure of H.261 Bitstream PSCTRPTYPEPEIPSPARE … PEI … GOB data GBSCGNGQUANTGEIGSPARE … GEI … MB data MBAMTYPEMQUANTMVDCBPBlock data … …
Coding of H.261 Bitstream PSCTRPTYPEPEIPSPARE GOB Layer GBSCGNGQUANTGEIGSPARE MB Layer Picture Layer GOB Layer
Coding of H.261 Bitstream MBA MTYPE MQUANT MB Layer MVDCBPBlock Layer CBP MVD MBA stuffing TCOEFF EOB Fixed length Variable length
H.263 H.263 = (H.261) + (MPEG-like features) Compared to H.261 –More allowable picture formats –Half-pixel motion estimation, no loop filter –Different VLC tables at macroblock and block levels –Four negotiable options 3~4 dB better PSNR than H.261 at <64 kbps
H.263 Video Formats Sub-CIFQCIFCIF4CIF16CIF Pels/line Lines
Four Negotiable Options Unrestricted Motion Vector: motion vectors can point outside the picture, to 31.5 instead of –16 to 15.5 Advanced Prediction Mode: 8 8 motion vectors, overlapped block motion compensation, and motion vectors can point outside the picture Syntax-based Arithmetic Coding (about 5% decreasing in bit-rate) PB-frame
H Optional Modes Annex D: New Unrestricted Motion Vector (mv range up to +/- 256) Annex I: Advanced Intra Coding Annex J: Deblocking Filter Annex M: Improved PB-Frame Annex O: Temporal, Spatial, and SNR Scalability Annex P: Reference Picture Resampling Annex Q: Reduced Resolution Update
H.263+ Optional Modes Annex S: Alternative Inter VLC Annex I: Modified Quantization Error Resilience Annex K: Slice Structured Annex R: Independent Segment Decoding Annex N: Reference Picture Selection
Codec Implementation Issues Fast algorithm for motion estimation Fast algorithm for DCT/IDCT Huffman table implementation Program design –Program diagram –Memory assess (frame stores) –Register assignment –Program redundancy
Supplemental Enhancement Information Enhanced features Picture freeze and release Tagging information Snapshot Video segment start/end Progressive refinement start/end Chroma key Can be discarded by decoders that do not understand
H and H.263L H (year 2000) Backward compatible to H.263 and H.263+ Technical proposals on Error resilience 4 4 motion compensation and transform Adaptive quantization Long-term/background memory De-blocking and de-ringing filters … H (year 2002) Not necessarily Backward compatible to H.263- type encoders
Conclusion Basic ideas of Video Coding H.261/(H.262)/H.263/H.263+ MPEG1/MPEG2/MPEG4/MPEG7/MPEG21 Key concepts in H.26x –Transform base coding –Motion Estimation