Presentation is loading. Please wait.

Presentation is loading. Please wait.

Video Compression Lecture 6. Video Basic  Analog TV/Monitor  Progressive  Interlacing  Colours  National Television Standard Committee (NTSC)  30.

Similar presentations


Presentation on theme: "Video Compression Lecture 6. Video Basic  Analog TV/Monitor  Progressive  Interlacing  Colours  National Television Standard Committee (NTSC)  30."— Presentation transcript:

1 Video Compression Lecture 6

2 Video Basic  Analog TV/Monitor  Progressive  Interlacing  Colours  National Television Standard Committee (NTSC)  30 frames per sec ; 525 lines ; Aspect ratio is 4:3 ; Needs colour calibration  Phase Alternating Line (PAL)  25 frames per sec ; 625 lines ; colour phase is reversed on every alternative line  Séquentiel Couler Avec Mémoire (SÉCAM)  25 fps ; 625 lines (like PAL), uses FM modulation of subcarrier

3 Colour space Representations  RGB  YPbPr (Luminance, Blue, Red)  YUV  YCbCr  Digital representation of YPbPr colour space

4 Digitisation (sampling)  As TV signals are natively in luminance and chrominance- difference form, when you capture video its native form is YUV

5 Basic Video Compression based on Motion Compensation

6 Video Compression  A video consists of a time-ordered sequence of frames, i.e., images.  An obvious solution to video compression would be predictive coding based on previous frames. Compression proceeds by subtracting images: subtract in time order and code the residual error.  It can be done even better by searching for just the right parts of the image to subtract from the previous frame.

7 Video Compression with Motion Compensation  Consecutive frames in a video are similar — temporal redundancy exists.  Temporal redundancy is exploited so that not every frame of the video needs to be coded independently as a new image. The difference between the current frame and other frame(s) in the sequence will be coded — small values and low entropy, good for compression.  Steps of Video compression based on Motion Compensation (MC): 1. Motion Estimation (motion vector search). 2. MC-based Prediction. 3. Derivation of the prediction error, i.e., the difference.

8 Motion Compensation  Each image is divided into macroblocks of size N x N.  By default, N = 16 for luminance images. For chrominance images, N = 8 if 4:2:0 chroma subsampling is adopted.  Motion compensation is performed at the macroblock level.  The current image frame is referred to as Target Frame.  A match is sought between the macroblock in the Target Frame and the most similar macroblock in previous and/or future frame(s) (referred to as Reference frame(s)).  The displacement of the reference macroblock to the target macroblock is called a motion vector MV.  Figure 10.1 shows the case of forward prediction in which the Reference frame is taken to be a previous frame.

9  MV search is usually limited to a small immediate neighborhood — both horizontal and vertical displacements in the range [−p, p]. This makes a search window of size (2p + 1) x (2p + 1). Figure 10.1 Macroblocks and Motion Vector in Video Compression

10 Search for Motion Vectors  The difference between two macroblocks can then be measured by their Mean Absolute Difference (MAD): N — size of the macroblock, k and l — indices for pixels in the macroblock, i and j — horizontal and vertical displacements, C ( x + k, y + l ) — pixels in macroblock in Target frame, R ( x + i + k, y + j + l ) — pixels in macroblock in Reference frame.  The goal of the search is to find a vector (i, j) as the motion vector MV = (u, v), such that MAD(i, j) is minimum:

11 H.261 Video

12  H.261 Compression was designed for video telephony and video conferencing applications  Developed by CCITT (now ITU-T) in 1988-1990  Intended for use over ISDN telephone lines, as part of the H.320 protocol suite  Data rate was specified as multiples of 64Kb/s (“p x 64”)  Goals for ISDN video telephony  Low end-to-end delay  Constant bit rate

13 H.261 Structure

14 CIF and QCIF Frame Formats

15 GOB and Resynchronisation  Purpose of Group of Blocks is resynchronisation  GOB starts with a Start Code (GBSC) and Group Number (GN)  Within a GOB, encoded MBs don’t even start on byte boundaries

16 Macroblocks  Macroblock is basic unit for compression  Each macroblock is 16 x16 pixels  Represent as YUV 4:2:0 data  16 x 16 Luminance (Y) and subsampled 8 x 8 Cr, 8 x 8 Cb  Represent this as 6 Blocks of 8 x 8 pixels

17 Macroblock coding  Three ways to code a Macroblock: 1. Don’t code if it hasn’t changed since last frame, do not send it 2. Intra-frame compression  Do DCT, Quantise, Zig-zag, Run-length encoding and Huffman coding. Just like JPEG 3. Inter-frame compression  Calculate difference from previous version of same block  Can use motion estimation to indicate block being differenced can from a slightly different place in previous frame  Same DCT/Quantisation/Huffman coding as Intra, but data is differences rather than absolute values

18 Intra-frame compression  Intra-coding of blocks is very similar to JPEG  DCT  Quantised DCT  Unlike JPEG, H.261 uses the same quantiser value for all coefficients  Feedback loop changes quantiser to achieve target bitrate  Order coefficients in zig-zag order  Run-length encode  Huffman code what remains

19 Inter-frame compression  Basic compression process is the same as intra-frame compression, but the data is the differences from the immediately preceding frame rather than the raw samples themselves.

20 Frame differencing  Often the amount of information in the difference between two frames is a lot less than in the second frame itself

21 Motion  Motion in the scene will increase the differences  If you can figure out the motion (where each block came from in the previous frame)  Encode the motion as a motion vector (MV) (two small integers indicating motion in x and y directions)  Encode the differences from the moved block using DCT+quantisation+RLE+Huffman encoding

22

23 Motion Compensation in H.261  Each inter-coded 16 x 16 pixel macroblock has its own motion vector  Applies to all six 8 x 8 blocks in the macroblock  Encoder must search the image surrounding the marcroblock to discover where it came from  Don’t care whether its really motion or not – only that differencing reduce the data to send  Motion Vector (MV) search can be the most CPU-intensive part of H.261  Standard doesn’t say how to do this – only how to decode the results.

24 Search for Motion Vectors  Brute Force/Sequential  2D Logarithmic  Hierarchical

25 Sequential search  Sequential search: sequentially search the whole (2p + 1) x (2p + 1) window in the Reference frame (also referred to as Full search).  a macroblock centered at each of the positions within the window is compared to the macroblock in the Target frame pixel by pixel and their respective MAD  The vector (i, j) that offers the least MAD is designated as the MV (u, v) for the macroblock in the Target frame.  sequential search method is very costly — assuming each pixel comparison requires three operations (subtraction, absolute value, addition), the cost for obtaining a motion vector for a single macroblock is (2p + 1) (2p + 1) N 2 3 O ( p 2 N 2 ).

26 2D Logarithmic search  Logarithmic search: a cheaper version, that is suboptimal but still usually effective.  The procedure for 2D Logarithmic Search of motion vectors takes several iterations and is akin to a binary search:  As illustrated in Fig.10.2, initially only nine locations in the search window are used as seeds for a MAD-based search; they are marked as ‘1’.  After the one that yields the minimum MAD is located, the center of the new search region is moved to it and the step-size (“offset”) is reduced to half.  In the next iteration, the nine new locations are marked as ‘2’ and so on.

27 Fig. 10.2: 2D Logarithmic Search for Motion Vectors.

28 Hierarchical Search  The search can benefit from a hierarchical (multiresolution) approach in which initial estimation of the motion vector can be obtained from images with a significantly reduced resolution.  A three-level hierarchical search in which the original image is at Level 0, images at Levels 1 and 2 are obtained by down-sampling from the previous levels by a factor of 2, and the initial search is conducted at Level 2.  Since the size of the macroblock is smaller and p can also be proportionally reduced, the number of operations required is greatly reduced.

29 Hierarchical search

30 Intra-block encoding

31 Inter-block encoding

32 Bitstream structure

33 H.261 design goals Intended for videotelephony  Low delay  Each frame coded as it arrives  Only need a small bitstream buffer on output to smooth to CBR  Constant Bit Rate (CBR)  Only send a small number of intra-coded blocks in each frame, so data rate variation is only a function of video content  Adjust the quantisation based on occupancy of the bitstream buffer

34 H.261 Non-design goals  Not intended for recording and playback  No way to seek backwards or forwards because you don’t normally encode any frames with entirely intra-coded blocks  Could do this, but wouldn’t give CBR flow needed for ISDN usage  Limited robustness to bit errors  Error cause corruption (incorrect Huffman decoding of rest of GOB). Possibly detected by hitting a illegal state in decoder  Stop decoding, search for next GOB. Start decoding again.  Intra blocks recover damage slowly over next few seconds

35 H.263  Son of H.261  Standarised in 1996  Replacing H.261 in many applications  Basic design is very similar to H.261 (DCT/Quantisation based, using intra or inter frame coding)  Numerous optional improvements to improve compression, robustness, and flexibility of use.

36 H.263 Improvements  Half-pixel precision in motion vectors (vs full-pixel precision for H.261)  New options:  Unrestricted Motion Vectors  Syntax-based arithmetic coding (replace RLE/Huffman)  Advance prediction (uses four 8 x 8 blocks instead of one 16 x 16: gives better detail)  Five resolutions (H.261 only does QCIF and CIF)

37 Video Compression II Lecture 7

38 MPEG Video

39 MPEG Family  MPEG-1  Similar to H.263 CIF in quality  MPEG-2  Higher quality: DVD, Digital TV, HDTV  MPEG-4/H.264  More modern codec  Aimed at lower bitrates  Works well for HDTV too

40 MPEG-1 Compression  MPEG: Motion Pictures Expert Group  Finalised in 1991  Optimised for video resolutions:  352x240 pixels at 30fps (NTSC)  352x288 pixels at 25fps(PAL/SECAM)  Optimised for bit rates around 1-1.5Mb/s  Syntax allows up to 4095 x 4095 at 60fps, but not commonly used  Progressive scan only (not interlaced)

41 MPEG Frame Types  Unlike H.261, each frame must be of one type  H.261 can mix intra and inter-coded MBs in one frame  Three types of MPEG  I-frame (like H.261 intra-coded frames)  P-frame (“predictive”, like H.261 inter-coded frames)  B-frame (“bidirectional predictive)

42 MPEG I-frames  Similar to JPEG, except:  Luminance and chrominance share quantisation tables  Quantisation is adaptive (table can change) for each macroblock  Unlike H.261, every n frames, a full intra-coded frame is included  Permits skipping. Start decoding at first I-frame following the point you skip to  Permits fast scan. Just play I-frames.  Permits playing backwards (decoding previous I-frame, decode frames that depend on it, play decoded frames in reverse order)  An I frame and the successive frames to the next I frame (n frame) is known as a Group of Pictures

43 MPEG P-frames  Similar to an entire frame of H.261 inter-coded blocks  Half-pixel accuracy in motion vectors (pixels are averaged if needed)  May code from previous I frame or previous P frame

44 Object occlusion  Often an object moves in front of a background  P frames code the object fine, but can’t effectively code the revealed background

45 B-frames  Bidirectional Predictive Frames  Each macroblock contains two sets of motion vectors  Coded from one previous frame, one future frame, or a combination of both 1. Do motion vector search separately in past reference frame 2. Compare 1. Difference from past frame 2. Difference from future frame 3. Difference from average of past and future frame 3. Encode the version with the least differences

46 B-frames: Macroblock averaging

47 Frame ordering  Up to encoder to choose I, P, B frame ordering  E.g. IBBPBBIBBPBBPI…

48 Encoding order  Encode I-frame 1  Store frame 2  Store frame 3  Encode P frame 5  Encode B frame 2  Encode B frame 3  Store frame 5  Store frame 6  Encode I frame 7  Encode B frame 5  Encode B frame 6

49 Transmission order  Frames are encoded out of order  Need to be decoded in the order they are encoded  Common to send out of order  Allows decoder to decode as data arrives, although it still has to hold decoded frames until it has decoded prior B frames before playing them out

50 B-frame disadvantages  Computational complexity  More motion search, need to decide whether or not to average  Increase in memory bandwidth  Extra picture buffer needed  Need to store frames and encode or playback out of order  Delay  Adds several frames delay at encoder waiting for need later frame  Adds several frames delay at decoder holding decoded I/P frame, while decoding and playing prior B-frames that depend on it

51 B-frame advantage  B-frames increase compression  Typically use twice as many B frames as I+P frames Typical MPEG-1 values Really depends on video content.

52 MPEG-2  ISO/IEC standard in 1995  Aimed at higher quality video  Supports interlaced formats  Many features, but has profiles which constrain common subsets of those features:  Main profile (MP): 2-15 Mb/s over broadcast channels (e.g. DVB-t) or storage media (e.g. DVD)  PAL quality: 4-6Mb/s, NTSC quality: 3-5Mb/s

53 MPEG-2 levels

54 MPEG-2 vs MPEG-1  Sequence layer  Progressive vs interlaced  More aspect ratios  Syntax can now signal frames size up to 16383 x 16383  Pictures must be a multiple of 16 pixels

55 MPEG-2 vs MPEG-1 Picture layer:  All MPEG-2 motion vectors are always half-pixel accuracy  MPEG-1 can opt out, and do one-pixel accuracy  DC coefficient can be coded as 8, 9, 10,or 11 bits  MPEG-1 always uses 8 bits  Optimal non-linear macroblock quantisation, giving a more dynamic step size range  0.5 to 56 vs 1 to 1 to 32 in MPEG-1  Good for high-rate high quality video

56 Interlacing  MPEG-2 codes a frame. May include both interlaced fields  Fields may differ, so compression suffers  More high frequencies in vertical dimension  MPEG-2 can use a modified zig-zag for run-length encoding of the coefficients

57 Interlacing  Although MPEG-2 only codes full frame (both fields), it support both field prediction and frame prediction for interlaced sources  The current uncompressed frame has two fields  Can do the motion search independently for each field  Half the lines use one motion vector and half use the other to produce the reference block

58 Typical MPEG-2 frame size  Average sizes for ~4Mb/s video, main profile at main level  Actual frame sizes will vary a lot depending on content

59 MPEG-3  Doesn’t exist  Was aimed at HDTV  Ended up being folded into MPEG-2

60 MPEG-4  ISO/IEC designation ‘ISO/IEC 14496’: 1999  MPEG-4 version 2: 2000  Aimed at low bitrate (10Kb/s)  Can scale very high (1Gb/s)  Based around the concept of the composition of basic video objects into a scene

61 Media objects  Still images (e.g. as a fixed background)  Video objects (e.g. a talking person without the background)  Audio objects (e.g. the voice associated with that person, background music)  Text and graphics  Talking synthetics heads and associated text used to synthesized the speech and animate the head; animated bodies to go with the faces  Synthetic sound  Also 3D objects

62 Composition of Media objects  MPEG-4 provides a standarised way to describe a scene  Place media objects in a coordinate system  Apply transforms to change the geometrical or acoustical appearance of a media object  Group primitive media objects to form compound media objects  Apply streamed data to media objects  E.g. animation parameters driving a synthetic face  Can change, interactively, the user’s viewing and listening points anywhere in the scene  Builds on concepts from the Virtual Reality Modelling Language (VRML)

63

64 MPEG-4 If you can segment foreground motion form the background, MPEG-4 allows you to send it separately as a sprite

65 H.264 (MPEG-4, Part 10)  MPEG-4, Part 10 is also known as H.264  Advanced video coding standard, finalised in 2003

66 H.264 vs MPEG-2  Multi-picture motion compensation  Can use up to 32 different frames to predict a single frame  B-frame in MPEG-2 only code from two  Variable block-size motion compensation  From 4 x 4 to 16 x 16  Allows precise segmentation of edges of moving regions  Quarter-pixel precision for motion compensation  Weighted prediction (can scale or offset predicted block)  Useful in face-to-black or cross-fade between scenes  Spatial prediction from the edges of neighbouring blocks for “intra” coding  Choice of several more advanced context-awareness variable length coding schemes (instead of Huffman)

67 H.264 performance  Typically half the data rate of MPEG-2  HDTV:  MPEG-2: 1920 x 1080 typically 12-20 Mbps  H.264: 1920 x 1080 content at 7-8 Mbps

68 H.264 usage  Pretty new but expanding use  Include din MacOS 10 (Tiger) for iChat video conferencing  Used by Video iPod  Adopted by 3GPP for Mobile Video  Mandatory in both the HD-DVD and Blue-ray specifications for High Definition DVD

69 Thanks! Acknowledgement: Compilation of notes from UCL, CMU, Li & Drew


Download ppt "Video Compression Lecture 6. Video Basic  Analog TV/Monitor  Progressive  Interlacing  Colours  National Television Standard Committee (NTSC)  30."

Similar presentations


Ads by Google