Presentation on theme: "Video Coding TSBK01 Image Coding and Data Compression Lecture 10 Jörgen Ahlberg."— Presentation transcript:
Video Coding TSBK01 Image Coding and Data Compression Lecture 10 Jörgen Ahlberg
Outline I.Colour coding II.Moving images: From 2D to 3D? III.Hybrid coding IV.Video coding standards
Part I: Colour Coding The base colours of colour television are – Red:700 nm – Green:546 nm – Blue:435 nm Three base colours enough to synthesize any visible colour!
B G R The Colour Vector In this plane, the luminance Y = R+G+B = 1
The PAL colours Y = 0.30B G B Cr = 0.70R G B Cb = R G B Y luminance; Cr, Cb chrominance Matrix R G B Y R-Y B-Y
Change basis to YUV (almost the same as YCrCb). – For more info on color spaces, see colour FAQ at The Human Visual System perceives the luminance in higher resolution than the chrominance! Subsample the colour components. Digital Colour Coding Y UV 4:2:0 Y UV 4:2:2
Part II: Coding of Moving Images Principle I - Extend known methods to 3D Coding Method Prestanda (bpp) Complexity Decoding complexity PCM6 – 8Low VQ0.5 – 2Very highLow Predictive2 – 5Low Transform0.5 – 1.5High Subband/ Wavelet 0.1 – 1.0High Fractal Very highLow
Extending 2D Methods Predictive coding – 3D predictors – Motion compensated predictors Transform coding – 3D transforms Subband coding – 3D subband filters BUT! The properties of the image signal are different in the temporal and the spatial domain!
Frame Prediction Intra-coded I-frame Predictively coded P-frames Better prediction if it can compensate for motion!
Motion Compensated Hybrid Coding VLC ME ME: Motion estimation TQ -1 TQ P VLC TQ: Transform + quantization
Motion Compensation Typically one motion vector per macroblock (4 transform blocks) Motion estimation is a time consuming process – Hierarchical motion estimation – Maximum length of motion vectors – Clever search strategies Motion vector accuracy: – Integer, half or quarter pixel – Bilinear interpolation
Part IV: Video Coding Standards kbit/sMbit/s Very low bitrate Low bitrateMedium bitrateHigh bitrate Mobile videophone Videophone over PSTN ISDN videophone Digital TVHDTVVideo CD MPEG-4MPEG-1MPEG-2H.261H.263
Standards H.26x – Standards for real time communication like video telephony and video conferencing. – Standardized by ITU. MPEG – Standards for stored video data like movies on CDs, DVDs, etc. – Standardized by ISO.
H.261 Standard for ISDN picture phones in Motion compensation: – One motion vector per macroblock. – One macroblock = four 8 £ 8 luminance blocks + two chrominance blocks (one U and one V). – Motion vectors max 15 pixels long in each direction. Format: – CIF (352 £ 288) or QCIF (176 £ 144) – 7.5 – 30 frames/s. Bitrate: Multiple of 64 kbit/s (=ISDN) including audio. Quality: Acceptable for small motion at 128 kbit/s.
H.263 Standard for picture telephones over analog subscriber lines in Format: – CIF, QCIF or Sub-QCIF. – Usually less than 10 frames/s. Bitrate: Typically 20 – 30 kbit/s. Quality: With new options as good as H.261 (at half the bitrate).
MPEG Moving Pictures Expert Group – a committee under ISO and IEC. Original plan: – MPEG-1 for 1.5 Mbit/s (VideoCD) – MPEG-2 for 10 Mbit/s (Digital TV) – MPEG-3 for 40 Mbit/s (HDTV) What happened: – MPEG-1 for 1.5 Mbit/s (Video CD) – MPEG-2 for 2 – 60 Mbit/s (TV and HDTV) – MPEG-4, -7 and -21 for other things.
MPEG-1 ISO/IEC standard in Target bitrate around 1.5 Mbit/s (Video CD). Properties: – Bi-directionally predictively coded frames (B-frames, see next slide). – More flexible than H.261. – Almost JPEG for intra frames. Format: – CIF – No interlace. – 24 – 30 frames/s.
MPEG Frame Types I B P BB P BB P BB I B Intra-coded I-frame Predictively coded P-frames Bi-directionally predictively coded B-frames Group of frames (GOF)
MPEG-coding of I-frames Intracoded 8 £ 8 DCT Arbitrary weighting matrix for coefficients Predictive coding of DC-coefficients Uniform quantization Zig-zag, run-level, entropy coding
MPEG-coding of P-frames Motion compensated prediction from I- or P-frame. Half-pixel accuracy of motion vectors, bilinear interpolation. Predictive coding of motion vectors. Prediction error coded as I-frame.
MPEG-coding of B-frames Motion compensated prediction from two consecutive I- or P-frames. – Forward prediction only (1 vector/macroblock). – Backward prediction only (1 vector/macroblock). – Average of fwd and bwd (2 vectors/macroblock). Otherwise as P-frames.
MPEG-2 ISO/IEC standard in Properties: – Handles interlace (optimized for TV) – Even more flexible than MPEG-1 Format: – 352 £ 288 – 704 £ 576 (25 frames/s) or 720 £ 480 (30 frames/s) – 1440 £ 1152 or 1920 £ 1080 (HDTV) Bitrate: – 2 – 60 Mbit/s – ~4 Mbits/s: Image quality similar to PAL / NTSC / SECAM. – 18 – 20 Mbit/s: HDTV.
MPEG-2 (cont.) Profiles: – Simple profile without B-frames. – Scaleable profiles. Experience tells that: – At 1.5 – 2 Mbit/s MPEG-2 is not better than MPEG-1. – With manual interaction at the coding, good quality can be achieved at 3 – 4 Mbit/s. – Problems with implementing the full standard has caused compatibility problems. – Buffering and rate control hard problems.
MPEG-4 ISO/IEC standard in 1998, version 2 in 1999 Instead of frames as coding units, MPEG-4 use audio-visual objects Focus is not primarily on compression, but on content-based functionality Contains definitions of: – Media object types (video, audio, text, graphics,...) – Parameters for describing the objects – Bitstream syntax for the (compressed) parameters – Scene description, file format, streaming, synchronization,... Allows mixing of media objects.
Parts of the MPEG-4 standard Part 1, Systems, contains – The bitstream syntax and the the binary language for scene description – Computer graphics object descriptions – Multiplexing, transport,... Part 2, Visual, contains – Video coding – Still image coding – Texture coding,... Part 3, Audio, contains a toolbox of audio coders for different applications...
Structure of an MPEG-4 Decoder A/V object Decoder MUX Compositor BitstreamAudio/Video scene A/V object Decoder A/V object Decoder
A video frame Background VOP VOP MPEG-4 (Natural) Video Instead of frames: Video Object Planes Coded with Shape Adaptive DCT Alpha mapSA DCT
Synthetic/Natural Hybrid Coding Mix traditional video with 2D/3D graphics – Compose virtual environments – Easy to add text, graphs, images, etc High compression Receive object from separate sources – Use predefined or locally defined objects Scaleability – Progressive decoding – Better terminal gives better quality.
Synthetic Objects 2D/3D graphics – Lines, polygons – Still images – Image/video mapping on polygon meshes VRML scenes and objects Animated people More on animation and virtual characters in Lecture 12! Synthetic audio More on natural and synthetic audio in Lecture 11!
Computer graphics generated virtual environment Natural video objectNatural video object mapped on 2D mesh Still image or natural video object mapped on animated 3D mesh All mixed in the decoder!!!
Virtual Environments Downloaded virtual environment Different environments for different users Simple change between environments Synthetic environments are cheaper than real ones
Tools for Synthetic Objects Wavelet-based still image compression – Scaleable quality and resolution – Progressive decoding – Can be mapped on 2D or 3D meshes Compression of 2D and 3D meshes – Mesh geometry and animation – Transmit vertex coordinates and let the receiving terminal calculate the polygons – A moving or still image can be mapped on the mesh (texture mapping).
More Tools for Synthetic Objects Face and Body Animation Text-to-speech (TTS) interface View-dependent scaleable texture – Information about the users view position in a 3D scene is transmitted on a back-channel – Only the necessary texture information is transmitted to the user
View-dependent Scaleable Texture Original texture The texture is mapped on a surface What the user sees
Other formats Microsoft, RealVideo, QuickTime,... All are variations of the hybrid coder used in MPEG- coders, with some extra features.
New Stuff ITU and ISO in cooperation: H.264 = MPEG-4 part 10 Finished in 2003.
H.264 / MPEG-4 part 10 4 £ 4 integer transform (approximating DCT). Prediction of blocks of sizes up to 16 £ 16. Motion vectors for blocks of sizes 4 £ 4 up to 16 £ 16. Up to 5 reference images for prediction. Non-uniform qunatization. Arithmetic coding of run-level pairs.
What about the sound? MPEG-1 – Audio layer I, II and III (mp3). MPEG-2 – Four channels, same codec as in MPEG-1. – AAC (Advanced Audio Codec) added later. MPEG-4 – AAC – Two speech coders – Structured audio – And more... More on audio coding in Lecture 11.
Conclusion Color coding – Change basis from RGB to YUV – Colour components are compressed harder than the luminance Moving image coding – Hybrid coding: Motion compensated predictive coding and transform coding of the prediction error – I-, P-, and B-frames – Object-based coding (MPEG-4) mixing synthetic and natural audio & video
Conclusion (cont) Standards – MPEG-1: Video CD – MPEG-2: Digital TV – MPEG-4: Multimedia – H.261: ISDN videophone – H.263: PSTN videophone – H.264 / MPEG-4 part 10:Universal video