Download presentation
Published byDiane Webster Modified over 9 years ago
1
Video Compression Professor : Sheau-Ru Tong Student : Chih-Ming Chen
NPUST-MINAR Professor : Sheau-Ru Tong Student : Chih-Ming Chen MINAR
2
Review of basics of image and video compression
Outline Review of basics of image and video compression 1 Scalable video coding 2 Overview of current video compression standards 3 Object-based video coding (MPEG-4) 4 MINAR
3
Review of Image Compression
Coding an image (single frame): RGB to YUV color-space conversion Partition image into 8x8-pixel blocks 2-D DCT of each block Quantize each DCT coefficient Runlength and Huffman code the nonzero quantized DCT coefficients Basis for the JPEG Image Compression Standard JPEG-2000 uses wavelet transform and arithmetic coding Compressed Bitstream Original Signal RGB to YUV Block DCT Quantization Runlength &Huffman Coding MINAR
4
Video Compression Main addition over image compression:
Exploit the temporal redundancy Predict current frame based on previously coded frames Three types of coded frames: I-frame: Intra-coded frame, coded independently of all other frames P-frame: Predicatively coded frame, coded based on previously coded frame B-frame: Bi-directionally predicted frame, coded based on both previous and future coded frames MINAR
5
MC-Prediction and Bi-Directional MC-Prediction (P- and B-frames)
Motion compensated prediction: Predict the current frame based on reference frame(s) while compensating for the motion Examples of block-based motion-compensated prediction (P-frame) and bi-directional prediction (B-frame): Previous Frame P-Frame Previous Frame B-Frame Future Frame MINAR
6
Example Use of I-,P-,B-frames: MPEG Group of Pictures (GOP)
Arrows show prediction dependencies between frames MINAR
7
Summary of Temporal Processing
Use MC-prediction (P and B frames) to reduce temporal redundancy MC-prediction usually performs well; In compression have a second chance to recover when it performs badly MC-prediction yields: Motion vectors MC-prediction error or residual Code error with conventional image coder Sometimes MC-prediction may perform badly Examples: Complex motion, new imagery (occlusions) Approach: Identify blocks where prediction fails Code block without prediction MINAR
8
Basic Video Compression Algorithm
Exploiting the redundancies: Temporal: MC-prediction (P and B frames) Spatial: Block DCT Color: Color space conversion Scalar quantization of DCT coefficients Zigzag scanning, runlength and Huffman coding of the nonzero quantized DCT coefficients MINAR
9
Example Video Encoder MINAR
10
Example Video Decoder MINAR
11
Review of basics of image and video compression
Outline Review of basics of image and video compression 1 Scalable video coding 2 Overview of current video compression standards 3 Object-based video coding (MPEG-4) 4 MINAR
12
Motivation for Scalable Coding
Basic situation: 1. Diverse receivers may request the same video Different bandwidths, spatial resolutions, frame rates, computational capabilities 2. Heterogeneous networks and a priori unknown network conditions Wired and wireless links, time-varying bandwidths When you originally code the video you don’t know which client or network situation will exist in the future Probably have multiple different situations, each requiring a different compressed bitstream Need a different compressed video matched to each situation Possible solutions: Compress & store MANY different versions of the same video Real-time transcoding (e.g. decode/re-encode) Scalable coding MINAR
13
Scalable Video Coding Scalable coding:
Decompose video into multiple layers of prioritized importance Code layers into base and enhancement bitstreams Progressively combine one or more bitstreams to produce different levels of video quality Example of scalable coding with base and two enhancement layers: Can produce three different qualities Base layer Base + Enh1 layers Base + Enh1 + Enh2 layers Scalability with respect to: Spatial or temporal resolution, bit rate, computation, memory Higher quality MINAR
14
Example of Scalable Coding
Encode image/video into three layers: Low-bandwidth receiver: Send only Base layer Medium-bandwidth receiver: Send Base & Enh1 layers High-bandwidth receiver: Send all three layers Can adapt to different clients and network situations MINAR
15
Scalable Video Coding (cont.)
Three basic types of scalability (refine video quality along three different dimensions): Temporal scalability Temporal resolution Spatial scalability Spatial resolution SNR (quality) scalability Amplitude resolution Each type of scalable coding provides scalability of one dimension of the video signal Can combine multiple types of scalability to provide scalability along multiple dimensions MINAR
16
Scalable Coding: Temporal Scalability
Temporal scalability: Based on the use of B-frames to refine the temporal resolution B-frames are dependent on other frames However, no other frame depends on a B-frame Each B-frame may be discarded without affecting other frames MINAR
17
Scalable Coding: Spatial Scalability
Spatial scalability: Based on refining the spatial resolution Base layer is low resolution version of video Enh1 contains coded difference between upsampled base layer and original video Also called: Pyramid coding MINAR
18
Scalable Coding: SNR (Quality) Scalability
SNR (Quality) Scalability: Based on refining the amplitude resolution Base layer uses a coarse quantizer Enh1 applies a finer quantizer to the difference between the original DCT coefficients and the coarsely quantized base layer coefficients MINAR
19
Summary of Scalable Video Coding
Three basic types of scalable coding: Temporal scalability Spatial scalability SNR (quality) scalability Scalable coding produces different layers with prioritized importance Prioritized importance is key for a variety of applications: Adapting to different bandwidths, or client resources such as spatial or temporal resolution or computational power Facilitates error-resilience by explicitly identifying most important and less important bits MINAR
20
Review of basics of image and video compression
Outline Review of basics of image and video compression 1 Scalable video coding 2 Overview of current video compression standards 3 Object-based video coding (MPEG-4) 4 MINAR
21
Motivation for Standards
Goal of standards: Ensuring interoperability: Enabling communication between devices made by different manufacturers Promoting a technology or industry Reducing costs MINAR
22
What do the Standards Specify?
Not the encoder Not the decoder Just the bitstream syntax and the decoding process (e.g. use IDCT, but not how to implement the IDCT) Enables improved encoding & decoding strategies to be employed in a standard-compatible manner Encoder Decoder Bitstream (Decoding Process) Scope of Standardization MINAR
23
Current Image and Video Compression Standards
Application Bit Rate JPEG Continuous-tone still-image compression Variable H.261 Video telephony and teleconferencing over ISDN p x 64 kb/s MPEG-1 Video on digital storage media (CD-ROM) 1.5 Mb/s MPEG-2 Digital Television 2-20 Mb/s H.263 Video telephony over PSTN 33.6-? kb/s MPEG-4 Object-based coding, synthetic content, interactivity JPEG-2000 Improved still image compression H.26L Improved video compression 10’s to 100’s kb/s MINAR
24
Comparing Current Video Compression Standards
Based on the same fundamental building blocks Motion-compensated prediction (I, P, and B frames) 2-D Discrete Cosine Transform (DCT) Color space conversion Scalar quantization, runlengths, Huffman coding Additional tools added for different applications: Progressive or interlaced video Improved compression, error resilience, scalability, etc. MPEG-1/2/4, H.261/3/L: Frame-based coding MPEG-4: Object-based coding and Synthetic video MINAR
25
MPEG-1 and MPEG-2 MPEG-1 (1991) MPEG-2 (1993)
Goal: Compression for digital storage media (e.g. CD-ROM) Achieves VHS quality video and audio at ~1.5 Mb/s MPEG-2 (1993) Goal: Superset of MPEG-1 to support higher bit rates, higher resolutions, and interlaced pictures. Original goal to support interlaced video from conventional television; Eventually extended to support HDTV Provides: Field-based coding and scalability tools MINAR
26
Example Use of I-,P-,B-frames: MPEG Group of Pictures (GOP)
Arrows show prediction dependencies between frames MINAR
27
MPEG Group of Pictures (GOP) Structure
Composed of I, P, and B frames Arrows show prediction dependencies Periodic I-frames enable random access into the coded bitstream Parameters: (1) Spacing between I frames, (2) number of B frames between I and P frames MINAR
28
MPEG Structure MPEG codes video in a hierarchy of layers. The sequence layer is not shown. GOP Layer Picture Layer Macroblock Layer Block Layer Slice Layer MINAR
29
MPEG-2 Profiles and Levels
Goal: To enable more efficient implementations for different applications (interoperability points) Profile: Subset of the tools applicable for a family of applications Level: Bounds on the complexity for any profile Level Profile High Main Low Simple HDTV: Main Profile at High Level DVD & SD Digital TV: Main Profile at Main Level MINAR
30
Goals of MPEG-4 Primary goals: New functionalities (not just better compression) Object-based or content-based representation Separate coding of individual visual objects Content-based access and manipulation Integration of natural and synthetic objects Interactivity Communication over error-prone environments Includes frame-based coding techniques from earlier standards MINAR
31
Comparing MPEG-1/2 and H.261/3 with MPEG-4
MPEG-1/2 and H.261/H.263: Algorithms for compression Basically describe a pipe for storage or transmission Frame-based Emphasis on hardware implementation MPEG-4: Set of tools for a variety of applications Define tools and glue to put them together Object-based and frame-based Emphasis on software Downloadable algorithms (not encoders or decoders) MINAR
32
Review of basics of image and video compression
Outline Review of basics of image and video compression 1 Scalable video coding 2 Overview of current video compression standards 3 Object-based video coding (MPEG-4) 4 MINAR
33
Comments on Object-based Processing
Basic goal: Separate encoding/decoding of separate objects in a scene Separate processing of each object enables: Identification and selective decoding and/or processing of object of interest Facilitates interactivity and manipulation of content Processing of content in the compressed domain Possible w/o decoding or segmentation at decoder Used for many years in authoring/production Video: bluescreening, e.g. weather-news Audio: individual processing of each voice MPEG-4 also enables end-user to have object-based processing MINAR
34
Different Parts of MPEG-4
Video Coding and expression of natural and synthetic video objects Audio Coding and expression of natural and synthetic speech and audio objects Systems Scene Description: Composition of different audio and video objects in the scene BIFS: Binary Format for Scene Description Buffering, multiplexing, timing Interaction Delivery (Delivery of MM Integration Framework, DMIF) Setup of connection (broadcast, interactive) Network is transparent to application MINAR
35
Scene Description Scene description:
Describes the spatio-temporal positioning of the individual audio & video (AV) objects to compose the scene AV Objects: audio, video, natural, synthetic, 2-D, 3-D Transmitted separately from object bitstreams Scene description info is a property of scene’s structure rather than individual objects Enables scene modification without decoding objects Can be dynamically altered MINAR
36
Example of MPEG-4 Scene MINAR [MPEG Committee]
37
Scene Description (cont.)
Hierarchical, tree structure: Leaf nodes: individual AV objects Other nodes: meaningful grouping [MPEG Committee] MINAR
38
Example MPEG-4 Decoding Process
[MPEG Committee] MINAR
39
Object-based Processing in the Compressed Domain
Each video or audio object coded into a separate bitstream Scene description contains all non-coded information Possible operations: Add/delete an object: Add/discard bitstream, e.g. individual instruments in an orchestra Manipulate (e.g. move) object: Alter visual/audio scene composition Many object-based operations can be performed without requiring decoding MINAR
40
MPEG-4 Natural Video MPEG-4 has two primary goals for natural video coding: High compression efficiency coding Rectangular frames High coding efficiency ( kb/s), low latency, low complexity Error resilience against packet loss, burst errors on wireless links Applications include: Video streaming over the Internet, video over 3G cellular systems Object-based coding Content-based functionalities Arbitrarily shaped visual objects Separate encoding & decoding of each object Greatly improved content creation capabilities, as well as interactivity with different objects at the client MINAR
41
MPEG-4 Coding of Natural Video
Classes of video to represent: Rectangular images Shape (rectangle) does not change with time Code motion and amplitude information Use conventional coding methods, e.g. MPEG-1/2 Arbitrarily shaped (non-rectangular) image regions Shape usually changes with time Must code motion, amplitude (texture) and shape Arbitrary & time-varying shape complicates coding Also describe how objects are composed to form scene (scene description) Separate encoding and decode of each object Frame-based coding Object-based coding MINAR
42
MPEG-4 Natural Video Coding
Extension of MPEG-1/2-type algorithms to code arbitrarily shaped objects Frame-based Coding Object-based Coding [MPEG Committee] Basic Idea: Extend Block-DCT and Block-ME/MC-prediction to code arbitrarily shaped objects MINAR
43
Coding of Arbitrarily Shaped Video Objects
Following slides briefly discuss different aspects of coding arbitrarily shaped video objects: Coding of texture (amplitude) information MC-prediction I, P, B coding of objects Coding of shape information Goal: To give brief, conceptual overview (Not covered on problem sets or quiz) Key points to take away: Different attributes to code for arbitrarily shaped video objects Texture, motion, & shape information MPEG-4 extends block-based coding to code arbitrarily shaped objects (Not an elegant solution, but it works) MINAR
44
Example of Arbitrarily Shaped Object
Arbitrarily shaped 2-D object (image region): Video object plane (VOP) in MPEG-4 [MPEG Committee] MINAR
45
Comments on Segmentation
Segmentation of video into objects is not standardized (part of encoder) Different segmentations scenarios: Sometimes segmentation is available, e.g. synthetically generated content Sometimes it is relatively easy, e.g. bluescreening or video-conferencing Usually it is very difficult MINAR
46
Coding the Texture of an Arbitrarily Shaped Object
Texture (amplitude) coded by Block-DCT adapted for arbitrarily shaped support Embed VOP in rectangle Separate processing of each 8x8 block Interior ® Conventional Block-DCT Exterior ® Discard Boundary ® Extrapolate then Block-DCT [MPEG Committee] MINAR
47
MC-Prediction for Texture Coding of Arbitrarily Shaped Object
Block-based ME/MC-P adapted for arbitrarily shaped support: Extrapolate arbitrarily shaped object to fill rectangle Perform conventional block-based ME/MC-P Error metric computed only over object’s support in current frame Also: Parametric motion models (e.g. affine, perspective) [MPEG Committee] MINAR
48
MC-Prediction for Video Object Planes: I, P, and B VOP’s
MC-Prediction for VOP’s: I-VOP: Intra-coded VOP (no prediction) P-VOP: Predicted VOP B-VOP: Bi-directionally predicted VOP MINAR
49
Binary Shape Coding Opaque objects: Each pixel either inside or outside support Shape given by binary alpha map (bitmap or binary mask) Many possible approaches for lossless and lossy shape coding e.g. Describe shape by chain code, polynomials, splines, bitmap MPEG-4: Block-based Context-based Arithmetic Coding (CAE) Embed support in rectangle Separate processing of 16x16 blocks Interior (opaque) blocks (completely within object) Exterior (transparent) blocks (completely outside object) Boundary blocks CAE Also motion compensated CAE MINAR
50
Binary Shape Coding: Block-based Shape Coding
Different 16x16 blocks: Interior, boundary, and exterior MINAR
51
Binary Shape Coding: Block-based CAE (cont.)
Coding of boundary blocks using CAE: Intra-shape coding Context defined by 10-pixel template Inter-shape coding MC-shape using shape motion vector Context defined by 9-pixel template from current and previous frames Previous Frame Current Frame MINAR
52
Sprite Coding (Background Prediction)
Sprite: Large background image Hypothesis: Same background exists for many frames, changes resulting from camera motion and occlusions One possible coding strategy: Code & transmit entire sprite once Only transmit camera motion parameters for each subsequent frame Significant coding gain for some scenes MINAR
53
Sprite Coding Example MINAR Foreground Object Sprite (background)
[MPEG Committee] Reconstructed Frame MINAR
54
Related MPEG Standards (non-compression)
MPEG-7 “Multimedia Content Description Interface” Goal: A method for describing multimedia content to enable efficient searching and management of multimedia. MPEG-21 “Multimedia Framework” Goal: To enable the electronic commerce of digital media content. MINAR
55
References and Further Reading
General Video Compression References: J.G. Apostolopoulos and S.J. Wee, ``Video Compression Standards'‘, Wiley Encyclopedia of Electrical and Electronics Engineering, John Wiley & Sons, Inc., New York, 1999. V. Bhaskaran and K. Konstantinides, Image and Video Compression Standards: Algorithms and Architectures, Boston, Massachusetts: Kluwer Academic Publishers, 1997. J.L. Mitchell, W.B. Pennebaker, C.E. Fogg, and D.J. LeGall, MPEG Video Compression Standard, New York: Chapman & Hall, 1997. B.G. Haskell, A. Puri, A.N. Netravali, Digital Video: An Introduction to MPEG-2, Kluwer Academic Publishers, Boston, 1997. MPEG web site: MINAR
56
References and Further Reading (cont.)
Video Compression Standards Documents Video codec for audiovisual services at px64 kbits/s, ITU-T Recommendation H.261, International Telecommunication Union, 1990. Video coding for low bit rate communication, ITU-T Recommendation H.263, International Telecommunication Union, version 1, 1996; version 2, 1997. ISO/IEC 11172, Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbits/s. International Organization for Standardization (ISO), 1993. ISO/IEC Generic coding of moving pictures and associated audio information. International Organization for Standardization (ISO), 1996. ISO/IEC Coding of audio-visual objects. International Organization for Standardization (ISO), 1999. MINAR
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.