Presentation is loading. Please wait.

Presentation is loading. Please wait.

Video Compression Professor : Sheau-Ru Tong Student : Chih-Ming Chen

Similar presentations


Presentation on theme: "Video Compression Professor : Sheau-Ru Tong Student : Chih-Ming Chen"— Presentation transcript:

1 Video Compression Professor : Sheau-Ru Tong Student : Chih-Ming Chen
NPUST-MINAR Professor : Sheau-Ru Tong Student : Chih-Ming Chen MINAR

2 Review of basics of image and video compression
Outline Review of basics of image and video compression 1 Scalable video coding 2 Overview of current video compression standards 3 Object-based video coding (MPEG-4) 4 MINAR

3 Review of Image Compression
Coding an image (single frame): RGB to YUV color-space conversion Partition image into 8x8-pixel blocks 2-D DCT of each block Quantize each DCT coefficient Runlength and Huffman code the nonzero quantized DCT coefficients  Basis for the JPEG Image Compression Standard  JPEG-2000 uses wavelet transform and arithmetic coding Compressed Bitstream Original Signal RGB to YUV Block DCT Quantization Runlength &Huffman Coding MINAR

4 Video Compression Main addition over image compression:
Exploit the temporal redundancy Predict current frame based on previously coded frames Three types of coded frames: I-frame: Intra-coded frame, coded independently of all other frames P-frame: Predicatively coded frame, coded based on previously coded frame B-frame: Bi-directionally predicted frame, coded based on both previous and future coded frames MINAR

5 MC-Prediction and Bi-Directional MC-Prediction (P- and B-frames)
Motion compensated prediction: Predict the current frame based on reference frame(s) while compensating for the motion Examples of block-based motion-compensated prediction (P-frame) and bi-directional prediction (B-frame): Previous Frame P-Frame Previous Frame B-Frame Future Frame MINAR

6 Example Use of I-,P-,B-frames: MPEG Group of Pictures (GOP)
Arrows show prediction dependencies between frames MINAR

7 Summary of Temporal Processing
Use MC-prediction (P and B frames) to reduce temporal redundancy MC-prediction usually performs well; In compression have a second chance to recover when it performs badly MC-prediction yields: Motion vectors MC-prediction error or residual  Code error with conventional image coder Sometimes MC-prediction may perform badly Examples: Complex motion, new imagery (occlusions) Approach: Identify blocks where prediction fails Code block without prediction MINAR

8 Basic Video Compression Algorithm
Exploiting the redundancies: Temporal: MC-prediction (P and B frames) Spatial: Block DCT Color: Color space conversion Scalar quantization of DCT coefficients Zigzag scanning, runlength and Huffman coding of the nonzero quantized DCT coefficients MINAR

9 Example Video Encoder MINAR

10 Example Video Decoder MINAR

11 Review of basics of image and video compression
Outline Review of basics of image and video compression 1 Scalable video coding 2 Overview of current video compression standards 3 Object-based video coding (MPEG-4) 4 MINAR

12 Motivation for Scalable Coding
Basic situation: 1. Diverse receivers may request the same video Different bandwidths, spatial resolutions, frame rates, computational capabilities 2. Heterogeneous networks and a priori unknown network conditions Wired and wireless links, time-varying bandwidths  When you originally code the video you don’t know which client or network situation will exist in the future  Probably have multiple different situations, each requiring a different compressed bitstream Need a different compressed video matched to each situation Possible solutions: Compress & store MANY different versions of the same video Real-time transcoding (e.g. decode/re-encode) Scalable coding MINAR

13 Scalable Video Coding Scalable coding:
Decompose video into multiple layers of prioritized importance Code layers into base and enhancement bitstreams Progressively combine one or more bitstreams to produce different levels of video quality Example of scalable coding with base and two enhancement layers: Can produce three different qualities Base layer Base + Enh1 layers Base + Enh1 + Enh2 layers Scalability with respect to: Spatial or temporal resolution, bit rate, computation, memory Higher quality MINAR

14 Example of Scalable Coding
Encode image/video into three layers: Low-bandwidth receiver: Send only Base layer Medium-bandwidth receiver: Send Base & Enh1 layers High-bandwidth receiver: Send all three layers Can adapt to different clients and network situations MINAR

15 Scalable Video Coding (cont.)
Three basic types of scalability (refine video quality along three different dimensions): Temporal scalability  Temporal resolution Spatial scalability Spatial resolution SNR (quality) scalability  Amplitude resolution Each type of scalable coding provides scalability of one dimension of the video signal Can combine multiple types of scalability to provide scalability along multiple dimensions MINAR

16 Scalable Coding: Temporal Scalability
Temporal scalability: Based on the use of B-frames to refine the temporal resolution B-frames are dependent on other frames However, no other frame depends on a B-frame Each B-frame may be discarded without affecting other frames MINAR

17 Scalable Coding: Spatial Scalability
Spatial scalability: Based on refining the spatial resolution Base layer is low resolution version of video Enh1 contains coded difference between upsampled base layer and original video Also called: Pyramid coding MINAR

18 Scalable Coding: SNR (Quality) Scalability
SNR (Quality) Scalability: Based on refining the amplitude resolution Base layer uses a coarse quantizer Enh1 applies a finer quantizer to the difference between the original DCT coefficients and the coarsely quantized base layer coefficients MINAR

19 Summary of Scalable Video Coding
Three basic types of scalable coding: Temporal scalability Spatial scalability SNR (quality) scalability Scalable coding produces different layers with prioritized importance Prioritized importance is key for a variety of applications: Adapting to different bandwidths, or client resources such as spatial or temporal resolution or computational power Facilitates error-resilience by explicitly identifying most important and less important bits MINAR

20 Review of basics of image and video compression
Outline Review of basics of image and video compression 1 Scalable video coding 2 Overview of current video compression standards 3 Object-based video coding (MPEG-4) 4 MINAR

21 Motivation for Standards
Goal of standards: Ensuring interoperability: Enabling communication between devices made by different manufacturers Promoting a technology or industry Reducing costs MINAR

22 What do the Standards Specify?
Not the encoder Not the decoder Just the bitstream syntax and the decoding process (e.g. use IDCT, but not how to implement the IDCT)  Enables improved encoding & decoding strategies to be employed in a standard-compatible manner Encoder Decoder Bitstream (Decoding Process) Scope of Standardization MINAR

23 Current Image and Video Compression Standards
Application Bit Rate JPEG Continuous-tone still-image compression Variable H.261 Video telephony and teleconferencing over ISDN p x 64 kb/s MPEG-1 Video on digital storage media (CD-ROM) 1.5 Mb/s MPEG-2 Digital Television 2-20 Mb/s H.263 Video telephony over PSTN 33.6-? kb/s MPEG-4 Object-based coding, synthetic content, interactivity JPEG-2000 Improved still image compression H.26L Improved video compression 10’s to 100’s kb/s MINAR

24 Comparing Current Video Compression Standards
Based on the same fundamental building blocks Motion-compensated prediction (I, P, and B frames) 2-D Discrete Cosine Transform (DCT) Color space conversion Scalar quantization, runlengths, Huffman coding Additional tools added for different applications: Progressive or interlaced video Improved compression, error resilience, scalability, etc. MPEG-1/2/4, H.261/3/L: Frame-based coding MPEG-4: Object-based coding and Synthetic video MINAR

25 MPEG-1 and MPEG-2 MPEG-1 (1991) MPEG-2 (1993)
Goal: Compression for digital storage media (e.g. CD-ROM) Achieves VHS quality video and audio at ~1.5 Mb/s MPEG-2 (1993) Goal: Superset of MPEG-1 to support higher bit rates, higher resolutions, and interlaced pictures. Original goal to support interlaced video from conventional television; Eventually extended to support HDTV Provides: Field-based coding and scalability tools MINAR

26 Example Use of I-,P-,B-frames: MPEG Group of Pictures (GOP)
Arrows show prediction dependencies between frames MINAR

27 MPEG Group of Pictures (GOP) Structure
Composed of I, P, and B frames Arrows show prediction dependencies Periodic I-frames enable random access into the coded bitstream Parameters: (1) Spacing between I frames, (2) number of B frames between I and P frames MINAR

28 MPEG Structure MPEG codes video in a hierarchy of layers. The sequence layer is not shown. GOP Layer Picture Layer Macroblock Layer Block Layer Slice Layer MINAR

29 MPEG-2 Profiles and Levels
Goal: To enable more efficient implementations for different applications (interoperability points) Profile: Subset of the tools applicable for a family of applications Level: Bounds on the complexity for any profile Level Profile High Main Low Simple HDTV: Main Profile at High Level DVD & SD Digital TV: Main Profile at Main Level MINAR

30 Goals of MPEG-4 Primary goals: New functionalities (not just better compression) Object-based or content-based representation Separate coding of individual visual objects Content-based access and manipulation Integration of natural and synthetic objects Interactivity Communication over error-prone environments Includes frame-based coding techniques from earlier standards MINAR

31 Comparing MPEG-1/2 and H.261/3 with MPEG-4
MPEG-1/2 and H.261/H.263: Algorithms for compression Basically describe a pipe for storage or transmission Frame-based Emphasis on hardware implementation MPEG-4: Set of tools for a variety of applications Define tools and glue to put them together Object-based and frame-based Emphasis on software Downloadable algorithms (not encoders or decoders) MINAR

32 Review of basics of image and video compression
Outline Review of basics of image and video compression 1 Scalable video coding 2 Overview of current video compression standards 3 Object-based video coding (MPEG-4) 4 MINAR

33 Comments on Object-based Processing
Basic goal: Separate encoding/decoding of separate objects in a scene Separate processing of each object enables: Identification and selective decoding and/or processing of object of interest Facilitates interactivity and manipulation of content Processing of content in the compressed domain Possible w/o decoding or segmentation at decoder Used for many years in authoring/production Video: bluescreening, e.g. weather-news Audio: individual processing of each voice MPEG-4 also enables end-user to have object-based processing MINAR

34 Different Parts of MPEG-4
Video Coding and expression of natural and synthetic video objects Audio Coding and expression of natural and synthetic speech and audio objects Systems Scene Description: Composition of different audio and video objects in the scene BIFS: Binary Format for Scene Description Buffering, multiplexing, timing Interaction Delivery (Delivery of MM Integration Framework, DMIF) Setup of connection (broadcast, interactive) Network is transparent to application MINAR

35 Scene Description Scene description:
Describes the spatio-temporal positioning of the individual audio & video (AV) objects to compose the scene AV Objects: audio, video, natural, synthetic, 2-D, 3-D Transmitted separately from object bitstreams Scene description info is a property of scene’s structure rather than individual objects  Enables scene modification without decoding objects Can be dynamically altered MINAR

36 Example of MPEG-4 Scene MINAR [MPEG Committee]

37 Scene Description (cont.)
Hierarchical, tree structure: Leaf nodes: individual AV objects Other nodes: meaningful grouping [MPEG Committee] MINAR

38 Example MPEG-4 Decoding Process
[MPEG Committee] MINAR

39 Object-based Processing in the Compressed Domain
Each video or audio object coded into a separate bitstream Scene description contains all non-coded information Possible operations: Add/delete an object: Add/discard bitstream, e.g. individual instruments in an orchestra Manipulate (e.g. move) object: Alter visual/audio scene composition  Many object-based operations can be performed without requiring decoding MINAR

40 MPEG-4 Natural Video MPEG-4 has two primary goals for natural video coding: High compression efficiency coding Rectangular frames High coding efficiency ( kb/s), low latency, low complexity Error resilience against packet loss, burst errors on wireless links Applications include: Video streaming over the Internet, video over 3G cellular systems Object-based coding Content-based functionalities Arbitrarily shaped visual objects Separate encoding & decoding of each object Greatly improved content creation capabilities, as well as interactivity with different objects at the client MINAR

41 MPEG-4 Coding of Natural Video
Classes of video to represent: Rectangular images Shape (rectangle) does not change with time Code motion and amplitude information Use conventional coding methods, e.g. MPEG-1/2 Arbitrarily shaped (non-rectangular) image regions Shape usually changes with time Must code motion, amplitude (texture) and shape Arbitrary & time-varying shape complicates coding Also describe how objects are composed to form scene (scene description) Separate encoding and decode of each object Frame-based coding Object-based coding MINAR

42 MPEG-4 Natural Video Coding
Extension of MPEG-1/2-type algorithms to code arbitrarily shaped objects Frame-based Coding Object-based Coding [MPEG Committee] Basic Idea: Extend Block-DCT and Block-ME/MC-prediction to code arbitrarily shaped objects MINAR

43 Coding of Arbitrarily Shaped Video Objects
Following slides briefly discuss different aspects of coding arbitrarily shaped video objects: Coding of texture (amplitude) information MC-prediction I, P, B coding of objects Coding of shape information Goal: To give brief, conceptual overview (Not covered on problem sets or quiz) Key points to take away: Different attributes to code for arbitrarily shaped video objects  Texture, motion, & shape information MPEG-4 extends block-based coding to code arbitrarily shaped objects (Not an elegant solution, but it works) MINAR

44 Example of Arbitrarily Shaped Object
Arbitrarily shaped 2-D object (image region): Video object plane (VOP) in MPEG-4 [MPEG Committee] MINAR

45 Comments on Segmentation
Segmentation of video into objects is not standardized (part of encoder) Different segmentations scenarios: Sometimes segmentation is available, e.g. synthetically generated content Sometimes it is relatively easy, e.g. bluescreening or video-conferencing Usually it is very difficult MINAR

46 Coding the Texture of an Arbitrarily Shaped Object
Texture (amplitude) coded by Block-DCT adapted for arbitrarily shaped support Embed VOP in rectangle Separate processing of each 8x8 block Interior ® Conventional Block-DCT Exterior ® Discard Boundary ® Extrapolate then Block-DCT [MPEG Committee] MINAR

47 MC-Prediction for Texture Coding of Arbitrarily Shaped Object
Block-based ME/MC-P adapted for arbitrarily shaped support: Extrapolate arbitrarily shaped object to fill rectangle Perform conventional block-based ME/MC-P Error metric computed only over object’s support in current frame Also: Parametric motion models (e.g. affine, perspective) [MPEG Committee] MINAR

48 MC-Prediction for Video Object Planes: I, P, and B VOP’s
MC-Prediction for VOP’s: I-VOP: Intra-coded VOP (no prediction) P-VOP: Predicted VOP B-VOP: Bi-directionally predicted VOP MINAR

49 Binary Shape Coding Opaque objects: Each pixel either inside or outside support Shape given by binary alpha map (bitmap or binary mask) Many possible approaches for lossless and lossy shape coding e.g. Describe shape by chain code, polynomials, splines, bitmap MPEG-4: Block-based Context-based Arithmetic Coding (CAE) Embed support in rectangle Separate processing of 16x16 blocks Interior (opaque) blocks (completely within object) Exterior (transparent) blocks (completely outside object) Boundary blocks  CAE Also motion compensated CAE MINAR

50 Binary Shape Coding: Block-based Shape Coding
Different 16x16 blocks: Interior, boundary, and exterior MINAR

51 Binary Shape Coding: Block-based CAE (cont.)
Coding of boundary blocks using CAE: Intra-shape coding Context defined by 10-pixel template Inter-shape coding MC-shape using shape motion vector Context defined by 9-pixel template from current and previous frames Previous Frame Current Frame MINAR

52 Sprite Coding (Background Prediction)
Sprite: Large background image Hypothesis: Same background exists for many frames, changes resulting from camera motion and occlusions One possible coding strategy: Code & transmit entire sprite once Only transmit camera motion parameters for each subsequent frame Significant coding gain for some scenes MINAR

53 Sprite Coding Example MINAR Foreground Object Sprite (background)
[MPEG Committee] Reconstructed Frame MINAR

54 Related MPEG Standards (non-compression)
MPEG-7 “Multimedia Content Description Interface” Goal: A method for describing multimedia content to enable efficient searching and management of multimedia. MPEG-21 “Multimedia Framework” Goal: To enable the electronic commerce of digital media content. MINAR

55 References and Further Reading
General Video Compression References: J.G. Apostolopoulos and S.J. Wee, ``Video Compression Standards'‘, Wiley Encyclopedia of Electrical and Electronics Engineering, John Wiley & Sons, Inc., New York, 1999. V. Bhaskaran and K. Konstantinides, Image and Video Compression Standards: Algorithms and Architectures, Boston, Massachusetts: Kluwer Academic Publishers, 1997. J.L. Mitchell, W.B. Pennebaker, C.E. Fogg, and D.J. LeGall, MPEG Video Compression Standard, New York: Chapman & Hall, 1997. B.G. Haskell, A. Puri, A.N. Netravali, Digital Video: An Introduction to MPEG-2, Kluwer Academic Publishers, Boston, 1997. MPEG web site: MINAR

56 References and Further Reading (cont.)
Video Compression Standards Documents Video codec for audiovisual services at px64 kbits/s, ITU-T Recommendation H.261, International Telecommunication Union, 1990. Video coding for low bit rate communication, ITU-T Recommendation H.263, International Telecommunication Union, version 1, 1996; version 2, 1997. ISO/IEC 11172, Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbits/s. International Organization for Standardization (ISO), 1993. ISO/IEC Generic coding of moving pictures and associated audio information. International Organization for Standardization (ISO), 1996. ISO/IEC Coding of audio-visual objects. International Organization for Standardization (ISO), 1999. MINAR


Download ppt "Video Compression Professor : Sheau-Ru Tong Student : Chih-Ming Chen"

Similar presentations


Ads by Google