Video Concepts and Techniques

Name: Video Concepts and Techniques
Uploaded: 2017-08-28T14:00:06+00:00
Duration: PTM19S8
Channel: Clarence Fowler
Description: Video Concepts and Techniques

Video Concepts and Techniques
Wen-Shyang Hwang KUAS EE.

Outline Fundamental Concepts Basic Video Compression Techniques
MPEG Video Coding I – MPEG-1 and 2 MPEG Video Coding II – MPEG-4, 7, and Beyond

Types of Video Signals 3 types: Component Video, Composite Video, S-Video Component Video – 3 signal use 3 separate video signals for red, green, and blue image planes. most computer systems use it. get best color reproduction since no crosstalk between channels. however, requires more bandwidth and good synchronization. Composite Video - 1 signal chrominance and luminance signals are mixed into a single carrier. chrominance is composition of (I and Q, or U and V) a color subcarrier put chrominance at high-frequency end of the signal shared with luminance signal. some interference between luminance and chrominance signals. S-Video - 2 Signals uses two wires for luminance and composite chrominance signals. less crosstalk between them. Composite合成的;inevitable 不可避免的

Analog Video Interlaced scanning
odd-numbered lines traced first, then even-numbered lines traced horizontal retrace: the jump from Q to R, during which the electronic beam in CRT is blanked. vertical retrace: the jump from T to U or V to P. NTSC (National Television System Committee) TV standard used in North America and Japan. 4:3 aspect ratio (ratio of picture width to height) 525 scan lines per frame at 30 frames per second (fps). aspect外觀;

Digital Video Advantages:
stored in memory, ready to be processed (noise removal, cut and paste), and integrated to various multimedia applications repeated recording does not degrade image quality ease of encryption and better tolerance to channel noise Chroma Subsampling human see color with much less spatial resolution than black/white how many pixel values should be actually sent? scheme (4:4:4): no chroma subsampling is used: each pixel's Y, Cb and Cr values are sent. scheme (4:2:2): horizontal subsampling of Cb, Cr signals by a factor of 2. all Ys are sent, and every two Cb's and Cr's are sent. scheme (4:1:1): subsamples horizontally by a factor of 4 scheme (4:2:0): subsamples in both the horizontal and vertical dimensions by a factor of 2. (used in JPEG and MPEG)

Video Compression A video consists of a time-ordered sequence of frames, i.e.,images. Video Compression (Static) predictive coding based on previous frames. temporal redundancy: consecutive frames in a video are similar subtract images in time order, and code the residual error. The approach of deriving the difference image (subtract image from the other) is ineffective because of object motion. Steps of Video compression based on Motion Compensation (MC) Motion Estimation (motion vector search). MC-based Prediction. Derivation of the prediction error, i.e., the difference. Temporal 時間的;

Motion Compensation For efficiency, each image is divided into macroblocks of size N X N. The current image frame is referred to as Target Frame. A match is sought between the macroblock in the Target Frame and the most similar macroblock in previous and/or future frame(s) (referred to as Reference frame(s)). motion vector MV: the displacement of the reference macroblock to the target macroblock. Prediction error: the difference of two corresponding macroblocks. Temporal 時間的;sought (seek過去分詞); compensation補償

Video Coding Evolution

H.261 An earlier digital video compression standard, its principle of MC-based compression is retained in all later video compression standards. Designed for videophone, video conferencing and other audiovisual services over ISDN. The video codec supports bit-rates of p 64 kbps, where p ranges from 1 to 30. The delay of the video encoder must be less than 150 msec so that the video can be used for real-time bidirectional video conferencing. H.261 Frame Sequence: Retain 保留; audiovisual視聽的

H.261 Frame Sequence Two types of image frames are defined: Intra-frames (I-frames) and Inter-frames (P-frames): I-frames are treated as independent images. Transform coding method similar to JPEG is applied within each I-frame, hence “Intra”. P-frames are not independent: coded by a forward predictive coding method (prediction from a previous P-frame is allowed – not just from a previous I-frame). Temporal redundancy removal is included in P-frame coding, whereas I-frame coding performs only spatial redundancy removal. Interval between pairs of I-frames is a variable. Usually, an ordinary digital video has a couple I-frames per second.

Intra-frame (I-frame) Coding
Macroblocks are of size 16X16 pixels for the Y frame, and 8X8 for Cb and Cr frames, since 4:2:0 chroma subsampling is employed. A macroblock consists of four Y, one Cb, and one Cr 8X8 blocks. For each 8X8 block a DCT transform is applied, the DCT coefficients then go through quantization zigzag scan and entropy coding.

Inter-frame (P-frame) Predictive Coding
H.261 P-frame coding scheme based on motion compensation: For each macroblock in Target frame, a motion vector is allocated by search method. After the prediction, a difference macroblock is derived to measure the prediction error. Each of these 8X8 blocks go through DCT, quantization, zigzag scan and entropy coding procedures. Sometimes, a good match cannot be found, then encode the macroblock as an intra macroblock. The quantization in H.261 uses a constant step size, for all DCT coefficients within a macroblock.

H.261 encoder and decoder

A Glance at Syntax of H.261 Video Bitstream
A hierarchy of four layers: Picture, Group of Blocks (GOB), Macroblock, and Block.

Syntax of H.261 Picture layer: PSC (Picture Start Code) delineates boundaries between pictures. TR (Temporal Reference) provides a time-stamp for the picture. GOB layer: H.261 pictures are divided into regions of 11X3 macroblocks, each of which is called a Group of Blocks (GOB). In case a network error causes a bit error or the loss of some bits, H.261 video can be recovered and resynchronized at the next identifiable GOB. Macroblock layer: Each Macroblock (MB) has its own Address indicating its position within the GOB, Quantizer (MQuant), and six 8X8 image blocks (4 Y, 1Cb, 1 Cr). Block layer: For each 8X8 block, the bitstream starts with DC value, followed by pairs of length of zerorun (Run) and the subsequent non-zero value (Level) for ACs, and finally the End of Block (EOB) code. delineate描述,畫...的輪廓.

H.263 An improved video coding standard for video conferencing and other audiovisual services transmitted on Public Switched Telephone Networks (PSTN). aims at low bit-rate communications at bit-rates of less than 64 kbps. uses predictive coding for inter-frames to reduce temporal redundancy and transform coding for the remaining signal to reduce spatial redundancy (for both Intra-frames and inter-frame prediction). The difference is that GOBs in H.263 do not have a fixed size, and they always start and end at the left and right borders of the picture.

Optional H.263 Coding Modes
H.263 specifies many negotiable coding options. Unrestricted motion vector mode Syntax-based arithmetic coding mode Advanced prediction mode PB-frames mode Introduction of a B-frame (predicted bidirectionally) Improve the quality of prediction. The PB-frames mode yields satisfactory results for videos with moderate motions. Under large motions, PB-frames do not compress as well as B-frames. annexes

MPEG MPEG (Moving Pictures Experts Group), established in 1988 for the development of digital video. MPEG-1 adopts CCIR601 digital TV format: SIF (Source Input Format). supports only non-interlaced video. Normally, MPEG-1picture resolution is: 352X240 for NTSC video at 30 fps, or 352X288 for PAL video at 25 fps It uses 4:2:0 chroma subsampling. MPEG-1 standard has 5 parts: ISO/IEC system Video Audio Conformance Software proprietary專利的, interests利益

Motion Compensation in MPEG-1
Motion Compensation (MC) based video encoding in H.261 works as : In Motion Estimation (ME), each macro-block (MB) of the Target P-frame is assigned a best matching MB from the previously coded I or P frame - prediction. prediction error: The difference between the MB and its matching MB, sent to DCT and its subsequent encoding steps. The prediction is from a previous frame - forward prediction. The MB containing part of a ball in the Target frame cannot find a good matching MB in the previous frame because half of the ball was occluded by another object. A match however can readily be obtained from the next frame. occluded封閉;堵塞;

Motion Compensation in MPEG-1 (Cont'd)
MPEG introduces a third frame type: B-frames, and its accompanying bi-directional motion compensation. Each MB from a B-frame will have up to two motion vectors (MVs) (one from the forward and one from the backward prediction). If matching in both directions is successful, then two MVs will be sent and the two corresponding matching MBs are averaged (indicated by `%' in the figure) before comparing to the Target MB for generating the prediction error. If an acceptable match can be found in only one of the reference frames, then only one MV and its corresponding MB will be used from either the forward or backward prediction.

MPEG-1 B-frame Coding Based on Bidirectional Motion Compensation.
MPEG Frame Sequence

Other Major Differences from H.261
Instead of GOBs as in H.261, an MPEG-1 picture can be divided into one or more slices. May contain variable numbers of macro-blocks in a single picture. May start and end anywhere as long as they fill the whole picture. Each slice is coded independently (flexibility in bit-rate control). Slice concept is important for error recovery.

Typical Sizes of MPEG-1 Frames
Size of compressed P-frames is significantly smaller than of I-frames. B-frames are smaller than P-frames. (B-frames: lowest priority).

MPEG-2 MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps. Defined 7 profiles aimed at different applications: Simple, Main, SNR scalable, Spatially scalable, High, 4:2:2, Multiview. Within each profile, up to 4 levels are defined. The DVD video specification allows only 4 display resolutions: 720X480, 704X480, 352X480, and 352X240 (a restricted form of the MPEG-2 Main profile at the Main and Low levels). Four Levels in the Main Profile of MPEG-2 Profile側面,輪廓

Supporting Interlaced Video
MPEG-2 supports interlaced video for digital broadcast TV and HDTV. In interlaced video, each frame (picture) consists of two fields. If each field is treated as a separate picture, then is called Field-picture. 5 Modes of Predictions: (wide range of applications requirement for accuracy and speed of motion compensation vary) Frame Prediction for Frame-pictures Field Prediction for Field-pictures Field Prediction for Frame-pictures 16X8 MC for Field-pictures Dual-Prime for P-pictures

MPEG-2 Scalabilities layered coding: a base layer and one or more enhancement layers. MPEG-2 supports the following scalabilities: SNR Scalability- enhancement layer provides higher SNR. Spatial Scalability- enhancement layer provides higher spatial resolution. Temporal Scalability- enhancement layer facilitates higher frame rate. Hybrid Scalability- combination of any two of the above three scalabilities. Data Partitioning- quantized DCT coefficients are split into partitions.

MPEG-4 MPEG-4 adopts a new object-based coding approach. (not frame-based compression coding) object-based coding has higher compression ratio and good for digital video composition, manipulation, indexing, and retrieval. Its 6 parts are system, video, audio, conformance, software, and DMIF (Delivery Multimedia Integration Framework). bit-rate range for MPEG-4 video now between 5 kbps to 10 Mbps.

Comparison of interactivities in MPEG standards:
MPEG-4 standard for: Composing media objects to create desirable audiovisual scenes. Multiplexing and synchronizing the bitstreams so that they can be transmitted with guaranteed Quality of Service (QoS). Interacting with audiovisual scene at receiving end (provides a toolbox of advanced coding modules and algorithms for audio and video compressions). Reference models in MPEG-1 and 2 (interaction in dashed lines supported only by MPEG-2) MPEG-4 reference model

Hierarchical structure of MPEG-4 visual bitstreams
Video-object Sequence (VS) - delivers the complete MPEG-4 visual scene, which may contain 2-D or 3-D natural or synthetic objects. Video Object (VO) - a particular object in the scene, which can be of arbitrary (non-rectangular) shape corresponding to an object or background of the scene. Video Object Layer (VOL) - facilitates a way to support (multi-layered) scalable coding. A VO can have multiple VOLs under scalable coding, or have a single VOL under non-scalable coding. Group of Video Object Planes (GOV) - groups Video Object Planes together (optional level). Video Object Plane (VOP) - a snapshot of a VO at a particular moment. Each VS will have one or more VOs, each VO will have one or more VOLs, and so on.

VOP-based Coding MPEG-1 and -2 do not support the VOP concept, and hence their coding method is referred to as frame-based (block-based) coding. MPEG-4 VOP-based coding employs Motion Compensation technique: Intra-frame coded VOP is called I-VOP. Inter-frame coded VOPs are called P-VOPs (forward prediction) or B-VOPs (bi-directional Predictions). (a) A video sequence; (b) MPEG-1 and 2 block-based coding. (c) Two potential matches in MPEG-1 and 2 (d) object-based coding in MPEG-4

ISO MPEG-4 Part10/ ITU-T H.264 Offers up to 50% better compression than MPEG-2, and up to 30% over H.263+ and MPEG-4 advanced simple profile. The leading candidates to carry High Definition TV (HDTV) video content on many potential applications. Core features: Entropy decoding, Motion compensation (P-prediction), Intra-prediction (I-prediction), Transform, scan, quantization, and In-loop deblocking filters. Baseline profile features Arbitrary slice order (ASO), Flexible macroblock order (FMO), redundant slices Main profile features B slices, Context adaptive binary arithmetic coding (CABAC), weighted prediction Extended profile features B slices, weighted prediction, Slice data partitioning, SP and SI slice types.

MPEG-7 To serve the need of audiovisual content-based retrieval (or audiovisual object retrieval) in applications such as digital libraries. The formal name Multimedia Content Description Interface.

MPEG-7 and Multimedia Content Description
MPEG-7 has developed Descriptors (D), Description Schemes (DS) and Description Definition Language (DDL). The following are some of the important terms: Feature - characteristic of the data. Description - a set of instantiated Ds and DSs that describes the structural and conceptual information of the content, the storage and usage of the content, etc. D - definition (syntax and semantics) of the feature. DS - specification of the structure and relationship between Ds and between DSs. DDL - syntactic rules to express and combine DSs and Ds. The scope of MPEG-7 is to standardize the Ds, DSs and DDL for descriptions. The mechanism and process of producing and consuming the descriptions are beyond the scope of MPEG-7. instantiated舉例說明

Video Concepts and Techniques

Similar presentations

Presentation on theme: "Video Concepts and Techniques"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Video Concepts and Techniques

Similar presentations

Presentation on theme: "Video Concepts and Techniques"— Presentation transcript:

Similar presentations

About project

Feedback