Presentation on theme: "Video Concepts and Techniques"— Presentation transcript:
1 Video Concepts and Techniques Wen-Shyang HwangKUAS EE.
2 Outline Fundamental Concepts Basic Video Compression Techniques MPEG Video Coding I – MPEG-1 and 2MPEG Video Coding II – MPEG-4, 7, and Beyond
3 Types of Video Signals3 types: Component Video, Composite Video, S-VideoComponent Video – 3 signaluse 3 separate video signals for red, green, and blue image planes.most computer systems use it.get best color reproduction since no crosstalk between channels.however, requires more bandwidth and good synchronization.Composite Video - 1 signalchrominance and luminance signals are mixed into a single carrier.chrominance is composition of (I and Q, or U and V)a color subcarrier put chrominance at high-frequency end of the signal shared with luminance signal.some interference between luminance and chrominance signals.S-Video - 2 Signalsuses two wires for luminance and composite chrominance signals.less crosstalk between them.Composite合成的;inevitable 不可避免的
4 Analog Video Interlaced scanning odd-numbered lines traced first, then even-numbered lines tracedhorizontal retrace: the jump from Q to R, during which the electronic beam in CRT is blanked.vertical retrace: the jump from T to U or V to P.NTSC (National Television System Committee) TV standardused in North America and Japan.4:3 aspect ratio (ratio of picture width to height)525 scan lines per frame at 30 frames per second (fps).aspect外觀;
5 Digital Video Advantages: stored in memory, ready to be processed (noise removal, cut and paste), and integrated to various multimedia applicationsrepeated recording does not degrade image qualityease of encryption and better tolerance to channel noiseChroma Subsamplinghuman see color with much less spatial resolution than black/whitehow many pixel values should be actually sent?scheme (4:4:4): no chroma subsampling is used: each pixel's Y, Cb and Cr values are sent.scheme (4:2:2): horizontal subsampling of Cb, Cr signals by a factor of 2. all Ys are sent, and every two Cb's and Cr's are sent.scheme (4:1:1): subsamples horizontally by a factor of 4scheme (4:2:0): subsamples in both the horizontal and vertical dimensions by a factor of 2. (used in JPEG and MPEG)
6 Video CompressionA video consists of a time-ordered sequence of frames, i.e.,images.Video Compression(Static) predictive coding based on previous frames.temporal redundancy: consecutive frames in a video are similarsubtract images in time order, and code the residual error.The approach of deriving the difference image (subtract image from the other) is ineffective because of object motion.Steps of Video compression based on Motion Compensation (MC)Motion Estimation (motion vector search).MC-based Prediction.Derivation of the prediction error, i.e., the difference.Temporal 時間的;
7 Motion CompensationFor efficiency, each image is divided into macroblocks of size N X N.The current image frame is referred to as Target Frame.A match is sought between the macroblock in the Target Frame and the most similar macroblock in previous and/or future frame(s) (referred to as Reference frame(s)).motion vector MV: the displacement of the reference macroblock to the target macroblock.Prediction error: the difference of two corresponding macroblocks.Temporal 時間的;sought (seek過去分詞); compensation補償
9 H.261An earlier digital video compression standard, its principle of MC-based compression is retained in all later video compression standards.Designed for videophone, video conferencing and other audiovisual services over ISDN.The video codec supports bit-rates of p 64 kbps, where p ranges from 1 to 30.The delay of the video encoder must be less than 150 msec so that the video can be used for real-time bidirectional video conferencing.H.261 Frame Sequence:Retain 保留; audiovisual視聽的
10 H.261 Frame SequenceTwo types of image frames are defined: Intra-frames (I-frames) and Inter-frames (P-frames):I-frames are treated as independent images. Transform coding method similar to JPEG is applied within each I-frame, hence “Intra”.P-frames are not independent: coded by a forward predictive coding method (prediction from a previous P-frame is allowed – not just from a previous I-frame).Temporal redundancy removal is included in P-frame coding, whereas I-frame coding performs only spatial redundancy removal.Interval between pairs of I-frames is a variable. Usually, an ordinary digital video has a couple I-frames per second.
11 Intra-frame (I-frame) Coding Macroblocks are of size 16X16 pixels for the Y frame, and 8X8 for Cb and Cr frames, since 4:2:0 chroma subsampling is employed. A macroblock consists of four Y, one Cb, and one Cr 8X8 blocks.For each 8X8 block a DCT transform is applied, the DCT coefficients then go through quantization zigzag scan and entropy coding.
12 Inter-frame (P-frame) Predictive Coding H.261 P-frame coding scheme based on motion compensation:For each macroblock in Target frame, a motion vector is allocated by search method. After the prediction, a difference macroblock is derived to measure the prediction error.Each of these 8X8 blocks go through DCT, quantization, zigzag scan and entropy coding procedures.Sometimes, a good match cannot be found, then encode the macroblock as an intra macroblock.The quantization in H.261 uses a constant step size, for all DCT coefficients within a macroblock.
14 A Glance at Syntax of H.261 Video Bitstream A hierarchy of four layers: Picture, Group of Blocks (GOB), Macroblock, and Block.
15 Syntax of H.261Picture layer: PSC (Picture Start Code) delineates boundaries between pictures. TR (Temporal Reference) provides a time-stamp for the picture.GOB layer: H.261 pictures are divided into regions of 11X3 macroblocks, each of which is called a Group of Blocks (GOB).In case a network error causes a bit error or the loss of some bits, H.261 video can be recovered and resynchronized at the next identifiable GOB.Macroblock layer: Each Macroblock (MB) has its own Address indicating its position within the GOB, Quantizer (MQuant), and six 8X8 image blocks (4 Y, 1Cb, 1 Cr).Block layer: For each 8X8 block, the bitstream starts with DC value, followed by pairs of length of zerorun (Run) and the subsequent non-zero value (Level) for ACs, and finally the End of Block (EOB) code.delineate描述,畫...的輪廓.
16 H.263An improved video coding standard for video conferencing and other audiovisual services transmitted on Public Switched Telephone Networks (PSTN).aims at low bit-rate communications at bit-rates of less than 64 kbps.uses predictive coding for inter-frames to reduce temporal redundancy and transform coding for the remaining signal to reduce spatial redundancy (for both Intra-frames and inter-frame prediction).The difference is that GOBs in H.263 do not have a fixed size, and they always start and end at the left and right borders of the picture.
17 Optional H.263 Coding Modes H.263 specifies many negotiable coding options.Unrestricted motion vector modeSyntax-based arithmetic coding modeAdvanced prediction modePB-frames modeIntroduction of a B-frame (predicted bidirectionally)Improve the quality of prediction.The PB-frames mode yields satisfactory results for videos with moderate motions.Under large motions, PB-frames do not compress as well as B-frames.annexes
18 MPEGMPEG (Moving Pictures Experts Group), established in 1988 for the development of digital video.MPEG-1 adopts CCIR601 digital TV format: SIF (Source Input Format). supports only non-interlaced video.Normally, MPEG-1picture resolution is:352X240 for NTSC video at 30 fps, or352X288 for PAL video at 25 fpsIt uses 4:2:0 chroma subsampling.MPEG-1 standard has 5 parts:ISO/IEC systemVideoAudioConformanceSoftwareproprietary專利的, interests利益
19 Motion Compensation in MPEG-1 Motion Compensation (MC) based video encoding in H.261 works as :In Motion Estimation (ME), each macro-block (MB) of the Target P-frame is assigned a best matching MB from the previously coded I or P frame - prediction.prediction error: The difference between the MB and its matching MB, sent to DCT and its subsequent encoding steps.The prediction is from a previous frame - forward prediction.The MB containing part of a ball in the Target frame cannot find a good matching MB in the previous frame because half of the ball was occluded by another object. A match however can readily be obtained from the next frame.occluded封閉;堵塞;
20 Motion Compensation in MPEG-1 (Cont'd) MPEG introduces a third frame type: B-frames, and its accompanying bi-directional motion compensation.Each MB from a B-frame will have up to two motion vectors (MVs) (one from the forward and one from the backward prediction).If matching in both directions is successful, then two MVs will be sent and the two corresponding matching MBs are averaged (indicated by `%' in the figure) before comparing to the Target MB for generating the prediction error.If an acceptable match can be found in only one of the reference frames, then only one MV and its corresponding MB will be used from either the forward or backward prediction.
21 MPEG-1 B-frame Coding Based on Bidirectional Motion Compensation. MPEG Frame Sequence
22 Other Major Differences from H.261 Instead of GOBs as in H.261, an MPEG-1 picture can be divided into one or more slices.May contain variable numbers of macro-blocks in a single picture.May start and end anywhere as long as they fill the whole picture.Each slice is coded independently (flexibility in bit-rate control).Slice concept is important for error recovery.
23 Typical Sizes of MPEG-1 Frames Size of compressed P-frames is significantly smaller than of I-frames.B-frames are smaller than P-frames. (B-frames: lowest priority).
24 MPEG-2MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps.Defined 7 profiles aimed at different applications:Simple, Main, SNR scalable, Spatially scalable, High, 4:2:2, Multiview.Within each profile, up to 4 levels are defined.The DVD video specification allows only 4 display resolutions: 720X480, 704X480, 352X480, and 352X240 (a restricted form of the MPEG-2 Main profile at the Main and Low levels).Four Levels in the Main Profile of MPEG-2Profile側面,輪廓
25 Supporting Interlaced Video MPEG-2 supports interlaced video for digital broadcast TV and HDTV.In interlaced video, each frame (picture) consists of two fields. If each field is treated as a separate picture, then is called Field-picture.5 Modes of Predictions: (wide range of applications requirement for accuracy and speed of motion compensation vary)Frame Prediction for Frame-picturesField Prediction for Field-picturesField Prediction for Frame-pictures16X8 MC for Field-picturesDual-Prime for P-pictures
26 MPEG-2 Scalabilitieslayered coding: a base layer and one or more enhancement layers.MPEG-2 supports the following scalabilities:SNR Scalability- enhancement layer provides higher SNR.Spatial Scalability- enhancement layer provides higher spatial resolution.Temporal Scalability- enhancement layer facilitates higher frame rate.Hybrid Scalability- combination of any two of the above three scalabilities.Data Partitioning- quantized DCT coefficients are split into partitions.
27 MPEG-4MPEG-4 adopts a new object-based coding approach. (not frame-based compression coding)object-based coding has higher compression ratio and good for digital video composition, manipulation, indexing, and retrieval.Its 6 parts are system, video, audio, conformance, software, and DMIF (Delivery Multimedia Integration Framework).bit-rate range for MPEG-4 video now between 5 kbps to 10 Mbps.
28 Comparison of interactivities in MPEG standards: MPEG-4 standard for:Composing media objects to create desirable audiovisual scenes.Multiplexing and synchronizing the bitstreams so that they can be transmitted with guaranteed Quality of Service (QoS).Interacting with audiovisual scene at receiving end (provides a toolbox of advanced coding modules and algorithms for audio and video compressions).Reference models in MPEG-1 and 2 (interaction in dashed lines supported only by MPEG-2)MPEG-4 reference model
29 Hierarchical structure of MPEG-4 visual bitstreams Video-object Sequence (VS) - delivers the complete MPEG-4 visual scene, which may contain 2-D or 3-D natural or synthetic objects.Video Object (VO) - a particular object in the scene, which can be of arbitrary (non-rectangular) shape corresponding to an object or background of the scene.Video Object Layer (VOL) - facilitates a way to support (multi-layered) scalable coding. A VO can have multiple VOLs under scalable coding, or have a single VOL under non-scalable coding.Group of Video Object Planes (GOV) - groups Video Object Planes together (optional level).Video Object Plane (VOP) - a snapshot of a VO at a particular moment.Each VS will have one or more VOs, each VO will have one or more VOLs, and so on.
30 VOP-based CodingMPEG-1 and -2 do not support the VOP concept, and hence their coding method is referred to as frame-based (block-based) coding.MPEG-4 VOP-based coding employs Motion Compensation technique:Intra-frame coded VOP is called I-VOP.Inter-frame coded VOPs are called P-VOPs (forward prediction) or B-VOPs (bi-directional Predictions).(a) A video sequence; (b) MPEG-1 and 2 block-based coding.(c) Two potential matches in MPEG-1 and 2 (d) object-based coding in MPEG-4
31 ISO MPEG-4 Part10/ ITU-T H.264Offers up to 50% better compression than MPEG-2, and up to 30% over H.263+ and MPEG-4 advanced simple profile.The leading candidates to carry High Definition TV (HDTV) video content on many potential applications.Core features:Entropy decoding, Motion compensation (P-prediction), Intra-prediction (I-prediction), Transform, scan, quantization, and In-loop deblocking filters.Baseline profile featuresArbitrary slice order (ASO), Flexible macroblock order (FMO), redundant slicesMain profile featuresB slices, Context adaptive binary arithmetic coding (CABAC), weighted predictionExtended profile featuresB slices, weighted prediction, Slice data partitioning, SP and SI slice types.
32 MPEG-7To serve the need of audiovisual content-based retrieval (or audiovisual object retrieval) in applications such as digital libraries.The formal name Multimedia Content Description Interface.
33 MPEG-7 and Multimedia Content Description MPEG-7 has developed Descriptors (D), Description Schemes (DS) and Description Definition Language (DDL). The following are some of the important terms:Feature - characteristic of the data.Description - a set of instantiated Ds and DSs that describes the structural and conceptual information of the content, the storage and usage of the content, etc.D - definition (syntax and semantics) of the feature.DS - specification of the structure and relationship between Ds and between DSs.DDL - syntactic rules to express and combine DSs and Ds.The scope of MPEG-7 is to standardize the Ds, DSs and DDL for descriptions. The mechanism and process of producing and consuming the descriptions are beyond the scope of MPEG-7.instantiated舉例說明