-1/20- Scalable Video Coding Scalable Extension of H.264 / AVC.

-1/20- Scalable Video Coding Scalable Extension of H.264 / AVC

Scalable Video Coding  Video streaming over internet is gaining more and more popularity due to video conferencing and video telephony applications.  The heterogeneous, dynamic and best effort structure of the internet, motivates to introduce a scalability feature as adapting video streams to fluctuations in the available bandwidths.  Optimize the video quality for a large range of bit-rates.  A video bit stream is called scalable if part of the stream can be removed in such a way that the resulting bit stream is still decodable.  Scalability here implies:  Single encode  Multiple possibilities to transmit and decode bitstream

Scalable Video Coding A video bit stream is called scalable if part of the stream can be removed in such a way that the resulting bit stream is still decodable, to adapt to the various needs of end users and to varying terminal capabilities or network conditions.

SVC - Standardization 4

SVC Principle : one encoding 5

SVC Principle : multiple decoding 6

H.264/AVC Simulcast vs. SVC  Typical gains in quality by doing SVC spatial scalability (as opposed to Simulcast) may be in the range  of 0.5dB to 1.5dB PSNR gain  Or equivalently 10 to 30% bit rate reduction  This gap will be more if there are more than one SNR layer per spatial layer H.264 simulcast SD HD SVC HD+SD

Functionalities and Applications  SVC has capability of reconstructing lower resolution or lower quality signals from partial bit streams.  Partial decoding of the bit stream allows-  Graceful degradation in case part of bit stream is lost.  Bit-rate adaptation  Format adaptation  Power adaptation  Beneficial for transmission services with uncertainties regarding  Resolution required at the terminal.  Channel conditions or device types.

SVC Basics  Straight forward extension to H.264 with very limited added complexity  Layered approach  One base layer  One or more enhancement layers.  Base layer is H.264/AVC compliant.  An SVC stream can be decoded by an H.264 decoder.  Enhancement layers enable Temporal, Spatial or Quality (SNR) scalability.

SVC Profiles  SVC Standard defines 3 profiles  Scalable Baseline profile  Targeted for conversational and surveillance applications.  Support for Spatial Scalable coding is restricted to ratios 1.5 and 2, between successive spatial layers.  Interlaced video not supported.  Scalable High profile  Designed for broadcast, storage and streaming applications.  Spatial scalable coding with arbitrary resolution ratios supported.  Interlaced video supported  Scalable High Intra profile  Designed for professional applications.  Contains only IDR pictures for all layers.  All other coding tools are same as Scalable High Profile.

Temporal Scalability (Dyadic prediction structure)  Group of Pictures (GOP)  Key Picture: Typically Intra-coded  Hierarchically predicted B Pictures: Motion-Compensated Prediction Frame Rate = 3.75 fps Frame Rate = 7.5 fpsFrame Rate = 15 fpsFrame Rate = 30 fps Prediction GOP border Key Picture T0T0 T0T0 T1T1 T2T2 T2T2 T3T3 T3T3 T3T3 T3T3 T x : Temporal Layer Identifier Structural Delay = 7 frames

Hierarchical B-pictures Above is a non-dyadic prediction structure, which provides 2 independently decodable subsequences with 1/9 th and 1/3 rd of full frame rate. Structural delay = 8 frames Figure courtesy “Overview of Scalable Video Coding extension of H.264 / AVC” SCHWARZ et al., IEEE Transactions on circuits and Systems for Video Technology, Sept. 2007 Above is a non-dyadic prediction structure, which provides 0 structural delay, but low coding efficiency, compared to above examples. Any chosen prediction structure need not be constant over time. It can be arbitrarily modified, e.g., to improve coding efficiency.

 IPP : GOP Size 1  No Temporal scalability  Only Temporal Level 0  IBP : GOP Size 2  Temporal Levels 0, 1  GOP Size 4  Temporal Levels 0, 1, 2  GOP Size 8  Temporal Levels 0, 1, 2, 3 Group Of Pictures (GOP)

Coding efficiency of Hierarchical Prediction Structures  Significant improvement in coding efficiency for high delay app.  Depends on how QP is chosen for different temporal layers.  larger GOP size gives larger PSNR improvement  Smaller QP for lower layer Figure courtesy “Overview of Scalable Video Coding extension of H.264 / AVC” SCHWARZ et al., IEEE Transactions on circuits and Systems for Video Technology, Sept. 2007

Spatial Scalability Sub-sample and Encode to form Base Layer Decode and Up-sample to original Resolution Subtract Predicted from Original Encode residue to form Enhancement Layer The base layer contains a reduced-resolution version of each coded frame. Decoding the base layer alone produces a low-resolution output sequence and decoding the base layer with enhancement layer(s) produces a higher-resolution output.

Spatial Scalability  The prediction signals are formed by  MCP inside the enhancement layer (Temporal) (small motion and high spatial detail)  Up-sampling from the lower layer (Spatial)  Average of the above two predictions (Temporal + Spatial)  Inter-layer prediction  Three kinds of inter-layer prediction  Inter-layer motion prediction  Inter-layer residual prediction  Inter-layer intra prediction ( when the co-located lower layer MB is intra coded )  Base mode MB  Only residuals are transmitted, but no additional side info.

Extended Spatial Scalability (ESS)  This is required in many applications where different display sizes from broadcasting, communications and IT environments are commonly mixed, having different aspect ratios (like 4:3 or 16:9 etc).

Quality / Fidelity / SNR Scalability  Types  Coarse Grain Scalability (CGS)  Medium Grain Scalability (MGS)  Fine Grain Scalability (FGS)  Not supported by SVC standard because of very poor enhancement layer coding efficiency.  Bit rate adaptation at same spatial/temporal resolution  SVC supports up to 16 SNR layers for each spatial layer

Coarse-grain quality scalability (CGS)  A special case of spatial scalability  Identical sizes (resolution) for base and enhancement layers  Smaller quantization step sizes for higher enhancement residual layers  Designed for only several selected bit-rate points  Supported bit-rate points = Number of layers  Switch can only occur at IDR access units

Medium-grain quality scalability (MGS)  More enhancement layers are supported  Refinement quality layers of residual  Key pictures  Drift control  Switch can occur at any access units  CGS + key pictures + refinement quality layers  Drift control  Drift: The effect caused by unsynchronized MCP at the encoder and decoder side  Trade-off of MCP in quality SVC  Coding efficiency  drift

SVC Encoder Dependency layer The same motion/prediction information Temporal Decomposition

SVC: Combined Scalability Spatio-Temporal-Quality Cube

Combined Scalability  Dependency and Quality refinement layers D = 2 Q = 2 Q = 1 Q = 0 D = 1 Q = 2 Q = 1 Q = 0 D = 0 Q = 2 Q = 1 Q = 0 Scalable bitstream

Combined Scalability T0T0 D1D1 Q1Q1 Q0Q0 D0D0 Q1Q1 Q0Q0 T2T2 T1T1 T2T2 T0T0

 Bit-stream format  Bit-stream switching  Inside a dependency layer  Switching everywhere  Outside a dependency layer  Switching up only at IDR access units  Switching down everywhere if using multiple-loop decoding NAL unit header NAL unit header extension NAL unit payload 11111323362 PTDQ P (priority_id): indicates the importance of a NAL unit T (temporal_id): indicates temporal level D (dependency_id): indicates spatial/CGS layer Q (quality_id): indicates MGS/FGS layer

Profiles of SVC  Scalable Baseline  For conversational and surveillance applications requiring low decoding complexity  Spatial scalability: fixed ratio (1, 1.5, or 2) and MB-aligned cropping  Temporal and quality scalability: arbitrary  No interlaced coding tools  B-slices, weighted prediction, CABAC, and 8x8 luma transform  The base layer conforms Baseline profile of H.264/AVC  Scalable High  For broadcast, streaming, and storage  Spatial, temporal, and quality scalability: arbitrary  The base layer conforms High profile of H.264/AVC  Scalable High Intra  Scalable High + all IDR pictures

Conclusions  Temporal scalability  Hierarchical prediction structure  Spatial and quality scalability  Inter-layer prediction of Intra, motion, and residual information  Single-loop MC decoding  Identical size for each spatial layer – CGS  CGS + key pictures + quality refinement layer – MGS  applications  Power adaption – decoding needed part of the video stream  Graceful degradation – when “right” parts are lost  Format adaption – backwards compatible extension in mobile TV  What’s next in SVC?  Bit-depth scalability (8-bit 4:2:0  10-bit 4:2:0)  Color format scalability (4:2:0  4:4:4) 2007/8 MC2008, VCLAB 27

References  H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the Scalable Video Coding Extension of the H.264/AVC Standard,” CSVT 2007.  T. Wiegand, “Scalable Video Coding,” Joint Video Team, doc. JVT-W132, San Jose, USA, April 2007.  T. Wiegand, “Scalable Video Coding,” Digital Image Communication, Course at Technical University of Berlin, 2006. (Available on http://iphome.hhi.de/wiegand/dic.htm)http://iphome.hhi.de/wiegand/dic.htm  H. Schwarz, D. Marpe, and T. Wiegand, “Constrained Inter- Layer Prediction for Single-Loop Decoding in Spatial Scalability,” Proc. of ICIP’05. 2007/8 MC2008, VCLAB 28

-1/20- Scalable Video Coding Scalable Extension of H.264 / AVC.

Similar presentations

Presentation on theme: "-1/20- Scalable Video Coding Scalable Extension of H.264 / AVC."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

-1/20- Scalable Video Coding Scalable Extension of H.264 / AVC.

Similar presentations

Presentation on theme: "-1/20- Scalable Video Coding Scalable Extension of H.264 / AVC."— Presentation transcript:

Similar presentations

About project

Feedback