Presentation is loading. Please wait.

Presentation is loading. Please wait.

-1/20- Scalable Video Coding Scalable Extension of H.264 / AVC.

Similar presentations


Presentation on theme: "-1/20- Scalable Video Coding Scalable Extension of H.264 / AVC."— Presentation transcript:

1 -1/20- Scalable Video Coding Scalable Extension of H.264 / AVC

2 Scalable Video Coding  Video streaming over internet is gaining more and more popularity due to video conferencing and video telephony applications.  The heterogeneous, dynamic and best effort structure of the internet, motivates to introduce a scalability feature as adapting video streams to fluctuations in the available bandwidths.  Optimize the video quality for a large range of bit-rates.  A video bit stream is called scalable if part of the stream can be removed in such a way that the resulting bit stream is still decodable.  Scalability here implies:  Single encode  Multiple possibilities to transmit and decode bitstream

3 Scalable Video Coding A video bit stream is called scalable if part of the stream can be removed in such a way that the resulting bit stream is still decodable, to adapt to the various needs of end users and to varying terminal capabilities or network conditions.

4 SVC - Standardization 4

5 SVC Principle : one encoding 5

6 SVC Principle : multiple decoding 6

7 H.264/AVC Simulcast vs. SVC  Typical gains in quality by doing SVC spatial scalability (as opposed to Simulcast) may be in the range  of 0.5dB to 1.5dB PSNR gain  Or equivalently 10 to 30% bit rate reduction  This gap will be more if there are more than one SNR layer per spatial layer H.264 simulcast SD HD SVC HD+SD

8 Functionalities and Applications  SVC has capability of reconstructing lower resolution or lower quality signals from partial bit streams.  Partial decoding of the bit stream allows-  Graceful degradation in case part of bit stream is lost.  Bit-rate adaptation  Format adaptation  Power adaptation  Beneficial for transmission services with uncertainties regarding  Resolution required at the terminal.  Channel conditions or device types.

9 SVC Basics  Straight forward extension to H.264 with very limited added complexity  Layered approach  One base layer  One or more enhancement layers.  Base layer is H.264/AVC compliant.  An SVC stream can be decoded by an H.264 decoder.  Enhancement layers enable Temporal, Spatial or Quality (SNR) scalability.

10 SVC Profiles  SVC Standard defines 3 profiles  Scalable Baseline profile  Targeted for conversational and surveillance applications.  Support for Spatial Scalable coding is restricted to ratios 1.5 and 2, between successive spatial layers.  Interlaced video not supported.  Scalable High profile  Designed for broadcast, storage and streaming applications.  Spatial scalable coding with arbitrary resolution ratios supported.  Interlaced video supported  Scalable High Intra profile  Designed for professional applications.  Contains only IDR pictures for all layers.  All other coding tools are same as Scalable High Profile.

11 Temporal Scalability (Dyadic prediction structure)  Group of Pictures (GOP)  Key Picture: Typically Intra-coded  Hierarchically predicted B Pictures: Motion-Compensated Prediction Frame Rate = 3.75 fps Frame Rate = 7.5 fpsFrame Rate = 15 fpsFrame Rate = 30 fps Prediction GOP border Key Picture T0T0 T0T0 T1T1 T2T2 T2T2 T3T3 T3T3 T3T3 T3T3 T x : Temporal Layer Identifier Structural Delay = 7 frames

12 Hierarchical B-pictures Above is a non-dyadic prediction structure, which provides 2 independently decodable subsequences with 1/9 th and 1/3 rd of full frame rate. Structural delay = 8 frames Figure courtesy “Overview of Scalable Video Coding extension of H.264 / AVC” SCHWARZ et al., IEEE Transactions on circuits and Systems for Video Technology, Sept. 2007 Above is a non-dyadic prediction structure, which provides 0 structural delay, but low coding efficiency, compared to above examples. Any chosen prediction structure need not be constant over time. It can be arbitrarily modified, e.g., to improve coding efficiency.

13  IPP : GOP Size 1  No Temporal scalability  Only Temporal Level 0  IBP : GOP Size 2  Temporal Levels 0, 1  GOP Size 4  Temporal Levels 0, 1, 2  GOP Size 8  Temporal Levels 0, 1, 2, 3 Group Of Pictures (GOP)

14 Coding efficiency of Hierarchical Prediction Structures  Significant improvement in coding efficiency for high delay app.  Depends on how QP is chosen for different temporal layers.  larger GOP size gives larger PSNR improvement  Smaller QP for lower layer Figure courtesy “Overview of Scalable Video Coding extension of H.264 / AVC” SCHWARZ et al., IEEE Transactions on circuits and Systems for Video Technology, Sept. 2007

15 Spatial Scalability Sub-sample and Encode to form Base Layer Decode and Up-sample to original Resolution Subtract Predicted from Original Encode residue to form Enhancement Layer The base layer contains a reduced-resolution version of each coded frame. Decoding the base layer alone produces a low-resolution output sequence and decoding the base layer with enhancement layer(s) produces a higher-resolution output.

16 Spatial Scalability  The prediction signals are formed by  MCP inside the enhancement layer (Temporal) (small motion and high spatial detail)  Up-sampling from the lower layer (Spatial)  Average of the above two predictions (Temporal + Spatial)  Inter-layer prediction  Three kinds of inter-layer prediction  Inter-layer motion prediction  Inter-layer residual prediction  Inter-layer intra prediction ( when the co-located lower layer MB is intra coded )  Base mode MB  Only residuals are transmitted, but no additional side info.

17 Extended Spatial Scalability (ESS)  This is required in many applications where different display sizes from broadcasting, communications and IT environments are commonly mixed, having different aspect ratios (like 4:3 or 16:9 etc).

18 Quality / Fidelity / SNR Scalability  Types  Coarse Grain Scalability (CGS)  Medium Grain Scalability (MGS)  Fine Grain Scalability (FGS)  Not supported by SVC standard because of very poor enhancement layer coding efficiency.  Bit rate adaptation at same spatial/temporal resolution  SVC supports up to 16 SNR layers for each spatial layer

19 Coarse-grain quality scalability (CGS)  A special case of spatial scalability  Identical sizes (resolution) for base and enhancement layers  Smaller quantization step sizes for higher enhancement residual layers  Designed for only several selected bit-rate points  Supported bit-rate points = Number of layers  Switch can only occur at IDR access units

20 Medium-grain quality scalability (MGS)  More enhancement layers are supported  Refinement quality layers of residual  Key pictures  Drift control  Switch can occur at any access units  CGS + key pictures + refinement quality layers  Drift control  Drift: The effect caused by unsynchronized MCP at the encoder and decoder side  Trade-off of MCP in quality SVC  Coding efficiency  drift

21 SVC Encoder Dependency layer The same motion/prediction information Temporal Decomposition

22 SVC: Combined Scalability Spatio-Temporal-Quality Cube

23 Combined Scalability  Dependency and Quality refinement layers D = 2 Q = 2 Q = 1 Q = 0 D = 1 Q = 2 Q = 1 Q = 0 D = 0 Q = 2 Q = 1 Q = 0 Scalable bit- stream

24 Combined Scalability T0T0 D1D1 Q1Q1 Q0Q0 D0D0 Q1Q1 Q0Q0 T2T2 T1T1 T2T2 T0T0

25  Bit-stream format  Bit-stream switching  Inside a dependency layer  Switching everywhere  Outside a dependency layer  Switching up only at IDR access units  Switching down everywhere if using multiple-loop decoding NAL unit header NAL unit header extension NAL unit payload 11111323362 PTDQ P (priority_id): indicates the importance of a NAL unit T (temporal_id): indicates temporal level D (dependency_id): indicates spatial/CGS layer Q (quality_id): indicates MGS/FGS layer

26 Profiles of SVC  Scalable Baseline  For conversational and surveillance applications requiring low decoding complexity  Spatial scalability: fixed ratio (1, 1.5, or 2) and MB-aligned cropping  Temporal and quality scalability: arbitrary  No interlaced coding tools  B-slices, weighted prediction, CABAC, and 8x8 luma transform  The base layer conforms Baseline profile of H.264/AVC  Scalable High  For broadcast, streaming, and storage  Spatial, temporal, and quality scalability: arbitrary  The base layer conforms High profile of H.264/AVC  Scalable High Intra  Scalable High + all IDR pictures

27 Conclusions  Temporal scalability  Hierarchical prediction structure  Spatial and quality scalability  Inter-layer prediction of Intra, motion, and residual information  Single-loop MC decoding  Identical size for each spatial layer – CGS  CGS + key pictures + quality refinement layer – MGS  applications  Power adaption – decoding needed part of the video stream  Graceful degradation – when “right” parts are lost  Format adaption – backwards compatible extension in mobile TV  What’s next in SVC?  Bit-depth scalability (8-bit 4:2:0  10-bit 4:2:0)  Color format scalability (4:2:0  4:4:4) 2007/8 MC2008, VCLAB 27

28 References  H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the Scalable Video Coding Extension of the H.264/AVC Standard,” CSVT 2007.  T. Wiegand, “Scalable Video Coding,” Joint Video Team, doc. JVT-W132, San Jose, USA, April 2007.  T. Wiegand, “Scalable Video Coding,” Digital Image Communication, Course at Technical University of Berlin, 2006. (Available on http://iphome.hhi.de/wiegand/dic.htm)http://iphome.hhi.de/wiegand/dic.htm  H. Schwarz, D. Marpe, and T. Wiegand, “Constrained Inter- Layer Prediction for Single-Loop Decoding in Spatial Scalability,” Proc. of ICIP’05. 2007/8 MC2008, VCLAB 28


Download ppt "-1/20- Scalable Video Coding Scalable Extension of H.264 / AVC."

Similar presentations


Ads by Google