Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spatial Sparsity Induced Temporal Prediction for Hybrid Video Compression Gang Hua and Onur G. Guleryuz Rice University, Houston, TX DoCoMo.

Similar presentations


Presentation on theme: "Spatial Sparsity Induced Temporal Prediction for Hybrid Video Compression Gang Hua and Onur G. Guleryuz Rice University, Houston, TX DoCoMo."— Presentation transcript:

1 Spatial Sparsity Induced Temporal Prediction for Hybrid Video Compression Gang Hua and Onur G. Guleryuz Rice University, Houston, TX ganghua@rice.edu DoCoMo USA Labs, Palo Alto, CA guleryuz@docomolabs-usa.com (Please view in full screen presentation mode to see the animations)

2 2 Outline   Problem Statement – –Quick intro to hybrid video compression. – –Example difficult video. – –Problems in temporal prediction. – –Quick results showing what the proposed work can do.   Our Solution: Spatial Sparsity Induced Temporal Prediction for Hybrid Video Compression – –Model. – –What we do, how we do it, and why it works.   Simulation results showing prediction examples & discussion.   Compression results.   Conclusion & future work. I will show results on video but I will also use classical images peppers/barbara to make intuitive points

3 3 Quick Digression: The Set of “Natural” Images The set of natural/interesting images Non-convex, star-shaped set Far from both barbara and peppers but still very useful in compressing barbara or peppers.

4 4 Setup: Hybrid Video Compression Predict current frame reference video frame prediction error Compress We will propose a new prediction technique: Spatial Sparsity Induced Temporal Prediction (SIP)

5 5 Current State of Prediction Predict current framereference frame Current prediction techniques only work well when current frame blocks are simple translations of past frame blocks (Sufficient, for most simple video). In this work, we will assume translations are accounted for. + - Transform Coder

6 6 Example Difficult Video (please note differences with traditional sequences like foreman ) (Commercial) (Trailer)

7 7 Some example problematic temporal evolutions for current techniques reference framecurrent frame Temporally decorrelated noise Simple fade from a blend of two scenes Special effects INTRA (non- differential) encoding (many bits)

8 8 Motion Compensation Transform Coder 1  z + + Frame to be coded + - + + Previously decoded frame, to be used as reference Coded differential Transform Decoder Sparsity Induced Prediction (causal information) SIP inside a generic hybrid coder Objective is to generate better motion compensated predictors.

9 9 MC reference frame current frame MC reference after SIP noise (denoised) lightning (removed!) cross-fade (fading scenes reduced and amplified as needed!) clutter (removed)

10 10 Loose Model (after translations are accounted for) = …=+ [] =0.5(+)+ = …[] []= … structured noise ! brightness change smooth light-map relevantnoisereference (ith pixel) current Straightforward with today’s know-how: use an overcomplete set of transforms threshold coefficients, … Denoising recipe will not work:  can we somehow optimize transforms?  use index sets of coefficients?, … We must find a common formulation for both of these cases (turns out to be very easy!) (Nx1) I will show more complicated variations as well

11 11 How We Do it “frame” coefficients of, Look at all images in terms of their “frame coefficients” (translation invariant decompositions generated with a 4x4 block DCT – poor person’s frame) (M times expansive, M=16) ( causal, least squares, per-coefficient estimate of the frame coefficients of ) Mini FAQ: Why a frame? Separating r and s becomes straightforward, i.e., easy rejection of s. Inverting overcomplete decompositions? Easy. Why DCT 4x4? Because it is fast. Can one use *lets, ? Subject to some caveats, yes.

12 12 Views of a DCT(4x4) frame = 0.5 (+) I rearranged the d to make nice pictures out of the coefficients

13 13 0 (gray=0) 0 are conveniently separated in the frame/overcomplete domain! are conveniently separated except for overlaps (but usually there are few overlaps and for fast processing in this version we will ignore them) can predict s in other ways and improve its rejection Automatic separation of relevant and irrelevant 00

14 14 0 0 … 0 0 … (blue= r is significant red = s is significant) Overlaps are few. However, it is clear that the prediction must suppress/amplify the same frequencies in a spatially adaptive fashion. Approaches that use filter dictionaries (i.e. Wiener interpolation filters, etc.) require very big dictionaries. … 0 0 0 … … … Real Example

15 15 Fill all frame coefficients of the orange block and invert (encoder/decoder). Send/receive residual for red block, … (Less accurate prediction at singularity overlaps). previously encoded block to be encoded available coefficients coefficients associated with the block to be coded Causal Prediction of Frame Coefficients neighborhood

16 16  Simulation results that show the efficacy of causal predictions (compression results are later).  Showcase of the proposed work using standard test images to give an idea of the temporal evolutions that it can deal with.  Evolutions are frame-wide for ease of demonstration. Otherwise, the proposed algorithm is local and can easily take advantage of localized evolutions in an adaptive fashion.  All frames have additive Gaussian noise ( ) for added challenge and demonstration of noise robustness.  (The algorithm exploits the underlying non-convexity of the set of natural images.) Some Prediction Examples

17 17Problem Past frame Current frame Required processing for each predicted block (without looking at the predicted block!) Prediction Prediction Accuracy (PSNR) Noisy videoDenoise 36.42 dB =peppers + noise (completely causal, no side information sent) (BLS-GSM 37.12dB)

18 18 Problem Past frame Current frame Required processing for each predicted block (without looking at the predicted block!) Prediction Prediction Accuracy (PSNR) Scene transition from a blend of two scenes. Denoise, find peppers (!) out of the blend of peppers & barbara, amplify peppers. 28.954 dB =(peppers+barbara)/2+ noise SNR=0dB! Must catch the red fish (completely causal, no side information sent) =peppers+ noise (de-Barbara-d )

19 19 Problem Past frame Current frame Required processing for each predicted block (without looking at the predicted block!) Prediction Prediction Accuracy (PSNR) Scene transition from a blend of three scenes. Denoise, find peppers out of the blend of peppers, barbara & boat, amplify peppers. 26.874 dB Must catch the red fish =(peppers+barbara+boat) /3 + noise

20 20 Problem Past frame Current frame Required processing for each predicted block (without looking at the predicted block!) Prediction Prediction Accuracy (PSNR) Scene transition with a cross fade (one scene fades out, the other fades in). Denoise, find barbara, reduce barbara, find peppers, amplify peppers. 34.952 dB =.3*peppers+.7*barbara + noise =.7*peppers+.3*barbara + noise

21 21 Problem Past frame Current frame Required processing for each predicted block (without looking at the predicted block!) Prediction Prediction Accuracy (PSNR) Scene transition from a blend with a brightness change. Denoise, find lightmap, invert lightmap, find peppers out of the blend of peppers & barbara, amplify peppers. 27.274 dB

22 22 Q: Does it work in practice? A: Yes. JM 10.2, IPP…, (MB level switch, no other overhead). QCIF video. ¼ pixel motion. Adaptive rounding on. Better ~20% gains in rate ~10% ~25% ~10% (a) (b) (c)(d)

23 23 Movie trailer ~18 % Our gains are reduced at lower bitrates because compression process tends to remove the effect of some of the problems we can deal with.

24 24Properties Decoder complexity translation invariant decomposition per-pixel: 3*4*4 multiplies, 4*4 divides, 4*4*4 additions (to compute ) reduce complexity by reducing causal neighborhood, less expansive decompositions, run only on high error blocks, etc. Encoder complexity = Decoder complexity + motion search (fast search, run only on high error blocks, etc.) Other work: Brightness compensation methods: Work only for brightness changes. Wiener Filter based sub-pixel interpolation : Filters have low-pass characteristics only. Need many filters in dictionary (too much overhead). Weighted prediction: Scene wide, only works on blends if blending frames are in the reference frame buffer. ~ Our work is more of an “all purpose cleaner” compared to early work

25 25Conclusion Images depicted in video are sparse and this can be taken advantage of in order to generate very interesting prediction results. The proposed work goes beyond early prediction solutions and adds new capabilities to the prediction. Many types of temporal evolutions in video can be easily managed, denoising accomplished, lightning removed, complicated fades handled, focus changes deblurred … Showcase of the power of sparse decompositions and how the underlying non- convexity can be utilized. Future Work: Manage overhead better. Improve performance. Reduce complexity.


Download ppt "Spatial Sparsity Induced Temporal Prediction for Hybrid Video Compression Gang Hua and Onur G. Guleryuz Rice University, Houston, TX DoCoMo."

Similar presentations


Ads by Google