Prof. Jayanta Mukhopadhyay

Prof. Jayanta Mukhopadhyay
Video Processing in Compressed Domain By Prof. Jayanta Mukhopadhyay

Video Resizing

MPEG Introduction Encoding INTRA Motion Compensated Inter Frames INTRA
(DCT, Quant., Motion Estimation & Compensation, VLC) Encoding INTRA Motion Compensated Inter Frames INTRA (IDCT, IQuant., Inverse Motion Compensation, VLC) Decoding Details

Compressed Domain (DCT) Processing
Spatial Domain MPEG video VLC Decoder Inverse Quantization IDCT 8 x 8 DCT blocks Processing Box 8 x 8 DCT blocks MPEG video VLC Encoder Quantization DCT

Video Downscaling Approaches Applications
Browsing remote video database, PIP, video conferencing, transcoding etc. Approaches Spatial Domain Technique Hybrid (Spatial + DCT) Technique Pure DCT Domain Technique

Video Resizing Spatial Domain Technique Input Data VLC Decoder Buffer
IDCT + Motion Compensation Frame Memory Spatial Downscaling Frame Memory + - VLC Encoder DCT Q Q-1 Buffer IDCT Motion Estimation & Compensation + Output Frame Memory

Video Resizing Computation Complexity for P frames from CIF resolution to QCIF resolution Function Complexity Mults. Adds Shifts Inverse Quant. + IDCT (144m, 464a per 8x8 block) Inverse Motion Compensation (256a per 16x16 block) Downscale by 2 (3a, 1s per pixel) Full Search ME (± 15 pels, a per 16x16 block) Motion Compensation (256a per 16x16 block) DCT + Quant. (144m, 464a per 8x8 block) Total Total Operations count (Add = 1op, Shift = 1op, Mult. = 3ops) =

Video Resizing DCT based Downscale DCT Intra Frame DCT based DCT based
Intra DCT blocks DCT based Downscale Downscaled DCT Intra Frames DCT Intra Frame Intra DCT blocks DCT based Motion Estimation & Compensation DCT based Inverse Motion Compensation DCT Inter Frame Motion Vectors Downscaled DCT Intra & Inter Frames

Video Resizing DCT Domain based down-sampling of Intra Frames by a factor of two Compressed Bitstream (downscaled) Compressed Bitstream Huffman Decoder & Dequantizer Huffman Encoder & Quantizer x1 x2 x3 x4 8x8 DCT Blocks 8x8 DCT Block x

Video Resizing Downscaling Technique for an Intra frame (DCT Domain)
8 samples b1 b2 B1 B2 8 - DCT 4 - IDCT 4 samples ^ , B Computational Complexity Downscaling 1.25m a per pixel of the Original frame Upsampling Upsampled frame

Video Resizing + DCT Domain based Inverse Motion Compensation (IMC)
Huffman Decoder And Dequantizer Huffman Encoder And Quantizer 8x8 DCT Error Blocks + 8x8 DCT Intra Blocks 8x8 DCT Blocks DCT Domain Inverse Motion Compensation Previous Frame DCT domain data 8x8 DCT Intra Blocks

Video Resizing ^ = ∑ ci1 xi ci2 x ^ = S ∑ ci1 StS xi StS ci2 St S x St
DCT Domain based Inverse Motion Compensation (Neri Merhav’s Scheme) x1 w x2 x ^ = ∑ ci1 xi ci2 i = 1 4 h x ^ E x3 x4 where cij, i = 1, …, 4, j = 1,2 are sparse 8x8 matrices of zeros and ones. (Intra) (Inter) Expression (1) can be written as S x St ^ = S ∑ ci1 StS xi StS ci2 St i = 1 4 StS = I Where S is a 8-point DCT matrix. S can be factorized as S = D P B1 B2 M A1 A2 A3 Expression (2) can further be written as ^ X = S [Jh B2t B1t Pt D ( X1 D P B1 B2 Jwt + X2 D P B1 B2 K8-wt) + K8-h B2t B1t Pt D( X3 D P B1 B2 Jwt + X4 D P B1 B2 K8-wt) ] St Where Ji = Ui (M A1 A2 A3)t, and Ki = Li (M A1 A2 A3)t i = 1,2,……8 Details

Video Resizing Computation Complexity of the Neri Merhav’s IMC
Matrix Computations/column J1 3m + 6a J2 4m + 10a J3 5m + 16a J4 5m + 19a J5 5m + 20a J6 5m + 22a J7 5m + 24a J8 5m + 28a B1/B1t 4a B2/B2t S/St 5m + 29a Let w = h = 4 Total computations = Six multiplications by B1 or B1t : 6x8x4 = 192a Six multiplications by B2 or B2t : 6x8x4 = 192a Two multiplications by Jw and K8-w, and one by Jh and K8-h = 8x(3x(5m + 19a + 5m + 19a) ) = 240m + 912a One 2D DCT operation = 2x(8x(5m + 29a)) = 80m + 464a Total operations = 320m a ( per 8x8 block) Operations per pixel = 5m a

Video Resizing Modified IMC technique (MBIMC) M’ cr cc E
x1 r x2 x3 c M’ m’ = x1 x2 x3 x4 x5 x6 x7 x8 x9 cr cc x4 x6 E x7 x9 1 ≤ r ≤ 8 and 1 ≤ c ≤ 8 (intra) (inter) Where Cr and Cc are row and column selector matrices of size 16x24 & 24x16. 0 0 ……..0 . 1 0 …………...…..0 0 1 …………...…..0 . 0 0 …………...…..1 0 0 ……..0 . cr = 16 rows r-1 columns 16 columns 8-r+1 columns

Video Resizing Macroblock wise IMC in DCT domain (A) cr cc M’ =
X1 X2 X3 X4 X5 X6 X7 X8 X9 S 0 S 0 S cr cc St 0 St 0 St M’ = S 0 0 S St 0 0 St (A) Using the 8-point DCT matrix factorization, we can represent - St 0 St 0 St (M A1 A2 A3)t (M A1 A2 A3)t 0 (M A1 A2 A3)t B2t 0 B2t 0 B2t B1t 0 B1t 0 B1t Pt 0 Pt 0 Pt Dt 0 Dt 0 Dt Qt B2t B1t Pt Dt = Where S is a 8-point DCT matrix. S can be factorized as S = D P B1 B2 M A1 A2 A3

Video Resizing The expression (A) can be written as X1 X2 X3 X4 X5 X6
cr cc M’ = Qt B2t B1t Pt Dt D P B1 B2 Q S 0 0 S St 0 0 St Let us represent Jr = Cr Qt and Kc = Q Cc 1 ≤ r ≤ 8 and 1 ≤ c ≤ 8 Jr and Kc will have similar complexities due to similar structure. Jr and Kr matrix multiplication can also be implemented efficiently by extending the notion of Neri Merhav.

Video Resizing 27 % improvement on Neri Merhav’s Approach
Computation Complexity of the Modified IMC scheme Matrix Computations/column J1 10m + 56a J2 13m + 58a J3 14m + 60a J4 15m + 64a J5 15m + 66a J6 J7 J8 B1/B1t 12a B2/B2t S/St 5m + 29a Let r = c= 5 Total Computations = Two multiplication of B1 type : 2x24x12 = 576a Two multiplication of B2 type : 2x24x12 = 576a One multiplication of Jr & Kc : 2x24x(15m+66a) = 720m a Four 2D DCT operation = 4x(8x(5m + 29a)) = 160m + 928a Total computations = 880m a (per 16x16 block) Operations per pixel = 3.43m a 27 % improvement on Neri Merhav’s Approach

Video Resizing PSNR difference between Spatial and MBIMC technique
Video : flower Video : susi

Video Resizing Integrated Scheme for (IMC + Downscaling)
Downsampling Filter If x1, x2, x3, x4 are 8x8 spatial domain adjacent blocks. The downsampled block ‘x’ can be computed as x1 x2 x3 x4 x = d dt (B) Where ‘d’ is a downsampling filter. d = 0.5 8x16

Video Resizing Using expression (A) and (B), we can write X1 X2 X3
cr Qt B2t B1t Pt Dt D P B1 B2 Q cc M’ = DCT d dt 16x16 8x8 Let us represent Jr = d Cr Qt and Kc = Q Cc dt 1≤ r ≤8 ; 1≤ c ≤8 Jr and Kc will have similar structure and similar computation complexities. But It will have two different structure when r (or c) is even and when r (or c) is odd. Details

Computations per column
Video Resizing Computation Complexity of the Integrated Scheme (IMC + Downsampling) Matrix Computations per column J1 10m + 34a J2 17m + 44a J3 14m + 38a J4 16m + 43a J5 15m + 41a J6 J7 J8 15m + 44a Let r = c = 6 Two multiplication of B1/B1t = 2 x 12 x 24 = 576a Two multiplication of B2/B2t = 2 x 12 x 24 = 576a One multiplication by J6 = 24 x (17m + 44a) = 408m a One multiplication by K6 = 24 x (17m + 44a) = 408m a One 2D-DCT operation = 8 x (5m + 29a) = 40m + 232a Total computations = 856m a ( per 16x16 block) Operations per pixel = 3.34m a 40 % improvement on Neri Merhav’s Approach

Video Resizing PSNR difference between Spatial and Integrated Scheme
Video : Flower Video : Mobile

Video Resizing Average PSNR Comparison Chart Spatial Domain Scheme
(Videos are downscaled from CIF (1.15 Mbps) to QCIF (512 Kbps) Spatial Domain Scheme Neri Merhav Scheme MBIMC Scheme Integrated Scheme Video I P Susi 32.32 32.58 35.90 35.85 33.92 Tennis 24.04 23.90 25.95 25.53 23.89 Mobile 21.03 22.50 22.62 24.30 22.88 Flower 21.99 23.20 24.07 25.62 23.85

Video Resizing Motion Vector Re-estimation

Video Resizing Algorithms for Motion Vector Re-estimation
Adaptive Motion Vector Resampling Technique (AMVR) Maximum Average Correlation (MAC) Median Method Non-Linear Motion Vector Resampling Technique (NLMR) And many more…

Video Resizing Comparison of Motion Vector Re-estimation Methods
Video : Coastguard Frames : 300 From : CIF (1.15 Mbps) To : QCIF (500 Kbps)

Video Resizing Comparison of Motion Vector Re-estimation Methods
Video : Container Frames : 300 From : CIF (1.15 Mbps) To : QCIF (500 Kbps)

Video Resizing Pure DCT Domain based Proposed System Input Data VLC
DCT blocks Input Data VLC Decoder Buffer Q-1 Motion Vector AMVR + MBIMC Scheme DCT Frame DCT Downscaling Intra DCT Blocks Q Step Size DCT Frame + - Q VLC Encoder Q-1 MTSS Buffer + DCT Based Motion Compensation Frame Memory Output

Video Resizing Computational Complexity of Proposed System
Function Complexity Mults. Adds Shifts Inverse Quant. (64m per 8x8 block) MBIMC (3.43m, 20.5a per pixel) DCT downscale by 2 (1.25m, 1.25a per pixel) AMVR (9m, 30a, 1shift per 16x16 block) DCT domain MC (3.43m, 20.5a per pixel) Quant. (64m per 8x8 block) Total (Conversion of P frame from CIF to QCIF) Total Operations count (Add = 1op, Shift = 1op, Mult. = 3ops) = 16 times faster than Spatial Domain Method

Video Resizing Comparison of Pure DCT and Hybrid System Avg. PSNR
( ) Pure DCT ( )

Video Resizing Comparison of Pure DCT and Spatial System Avg. PSNR
( ) Pure DCT ( )

Video Resizing Optimization to Pure DCT based proposed system (Utilizing the sparseness of DCT blocks) Function Complexity Mults. Adds Shifts Inverse Quant. (64m per 8x8 block) MBIMC (0.9m, 6.8a per pixel) (assuming only 16 non-zero coeff.) DCT downscale by 2 (1.25m, 1.25a per pixel) AMVR (9m, 30a, 1shift per 16x16 block) DCT domain MC (0.9m, 6.8a per pixel) Quant. (64m per 8x8 block) Total (Conversion of P frame from CIF to QCIF) Total Operations count (Add = 1op, Shift = 1op, Mult. = 3ops) = 36 times faster than Spatial Domain Method

Video Resizing Comparison of Optimized Pure DCT and Hybrid System
Avg. PSNR Hybrid ( ) Optimized Pure DCT ( )

Video Resizing Comparison of Optimized Pure DCT and Spatial System
Avg. PSNR Spatial ( ) Optimized Pure DCT ( )

(assuming 16 non-zero coeff.)
Video Resizing Average PSNR Comparison Chart (Videos are downscaled from CIF (1.15 Mbps) to QCIF (512 Kbps) Spatial Domain Method Hybrid Domain Method DCT Domain Method (assuming 16 non-zero coeff.) Video Y U V Coastguard 25.17 32.54 32.55 24.26 32.45 25.13 42.22 43.16 Foreman 28.61 32.20 32.08 28.29 32.00 31.92 29.42 40.09 41.09 Susi 33.90 32.94 32.44 33.86 32.93 32.43 36.83 40.92 40.73 Tennis 24.98 32.36 31.58 24.97 31.57 26.49 41.60 41.95

Conclusion The modified IMC (MBIMC) scheme provided 27% improvement over the existing IMC technique. The Integrated (IMC+downscaling) scheme provides 40% improvement. Our proposed DCT domain based video downscaling system is 36 times faster than spatial domain method. Our proposed DCT domain based video downscaling system produces approx. 1.5 dB better output than Hybrid and spatial domain system.

H.264 Resizing

Relation between Integer DCT and Real DCT

To simplify the implementation, d is approximated by 0.5.
It can be factorized as To simplify the implementation, d is approximated by 0.5. To ensure that the transform remains orthogonal, b also needs to be modified such that

The 2nd and 4th rows of matrix C and the 2nd and 4th columns of matrix CT are scaled by a factor of 2 The post-scaling matrix E is scaled down to compensate. Ef This transform is an approximation to the 4x4 DCT but not equal to it. Forward transform and inverse transform are not the same.

Ei The forward and inverse transforms are orthogonal T-1(T(X)) = X. Ef and Ei are scaling matrices that can be incorporated into the quantizer. Hence Real forward DCT = Input is trasformed by Integer forward transfom and then sacled by Ef. Real Inverse DCT = input scaled by Ei and then transformed by Integer Inverse transform

Conversion of a H.264 P frame to an I frame
Macroblock is be partitioned into any of the seven types 16x16,16x8,8x16,8x8,8x4,4x8,4x4 For each macroblock partition type there may be 10 prediction types. Full pel prediction Horizontal only – Half pel or quarter pel Vertical only – Half pel or quarter pel Horizontal and then vertical – Half pel or quarter pel Vertical and then Horizontal – Half pel or quarter pel Diagonal prediction – Half pel or quarter pel

What is Transcoding? Transcoding : A Process in which a coded bit stream is converted into another one of different bit rate, or different format. Bit stream of Different bit rate, or Different Format Pre-encoded Bit stream Transcoder

Pixel Domain Transcoder(PDT)
yuv frames H264 video file H.264 Decoder MPEG-2 Encoder MPEG-2 Video file

Pixel Domain Transcoder Frame
MPEG-2 encoder vs. PDT Pixel Domain Transcoder Frame MPEG-2 Frame

MPEG-2 encoder vs. PDT contd.
MPEG-2 Encoder Vs Pixel Domain Transcoder

Motion Vectors and Block types
DCT Domain Transcoder + VLD IQ Q2 VLC - IQ + + Motion Vectors and Block types MC- DCT MEMORY

Motion Vectors and Block types
Enhancement of PDT VLD IQ IDCT DCT Q2 VLC MEMORY MC + - Motion Vectors and Block types

Adaptive Motion Vector Re-estimation(AMVR)
Weighted average approach Align to best prediction error vector Criteria: if the object boundary blocks have low prediction error than background blocks. Align to worst prediction error vector Criteria: if the object boundary blocks have High prediction error than background blocks

AMVR-Contd MVi is the motion vector of block i of H.264.
Ai is denotes the activity measurement of the block.

Median Method Extracts the motion vector situated in the middle of the rest of the motion vectors

Non-Linear Motion Vector Re-estimation
Minimum distance from the optimal is the best matching motion vector Four parameters are defined for each block A – Activity measurement C – Cluster of motion vector Q – Quantization step size M – Magnitude of motion vector

NLMR contd For each referenced block i, Li is defined(Li is the likelihood score that the block is matching with the optimal) Li is incremented when any Ai,Ci,Qi is highest or Mi is lowest among 16 blocks. Motion vector corresponding to higest L is the best matching motion vector.

Computations Required
Method Additions and Subtractions Multiplications and divisions Shifts and Comparisons Total Saving Per Frame Full Search 9669s +2733a 22m 262s + 836c 23622 0% AMVR 50a 34m+2d 1c 87 99.6% of ME time Median 242a 2d 16c 260 98.89% of ME time NLMR 512s+22a 24m+2d 352c 912 96.13% of ME time

Prof. Jayanta Mukhopadhyay

Similar presentations

Presentation on theme: "Prof. Jayanta Mukhopadhyay"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Prof. Jayanta Mukhopadhyay

Similar presentations

Presentation on theme: "Prof. Jayanta Mukhopadhyay"— Presentation transcript:

Similar presentations

About project

Feedback