Download presentation

Presentation is loading. Please wait.

Published byJake Whitehead Modified over 3 years ago

1
Henrique S. Malvar, Fellow, IEEE, Antti Hallapuro, Marta Karczewicz, and Louis Kerofsky, Member, IEEE

2
Brief recall of the H.264 encode and decode structure Transform in H.264 DCT and Integer transform Low-Complexity integer transform(author proposed) Quantization in H.264

3
Three Parts: Prediction, Transform, Quantization PredictionTransformQuantization Input block Entropy Coding Prediction: Generate block prediction by Motion Estimation. Transform: Convert the difference between the prediction and true value into coefficients by integer transform. Quantization: Quantize the coefficients. Transmit

4
DCT(Discrete Cosine Transform) Commonly used in block transform coding of image and video, e.g. JPEG and MPEG. Definition for 8x8 block: Convert image from spatial domain to frequency domain

5
In H.264, 4x4 block transform is adopted Problem: Coefficients are irrational numbers. In digital computer, when you do an inverse transform after forward transform of an input, It may not get the same input back.

6
Solution: Integer Transform An integer approximation of DCT. Original H.264 design: {a=13, b=7, c=17} Problem: increase of dynamic range. If max(X(i,j))=A, then max(Y(u,v))= A x (13x4)^2 = 2704 x A. Log2(2704) = 11.4, So it needs 12 more bits to encode Y(u,v) than X(i,j)

7
Choose {a=1, b=2, c=1} 1. Rows are orthogonal to each other. 2. The dynamic range gain is log2(6^2) = Although the norm of each row is different, it can be easily compensated in quantization part. No noticeable performance penalty while reducing the dynamic range gain and simplicity.

8
Inverse transform We could just use the transpose of H. However, in order to minimize the dynamic range gain, we scale the rows that has element 2 in H by ½. So it becomes, Dynamic range gain = log2(4^2) = 4 bits. Also, the factor ½ can be realized by right shift 1 bit, so no multiplication needed.

9
Forward transform Inverse transform

10
It is the step that introduces signal loss for better compression. Encoder quantization is given by where controls the quantization width near the origin. The decoder produces reverse quantization by

11
There must be as low complexity as possible since the H.264 uses predictive coding which means that the error will tend to drift over the entire set for each prediction. Memory requirements are very high for 32- bit operations hence the arithmetic must be as close to 16-bit as possible. There must be no undue stress on the hardware yet keeping the prediction drift error free.

12
The disadvantage of the quantizing equation is that it divides by an integer. In the H.264 format the quantization is of the form The inverse quantization is given by The values A(Q) and B(Q) are obtained from the quantization tables.

13
In the previous equation And Q varies from 0 to Q max. Hence 0 is the finest and Q max is the coarsest quantization. Care must be taken during shifting the bits right since repeated division means tending towards negative infinity and not 0. In the original H.264 design, L=N=20.

14
The values A(Q) and B(Q) must satisfy the form where G is the squared norm of the rows of H. The values of L & N are chosen on a compromise. Larger values reduce approximation error in the above equation and smaller values reduce dynamic range.

15
The complexity of quantization formulae are reduced considerably by reducing them to 16 bits. However, this reduction must be traded off with no reduction in PSNR. This is done by effectively reducing values of B(Q), L & N. B(Q) effectively doubles for an increase of 6 in Q making it a linear relationship between PSNR and Q. This makes it easier to design quantization and reconstruction tables.

16
The H.264 hence uses the modified quantization and reconstruction formulae Where The mod operator makes the quantization factor periodic making it easy to define a large range of parameters without increasing memory requirements

17
The matrices shown denote values of A(Q) and B(Q) such that the matrices maximise dynamic range. These ensure that results always fall within a 16 bit result.

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google