Efficient use of spectrum Less sensitive to noise & distortions Integration of digital services Data Encryption Digital Video
576 lines 5.5MHz = 720 pixels Raw image 576 lines/frame 720 pixels/line 50 fields/second 8 bit per pixel Total: 576x720x25x3 x8 = 249Mbit/sec R,G andB R: 83Mbit/sG: 83Mbit/sB: 83Mbit/sTotal: 249Mbit/s Figure 2a
f VSB Chroma at 4.43 MHz MHz Sound at 6 MHz Chrominance can be represented with a considerable narrower bandwidth (resolution) than luminance
576 lines 5. 5MHz = 720 pixels PAL system 576 lines/frame 720 pixels/line 50 fields/second 8 bit per pixel Total: 576x720x25x8 = 83Mbit/sec Luminance Y Y: 83Mbit/sU:V:Total: Figure 2b
288 lines 2.75MHz = 360 pixels PAL system 288 lines/frame 360 pixels/line 50 fields/second 8 bit per pixel Total: 288x360x25x8 = 21Mbit/sec Chrominance U Y: 83Mbit/sU: 21Mbit/sV:Total: Figure 2c
288 lines 2.75MHz = 360 pixels PAL system 288 lines/frame 360 pixels/line 50 fields/second 8 bit per pixel Total: 288x360x25x8 = 21Mbit/sec Chrominance V Y: 83Mbit/sU: 21Mbit/sV: 21Mbit/sTotal: 125Mbit/s Figure 2d
Medium Quality : 1.2 Mbit/s Superior Quality : 6 Mbit/s Actual size - 249Mbit/s Result: Compression is necessary U,V downsampled - 125Mbit/s
Redundancy in image contents Adjacent pixels are similar Intensity variations can be predicted Sequential frames are similar Lossy compression: Removal of redundant information, resulting in distortion that is insensitive to Human Perception
Figure 3a Lenna Pixels within this region have similar but not totally identical intensity.
Figure 3b
Intensity position
Autocorrelation function Figure 4
Interpolation 1.Pixel intensities usually varies in a smooth manner except at edge (dominant/salient) points 2.Record pixels at dominant points only. 3.Reconstruct the pixels between dominant points with “Interpolation”. 4.A straightforward method: Joining dominant points with straight lines. 5.High compression ratio for smooth varying intensity profile. 6.Difficulty: How to identify dominant points?
Intensity position Figure 5a Transmit only selected pixels predicted the rest
Prediction of current sample based on previous ones Quantizer (Q) Predictor (P) Input signal Predicted signal Error signal Quantized error signal Reconstructed signal Quantizer: representation of a continuous dynamic range with a finite number of discrete levels (will be discussed later) Error = Quantization error
Function of Predictive Coding: Data Compression Quantizer (Q) Predictor (P) 8 bits The better the predictor, the higher is the compression ratio 3 bits Prediction error
A simple example: 6 bitsLevelxy0 0 to to to to 11 9 Quantizer (Q) Predictor (P) 2 bits Quantizer
Predictive Decoder Quantizer (Q) Predictor (P) Reconstructed signal Reconstruction error Quantization error Option: the quantized levels are transmitted instead of the actual errors Q -1
Predictive DecoderLevelxy0 0 to to to to bits Quantizer 6 bits Error = Quantization error Quantizer (Q) Predictor (P) Q -1
Predictive Decoder Quantizer (Q) Predictor (P) Levelxy0positive+S 1negative-S 1 bits Quantizer 6 bits Q -1 S = Fix step size
Where Prediction based on the linear combination of previously reconstructed samples Current sample = Optimal predictor design by minimizing the Mean Square Prediction Error
Intensity position Figure 5b Y A
e.g. Asin( n/T)+Y Intensity position n Figure 5d A Y
1. Select a basis - a set of fixed functions {f 0 (n), f 1 (n), f 2 (n), f 3 (n), ……………, f N (n)} 2. Assuming all types of signals can be approximated by a linear combination of these functions (i.e. A(n) = a 0 f 0 (n)+ a 1 f 1 (n)+ a 2 f 2 (n)+…+ a N f N (n) 3. Calculate the coefficients a 0, a 1, ….., a N 4. Represents the input signal with the coefficients instead of the actual data 5. Compression: Use less coefficients, e.g. a 0, a 1, ….., a K (K<N) 6. For example: the set of sine and cosine waves Major Steps
1. Adopt the sine and cosine waves as a basis 2. Calculate the Fourier coefficients (Note: a sequence of N points will give N complex coefficients 3. Encoding (compression): Represents the signal with the first K coefficients, where K < N 4. Decoding (decompression): Reconstruct the signal with the K coefficients with inverse Fourier Transform. 5. Other Transforms (e.g. Walsh Transform) can be adopted Sinusoidal Waves
Set of basis functions
denotes Dot Product between A and B Transform from the “s” domain to the “S” domain
x(0) x(1) x(2) …….. x(N-2) x(N-1) X(k)X(k) W(0,k) W(1,k) W(2,k) W(N-2,k)W(N-1,k)
A. Orthogonal Property Delta function B. Orthonormal Property
s denotes Dot Product between A and B Inverse Transform from the “S” domain to the “s” domain are complex conjugates
X(0) X(1) X(2) …….. X(N-2) X(N-1) x(n)x(n) W * (n,0) W * (n,1) W * (n,2) W * (n,N-2)W * (n,N-1)
Note: X(k) is complex
Note: X(k) is real
x(n) = W’(n,k) = cos[(2n+1)k Note: W k is real, therefore W’ k = W k k=0 N-1 X(k)W’(n,k) C(k)C(k) 2 C(k) = for k = 0 = 1 otherwise
Transform that are suitable for compression should exhibit the following properties: Optimal Transform : Karhunen-Loeve Transform(KLT) a. There exist an inverse transform b. Decorrelation c. Good Energy Compactness
x(0), x(1), x(2), x(3), x(4), x(5), x(6), x(7), ….., x(N-2), x(N-1) A sample can be predicted from its neighbor(s)
X(0)X(1)X(2)X(3)X(4)X(5)X(6)X(7) After DFT, a coefficient is less predictable from its neighbor(s) Magnitude of frequency components
x(0), x(1), x(2), x(3), x(4), x(5), x(6), x(7), ….., x(N-2), x(N-1) All samples are important
x(0), x(1), x(2), x(3), x(4), x(5), x(6), x(7), ….., x(N-2), x(N-1) All samples are important Any missing sample causes large distortion
X(0)X(1)X(2)X(3)X(4)X(5)X(6)X(7) x(0)x(1)x(2)x(3)x(4)x(5)x(6)x(7) DFT samples
X(0)X(1)X(2)X(3)X(4)X(5)X(6)X(7) x(0)x(1)x(2)x(3)x(4)x(5)x(6)x(7)
X(0)X(1)X(2)X(3)X(4)X(5)X(6)X(7) x(0)x(1)x(2)x(3)x(4)x(5)x(6)x(7) The signal can be constructed with the first 3 samples with good approximation
All information is concentrated in a small number of elements in the transformed domain DCT has very good Energy Compactness and Decorrelation Properties
X(j,k) = m=0 M-1 x(m,n)W(m,j) W(n,k) C(j)C(j) 2 n=0 C(k)C(k) 2 W(n,k) = cos[(2n+1)k C(k), C(j) = for k = 0 and j = 0, respectively = 1 otherwise N-1 W(m,j) = cos[(2m+1)j
x(0,0)x(0,1)x(0,2)x(0,N-1) x(1,0)x(1,1)x(1,2)x(1,N-1) x(M-1,0)x(M-1,1)x(M-2,2)x(M-1,N-1) X(0,0)X(0,1)X(0,2)X(0,N-1) X(1,0)X(1,1)X(1,2)X(1,N-1) X(M-1,0)X(M-1,1)X(M-2,2)X(M-1,N-1) 2-D DCT
x(m,n) = j=0 M-1 X(j,k)W(m,j) W(n,k) C(j)C(j) 2 C(k)C(k) 2 W(n,k) = cos[(2n+1)k C(k), C(j) = for k = 0 and j = 0, respectively = 1 otherwise k=0 N-1 W(m,j) = cos[(2m+1)j
x(0,0)x(0,1)x(0,2)x(0,N-1) x(1,0)x(1,1)x(1,2)x(1,N-1) x(M-1,0)x(M-1,1)x(M-2,2)x(M-1,N-1) X(0,0)X(0,1)X(0,2)X(0,N-1) X(1,0)X(1,1)X(1,2)X(1,N-1) X(M-1,0)X(M-1,1)X(M-2,2)X(M-1,N-1) 2-D IDCT
Importance
Given a signal and Assume f(n) is wide-sense stationary, i.e. its statistical properties are constant with changes in time Defineand (O1) (O2) f(n), define the mean and autocorrelation as
(O3) Equation O1 can be rewritten as The covariance of f is given by (O4) (O5)
The signal is transform to its spectral coefficients Comparing the two sequences: a. Adjacent terms are related b. Every term is important a. Adjacent terms are unrelated b. Only the first few terms are important
The signal is transform to its spectral coefficients similar to f, we can define the mean, autocorrelation and covariance matrix for
a. Adjacent terms are relateda. Adjacent terms are unrelated Adjacent terms are uncorrelated if every term is only correlated to itself, i.e., all off-diagonal terms in the autocorrelation function is zero. Define a measurement on correlation between samples: (O6)
We assume that the mean of the signal is zero. This can be achieved simply by subtracting the mean from f if it is non- zero. The covariance and autocorrelation matrices are the same after the mean is removed.
b. Every term is important b. Only the first few terms are important Note: If only the first L-1 terms are used to reconstruct the signal, we have (O7)
If only the first L-1 terms are used to reconstruct the signal, the error is The energy lost is given by but, hence (O8) (O9) (O10)
Eqn. O10 is valid for describing the approximation error of a single sequence of signal data f. A more generic description for covering a collection of signal sequences is given by: (O11) An optimal transform mininize the error term in eqn. O11. However, the solution space is enormous and constraint is required. Noted that the basis functions are orthonormal, hence the following objective function is adopted.
(O12) The term r is known as the Lagrangian multiplier The optimal solution can be found by setting the gradient of J to 0 for each value of r, i.e., Eqn O13 is based on the orthonormal property of the basis functions. (O13)
The solution for each basis function is given by (O14) r is an eigenvector of R f and r is an eigenvalue Grouping the N basis functions gives an overall equation (O15) R = R f = (O16) which is a diagonal matrix. The decorrelation criteria is satisfied
The signal is transform to its spectral coefficients Given a signal The solution for each basis function is given by Determine the autocorrelation function R f
Redundancy in images Probability distribution of pixel values are uneven Assuming the pixel intensity (gray scale) ranges from 0 to 255 units Figure 6a 255 0
Pixel Intensity Probability of occurrence Figure 6b Use less bits to represent pixel intensity that occurs more often
A simple example: 720 pixels 576 pixels 8bit per pixels Total: 3.3Mbits Image size = 720x576x8 = 3.3Mbits 8bit per pixels
IntensityP r Pixel Intensity PrPr pixels 576 pixels 8bit per pixels Total: 3.3Mbits
Pixel Intensity Intensity P r # of bits Bit String PrPr XXXXXXXX Total = (720X576)X( X255X9) = 2.1Mbits
Sequential frames are similar
Figure 7 P1 P2 P3 Only about 5-10% of the content had been changed between frames
Still picture - JPEG Joint Photographic Expert Group International Standard Organization (ISO) standards. Based on Discrete Cosine Transform (DCT). Motion picture - MPEG Motion Picture Expert Group
Image Image Vectors DCT Quantization Zig-Zag Coding Runlength Coding Entropy Coding Digitization JPEG Compressed Format
Image Image Vectors DCT Quantization Zig-Zag Coding Runlength Coding Entropy Coding Digitization JPEG Compressed Format
Digitization Figure 8
Figure 9 Image Digitization
Image Image Vectors DCT Quantization Zig-Zag Coding Runlength Coding Entropy Coding Digitization JPEG Compressed Format
Figure 10a
Figure 10b Image vectors
Figure 10c Image Vector - a magnified view
Figure 10d
Image Vector - a magnified view x(0,0)x(0,1)x(0,2)x(0,3)x(0,4)x(0,5)x(0,6)x(0,7) x(2,0)x(2,1)x(2,2)x(2,3)x(2,4)x(2,5)x(2,6)x(2,7) x(3,0)x(3,1)x(3,2)x(3,3)x(3,4)x(3,5)x(3,6)x(3,7) x(4,0)x(4,1)x(4,2)x(4,3)x(4,4)x(4,5)x(4,6)x(4,7) x(1,0)x(1,1)x(1,2)x(1,3)x(1,4)x(1,5)x(1,6)x(1,7) x(5,0)x(5,1)x(5,2)x(5,3)x(5,4)x(5,5)x(5,6)x(5,7) x(6,0)x(6,1)x(6,2)x(6,3)x(6,4)x(6,5)x(6,6)x(6,7) x(7,0)x(7,1)x(7,2)x(7,3)x(7,4)x(7,5)x(7,6)x(7,7) Figure 10e
Image Image Vectors DCT Quantization Zig-Zag Coding Runlength Coding Entropy Coding Digitization JPEG Compressed Format
Increasing horizontal frequency Increasing vertical frequency Figure 11a
Increasing horizontal frequency Increasing vertical frequency Figure 11b Because of the energy compactness of DCT, most of the information is concentrated in the low frequency corner
Figure 11c
The DCT coefficients are normalised to 11 bits integer values Before the transform, the pixel intensity range is converted from [0,255] to [-128, 127] The process, known as ‘zero shift’, is performed by subtracting each pixel intensity by 128
Image Image Vectors DCT Quantization Zig-Zag Coding Runlength Coding Entropy Coding Digitization JPEG Compressed Format
Quantizer f 0 d1d1 d2d2 d3d3 r1r1 r2r2 -r 2 -r 1 d4d4 -d 4 -d 3 -d 2 -d 1 r3r3 -r 3 Uniform Symmetric Quantizers Input Output d i : Decision levels r i : Representation levels
Mean Square Quantization Error (MSQE) Mean Absolute Quantization Error (MAQE) Q1 Q2
Max-Lloyd Quantizer A method to determine the decision and representation levels Suppose Then Q3
Max-Lloyd Quantizer Consider two arbitrary adjacent reconstruction levels r k-1 and r k What will be the optimal value for d k so that error is minimized? d k-1 dkdk d k+1 r k-1 rkrk Q4
Max-Lloyd Quantizer Similarly Q5
Max-Lloyd Quantizer for uniform pdf Consider a uniform probability density function f 0 A/2 -A/2 1/A p(f)p(f)
Max-Lloyd Quantizer for uniform pdf From Q4, From Q5, Hence, Constant Step Size
Max-Lloyd Quantizer for uniform pdf Step size (SS) Q6 Q7 Variance =
Max-Lloyd Quantizer for uniform pdf Q8 For a b bits quantizer, Q9 SNR =
Assign different quantization step size for each coefficients Figure 12
Consider a range of values from, lets say 0 to If a step size = 8 is used, the range is divided into 256/8 = 32 levels 5 bits are required to represent each level in this range Value Level Bit string Quantized value
Consider a range of values from, lets say 0 to If a step size = 16 is used, the range is divided into 256/16 = 16 levels 4 bits are required to represent each level in this range Value Level Bit string Quantized value
16 levels The larger the step size, the smaller the number of quantized levels the smaller the number of bits, the larger the distortion in value and the other way round
Human Visual System is more sensitive to low frequency intensity (spatial) variation in an image Increasing horizontal frequency Increasing vertical frequency Figure 13
Human Visual System (HVS) is more sensitive to low frequency intensity (spatial) variation in an image Decreasing sensitivity to HVS Figure 14
Assign different quantization step size for each coefficients Figure DCT coefficientsQ Step Size
Assign different quantization step size for each coefficients Figure DCT coefficients Quantized DCT coefficients
After Quantization, a lot of high frequency DCT coefficients are truncated to ‘0’ Non-zero coefficients carry most of the image contents and those that are sensitive to the HVS Large number of ‘0’ value coefficients suggested runlength coding
For a continuous stream of numbers with identical values, it is only necessary to record 1. The value of the number 2. The number of duplication A sequence of 8 bytes of raw data s = [15, 15, 15, 15, 15, 15, 15, 15] Runlength representation: [ 15, 8 ] ValueRunlength Only 2 bytes are needed to represent ‘s’
The longer the string of duplicated numbers, the larger the Compression Ratio (CR) Runlength representation: [ 15, 4 ] ValueRunlengthCompression Ratio = 2 Runlength representation: [ 15, 16 ] ValueRunlengthCompression Ratio = 8 s = [15, 15, 15, 15] s = [15, 15, 15, 15, 15, 15, 15, 15,15,15,15,15,15,15,15,15]
Runlength of ‘0’ CR Figure 17
The compression ratio of horizontal scanning is always less than or equal to 4 A better approach is to adopted zig-zag scanning
Image Image Vectors DCT Quantization Zig-Zag Coding Runlength Coding Entropy Coding Digitization JPEG Compressed Format
Quantized DCT coefficients Runlength of ‘0’ = 47 CR = 23.5 Figure 18
Image Image Vectors DCT Quantization Zig-Zag Coding Runlength Coding Entropy Coding Digitization JPEG Compressed Format
Probability distribution of pixel values are uneven Use less bits to represent pixel intensity that occurs more often Remember this? This can be generalised to......
If probability distribution of data values are uneven Less bits can be used to represent values that occurs more often and vice versa
In JPEG, DC and other coefficients are encoded separately Figure 19 DCT coefficients DC All other coefficients are ‘AC’ terms
DC coefficients of adjacent image blocks are similar. DC coefficient represents the average intensity in an image block 8 pixels
Differential Pulse Code Modulation (DPCM) is applied to encode the ‘Quantized’ DC terms. Consider a row of image block Image blocksQuantized DC coefficients DPCM
As adjacent DC terms are similar, the DPCM values are small in general, i.e., small values occur more often The DPCM values are divided in 16 classes according to their magnitude Each class had different probability of occurence
ClassDPCM difference values [0] [-1][+1] [-3,-2][+2,+3] [-7, -6,...., -4][+4, +5,...., +7] [-15, -14,....,-9, -8][+8, +9,....,+14, +15] [-31, -30,....,-17, -16][+16, +17,....,+30, +31] [-63, -62,....,-33, -32][+32, +33,....,+62, +63] [-127, -126, , -64][+64, , +126, +127] [-255, -254,....., -128][+128, +129,....., +255] [-511, -511,....., -256][+256, +257,....., +511] [-1023,..., -513, -512][+512, +513,..., +1023] [-2047, , -1024][+1024, , +2047] [-4095, , -2048][+2048, , +4095] [-8191, , -4096][+4096, , +8191] [-16383, , -8192][+8192, , ] [-32767,......, ][+16384, , ]
Small values, that occur more often, are grouped into classes that contain fewer members A class with fewer elements(s) require less bits to identify its members As a result, small values require less bits to represent
Any DPCM value is addressed by its class and a string of additional bits to identify its position in the class ClassDPCM difference values 6[-63, -62,....,-33, -32][+32, +33,....,+62, +63] For example, in class 6, there are 64 members, 6 additional bits is required
Representation of DPCM data ClassAdditional bits 4 bitsAdaptive For most DC coefficients, the DPCM values are belonged to lower classes that require less additional bits
Nonzero AC terms are represented in the same way as DPCM coefficients ClassAdditional bits 4 bitsAdaptive Zero terms are encoded with zig-zag scanning followed by RLC How are these two items combined?
Quantized DCT coefficients V Zig-zag scanning index (I) I V 4354RL
ClassAC coefficient values [0] [-1][+1] [-3,-2][+2,+3] [-7, -6,...., -4][+4, +5,...., +7] [-15, -14,....,-9, -8][+8, +9,....,+14, +15] [-31, -30,....,-17, -16][+16, +17,....,+30, +31] [-63, -62,....,-33, -32][+32, +33,....,+62, +63] [-127, -126, , -64][+64, , +126, +127] [-255, -254,....., -128][+128, +129,....., +255] [-511, -511,....., -256][+256, +257,....., +511] [-1023,..., -513, -512][+512, +513,..., +1023] [-2047, , -1024][+1024, , +2047] [-4095, , -2048][+2048, , +4095] [-8191, , -4096][+4096, , +8191] [-16383, , -8192][+8192, , ] [-32767,......, ][+16384, , ]
I V 354RL 4Class1 AC coefficient values 4[-15, -14,....,-9, -8][+8, +9,....,+14, +15]
ClassAC coefficient values 1[-1][+1] I V 354RL 4Class
RL and Class are grouped into the RUN-SIZE Table F N/A F N/A F N/A F N/A F N/A F N/AF1F2F3F4F5FF F F RRRR SSSS 00 - End of Block
Each non-zero AC coefficient is represented by an 8- bit value ‘RRRRSSSS’ RRRR is the runlength of ‘zeros’ between current and previous AC coefficients If the runlength exceeds 15, a term ‘F0’ will be inserted to represent a runlength of 16 If all remaining coefficients are zero, a term ‘00’ (EOB) is inserted. A Few Points to Note
Additional bits RL Class EOB EOB : End of Block RS443100Hexadecimal RS684900Decimal I V 354RL 4Class
1 Additional bits 68RS49 Encoded AC format Number of bits: = 29bits Number of bits for the 63 AC coefficients = 63 x 11 = 693 bits I V 354RL 4Class1 4
The “Baboon” is one of the popular standard images that had been adopted for comparison purpose in image compression research. The difficult part is that the large amount of texture is pretty hard to compress with good fidelity. The easy part is the distortions are difficult to spot. Hi!, I am the famous Baboon, very nice to meet all of you.