1 Chapter 4: Compression (Part 2) Image Compression.

1 Chapter 4: Compression (Part 2) Image Compression

2 Acknowledgement  Some figures and pictures are taken from: The Scientist and Engineer's Guide to Digital Signal Processing by Steven W. Smith

3 Lossy compression  Motivations: Uncompressed images, video and audio data are huge, e.g., in HDTV, bit rate easily exceeds 1Gbps. Lossless methods (Huffman, LZW) are inadequate for images and video because the spatial and/or temporal redundancy of pixel values are not exploited. Special characteristics of human perception (e.g., more sensitive to low spatial frequencies) should be taken advantage of to achieve a higher compression ratio.

4 Spatial sensitivity a higher spatial frequency requires a larger contrast Intensity

5 Image & video compression  JPEG: spatial redundancy removal in intra- frame coding.  H.261 and MPEG: both spatial and temporal redundancy removal in intra-frame and inter- frame coding.

6 Sub-sampling techniques  Some pixel values are discarded. Interpolation techniques are used upon reconstruction of the original data.  Sub-sampling results in information loss. However, the loss is acceptable by the virtue of the physiological characteristics of human eyes.  Chromatic sub-sampling: Human vision system is more sensitive to changes in brightness than to color changes. Very often, RGB values are transformed to Y’C B C R values. The chroma components are then sub-sampled to reduce the data requirement.

7 Chromatic sub-sampling  4:2:2 sub-sample color signals horizontally by a factor of 2 (CCIR 601 standard).  4:1:1 sub-sample horizontally by a factor of 4.  4:2:0 sub-sample in both dimensions by a factor of 2.  4:2:0 is often used in JPEG and MPEG.

8 Chromatic sub-sampling (notation) 4:2:2 luma horizontal sampling reference chroma horizontal sampling either same as the 2 nd digit; or 0, indicating that C B and C R are vertically sub-sampled at a factor of 2.

Example: a frame with pixel dimensions of 720  480:

10 JPEG compression  JPEG stands for “Joint Photographic Experts Group”.  JPEG is commonly used to refer to a standard for compressing and encoding continuous-tone still images.  adjustable compression/quality  4 modes of operations: Sequential (line-by-line) (baseline implementation) Progressive (blur-to-clear) Lossless (pixel-for-pixel) Hierarchical (multiple resolutions)

11 JPEG (steps) 1. Preparation Image can be separated in Y’C B C R components to facilitate sub-sampling on the chrominance components. Each component is segmented into 8  8 blocks. 2. Processing Transformation from spatial to frequency domain using DCT. Uncom- pressed picture Compressed picture preparation picture processing quantizationentropy encoding

1/2 1/4 1/2

13 JPEG (steps) 3. Quantization map real-number values from the previous step to integers. This process results in loss of precision, but achieves data compression. E.g., value: 54.2, quantizer: 20 Recorded value: round(54.2/20) = 3 Recovered value: 3  20 = 60; error = 5.8 It specifies the granularity of the mapping, allowing control of the precision carried in the compressed data. Different levels of quantization are applied to the luminance and chrominance components, exploiting the sensitivity of human perception.

14 JPEG (steps) 4. Entropy encoding It compresses a sequential data stream without loss. Steps of zigzag scan to linearize the data. Predictive encoding and RLE are used to encode the DC and AC components. Finally, Huffman scheme to encode the data.

15 JPEG (schematic diagram) Y’C B C R CBCB CRCR

16 Image preparation  Each image consists of a number of components (e.g., RGB, Y’C B C R ).  Divide each component into 8  8 blocks.  Each block is a “data unit” subject to DCT transformation.  The values in a block are shifted from unsigned integers with range [0, 2 p -1] to signed integers with range [-2 p-1, 2 p-1 -1]. e.g., in 8-bit mode, the range [0,255] is shifted to [- 128,127].

An 8×8 image block

18 DCT (Discrete Cosine Transform)  An 8  8 image block is a 2D function f(x,y) (0  x, y  7) in spatial domain. 231224 217 203189196 210217203189203224217224 196217210224203 196189 210203196203182203182189 203224203217196175154140 182189168161154126119112 17515412610514010511984 15498105981056311284 01234567 x 0 1 2 3 4 5 6 7 y The eye-brow block

19 DCT (Discrete Cosine Transform)  We define 64 basis functions for frequency variables u, v (0  u, v  7) in a 2- dimensional space:  e.g., Each function represents a basic picture

20 DCT (Discrete Cosine Transform)  These are wave functions of successively increasing frequencies. (Imagine them as undulating surfaces of increasingly frequent ups and downs.)  Given a 2D function (imagine it as a 2D surface), one can decompose it into a linear combination of these wave functions.  So, DCT is a frequency (uv coordinates) representation of a spatial (xy coordinates) function.

u0v0 u1v1 u0v1 u1v0 u2v2 u5v1 Some 2-D Basis Functions Some 2-D Basis Functions u6v3 x y

x y Some 2-D basis functions with quantized values Some 2-D basis functions with quantized values

23 DCT  The 64 (8  8) DCT basis functions (top view) are:

24 DCT coefficients - example - An 8  8 block DCT coefficients after transformation in x,y co-ordinatesin u,v co-ordinates u0v0u0v0 u0v7u0v7 u7v0u7v0 u7v7u7v7

26 DCT  From the original spatial function f(x,y) that represents the 8×8 image block, DCT extract the frequency components by multiplying f(x,y) with these basis functions. Down-shifted value

27 DCT  The result is a function F(u,v) in frequency domain, giving the 64 (8  8) coefficients that represents the 64 frequency components of the original image function. Of the 64 coefficients, F(0,0) is due to the basis function of u,v = 0, a flat wave function. F(0,0) is also known as the DC-coefficient. The other coefficients are called the AC-coefficients.

28 DCT  The DC component determines the fundamental (or average) gray/color intensity of the 8  8 pixels. The AC components add the intensity variation to the pixel values to give the original image function.  Typical image consists of large regions of single intensity and color. DCT thus concentrates most of the signal in the lower spatial frequencies. Many of the high-frequency coefficients are of very low values. Entropy encoding applied to the DCT would normally achieve high data reduction.

29 IDCT  The inverse of DCT (IDCT) takes the 64 DCT coefficients and reconstructs a 64-point output image by summing the basis signals.  The result is a summation of all the frequency components, yielding a reconstruction of the original image. (Imagine adding up the respective undulating surfaces to yield the original surfaces.) 2D spatial function that describes the image block 64 coefficients

the “eye” block

32 Quantization  The 64 DCT coefficients are real numbers (i.e., not integers). These coefficients are quantized to throw away bits, and that is the main source of lossiness.

33 Quantization  Uniform quantization Each DCT coefficient is divided by a constant N and then the result is rounded to the nearest integer. Equal treatment to all DCT coefficients.  Quantization tables Each of the 64 coefficients can be adjusted separately. Specific frequencies can be given more importance than others according to the characteristics of the original image.

34 Quantization  Quantization tables In JPEG, each F(u,v) is divided by a different quantizer step size Q(u,v) given in a quantization table:

35 Quantization  The eye is most sensitive to low frequencies (upper left corner), less sensitive to high frequencies (lower right corner).  JPEG standard defines 2 default quantization tables, one for luma (above), one for chroma.  Quality factor: How would scaling the quantization numbers affect the image, say if I double them all? In most implementations, quality factor is the scaling factor for the default quantization table.

36 Zig-Zag scan  This step linearizes the 8  8 block of DCT coefficients. It maps an 8  8 block to a 64-number stream.  RLE and Entropy encoding methods are then applied on the number stream.  Why zig-zag? It is to group the coefficients from low to high frequencies, so that zeros in high frequencies are grouped together. Consecutive zeros would be effectively compressed using RLE. high frequencies can be truncated easily.

37 Zig-Zag scan uu vv

38 Entropy encoding  DC component encoded using predictive/differential encoding DC coefficient determines the average color/intensity of an 8  8 block. Between adjacent blocks, the variation is (usually) fairly small. Encode the difference between the current DC coefficient and the one of the previous block. likely to have very similar DC’s

39 Entropy encoding  AC components encoded with RLE The 63-number stream has lots of zeros in it. Encode as (skip,value) pairs, where skip is the number of consecutive zeros and value is the next non-zero component.

40 Entropy encoding  convert the DCT coefficients after quantization into a compact binary sequence in 2 steps: forming an intermediate symbol sequence converting the sequence into binary using Huffman table  intermediate symbol sequence Each (non-zero) AC coefficient is represented by a pair of symbols: Symbol-1 (Runlength, Size) Symbol-2 (Amplitude)

41 AC encoding  Runlength is the # of consecutive 0-valued AC coefficients preceding a nonzero AC coefficient. Runlength is in the range 0 to 15.  Size is the # of bits used to encode the magnitude of Amplitude. Amplitude can use up to 10 bits.  Amplitude is the amplitude of the nonzero AC coefficient in the range of [-1024,+1023]  10 bits.

42 AC encoding  e.g., given the sequence:..., 0, 0, 0, 0, 0, 0, 476, …  (6,9)(476) // 2 symbols  If Runlength > 15, then Symbol-1 (15,0) = 16 0’s  e.g., what is the sequence represented by: (15,0) (15,0) (7,4) (12)?  (0,0) = End-of-Block symbol: all remaining coefficients are 0’s.

43 DC encoding  A DC value is represented by Size (number of bits needed to represent, symbol-1) and Amplitude (symbol-2).  If DC is 4, 3 bits are needed. Encode Size as a Huffman symbol, then the actual 3 bits.  Since DC are differentially encoded, its range is [- 2048,2047].

Should be 1110

45 JPEG example  The following program has also been suggested to generate the quantization table: for (i=0;i<n;i++) for (j=0; j<n; j++) Q[i][j]= 1 + [(1+i+j)  quality];  The JPEG standard proposes Huffman encoding tables. One example (partial):

46 Compression measures  Compression Ratio (CR): CR = Original data size / Compressed data size higher CR  lower picture quality.  Wallace suggested a measure N b = # of bits per pixel in the compressed image.  An observation: 0.25-0.5 bits/pixel: moderate to good quality; 0.5-0.75 bits/pixel: good to very good quality; 0.75-1.5 bits/pixel: excellent quality; 1.5-2.0 bits/pixel: usually indistinguishable from the original.

47 Compression and picture quality OriginalDC only 0.19 bpp DC + 1-9 AC 0.96 bpp DC + 1-2 AC 0.43 bpp

48 Lossless mode of JPEG compression  A special case of JPEG where there is no loss in the encoding process.  In this mode, image processing and quantization use a predictive technique instead of transformation encoding.  Neighboring pixels are taken as predictors, and the difference between the predicted and the actual values are encoded using Huffman methods.

49 Lossless JPEG Source Data Predictor Entropy Encoder Comp. Data Table Specification High entropy Low entropy histograms

50 Lossless JPEG  In most cases, pixel values do not vary by much except at intensity/color edges. The differences have small values in most regions of the image. Effective entropy compression is possible.

51 Lossless JPEG  For each pixel, the predictor uses a linear combination of previously encoded neighbors. The typical predictor functions used are: Pixel whose value is to be predicted Contribution of the neighboring pixels to the prediction ab c E.g., with predictor P1, predicted value = 0*a + 1*b + 0 *c

52 Lossless JPEG  2D predictors (4-7) usually do better than 1D predictors. (P0 is “no prediction”)  Typical compression ratio achieved is about 2:1.

53 Sequential encoding  In sequential encoding, the whole image is encoded and decoded in a single run. It allows decoding with immediate presentation, but in a top-to-bottom sequence.

54 Progressive encoding  Progressive mode encodes and reconstructs the image with a very rough representation, and refines it during successive steps. Also known as layered coding.

55 Successive refinement  2 ways to successive refinement: Spectral selection. Send DC component for entropy encoding, then first few ACs, then some more ACs, etc. Successive approximation. Send all DCT coefficients in each run, but single bits within a coefficient are processed in different runs. The most-significant bits encoded first, followed by the less-significant bits.

57 Successive refinement Original7 MSBs of DC 0.15 bpp +5MSB of AC 0.3 bpp +7 MSB of AC 0.8 bpp

58 Hierarchical mode  down-sample by factors of 2 in both directions, e.g., reduce 640  480 to 320  240 to 160  120, etc.  Repeat the following process recursively until the full resolution image is compressed. Start by encoding the smallest image, then decode and up-sample the smaller image. encode the difference between the up-sampled and the original images.

59 Hierarchical mode original 640x480 320 x 240 160x120 down sample JPEG uncompress up sample diff. sum JPEG uncomp. up sample diff. JPEG File:

60 Hierarchical mode  Since the original image is encoded at different resolutions, it requires more storage for multiple resolutions. Advantage: the picture is immediately available at different resolutions. Scaling is cheap when display system works only at a lower resolution.

61 Wavelet coding  used in JPEG 2000  consider a one-dimensional array of values: 101, 102, 103, 104, 105, 106, 107, 108  we can represent these values by averages of sums and differences: pair-wise sums: (101+102)/2; (103+104)/2; (105+106)/2; (107+108)/2 pair-wise diffs: (101-102)/2; (103-104)/2; (105-106)/2; (107-108)/2  put these sums and differences into a sequence: 101 ½, 103 ½, 105 ½, 107 ½, -1/2, -1/2, -1/2, -1/2

62 Wavelet transform  Note that the original values can be reconstructed by the sums and differences. 101 ½103 ½105 ½107 ½-1/2 101102103104105106107108 addition subtraction

63 Wavelet transform  Note that if we replace the four –1/2’s by 0’s, the recovered sequence is not too far off from the original:  Hence, quantization and RLE could be applied to effectively reduce the size of the sequence. 101 ½103 ½105 ½107 ½0000 101 ½ 103 ½ 105 ½ 107 ½

64 Wavelet transform  recursively apply the idea to the averages … 101, 102, 103, 104, 105, 106, 107, 108 101 ½, 103 ½, 105 ½, 107 ½, -1/2, -1/2, -1/2, -1/2 102 ½, 106 ½, -1, -1, -1/2, -1/2, -1/2, -1/2 104 ½, -2, -1, -1, -1/2, -1/2, -1/2, -1/2

65 Wavelet transform  recursively apply the idea to the averages … 101, 102, 103, 104, 105, 106, 107, 108 101 ½, 103 ½, 105 ½, 107 ½, -1/2, -1/2, -1/2, -1/2 102 ½, 106 ½, -1, -1, -1/2, -1/2, -1/2, -1/2 104 ½, -2, -1, -1, -1/2, -1/2, -1/2, -1/2 2 nd level details: average values of first half and second half

66 Wavelet transform  recursively apply the idea to the averages … 101, 102, 103, 104, 105, 106, 107, 108 101 ½, 103 ½, 105 ½, 107 ½, -1/2, -1/2, -1/2, -1/2 102 ½, 106 ½, -1, -1, -1/2, -1/2, -1/2, -1/2 104 ½, -2, -1, -1, -1/2, -1/2, -1/2, -1/2 3 rd level details: average values of first pair, second pair, third pair, and final pair

67 Wavelet transform  recursively apply the idea to the averages … 101, 102, 103, 104, 105, 106, 107, 108 101 ½, 103 ½, 105 ½, 107 ½, -1/2, -1/2, -1/2, -1/2 102 ½, 106 ½, -1, -1, -1/2, -1/2, -1/2, -1/2 104 ½, -2, -1, -1, -1/2, -1/2, -1/2, -1/2 full details: all values

apply wavelet transform to each row of pixels

averages diff’s

apply wavelet transform to each column of pixels

more important data less important data

74 JPEG vs. JPEG 2000 original JPEG 2000 at 0.27 bpp JPEG at 0.27 bpp author: Christopher M. Brislawn

75 JPEG vs. JPEG 2000 original JPEG 2000 at 1 bppJPEG at 1 bpp

76 JPEG vs. JPEG 2000 original JPEG 2000 at 0.5 bppJPEG at 0.5 bpp

1 Chapter 4: Compression (Part 2) Image Compression.

Similar presentations

Presentation on theme: "1 Chapter 4: Compression (Part 2) Image Compression."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Chapter 4: Compression (Part 2) Image Compression.

Similar presentations

Presentation on theme: "1 Chapter 4: Compression (Part 2) Image Compression."— Presentation transcript:

Similar presentations

About project

Feedback