MP3 and AAC Trac D. Tran ECE Department The Johns Hopkins University Baltimore MD 21218.

MP3 and AAC Trac D. Tran ECE Department The Johns Hopkins University Baltimore MD 21218

MP3  MP3 = MPEG2 Layer III audio coding  Transform: cascade of 32- channel filter bank and 6- channel or 18- channel MDCT  Quantization: uniform scalar quantizer with a psycho-acoustic model  Entropy coding: run-length + Huffman

Transformation Stage in MP3 H (z) 0 1 x[n] 32 32 31 32 0 1 6 6 6 6 32-channel 512-tap CMFB 6-channel 12-tap MLT/MDCT H (z) 0 1 32 32 31 32 18-channel 36-tap MLT/MDCT transients steady-state

Masking  Masking discovered from psycho-acoustic experiments  Human auditory system is less sensitive around a strong tonal signal

Masking: Original Signal

Masking Threshold  Signal components below the masking threshold are deemed insignificant (can be quantized to zero)  Components are computed from overlapping 1024-long Hanning windows

Advanced Audio Coding (AAC)  Successor of MP3  Better audio quality than MP3 at most bit rates  Perceptually lossless at 320 kbps for 5-channel surround sound (64 kbps/channel)  Almost CD quality at 96 kbps (48 kbps/channel)  AAC is part of the MPEG4 Standard  Default audio format of Apple’s iPhone, iPod, iTunes; Sony PlayStation 3; Nintendo Wii  MDCT – Scalar Quantization – Huffman Coding

Transformation Stage in AAC H (z) 0 1 128 128 127 128 128-channel 256-tap MDCT H (z) 0 1 1024 1023 1024-channel 2048-tap MDCT for transient signals 1024 1024 for steady-state signals  AAC adaptively switches between  8 blocks of 128-point MDCT with 256-point windows  1 block of 1024-point MDCT with 2048-point window  All windows have 50% overlap x[n] x[n]

JPEG Still Image Coding Standard Trac D. Tran ECE Department The Johns Hopkins University Baltimore MD 21218

Overall Structure of JPEG Color Converter Level Offset 8x8 DCT Uniform Quant. DC Pred. DC VLC Zigzag Scan Run -Level AC VLC DC AC  Color converter  RGB to YUV  Level offset  subtract 2^(N-1). N: bits / pixel.  Quantization  Different step size for different coefficients  DC  Predict from DC of previous block  AC:  Zigzag scan to get 1-D data  Run-level: joint coding of non-zero coeffs and number of zeros before

JPEG Quantization  Uniform mid-tread quantizer  Larger step sizes for chroma components  Different coefficients have different step sizes  Smaller steps for low frequency coefficients (more bits)  Larger steps for high frequency coefficients (less bits)  Human visual system is not sensitive to error in high frequency 16 11 10 16 24 40 51 51 12 12 14 19 26 58 60 55 14 13 16 24 40 57 69 56 14 17 22 29 51 87 80 62 18 22 37 56 68 109 103 77 24 35 55 64 81 104 113 92 49 64 78 87 103 121 120 101 72 92 95 98 112 100 103 99  Chroma Quantization Table 17 18 24 47 99 99 99 99 18 21 26 66 99 99 99 99 24 26 56 99 99 99 99 99 47 66 99 99 99 99 99 99 99 99 99 99  Luma Quantization Table  Actual step size: Scale the basic table by a quality factor

Scaling of Quantization Table  Actual Q table = scaling x Basic Q table:  quality factor ≤ 50: scaling = 50/quality  quality factor > 50: scaling = 2 - quality/50 Quality Factor Scaling ---------------------------------- 10 5.0 20 2.5 50 1.0 75 0.5 16 11 10 16 24 40 51 51 12 12 14 19 26 58 60 55 14 13 16 24 40 57 69 56 14 17 22 29 51 87 80 62 18 22 37 56 68 109 103 77 24 35 55 64 81 104 113 92 49 64 78 87 103 121 120 101 72 92 95 98 112 100 103 99

DC Prediction  DC Coefficients: average of a block  DC of neighboring blocks are still similar to each others: redundancy  The redundancy can be removed by differential coding:  e(n) = DC(n) – DC(n-1)  Only encode the prediction error e(n) 8x8 DC coeffs of Lena

Coefficient Category  Divide coefficients into categories of exponentially increased sizes  Use Huffman code to encode category ID  Use fixed length code within each category  Similar to Exponential Golomb code RangesRange SizeDC Cat. IDAC Cat. ID 010N/A -1, 1211 -3, -2, 2, 3422 -7, -6, -5, -4, 4, 5, 6, 7833 -15, …, -8, 8, …, 151644 -31, …, -16, 16, …, 313255 -63, …, -32, 32, …, 636466 ………… [-32767, -16384], [16384, 32767] 3276815

Coding of DC Coefficients  Encode e(n) = DC(n) – DC(n-1) 8x8 DC Cat.Prediction ErrorsBase Codeword 00010 1-1, 1011 2-3, -2, 2, 3100 3-7, -6, -5, -4, 4, 5, 6, 700 4-15, …, -8, 8, …, 15101 5-31, …, -16, 16, …, 31110 6-63, …, -32, 32, …, 631110 ……… Our example: DC: 8. Assume last DC: 5  e = 8 – 5 = 3. Cat.: 2, index 3  Bitstream: 10011

Coding of AC Coefficients  Most non-zero coefficients are in the upper-left corner  Zigzag scanning  Example 8 24 -2 0 0 0 0 0 -31 -4 6 -1 0 0 0 0 0 -12 -1 2 0 0 0 0 0 0 -2 -1 0 0 0 0 0 0 0 0 0 0 0 0  Zigzag scanning result (DC is coded separately): 24 -31 0 -4 -2 0 6 -12 0 0 0 -1 -1 0 0 0 2 -2 0 0 0 0 0 -1 EOB

A Complete Example 124 125 122 120 122 119 117 118 121 121 120 119 119 120 120 118 126 124 123 122 121 121 120 120 124 124 125 125 126 125 124 124 127 127 128 129 130 128 127 125 143 142 143 142 140 139 139 139 150 148 152 152 152 152 150 151 156 159 158 155 158 158 157 156 39.8 6.5 -2.2 1.2 -0.3 -1.0 0.7 1.1 -102.4 4.5 2.2 1.1 0.3 -0.6 -1.0 -0.4 37.7 1.3 1.7 0.2 -1.5 -2.2 -0.1 0.2 -5.6 2.2 -1.3 -0.8 1.4 0.2 -0.1 0.1 -3.3 -0.7 -1.7 0.7 -0.6 -2.6 -1.3 0.7 5.9 -0.1 -0.4 -0.7 1.9 -0.2 1.4 0.0 3.9 5.5 2.3 -0.5 -0.1 -0.8 -0.5 -0.1 -3.4 0.5 -1.0 0.8 0.9 0.0 0.3 0.0 2 1 0 0 0 0 0 0 -9 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  Original data:2-D DCT  Quantized by basic table  Zigzag scanning 2 1 -9 3 EOB Q table: 16 11 … 12 … 14 … floor(39.8/16 + 0.5) = 2 floor(6.5/11 + 0.5) = 1 -floor(102.4/12 + 0.5) = -9 floor(37.7/14 + 0.5) = 3

A Complete Example  Zigzag scanning 2 1 -9 3 EOB 32 11 0 0 0 0 0 0 -108 0 0 0 0 0 0 0 42 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  Inverse Quantization 122 122 121 121 120 119 119 118 121 121 120 119 119 118 117 117 120 120 120 119 118 117 117 117 123 123 122 122 121 120 120 120 131 130 130 129 128 128 127 127 142 141 141 140 139 139 138 138 153 152 152 151 150 150 149 149 159 159 159 158 157 157 156 156  Reconstructed block  MSE: 5.67

Progressive JPEG  Baseline JPEG encodes the image block by block:  Decoder has to wait till the end to decode and display the entire image  Progressive: Coding DCT coefficients in multiple scans  The first scan generates a low-quality version of the entire image  Subsequent scans refine the entire image gradually.  Two procedures defined in JPEG:  Spectral selection:  Divide all DCT coefficients into several bands (low, middle, high frequency subbands…)  Bands are coded into separate scans  Successive approximation:  Send MSB of all coefficients first  Send lower significant bits in subsequent scans

JPEG Coding Result for Lena Quality factor: 5 25 50 75 90 QF 25 QF 5 Blocking artifact

Summary  Transformation  Karhunen-Loeve Transform (KLT): optimal linear transform  Discrete Cosine Transform (DCT): for images & video  MDCT: overlapped higher frequency resolution for audio  Discrete Wavelet Transform (DWT): multi-resolution representation  MP3 & AAC  Audio coding: FB/MDCT – Quantization – Huffman  JPEG: first international compression standard for still images  DCT – Quantization – Run-length – Huffman  JPEG2000: latest technology, wavelet-based  Scalable, progressive coding with flexible intelligent functionalities

MP3 and AAC Trac D. Tran ECE Department The Johns Hopkins University Baltimore MD 21218.

Similar presentations

Presentation on theme: "MP3 and AAC Trac D. Tran ECE Department The Johns Hopkins University Baltimore MD 21218."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

MP3 and AAC Trac D. Tran ECE Department The Johns Hopkins University Baltimore MD 21218.

Similar presentations

Presentation on theme: "MP3 and AAC Trac D. Tran ECE Department The Johns Hopkins University Baltimore MD 21218."— Presentation transcript:

Similar presentations

About project

Feedback