Download presentation

Presentation is loading. Please wait.

Published byJennifer Fisher Modified over 3 years ago

1
Rate-distortion Optimization for MP3 and AAC Audio Coding with Complete Decoder Compatibility Jingming Xu Multimedia Communications Lab University of Waterloo

2
September 16 th, Outline Introduction and motivation MP3, AAC, and Two-nested-loop Search Rate-distortion optimization for MP3 Rate-distortion optimization for AAC Conclusions and Future Research

3
September 16 th, Introduction Audio coding - different from universal data compression Long term correlations Multi-channel correlations Subject to natural noises Subjective perceptual quality judgement Audio coding methods - for both lossy and lossless Linear prediction Time-frequency mapping (DCT, FFT, MDCT, etc.) Parameter coding ….

4
September 16 th, Introduction (2) MPEG - the most successful audio coding standard series so far MPEG-1 (1992) - T/F mapping based, 3 Layers with increased complexity MPEG-2 BC (1994) - backward compatible with MPEG-1, with multi-channel and sampling frequency extensions MPEG-2 AAC (1997) - introducing more coding tools and giving up backward compatibility to improve quality MPEG-4 AAC (1999) - inherited from MPEG-2 AAC with TwinTQ and bitrate scalability extensions MPEG-1 Layer 3 and MPEG-2 BC Layer 3 define the popular MP3

5
September 16 th, Introduction (3) Motivations MP3 and AAC leave structured encoding blocks design open for performance enhancement. The state-of-the-art MP3 and AAC quantization and entropy coding scheme, Two-nested-loop Search (TNLS), is essentially incapable to exploit the maximal standard- constrained flexibility for best rate-distortion tradeoff. The huge success of MP3 and AAC in the digital audio industry.

6
September 16 th, Introduction (4) Quality evaluation of compressed audio Most widely used objective measure - noise-to-mask ratio Most widely used subjective measure - ITU listening test (ITU-R Recommendation BS.1116) Triple sources A, B, C with hidden reference, double blind 5-grade impairment score scale

7
September 16 th, MP3 and AAC audio coding standards Encoding process Window switching Stereo coding Pre-processing in AAC: gain control, prediction, noise shaping and substitution, etc.

8
September 16 th, MP3 and AAC audio coding standards (2) Quantization and entropy coding in MP3 Scale factor bands and non-uniform quantization scale_factor values are encoded by fixed number of bits in the side information and variable number of bits in the main_data stream

9
September 16 th, MP3 and AAC audio coding standards (3) Quantization and entropy coding in MP3 Huffman coding 34 fixed Huffman codebooks Huffman coding region division: Each region is coded with a different codebook that best matches the statistics of that region. big_value, count_1, zero, ….

10
September 16 th, MP3 and AAC audio coding standards (4) Quantization and entropy coding in AAC Non-uniform quantizer: same as in MP3 scale_factor values are differentially encoded relatively to the one of the preceding band by fixed Huffman codebook Huffman coding 12 fixed Huffman codebooks Huffman coding region division: Section boundaries can only be at the scale factor band boundaries For each section, the length of the section in scale factor bands, and the index of the codebook used for that section, are transmitted with a fixed number of bits.

11
September 16 th, Two-nested-loop Search algorithm Inner LoopOuter Loop

12
September 16 th, Two-nested-loop Search algorithm (2) Problems in TNLS Quantization, scale factor adaption and Huffman coding are considered separately. Has no convergence guarantee Does not target at minimizing the overall distortion Disregards the inter-band correlations of scale factors and Huffman codebook selection in AAC

13
September 16 th, Rate-distortion optimization for MP3 Problem formulation Lagrangian RD cost minimization - quantized coefficients - scale factors - Huffman coding region division - Huffman codebook selection - non-uniform de-quantizer defined in MP3 - noise-to-mask ratio

14
September 16 th, Rate-distortion optimization for MP3 (2) Problem formulation Soft-decision quantization In conventional hard-decision quantization, is solely determined by given, i.e.,. However, in the soft-decision quantization scenario, is considered as a flexible coding factor and selected such that the actual RD cost can be minimized. Therefore,.

15
September 16 th, Rate-distortion optimization for MP3 (3) Fixed-slope graph-based iterative RD optimization Step 1: Initialize a set of scale factors from the given frame of spectrum with a HCB selection fashion. Set t=0, and specify a tolerance as the convergence criterion. Step 2: Given and for any t 0, find the optimal quantized spectrum and HCB region division fashion throughout a standard-constrained graph, where and achieve the minimum Denote by.

16
September 16 th, Rate-distortion optimization for MP3 (4) Graph Search for MP3 Quantized Spectrum and Region Division

17
September 16 th, Rate-distortion optimization for MP3 (5) Fixed-slope graph-based iterative RD optimization Step 3: Given, and, update to, so that achieves the minimum Step 4: Given, and, update to, so that achieves the minimum Step 5: Repeat Steps 2, 3 and 4 for t = 0,1,2…. Until, then output,, and.

18
September 16 th, Rate-distortion optimization for MP3 (6) Simulation results: ANMR ( implementation based on ISO MP3 reference codec ) violin.wavspme50_1.wav

19
September 16 th, Rate-distortion optimization for MP3 (7) Simulation results: ANMR ( implementation based on LAME Best-quality mode ) violin.wavspme50_1.wav

20
September 16 th, Rate-distortion optimization for MP3 (8) Simulation results: ITU listening test (80kb/s)

21
September 16 th, Rate-distortion optimization for MP3 (9) Remarks The iteration process may only achieve local optimality, thus a wisely chosen initial state is favored when one targets at achieving the best possible RD performance. The fixed-slope graph-based iterative algorithm we proposed provides a feasible solution to the problems in TNLS. One can adaptively adjust the value of, to meet rate or distortion constraints in real audio compression applications.

22
September 16 th, Rate-distortion optimization for AAC Problem formulation Lagrangian RD cost minimization - scale factor sequence - Huffman codebook index sequence first-order inter-band dependency -> Dynamic programming (Viterbi algorithm)

23
September 16 th, Rate-distortion optimization for AAC (2) Fixed-slope trellis-based RD optimization Step 1: Build up trellis structure. For each state, = 0,1, …., -1, = 0,1, …., -1, = 0,1, …., -1, in the trellis, find the best to minimize its decomposed RD cost Step 2: Find the optimal path throughout the Trellis by Viterbi algorithm Step 3: Backtrack the optimal, and as final output

24
September 16 th, Rate-distortion optimization for AAC (3) Trellis Structure for AAC Quantization and Entropy Coding

25
September 16 th, Rate-distortion optimization for AAC (4) Simulation results: ANMR Implementation based on ISO AAC reference codec Also compared with Aggarwal s approach (Steps 2, 3 only) violin.wavspme50_1.wav

26
September 16 th, Rate-distortion optimization for AAC (5) Simulation results: ITU listening test (64kb/s)

27
September 16 th, Rate-distortion optimization for AAC (6) Remarks The fixed-slope trellis-based algorithm we proposed achieves the global optimum RD performance within the quantization and entropy coding stage under the AAC standard constraints. Joint design of the pre-processing decisions with our proposed optimization can theoretically achieve the global optimum performance in the entire standard-constrained parameter space, however, with computational complexity exponential to the number of bands per frame.

28
September 16 th, Conclusions and Future Research Conclusions Fixed-slope approach converts the encoding problem to a search problem through a constrained space and then permits the implementation of efficient sequential search algorithm. Soft-decision quantization spirit completes our RD optimization frameworks, and introduces significant performance enhancement. Substantial performance improvement against the state- of-the-art encoders is achieved with complete decoder compatibility in each case.

29
September 16 th, Conclusions and Future Research (2) Future research Real-time implementations Extension to scalable AAC Joint pre-processing and optimization for AAC Optimal lossy audio compression without syntax constraints Optimal settings for transform (e.g. block lengths), quantization (e.g. stepsizes) and prediction Joint design of quantization and entropy coding ….

30
Questions?

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google