Philippe Gournay, Bruno Bessette, Roch Lefebvre

Philippe Gournay, Bruno Bessette, Roch Lefebvre
Universal Speech and Audio Codec Linear Prediction Domain processing Philippe Gournay, Bruno Bessette, Roch Lefebvre Université de Sherbrooke Département de Génie Electrique et Informatique Sherbrooke, Québec, Canada

Outline The 3GPP AMR-WB+ Standard Changes brought to LPD processing
Source of inspiration for LPD processing in USAC Changes brought to LPD processing Forward Aliasing Cancellation Frequency-Domain Noise Shaping Other changes Conclusion More efficient LPD processing Better unification of LPD and non-LPD FD coders

Context The 3GPP AMR-WB+ Standard Hybrid codec
Time (ACELP) and Frequency (TCX) Domain Very efficient on speech and speech-over-music contents

The AMR-WB+ Encoder ACELP PACKETIZATION 1 frame Bitstream Audio TCX
Mode Selection 1, 2 or 4 frames Mode Index, ISF

AMR-WB+ Frame Structure
ACELP Short TCX Medium TCX Long TCX One super-frame = 1024 samples (a) (b) (c) Three out of the 26 possible ACELP/TCX coding configurations

Transitions from ACELP to TCX
Zero-input response (ZIR) of LPC weighting filter provides pseudo-windowing Decoded TCX window ACELP Frame 1/8 overlap

Transitions from TCX to ACELP
Redundant windowed TCX samples are discarded Decoded TCX window Frame 1/8 overlap ACELP

Limitations of the AMR-WB+ model
Non-critically sampled transforms FFT vs. MDCT Inefficiencies at transitions between modes Sub-optimal windowing (from ACELP to TCX) Discarded samples (from TCX to ACELP) Transform windows not aligned with ACELP grid LPC analysis window also shifted to the right Even worse when switching with AAC Time-Domain Aliasing Cancellation (TDAC) Transitions between LPD and non-LPD processing

Changes brought to the LPD processing
Replaced FFTs by MDCTs Introduced Frequency Domain Noise Shaping Introduced Forward Aliasing Cancellation Other changes

Frequency Domain Noise Shaping
To unify processing of AAC and TCX frames, the MDCT transform in TCX is applied in the original signal domain Noise shaping for TCX frames is performed in the MDCT domain based on LPC filters mapped to the MDCT domain FDNS allows a smooth (sample-by-sample) time-domain noise envelope by applying a 1st-order filtering to the MDCT coefficients (similar in principle to TNS)

Effect of FDNS on the spectral shape and the time-domain envelope of the noise
time axis (n) A B C Noise gains g1[m] calculated at time position A Interpolated gains seen in the time domain, for each of the M bands Noise gains g2[m] calculated at time position B Frequency axis (k or m)

Frequency-Domain Noise Shaping
FDNS allows a smooth (sample-by-sample) time-domain noise envelope by applying a 1st-order filtering to the MDCT coefficients (similar in principle to TNS)

Forward Aliasing Cancellation
Introduced to compensate windowing and time-domain aliasing in MDCT-coded frames when switching to and from ACELP frames Windowing effect and Time Domain Aliasing ACELP synthesis TCX frame output Next ACELP frame - +

Forward Aliasing Cancellation
FAC is applied in the original signal domain FAC is quantized in the LPC weighted domain so that quantization noises of FAC and decoded MDCT are of the same nature For transition from ACELP to TCX, the ACELP synthesis can be taken into account; this reduces the bitrate needed to encode FAC

Computation of FAC targets for transitions from and to ACELP (encoder)
Signal in the original domain + - TCX frame output ACELP synthesis ACELP contribution TCX frame error (including ACELP contribution) ACELP error LPC1 LPC2 Windowed and folded ACELP synth Windowed ACELP ZIR Line 1 Line 2 Line 3 Line 4 Next ACELP frame FAC target

Quantization of FAC targets
W1(z) LPC1 FAC target DCT-IV Q DCT-IV-1 FAC synthesis 1/W1(z) 1/W1(z) ZIR Transmit to decoder Filter memory (ACELP error) Zero memory Transition from ACELP to TCX LPC2 FAC target FAC synthesis W2(z) DCT-IV Q DCT-IV-1 1/W2(z) Filter memory (TCX frame error) Zero memory Transmit to decoder Transition from TC to ACELP

Other changes brought to the LPD processing
Critical sampling MDCT vs. FFT FAC+FDNS Scalar quantizer + adaptive arithmetic coder for TCX (AMR-WB+ uses AVQ) Variable bit rate LPC quantizer Bit reservoir adaptation

Conclusion USAC makes use of LPD and non-LPD processing
LPD mode inspired by AMR-WB+ Non-LPD mode derived from AAC Substantial changes were brought to the LPD processing, and new tools were introduced to make it more efficient Frequency Domain Noise Shaping (FDNS) Forward Aliasing Cancellation (FAC) USAC is a real unification of two coding models

Thank you for your attention!

Philippe Gournay, Bruno Bessette, Roch Lefebvre

Similar presentations

Presentation on theme: "Philippe Gournay, Bruno Bessette, Roch Lefebvre"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Philippe Gournay, Bruno Bessette, Roch Lefebvre

Similar presentations

Presentation on theme: "Philippe Gournay, Bruno Bessette, Roch Lefebvre"— Presentation transcript:

Similar presentations

About project

Feedback