Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu, Jon Gibbs Huawei Technologies.

Similar presentations


Presentation on theme: "A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu, Jon Gibbs Huawei Technologies."— Presentation transcript:

1 A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu, Jon Gibbs Huawei Technologies Co. Ltd Beijing, China Kihyun Choo, Eunmi Oh Samsung Electronics Co., Ltd., Seoul, Korea Václav Eksler VoiceAge Corp., Montreal, QC, Canada

2 Agenda  3GPP EVS codec overview  Prior-art review of BWE techniques  Multi-mode FD BWE  Multi-Mode FD BWE with Relaxed Synchronization  BWE Switching Mechanism  Quality Evaluation  Conclusions

3 3GPP EVS codec overview  3GPP Enhanced Voice Services (EVS) codec standardized in Sep. 2014.  Significantly improves user experience  EVS codec encodes the signal based on the signal content:  Time-Domain LP coding techniques are used: ACELP and GSC GSC (Generic audio Signal Coding) is used to improve the quality of music/mixed segments in LP domain.  Frequency-Domain coding techniques

4 Prior-art Review of BWE techniques  BWE exploits the intrinsic correlation between the low and high frequency parts of a signal’s spectrum in order to reconstruct the high frequency part.  Spectral Band Replication (SBR) in MPEG-4 HE ‑ AAC.  A multi-mode bandwidth extension scheme in Recommendations ITU-T G.711.1 Annex D and G.722 Annex B.  SWB is a key feature of EVS as a new 3GPP codec, important to extend the bandwidth from WB to SWB/FB.  Time domain (TD) or Frequency domain (FD) BWE for different types of input signals.  The paper focuses on FD-BWE on top of either ACELP or GSC.  This switched bandwidth extension (BWE) approach improves the EVS codec LP based coding efficiency.

5 Multi-mode FD BWE  Concept standardized for the first time in Recommendations ITU-T G.711.1 Annex D and G.722 Annex B.  A transient detector identifies rapid variations of the high band signal over time.  4 classes : TRANSIENT (TS), HARMONIC (HM), NORMAL (NM) or NOISE (NS).  A combination of adaptive spectral envelope and time envelope coding, derived from the high band signal.  TRANSIENT frames: four spectral envelopes and four time envelopes.  Non-TRANSIENT frames: fourteen spectral envelopes and no time envelope.  The high frequency band excitation is generated by either normalizing the selected region of the low frequency band with an adaptive normalization length or generated by random noise.

6  The design constraints imposed upon the EVS codec specify that the total algorithmic delay of the codec must not exceed 32 ms.  Transform used by FD-BWE: an Asymmetric Low Delay Optimized (ALDO) window with a time support of 40 ms while the non-zero window length is 28.75 ms.  Still insufficient delay allowance remaining for the FD BWE to achieve the overall 32 ms delay requirement.  Multi-mode BWE in ITU-T G.711.1 Annex D and G.722 Annex B as a baseline. Then the new low delay FD-BWE approach relies on a relaxed synchronization scheme, based on the multi-mode BWE in ITU-T codecs. FD BWE constraints in EVS

7  The relaxed synchronization of multi-mode FD BWE is achieved by utilizing the time difference between the high frequency band excitation and the high frequency band envelope.  Assume D 1 is the delay of the low frequency band coding, D 2 is the delay of high frequency band coding where D 2 is introduced by the windowing prior to the MDCT.  The total algorithmic delay of the prior bandwidth extension algorithms is (D 1 + D 2 ). By the proposed scheme, the delay may be adaptively reduced to the range [max(D 1, D 2 ), (D 1 +D 2 )] with minimal impact on the perceptual quality.  When the spectrum is relatively stable, the phase of the high frequency band signal is of relatively minor perceptual significance when compared to its energy.  By reducing the delay, the time alignment between the energy envelope of the low and high frequency band signals is maintained while a short time misalignment (or a relaxation) of the synchronization between the high frequency band excitation and envelopes is permitted. Relaxed synchronization

8  An asymmetric low delay optimized (ALDO) window is used for FD BWE.  Assumes that the target overall delay is D t, which is in the range [max(D 1, D 2 ), (D 1 +D 2 )]. The input time domain signal may be delayed by (D t −D 2 ). The time misalignment between the high frequency band excitation and the high frequency band envelope is {(D 1 +D 2 )−D t }. Consequently a lower delay than (D 1 +D 2 ) is achieved by the proposed FD BWE. Time alignment

9  The time envelope is calculated on top of the delayed high frequency band time domain signal:  It is then adjusted by an attenuation factor R, which represents the energy attenuation of the low frequency band due to the LP based low band coding:  Finally the time envelope is adjusted:  Spectral envelopes : multi-stage split VQ.  The envelopes at even positions are quantized by Split VQs.  The prediction errors at odd positions are calculated with interpolation and quantized by another stage of the Split VQ. Transient frames

10  Non-Transient frames  Energy control in each sub-band to prevent unpleasant distortion.  The distortion may occur due to the un-matched characteristics of the original and the generated spectra.  Energy control adjusts the energies depending on the comparison of the tonalities of the two spectra to avoid the distortion.  Adaptive normalization length to generate the high frequency band excitation  The more harmonic the high frequency band is, the longer normalization length.  Depends on the number of the sub-bands of the low frequency band whose peak to average ratio is larger than a threshold, n h,  A pre-echo reduction is performed to improve the performance of fricatives for Non-Transient frames. Other techniques

11  The proposed relaxed synchronization FD BWE scheme is applied in the EVS codec for WB at 13.2 kbps and for SWB at 13.2 kbps and 32 kbps.  The BWE bit budget for SWB is 31 bits, while it is 6 bits for WB.  For WB coding, there are fewer spectral envelopes and no time envelopes encoded since the WB FD BWE covers only frequencies from 6~8 kHz.  The delay parameters correspond in the EVS codec to  D t = 12 ms (overall delay constraint minus the frame length),  D 1 = 9.6875 ms (encoder look-ahead plus encoder resampling)  D 2 = 8.75 ms (overlap length).  This results in the time misalignment between the high frequency band excitation and the high frequency band envelope being 6.4375 ms. EVS implementation details

12 BWE Switching Mechanism  In general, TD BWE on top of ACELP performs well when encoding active speech segments and FD BWE on top of GSC performs well when encoding inactive and mixed/music segments. However, some mixed/music segments are better coded with ACELP coding and FD BWE.  If the input signal is classified as a music signal, or the low frequency band signal is classified as inactive, multi-mode FD BWE is used irrespective of whether the low frequency band is coded with ACELP or GSC. Otherwise, if the input signal is classified as a speech signal, TD BWE is used no matter how the low frequency band has been coded.  When the high frequency band signal is judged to contain inactive or mixed/music content signals then FD BWE is used as the high band coding technology.

13 Quality Evaluation  MUSHRA, 95% confidence intervals, 16 expert listeners, 16 mixed content items and 16 music items.  Two variants were evaluated:  Low delay (LD) FD BWE configured to have overall delay D t = 12 ms.  High delay (HD) FD BWE configured to have overall delay of D t = (D 1 +D 2 ) = 18.4375 ms.  EVS SWB at 13.2 kbps.  Low delay FD BWE is statistically equivalent to the high delay FD BWE.

14 Conclusions  A novel multi-mode FD BWE scheme with relaxed synchronization optimized for inactive and mixed/music content signals is presented.  It forms a part of the LP based coding of the 3GPP EVS codec.  High subjective quality and low algorithmic delay are achieved by relaxing the time alignment constraints between the high frequency band excitation and its envelope.  A switching mechanism between two different BWE technologies shows a performance advantage.

15 Thank you!


Download ppt "A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu, Jon Gibbs Huawei Technologies."

Similar presentations


Ads by Google