Presentation is loading. Please wait.

Presentation is loading. Please wait.

Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 1 IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI Jean-Marc Valin,

Similar presentations


Presentation on theme: "Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 1 IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI Jean-Marc Valin,"— Presentation transcript:

1 Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 1 IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI Jean-Marc Valin, Roch Lefebvre University of Sherbrooke Bandwidth Extension of Narrowband Speech for Low Bit-Rate Wideband Coding

2 Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 2 Outline Problem statement Proposed solution System performance Discussion

3 Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 3 Problem Statement Telephone Band: 300 - 3400 Hz AM Band: 50 - 7000 Hz How to make sound like with 500 bits/sec? We need to recover information from both low and high-frequency bands (G.729)

4 Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 4 Proposed Solution 1) Do our best to recover the wideband information from narrowband speech 2) Use coding for the information that cannot be recovered –Recovered information : Low-frequency band High-frequency excitation –Coded information : High-frequency spectral envelope

5 Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 5 System Overview Inverse IRM filter is optional –produces a flat response from 200-3500 Hz Inverse IRM Filter Low-frequency regeneration 22 High-frequency regeneration 8 kHz narrowband 16 kHz wideband 50-300 Hz band 300-3400 Hz band 3400-8000 Hz band Side information

6 Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 6 Low-Frequency Regeneration (1/2) Assumptions : –Only pitch harmonics need to be recovered In general, no more than two pitch harmonics below 200 Hz –Absolute phase is not perceptually relevant Frequency of harmonics determined from pitch analysis Amplitudes determined from feed-forward multi- layer perceptron (output in log domain)

7 Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 7 Low-Frequency Regeneration (2/2) Low-frequency harmonic synthesis Scale and sum 22 1 st harmonic 2 nd harmonic Pitch analysis MFCC calculation Multi-layer Perceptron Pitch delay Cepstral coefficients Scale factors Narrowband speech Low frequencies Pitch gain LP filter (1) (16)

8 Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 8 High-Frequency Extension Excitation-filter model (16 ms frames) Problem is separated in two parts –Excitation extension (no side information) –Spectral envelope coding (side information) A(z)A(z) Narrowband speech LPC analysis Excitation extension B(z)B(z) 1 Spectral envelope Extension Side information (High-frequency spectral envelope) High- frequency band High pass

9 Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 9 Excitation Extension Narrowband excitation Absolute value Whitening filter wideband excitation 22 High pass

10 Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 10 Spectral Envelope Coding Spectral envelope calculated from the wideband LPC coefficients Quantization of the 3000-8000 Hz range (40 points) –Log domain –8-bit Vector Quantization (500 bits/s side information, using 16 ms frames) Concatenation with envelope obtained from LPC analysis on narrowband speech

11 Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 11 Objective results Low-frequency band –3 dB RMS error on harmonic amplitude High-frequency band –3.6 dB RMS error on envelope –No objective measure for excitation extension (perceptually very close to original)

12 Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 12 Subjective Results femalemale Original wideband Recovered from original IRM-filtered speech Recovered from G.729 coded speech

13 Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 13 Discussion Highlights –Expand IRM-filtered telephone-band speech to AM band –Very low side information rate (500 bits/s) Areas of improvement –Use high-band spectral estimation before coding –Use residual low-frequency information (below 300 Hz) –Noise robustness –Post-filtering


Download ppt "Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 1 IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI Jean-Marc Valin,"

Similar presentations


Ads by Google