Presentation is loading. Please wait.

Presentation is loading. Please wait.

November 1, 2005IEEE MMSP 2005, Shanghai, China1 Adaptive Multi-Frame-Rate Scheme for Distributed Speech Recognition Based on a Half Frame-Rate Front-End.

Similar presentations


Presentation on theme: "November 1, 2005IEEE MMSP 2005, Shanghai, China1 Adaptive Multi-Frame-Rate Scheme for Distributed Speech Recognition Based on a Half Frame-Rate Front-End."— Presentation transcript:

1 November 1, 2005IEEE MMSP 2005, Shanghai, China1 Adaptive Multi-Frame-Rate Scheme for Distributed Speech Recognition Based on a Half Frame-Rate Front-End Zheng-Hua Tan, Paul Dalsgaard and Børge Lindberg Aalborg University, Denmark

2 November 1, 2005IEEE MMSP 2005, Shanghai, China2 Outline Background and motivation Half frame-rate front-end Experimental evaluation Adaptive multi-frame-rate DSR scheme Experimental evaluation Conclusions

3 November 1, 2005IEEE MMSP 2005, Shanghai, China3 Distributed speech recognition (DSR) – automatic speech recognition (ASR) over mobile networks Networking introduced challenges: Bandwidth limitations Transmission errors Background and motivation Feature extraction ASR decoding Word s Speech Network constraints Source & channel coding Source & channel decoding

4 November 1, 2005IEEE MMSP 2005, Shanghai, China4 Existing solutions: Source coding to compress speech features, e.g. split vector quantization, discrete cosine transform Channel coding and error concealment to protect and recover speech features Our alternative solutions: in the front-end feature extraction stage based on the redundancies known to exist in full frame-rate (FFR) features  half frame-rate (HFR) front-end adaptive multi-frame-rate scheme Background and motivation

5 November 1, 2005IEEE MMSP 2005, Shanghai, China5 Full frame-rate front-end Temporal correlation between speech features caused by Vocal tract inertia Overlapping in the feature extraction procedure: ms 00 4535252010 10 ms frame shift 15 ms overlap 25 ms frame length

6 November 1, 2005IEEE MMSP 2005, Shanghai, China6 Half frame-rate front-end 25 ms frame length & 20 ms frame shift  5 ms overlap But why is FFR front-end prevalent in ASR systems? And why is HFR front-end promising in DSR? ms 00 4535252010 20 ms frame shift 5 ms overlap 25 ms frame length

7 November 1, 2005IEEE MMSP 2005, Shanghai, China7 HFR front-end in DSR Observation: the performance degradation of DSR is marginal when packet loss occurs in short bursts on the condition that a proper error concealment technique is applied. so why not deliberately drop some packets (speech frames)?  HFR + repetition ‘error concealment’: Prior to server-side recognition, each HFR feature vector is repeated once to construct the FFR vector equivalent.

8 November 1, 2005IEEE MMSP 2005, Shanghai, China8 Experiments Recognition accuracy (%) across the front-ends for three databases using FFR models Repetition of each HFR feature vector is critical! Danish digitsCity namesAurora 2 (TI digits) FFR99.7979.2999.05 HFR-Repetition99.5979.2998.98 HFR- NoRepetition 96.6861.2571.12

9 November 1, 2005IEEE MMSP 2005, Shanghai, China9 Derived DSR schemes The FFR-based ETSI-DSR standard The HFR front-end – half the bit rate FFR-based one-frame coding FFR-based interleaving24 No delay when transmission errors as opposed to the regular interleaving! FFR-based multiple description coding (MDC): odd- numbered & even-numbered feature vectors

10 November 1, 2005IEEE MMSP 2005, Shanghai, China10 Comparison of DSR schemes Robustness against transmission errors (Word Error Rate %) Aurora 2 database corrupted by GSM error pattern 3 (4 dB C/I ratio) Error-free MDC Interleaving24 Half frame-rate – Repetition ETSI-DSR Standard No CRC Which is the best? WER

11 November 1, 2005IEEE MMSP 2005, Shanghai, China11 Adaptive multi-frame-rate scheme Client Front-End Server Back-End Channel Encoder Channel Decoder incl. EC Split VQ Decoder Recogniser Words Speech Split VQ Coder FFR Front-End Error-Prone Channel Network Context HFR Front-End

12 November 1, 2005IEEE MMSP 2005, Shanghai, China12 Conclusions Half frame-rate front-end for DSR: half frame-rate, half bit-rate, half client-side computation. comparable performance, but repetition of HFR features is critical. Adaptive multi-frame-rate DSR scheme HFR one-frame coding Interleaving no transmission errors, no delay MDC a performance close to error-free channel


Download ppt "November 1, 2005IEEE MMSP 2005, Shanghai, China1 Adaptive Multi-Frame-Rate Scheme for Distributed Speech Recognition Based on a Half Frame-Rate Front-End."

Similar presentations


Ads by Google