Presentation is loading. Please wait.

Presentation is loading. Please wait.

On the Integration of Speech Recognition into Personal Networks

Similar presentations


Presentation on theme: "On the Integration of Speech Recognition into Personal Networks"— Presentation transcript:

1 On the Integration of Speech Recognition into Personal Networks
- ICSLP2004, Korea Zheng-Hua Tan, Paul Dalsgaard and Børge Lindberg {zt, pd, Aalborg University, Denmark

2 The concept of personal networks (PN)
Outline Motivation The concept of personal networks (PN) Speech recognition in personal networks Robust against network degradations Towards QoS driven spoken language systems – adaptation to networks Frame-error-rate based OOV detection Conclusions

3 What is the future network environment?
Motivation Wireless communication presents a number of challenges to speech technology limited resources available in the terminals bandwidth constraints transmission errors and packet losses What is the future network environment?

4 The concept of personal networks (PN)
Home network Internet, UMTS, WLAN, etc. Smart building Corporate network Vehicular area network PAN Core PAN

5 Speech recognition in personal networks
Terminal based vs. network based Network based ASR architecture front-end source coding channel coding Distributed speech recognition (DSR)

6 Feature-pair and SVQ: The n’th vector is
The ETSI-DSR standard Feature-pair and SVQ: The n’th vector is Frame-pair: Feature-pair Subvector2 8 bits Subvector1 6 bits

7 Robust against network degradation
Analysis of error characteristics Burst-like vs. random Distribution functions of erroneous frames by length. Length ‘O’ covers lengths larger than 8. 100% 96% 56%

8 Robust against network degradation – cont.
Frame-pair vs. one-frame Vector based vs. subvector based One-frame based CRC Subvector based EC

9 Robust against network degradation – cont.
Consistency matrix and subvector concealment 0 for inconsistent 1 for consistent C =

10 Robust against network degradation – cont.
Experimental results – Aurora 2 Training: Clean speech training Test: clean data from Test set A

11 FER-based out-of-vocabulary detection
OOV Detection Reject the H0 hypothesis if H0: one of the IV words H1: OOV words modelled by one filler model O: speech signal observation T: threshold IV words OOV words Two kinds of errors False rejection (FR) False acceptance (FA)

12 FER-based out-of-vocabulary detection – cont.
Transmission errors change the probability density of the log-likelihood ratio in two ways: increasing the standard deviation shifting the mean

13 FER-based out-of-vocabulary detection – cont.
A FER-dependent threshold for OOV detection the threshold is modelled as a function of the frame-error-rate (FER) Experiments IV: Danish digits OOV: city names random bit errors aimed at a constant FR False rejection rate vs. AWGN channel BER values

14 Towards QoS driven spoken language systems - adaptation to networks
From front-end concealment to back-end adaptation to network degradations further to QoS dependant adaptation of spoken language processing and dialogue management modules. For example, the user can be requested to use a more restricted vocabulary and grammar or to switch to other modalities.

15 Shows the importance of applying a robust error concealment scheme
Conclusions Reviews the developments of incorporating speech technology into a network architecture distributed architecture of ASR in a wide range of devices. Shows the importance of applying a robust error concealment scheme Presents a FER based OOV detection method Points out that new research has to be initiated with the aim of introducing QoS-dependent modifications to existing ASR modules


Download ppt "On the Integration of Speech Recognition into Personal Networks"

Similar presentations


Ads by Google