Presentation is loading. Please wait.

Presentation is loading. Please wait.

Histogram-based Quantization for Distributed / Robust Speech Recognition Chia-yu Wan, Lin-shan Lee College of EECS, National Taiwan University, R. O. C.

Similar presentations


Presentation on theme: "Histogram-based Quantization for Distributed / Robust Speech Recognition Chia-yu Wan, Lin-shan Lee College of EECS, National Taiwan University, R. O. C."— Presentation transcript:

1 Histogram-based Quantization for Distributed / Robust Speech Recognition Chia-yu Wan, Lin-shan Lee College of EECS, National Taiwan University, R. O. C. 2007/08/16

2 Outline Introduction Histogram-based Quantization (HQ) Joint Uncertainty Decoding (JUD) Three-stage Error Concealment (EC) Conclusion

3 Problems of Distance-based VQ Conventional Distance-based VQ (e.g. SVQ) was popularly used in DSR Dynamic Environmental noise and codebook mismatch jointly degrade the performance of SVQ Histogram-based Quantization (HQ) is proposed to solve the problems Noise moves clean speech to another partition cell (X to Y) Mismatch between fixed VQ codebook and test data increases distortion Quantization increases difference between clean and noisy features

4 Decision boundaries y i {i=1,…,N} are dynamically defined by C(y). Representative values z i {i=1,…,N} are fixed, transformed by a standard Gaussian. Histogram-based Quantization (HQ) T

5 T The actual decision boundaries (horizontal scale) for x t are dynamically defined by the inverse transformation of C(y).

6 Histogram-based Quantization (HQ) With histogram C’(y’), decision boundaries automatically changed to. Decision boundaries are adjusted according to local statistics, no codebook mismatch problem. T

7 Histogram-based Quantization (HQ) Based on CDF on the vertical scale and histogram, less sensitive to noise on the horizontal scale Disturbances are automatically absorbed into HQ block Dynamic nature of HQ  hidden codebook on vertical scale  transformed by dynamic C(y)  {y i } Dynamic on horizontal scale T

8 Histogram-based Vector Quantization (HVQ)

9 Discussions about robustness of Histogram-based Quantization (HQ) Distributed speech recognition: SVQ v.s. HQ Robust speech recognition: HEQ v.s. HQ

10 Comparison of Distance-based VQ and Histogram-based Quantization (HQ) Distance-based VQ (SVQ)Histogram-based Quantization (HQ) HQ solves the major problems of conventional Distance-based VQ Fixed codebook cannot well represent the noisy speech Dynamically adjusted to local statistics, no codebook mismatch Quantization increases difference between clean and noisy speech. Inherent robust nature, noise disturbances automatically absorbed by C(y)

11 HEQ performed point-to-point transformation point-based order-statistics are more disturbed HQ performed block-based transformation automatically absorbed disturbance within a block with proper choice of block size, block uncertainty can be compensated by GMM and uncertainty decoding Averaged normalized distance between clean and corrupted speech features based on AURORA 2 database HEQ (Histogram Equalization) v.s. HQ (Histogram-based Quantization)

12 HEQ performed point-to-point transformation point-based order-statistics are more disturbed HQ performed block-based transformation automatically absorbed disturbance within a block with proper choice of block size, block uncertainty can be compensated by GMM and uncertainty decoding HEQ (Histogram Equalization) v.s. HQ (Histogram-based Quantization) HQ gives smaller d for all SNR condition less influenced by the noise disturbance

13 HQ as a feature transformation method

14 HQ as a feature quantization method

15

16

17

18

19

20 Further analysis Bit rates v.s. SNR Clean-condition trainingmulti-condition training

21 HQ-JUD For both robust and/or distributed speech recognition For robust speech recognition HQ is used as the front-end feature transformation JUD as the enhancement approach at the backend recognizer For Distributed Speech Recognition (DSR) HQ is applied at the client for data compression JUD at the server Front-end Back-end ClientServer Robustness DSR

22 Joint Uncertainty Decoding (1/4) - Uncertainty Observation Decoding HMM would be less discriminate on features with higher uncertainty  Increasing larger variance for more uncertain features w: observation, o: uncorrupted features Assume

23 Joint Uncertainty Decoding (2/4) - Uncertainty for quantization errors Codeword is the observation w Samples in the partition cell are the uncorrupted features o p(o) is the pdf of the samples within the partition cell Variance of samples within partition cell

24 More uncertain regions Loosely quantized cells Joint Uncertainty Decoding (2/4) - Uncertainty for quantization errors Codeword is the observation w Samples in the partition cell are the possible distribution o p(o) is the pdf of the samples within the partition cell Increases the variances for the loosely quantized cells Variance of samples within partition cell

25 Joint Uncertainty Decoding (3/4) -Uncertainty for environmental noise Increase the variances for HQ features with a larger histogram shift Histogram shift

26 Jointly consider the uncertainty caused by both the environmental noise and the quantization errors. One of the above two would dominate Quantization errors (High SNR)  Disturbance absorbed into HQ block Environment noise (Low SNR)  Noisy features moved to another partition cells Joint Uncertainty Decoding (4/4)

27 HQ-JUD for robust speech recognition

28 Different types of noise, averaged over all SNR values Client HEQ-SVQ Client HEQ-SVQ Server UD Client HQ Client HQ Server JUD HQ-JUD for distributed speech recognition

29 Different types of noise, averaged over all SNR values Client HEQ-SVQ HEQSVQ-UD was slightly worse than HEQ for set C Client HEQ-SVQ Server UD HQ-JUD for distributed speech recognition

30 Different types of noise, averaged over all SNR values HEQSVQ-UD was slightly worse than HEQ for set C HQ-JUD consistently improved the performance of HQ Client HQ Client HQ Server JUD HQ-JUD for distributed speech recognition

31 Different types of noise, averaged over all SNR values Client HEQ-SVQ Client HQ HQ performed better than HEQ-SVQ for all types of noise HQ-JUD for distributed speech recognition

32 Different types of noise, averaged over all SNR values HQ performed better than HEQ-SVQ for all types of noise HQ-JUD consistently performed better than HEQSVQ-UD Client HQ Server JUD Client HEQ-SVQ Server UD HQ-JUD for distributed speech recognition

33 Different SNR conditions, averaged over all noise types HQ-JUD significantly improved the performance of SVQ-UD HQ-JUD consistently performed better than HEQSVQ-UD Client HEQ-SVQ Server UD Client HQ Server JUD Client SVQ Server UD Client HQ Server JUD HQ-JUD for distributed speech recognition

34 Three-stage error concealment (EC)

35 Stage 1 : error detection Frame-level error detection The received frame-pairs are first checked with CRC Subvector-level error detection The erroneous frame-pairs are then checked by the HQ consistency check The quantized codewords for HQ represent the order-statistics information of the original parameters Quantizaiton process does not change the order-statistics Re-perform HQ on received subvector codeword should fall in the same partition cell

36 Stage 1 : error detection Noise seriously affects the SVQ with data consistency check -precision degradation (from 66% at clean down to 12% at 0 dB) HQ-based consistency approach is much more stable at all SNR values, - both recall and precision rates are higher.

37 Stage 2 : reconstruction Based on the Maximum a posterior (MAP) criterion -Considering the probability for all possible codewords S t (i) at time t, given the current and previous received subvector codewords, R t and R t-1, -prior speech source statistics : HQ codeword bigram model -channel transition probability : the estimated BER from stage1 -reliability of the received subvectors : consider the relative reliability between prior speech source and wireless channel priorchannel

38  Channel transition probability P(R t | S t (i)) -significantly differentiated (for different codeword i, with different d) when R t is more reliable (BER is smaller) -put more emphasis on prior speech source when R t is less reliable -the estimated BER is the number of inconsistent subvectors in the present frame divided by the total number of bits in the frame Stage 2 : reconstruction

39 Prior source information P(S t (i)| R t-1 ) -based on the codeword bi-gram trained from cleaning training data in AURORA 2 -HQ can estimate the lost subvectors more preciously than SVQ -The conditional entropy measure Stage 2 : reconstruction

40 Stage 3 : Compensation in Viterbi decoding The distribution of P(S t (i)|R t,R t-1 ) characterizes the uncertainty of the estimated features Assume the distribution P(S t (i)|R t,R t-1 ) is Gaussian, the variance of the distribution P(S t (i)|R t,R t-1 ) is used in Uncertainty Decoding Make the HMMs less discriminative for the estimated subvectors with higher uncertainty

41 HQ-based DSR system with transmission errors Features corrupted by noise are more susceptible to transmission errors For SVQ, 98% to 87% (clean), 60% to 36% (10 dB SNR)

42 HQ-based DSR system with transmission errors The improvements that HQ offered over HEQ-SVQ when transmission errors were present are consistent and significant at all SNR values HQ is robust against both environmental noise and transmission errors

43 Analyze the degradation of recognition accuracy caused by transmission errors Comparison of SVQ, HEQ-SVQ and HQ for the percentage of words which were correctly recognized if without transmission errors, but incorrectly recognized after transmission.

44 HQ-Based DSR with Wireless Channels and Error Concealment ETSI repetition technique actually degraded the performance of HEQ-SVQg the whole feature vectors including the correct subvectors are replaced by inaccurate estimations g: GPRS r: ETSI repetition c: three-stage EC

45 HQ-Based DSR with Wireless Channels and Error Concealment Three-stage EC improved the performance significantly for all cases. Robust against not only transmission errors, but against environmental noise as well. g: GPRS r: ETSI repetition c: three-stage EC

46 HQ-Based DSR with Wireless Channels and Error Concealment

47 Different client traveling speed (1/3)

48 Different client traveling speed (2/3)

49 Different client traveling speed (3/3)

50 Conclusions Histogram-based Quantization (HQ) is proposed a novel approach for robust and/or distributed speech recognition (DSR) robust against environmental noise (for all types of noise and all SNR conditions) and transmission errors For future personalized and context aware DSR environments HQ can be adapted to network and terminal capabilities with recognition performance optimized based on environmental conditions

51 Thank you for your attention


Download ppt "Histogram-based Quantization for Distributed / Robust Speech Recognition Chia-yu Wan, Lin-shan Lee College of EECS, National Taiwan University, R. O. C."

Similar presentations


Ads by Google