Presentation is loading. Please wait.

Presentation is loading. Please wait.

A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.

Similar presentations


Presentation on theme: "A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝."— Presentation transcript:

1 A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝

2 2 Reference [1] Hui Jiang, Chin-Hui Lee, “A new approach to utterance verification based on neighborhood information in model space”,Speech and Audio Processing, IEEE Transactions on, Vol. 11, No. 5. (2003), pp. 425-434. [2] H. Jiang, K. Hirose, and Q. Huo, “Robust speech recognition based on Bayesian prediction approach,” IEEE Trans. Speech Audio Processing,vol. 7, pp. 426–440, July 1999. [3] N. Merhav and C.-H. Lee, “A minimax classification approach with application To robust speech recognition,” IEEE Trans. Speech Audio Processing, vol. 1, pp. 90–100, 1993.

3 3 Outline Introduction UV based on neighborhood information Bayes factors : a bayesian tool for verification problems. Experiments Summary and Conclusions

4 4 Introduction The major difficulty with likelihood ration test-based in utterance verification is how to model the alternative hypothesis. It is very important to know the properties of competing source distributions. In this paper, we are going to investigate a novel idea to perform utterance verification based on neighborhood information in model space.

5 5 UV based on neighborhood information Nested neighborhoods in model space :

6 6 UV based on neighborhood information (cont) Nested neighborhoods in model space (cont) : Fig. 1. Illustration of the structure of nested neighborhoods in HMM model space.

7 7 UV based on neighborhood information (cont) Nested neighborhoods in model space (cont) :

8 8 UV based on neighborhood information (cont) Nested neighborhoods in model space (cont) :

9 9 UV based on neighborhood information (cont) For a given speech segment X, assume that a ASR system recognizes it as word W which is represented by an HMM model Traditionally, We usually formulate UV as a statistical hypothesis testing problem. Here, we translate the above hypothesis testing into the following ones

10 10 UV based on neighborhood information (cont) Fig. 2. Illustration of hypothesis testing in the scenario of detecting speech recognition errors based on the neighborhood information.

11 11 Bayes factors The Bayesian approach to hypothesis testing involves the calculation and evaluation of the so- called Bayes factor. Given the observation X along with two hypotheses and, Bayes factors is computed as

12 12 Bayes factors (cont) In order to use Bayes factors to solve the hypothesis testing problem, i.e., two important issue must be addressed How to properly choose prior distribution p(.) of HMM model parameter for each hypothesis. How to quantitatively define neighborhoods

13 13 Bayes factors (cont)

14 14 Bayes factors (cont)

15 15 Bayes factors (cont)

16 16 Bayes factors (cont)

17 17 Bayes factors (cont)

18 18 Bayes factors (cont)

19 19 Bayes factors (cont)

20 20 Bayes factors (cont)

21 21 Bayes factors (cont)

22 22 Bayes factors (cont)

23 23 Bayes factors (cont)

24 24 Bayes factors (cont) In this paper, in order to balance contribution from different models in the neighborhood, we introduce an exponential scale factor into the integral calculation. The exponential scale factor is important equalize the contributions from different models in the neighborhood during the computation of Bayes factor. If we choose, the models with large likelihood values are emphasized. On the other hand if the models with smaller likelihood values will be put more weight.

25 25 Bayes factors (cont)

26 26 Bayes factors (cont)

27 27 Bayes factors (cont)

28 28 Bayes factors (cont)

29 29 Experiments We evaluate proposed methods on Bell Labs communicator system In our recognition system, we used a 38-dimension feature vector, consisting of 12 Mel LPCCEP, 12 delta CEP, 12 delta-delta CEP, delta and delta-delta log-energy The acoustic models are state-tied, tri-phone CDHMM models, which consist of roughly 4K distinct HMM states with an average 13.2 Gaussian mixture per state.

30 30 Experiments (cont) A class-based, tri-gram LM including 2600 words is used. The ASR system achieves 15.8% WER in our independent evaluation set, which includes in total 1395 utterances. Based on the word and phoneme segmentations generated by the recognizer, we calculate a confidence score for every recognized word.

31 31 Experiments (cont) Baseline system : likelihood ratio test. New approach with settings in Case I We choose neighborhood and constrained uniform prior distribution. Since we use static, delta and delta-delta feature, we slightly modify the neighborhood definition in (2) as

32 32 Experiments (cont) New approach with settings in Case I (cont) For the state-dependent setting, we first set up to a small value, and to a large value. According to (26) we have manually checked the range and New approach with settings in Case || We choose delta priors in (27) and (28) in the level of HMM state. At first, for each distinct state, we calculate its distance from all other states. The distance between two HMM states is computed as the minimum euclidean distance between every possible pair of Gaussian components from these states

33 33 Experiments (cont) New approach with settings in Case || (cont) For each state, we sort all other states according to their distances form the underlying state. In the first case, denoted as Case II-A, for each underlying HMM state, we choose neighborhood sizes to include exactly other states in and in In the second case, denoted as Case II-B, from the top 1500 sorted states, we choose neighborhood sizes for to include all other states with distance less than and one’s distance between and

34 34 Experiments (cont) TABLE I VERIFICATION PERFORMANCE COMPARISON (EQUAL ERROR RATE IN %) OF BASELINE UV METHOD (LRT + ANTI- MODELS) WITH THE PROPOSED NEW APPROACH IN SEVERAL DIFFERENT SETTINGS. IN EACH CASE, THE BEST PERFORMANCE OF THE NEW APPROACH AND ITS CORRESPONDING PARAMETER SETTING ARE GIVEN. HERE WE ALWAYS FIX = 1.2

35 35 Experiments (cont) Fig. 3. Comparison of ROC curves for different methods when verifying mis-recognized words against correctly recognized words in ASR outputs.

36 36 Summary and Conclusions The basic idea is to assume that all competing models of a given model sit inside one neighborhood of the underlying model. More research works are still need to search for a better neighborhood definition in high- dimension HMM model space. Another possible research direction for future works, in stead of Bayes factors, such as generalized likelihood ratio testing (GLRT) can also be used to implement the neighborhood based UV


Download ppt "A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝."

Similar presentations


Ads by Google