Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.

Similar presentations


Presentation on theme: "Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer."— Presentation transcript:

1 Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering, University of Miami Coral Gables, Florida 33124, U.S.A. DSAP

2 Abstract Speech recognition errors limit the capability of language models to predict subsequent words correctly Error analysis on Switchboard data show that : 87% of words proceeded by a correct word were correctly decoded 47% of words proceeded by incorrect word was correctly decoded An effective way to enhance the function of the language model is by using confidence measures Most of current efforts for developing confidence measures for speech recognition focus on the verification of the final result but doesn’t make any effort to correct recognition errors In this work, we use confidence measures early during the search process. A word-based acoustic confidence metric is used to define a dynamic language weight.

3 Using Confidence To Guide The Search The search score is changed from To the confidence based score Where A : Acoustic input W : The hypothesized word sequence P(A/W): The acoustic model score P(W) : The language model score LW :The language weight C(W) : The confidence of word sequence W

4 We used the functional form The word sequence confidence is estimated by the average of its words’ confidence. Where N : The number of words in sequence W C(w j ) : The confidence of word w j C 0 : The operation point threshold LW 0 : The static language weight r : A smoothing parameter

5 For bigram models we approximate by the current and previous words confidence LW as a function of C(W), LW 0 =6.5, C 0 =0.65

6 Constraints On The Measures Used For Confidence-Based Language Model (CBLM) Efficiency: Has to be computationally inexpensive Synchronization: Can be extracted from on-line information Source of information : Extracted only from acoustic data

7 Word Posterior As a Confidence Measure Ignored in all ASR systems

8 Observation Probability Estimation Theoretically: Discrete HMM: Semi-Continuous HMM: q : model states m(x) : vector quantization of x C : number of mixtures w iq : mixture weights g i (x): mixtures

9 Continuous HMM: Building a catch-all model

10 Mixtures Clustering Technique B distance : Bhattacharyya distance

11 Vector Quantization OV: observation vector CV i : code vector : Gaussian mixture mean Computation reduction using VQ

12 The Catch-all Model Performance Relative ROC performance of reduced catch-all models

13 Word Level Confidence Measures Arithmetic Mean Geometric Mean Weighted Mean CM(a): confidence score of phoneme a a, b : linear model parameters

14 Word Level Confidence Measures Performance ROC curves indicating the relative performance of CM am, CM gm and CM wm

15 Performance Evaluation Compared With Other Approaches Comparison of the catch-all model measure, the likelihood ratio(LR) measure and the word lattice based measure

16 Experimental Results Smoothing Parameter( r) Threshold 0.50.60.70.80.9 019.3% 118.6%18.43%18.41%18.3124% 218.9%18.42%18.4118.3022% 318.9%18.47%18.6318.4325% WER for different threshold and r values Recognition accuracy for words following correctly decoded and incorrectly decoded words

17 CONCLUSION AND FUTURE WORK We used a confidence metric to improve the integration of system models and guide the search towards the most promising paths Dynamic tuning of the language model weight parameter proved to be effective for performance improvement Word posterior based confidence measures are efficient and can be extracted from the online search side information.It doesn’t require the training of anti-models With CBLM the language model score will be favored in regions of ambiguous acoustics, but will plays a second fiddle when the acoustics are well matched. Future work: We plan to extend this work for the cases when we have high confidence only for one of the words, we should back off to the unigram language model score not completely reduce the language model score.


Download ppt "Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer."

Similar presentations


Ads by Google