Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Incorporating In-domain Confidence and Discourse Coherence Measures in Utterance Verification ドメイン内の信頼度と談話の整 合性 を用いた音声認識誤りの検出 Ian R. Lane, Tatsuya Kawahara.

Similar presentations


Presentation on theme: "1 Incorporating In-domain Confidence and Discourse Coherence Measures in Utterance Verification ドメイン内の信頼度と談話の整 合性 を用いた音声認識誤りの検出 Ian R. Lane, Tatsuya Kawahara."— Presentation transcript:

1 1 Incorporating In-domain Confidence and Discourse Coherence Measures in Utterance Verification ドメイン内の信頼度と談話の整 合性 を用いた音声認識誤りの検出 Ian R. Lane, Tatsuya Kawahara Spoken Language Communications Research Laboratories, ATR School of Informatics, Kyoto University

2 2 Introduction Current ASR technologies not robust against: –Acoustic mismatch: noise, channel, speaker variance –Linguistic mismatch: disfluencies, OOV, OOD Assess confidence of recognition hypothesis, and detect recognition errors Effective user feedback Select recovery strategy based on type of error and specific application

3 3 Previous Works on Confidence Measures Feature-based –[Kemp] word-duration, AM/LM back-off Explicit model-based –[Rahim] likelihood ratio test against cohort model Posterior probability –[Komatani, Soong, Wessel] estimate posterior probability given all competing hypotheses in a word-graph Approaches limited to “low-level” information available during ASR decoding

4 4 Proposed Approach Exploit knowledge sources outside ASR framework for estimating recognition confidence e.g. knowledge about application domain, discourse flow Incorporate CM based on “high-level” knowledge sources In-domain confidence – degree of match between utterance and application domain Discourse coherence – consistency between consecutive utterances in dialogue

5 5 CM in-domain (X i ):in-domain confidence CM discourse (X i |X i-1 ):discourse coherence CM(X i ): joint confidence score, combine above with generalized posterior probability CM gpp (X i ) CM discourse (X i |X i-1 ) XiXi CM in-domain (X i ) Topic Classification In-domain Verification dist (X i,X i-1 ) Input utterance CM gpp (X i ) ASR front-end Out-of-domain Detection CM(X i ) X i-1 CM in-domain (X i-1 ) Topic Classification In-domain Verification ASR front-end Out-of-domain Detection Utterance Verification Framework

6 6 In-domain Confidence Measure of topic consistency with application domain –Previously applied in out-of-domain utterance detection Examples of errors detected via in-domain confidence Mismatch of domain REF: How can I print this WORD file double-sided ASR: How can I open this word on the pool-side hypothesis not consistent by topic  in-domain confidence low Erroneous recognition hypothesis REF: I want to go to Kyoto, can I go by bus ASR: I want to go to Kyoto, can I take a bath hypothesis not consistent by topic  in-domain confidence low REF: correct transcriptionASR: speech recognition hypothesis

7 7 Input Utterance X i (recognition hypothesis) Feature Vector Topic confidence scores ( C(t 1 | X i ),...,C(t m | X i ) ) Transformation to Vector-space In-Domain Verification V in-domain (X i ) CM in-domain (X i ) In-domain confidence Classification of Multiple Topics SVM (1~m) In-domain Confidence

8 8 Input Utterance X i (recognition hypothesis) In-Domain Verification V in-domain (X i ) CM in-domain (X i ) Classification of Multiple Topics SVM (1~m) In-domain Confidence e.g. ‘could I have a non-smoking seat’ (a, an, …, room, …, seat, …, I+have, … (1, 0, …, 0, …, 1, …, 1, … accom. airplane airport … 0.050.360.94 90 % Transformation to Vector-space

9 9 In-domain Verification Model Linear discriminate verification model applied 1, …, m trained on in-domain data using “deleted interpolation of topics” and GPD [lane ‘04] C(t j |X i ): topic classification confidence score of topic t j for input utterance X j : discriminate weight for topic t j

10 10 Discourse Coherence Topic consistency with preceding utterance Examples of errors detected via discourse-coherence Erroneous recognition hypothesis Speaker A: Previous utterance [X i-1 ] REF: What type of shirt are you looking for? ASR: What type of shirt are you looking for? Speaker B: Current utterance [X i ] REF: I’m looking for a white T-shirt. ASR: I’m looking for a white teacher. topic not consistent across utterances  discourse coherence low REF: correct transcriptionASR: speech recognition hypothesis

11 11 Euclidean distance between current (X i ) and previous (X i-1 ) utterances in topic confidence space CM discourse large when X i, X i-1 related, low when differ Discourse Coherence

12 12 Joint Confidence Score Generalized Posterior Probability Confusability of recognition hypothesis against competing hypotheses [Lo & Soong] At utterance level: GWPP(x j ):generalized word posterior probability of x j x j :j-th word in recognition hypothesis of X

13 13 Joint Confidence Score where For utterance verification compare CM(X i ) to threshold (  ) Model weights ( gpp, in-domain, discourse ), and threshold (  ) trained on development set

14 14 Experimental Setup Training-set: ATR BTEC (basic-travel-expressions-corpus) –~400k sentences (Japanese/English pairs) –14 topic classes (accommodation, shopping, transit, …) –Train: topic-classification + in-domain verification models Evaluation data: ATR MAD (machine aided dialogue) –Natural dialogue between English and Japanese speakers via ATR speech-to-speech translation system –Dialogue data collected based on set of pre-defined scenarios –Development-set: 270 dialoguesTest-set: 90 dialogues On development set train: CM sigmoid transforms CM weights ( gpp, in-domain, discourse ) Verification threshold (  )

15 15 Speech Recognition Performance DevelopmentTest # dialogues27090 Japanese Side # utterances26741011 WER10.5%10.7% SER41.9%42.3% English Side # utterances30911006 WER17.0%16.2% SER63.5%55.2% ASR performed with ATRASR; 2-gram LM applied during decoding, rescore lattice with 3-gram LM

16 16 Evaluation Measure Utterance-based Verification –No definite “keyword” set in S-2-S translation –If recognition error occurs (one or more errors)  prompt user to rephrase entire utterance CER (confidence error rate) –FA: false acceptance of incorrectly recognized utterance –FR: false rejection of correctly recognized utterance

17 17 GPP-based Verification Performance Accept All: Assume all utterances are correctly recognized GPP: Generalized posterior probability Large reduction in verification errors compared with “Accept all” case CER 17.3% (Japanese) and 15.3% (English) Accept All GPP Accept All

18 18 CER reduced by 5.7% and 4.6% for “GPP+IC” and “GPP+DC” cases CER 17.3%  15.9% (8.0% relative) for “GPP+IC+DC” case Incorporation of IC and DC Measures (Japanese) GPP: Generalized posterior probability IC: In-domain confidence DC: Discourse coherence GPP +IC GPP +DC GPP +IC +DC GPP

19 19 Similar performance for English side CER 15.3%  14.4% for “GPP+IC+DC” case Incorporation of IC and DC Measures (English) GPP: Generalized posterior probability IC: In-domain confidence DC: Discourse coherence GPP +IC GPP +DC GPP +IC +DC GPP

20 20 Proposed novel utterance verification scheme incorporating “high-level” knowledge In-domain confidence: degree of match between utterance and application domain Discourse coherence: consistency between consecutive utterances Two proposed measures effective Relative reduction in CER of 8.0% and 6.1% (Japanese/English) Conclusions

21 21 “High-level” content-based verification Ignore ASR-errors that do not affect translation quality Further improvement in performance Topic Switching –Determine when users switch task Consider single task per dialogue session Future work


Download ppt "1 Incorporating In-domain Confidence and Discourse Coherence Measures in Utterance Verification ドメイン内の信頼度と談話の整 合性 を用いた音声認識誤りの検出 Ian R. Lane, Tatsuya Kawahara."

Similar presentations


Ads by Google