1 Incorporating In-domain Confidence and Discourse Coherence Measures in Utterance Verification ドメイン内の信頼度と談話の整合性を用いた音声認識誤りの検出 Ian R. Lane, Tatsuya Kawahara.

1 Incorporating In-domain Confidence and Discourse Coherence Measures in Utterance Verification ドメイン内の信頼度と談話の整合性を用いた音声認識誤りの検出 Ian R. Lane, Tatsuya Kawahara Spoken Language Communications Research Laboratories, ATR School of Informatics, Kyoto University

2 Introduction Current ASR technologies not robust against: –Acoustic mismatch: noise, channel, speaker variance –Linguistic mismatch: disfluencies, OOV, OOD Assess confidence of recognition hypothesis, and detect recognition errors Effective user feedback Select recovery strategy based on type of error and specific application

3 Previous Works on Confidence Measures Feature-based –[Kemp] word-duration, AM/LM back-off Explicit model-based –[Rahim] likelihood ratio test against cohort model Posterior probability –[Komatani, Soong, Wessel] estimate posterior probability given all competing hypotheses in a word-graph Approaches limited to “low-level” information available during ASR decoding

4 Proposed Approach Exploit knowledge sources outside ASR framework for estimating recognition confidence e.g. knowledge about application domain, discourse flow Incorporate CM based on “high-level” knowledge sources In-domain confidence – degree of match between utterance and application domain Discourse coherence – consistency between consecutive utterances in dialogue

5 CM in-domain (X i ):in-domain confidence CM discourse (X i |X i-1 ):discourse coherence CM(X i ): joint confidence score, combine above with generalized posterior probability CM gpp (X i ) CM discourse (X i |X i-1 ) XiXi CM in-domain (X i ) Topic Classification In-domain Verification dist (X i,X i-1 ) Input utterance CM gpp (X i ) ASR front-end Out-of-domain Detection CM(X i ) X i-1 CM in-domain (X i-1 ) Topic Classification In-domain Verification ASR front-end Out-of-domain Detection Utterance Verification Framework

6 In-domain Confidence Measure of topic consistency with application domain –Previously applied in out-of-domain utterance detection Examples of errors detected via in-domain confidence Mismatch of domain REF: How can I print this WORD file double-sided ASR: How can I open this word on the pool-side hypothesis not consistent by topic  in-domain confidence low Erroneous recognition hypothesis REF: I want to go to Kyoto, can I go by bus ASR: I want to go to Kyoto, can I take a bath hypothesis not consistent by topic  in-domain confidence low REF: correct transcriptionASR: speech recognition hypothesis

7 Input Utterance X i (recognition hypothesis) Feature Vector Topic confidence scores ( C(t 1 | X i ),...,C(t m | X i ) ) Transformation to Vector-space In-Domain Verification V in-domain (X i ) CM in-domain (X i ) In-domain confidence Classification of Multiple Topics SVM (1~m) In-domain Confidence

8 Input Utterance X i (recognition hypothesis) In-Domain Verification V in-domain (X i ) CM in-domain (X i ) Classification of Multiple Topics SVM (1~m) In-domain Confidence e.g. ‘could I have a non-smoking seat’ (a, an, …, room, …, seat, …, I+have, … (1, 0, …, 0, …, 1, …, 1, … accom. airplane airport … 0.050.360.94 90 % Transformation to Vector-space

9 In-domain Verification Model Linear discriminate verification model applied 1, …, m trained on in-domain data using “deleted interpolation of topics” and GPD [lane ‘04] C(t j |X i ): topic classification confidence score of topic t j for input utterance X j : discriminate weight for topic t j

10 Discourse Coherence Topic consistency with preceding utterance Examples of errors detected via discourse-coherence Erroneous recognition hypothesis Speaker A: Previous utterance [X i-1 ] REF: What type of shirt are you looking for? ASR: What type of shirt are you looking for? Speaker B: Current utterance [X i ] REF: I’m looking for a white T-shirt. ASR: I’m looking for a white teacher. topic not consistent across utterances  discourse coherence low REF: correct transcriptionASR: speech recognition hypothesis

11 Euclidean distance between current (X i ) and previous (X i-1 ) utterances in topic confidence space CM discourse large when X i, X i-1 related, low when differ Discourse Coherence

12 Joint Confidence Score Generalized Posterior Probability Confusability of recognition hypothesis against competing hypotheses [Lo & Soong] At utterance level: GWPP(x j ):generalized word posterior probability of x j x j :j-th word in recognition hypothesis of X

13 Joint Confidence Score where For utterance verification compare CM(X i ) to threshold (  ) Model weights ( gpp, in-domain, discourse ), and threshold (  ) trained on development set

14 Experimental Setup Training-set: ATR BTEC (basic-travel-expressions-corpus) –~400k sentences (Japanese/English pairs) –14 topic classes (accommodation, shopping, transit, …) –Train: topic-classification + in-domain verification models Evaluation data: ATR MAD (machine aided dialogue) –Natural dialogue between English and Japanese speakers via ATR speech-to-speech translation system –Dialogue data collected based on set of pre-defined scenarios –Development-set: 270 dialoguesTest-set: 90 dialogues On development set train: CM sigmoid transforms CM weights ( gpp, in-domain, discourse ) Verification threshold (  )

15 Speech Recognition Performance DevelopmentTest # dialogues27090 Japanese Side # utterances26741011 WER10.5%10.7% SER41.9%42.3% English Side # utterances30911006 WER17.0%16.2% SER63.5%55.2% ASR performed with ATRASR; 2-gram LM applied during decoding, rescore lattice with 3-gram LM

16 Evaluation Measure Utterance-based Verification –No definite “keyword” set in S-2-S translation –If recognition error occurs (one or more errors)  prompt user to rephrase entire utterance CER (confidence error rate) –FA: false acceptance of incorrectly recognized utterance –FR: false rejection of correctly recognized utterance

17 GPP-based Verification Performance Accept All: Assume all utterances are correctly recognized GPP: Generalized posterior probability Large reduction in verification errors compared with “Accept all” case CER 17.3% (Japanese) and 15.3% (English) Accept All GPP Accept All

18 CER reduced by 5.7% and 4.6% for “GPP+IC” and “GPP+DC” cases CER 17.3%  15.9% (8.0% relative) for “GPP+IC+DC” case Incorporation of IC and DC Measures (Japanese) GPP: Generalized posterior probability IC: In-domain confidence DC: Discourse coherence GPP +IC GPP +DC GPP +IC +DC GPP

19 Similar performance for English side CER 15.3%  14.4% for “GPP+IC+DC” case Incorporation of IC and DC Measures (English) GPP: Generalized posterior probability IC: In-domain confidence DC: Discourse coherence GPP +IC GPP +DC GPP +IC +DC GPP

20 Proposed novel utterance verification scheme incorporating “high-level” knowledge In-domain confidence: degree of match between utterance and application domain Discourse coherence: consistency between consecutive utterances Two proposed measures effective Relative reduction in CER of 8.0% and 6.1% (Japanese/English) Conclusions

21 “High-level” content-based verification Ignore ASR-errors that do not affect translation quality Further improvement in performance Topic Switching –Determine when users switch task Consider single task per dialogue session Future work

1 Incorporating In-domain Confidence and Discourse Coherence Measures in Utterance Verification ドメイン内の信頼度と談話の整合性を用いた音声認識誤りの検出 Ian R. Lane, Tatsuya Kawahara.

Similar presentations

Presentation on theme: "1 Incorporating In-domain Confidence and Discourse Coherence Measures in Utterance Verification ドメイン内の信頼度と談話の整合性を用いた音声認識誤りの検出 Ian R. Lane, Tatsuya Kawahara."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Incorporating In-domain Confidence and Discourse Coherence Measures in Utterance Verification ドメイン内の信頼度と談話の整 合性 を用いた音声認識誤りの検出 Ian R. Lane, Tatsuya Kawahara.

Similar presentations

Presentation on theme: "1 Incorporating In-domain Confidence and Discourse Coherence Measures in Utterance Verification ドメイン内の信頼度と談話の整 合性 を用いた音声認識誤りの検出 Ian R. Lane, Tatsuya Kawahara."— Presentation transcript:

Similar presentations

About project

Feedback

1 Incorporating In-domain Confidence and Discourse Coherence Measures in Utterance Verification ドメイン内の信頼度と談話の整合性を用いた音声認識誤りの検出 Ian R. Lane, Tatsuya Kawahara.

Presentation on theme: "1 Incorporating In-domain Confidence and Discourse Coherence Measures in Utterance Verification ドメイン内の信頼度と談話の整合性を用いた音声認識誤りの検出 Ian R. Lane, Tatsuya Kawahara."— Presentation transcript: