Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002.

Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

Errors in Spoken-Language Interfaces Speech Recognition is problematic:  Input signal quality  Accents, Non-native speakers  Spoken language disfluencies: stutters, false- starts, /mm/, /um/ Typical Word Error Rates in SDS: 10-30% Systems today lack the ability to gracefully recover from error

An example S: Are you a registered user ? U: No I'm not. No [NO I'M NOT NO] S: What is your full name ? U: Adam Schumacher [ATHENS IN AKRON] S: There is an Athens in Georgia and in Greece. Which destination did you want ? U: Neither [NEITHER] S: I'm sorry, I didn't catch that. There is an Athens in Georgia and in Greece. Which destination did you want ? U: Georgia [GEORGIA] S: A flight from Athens... Where do you want to go ? U: Start over [START OVER] S: Do you really want to start over ? U: Yes [YES] S: What city are you leaving from ? U: Hamilton, Ontario [HILTON ONTARIO] S: Sorry, I'm not sure I understood what you said. Where are you leaving from ? U: Hamilton [HILTON] S: I'm still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from ? U: Toronto [TORONTO]

Pathway to a solution Make systems aware of unreliability in their inputs  Confidence scores Develop a model which learns to optimally choose between several prevention/repair strategies  Identify strategies  Express them in a computable manner  Develop the model

Papers Error Detection in Spoken Human- Machine Interaction [E.Krahmer, M. Swerts, M. Theune, M. Weegels] Problem Spotting in Human-Machine Interaction [E.Krahmer, M. Swerts, M. Theune, M. Weegels] The Dual of Denial: Discomfirmations in Dialogue and Their Prosodic Correlates [E.Krahmer, M. Swerts, M. Theune, M. Weegels]

Goals [Let’s look at dialog on page 2] (1) Analysis of positive an negative cues we use in response to implicit and explicit verification questions (2) Explore the possibilities of spotting errors on line

Explicit vs. Implicit Explicit  Presumably easier for the system to verify But there’s evidence that it’s not as easy …  Leads to more turns, less efficiency, frustration Implicit  Efficiency  But induces a higher cognitive burden which can result in more confusion  ~ Systems don’t deal very well with it…

Clarke & Schaeffer Grounding model  Presentation phase  Acceptance phase Various indicators  Go ON / YES  Go BACK / NO Can we detect them reliably (when following implicit and explicit verification questions) ?

Positive and Negative Cues PositiveNegative Short turnsLong turns Unmarked word orderMarked word order ConfirmDiscomfirm AnswerNo answer No correctionsCorrections No repetitionsRepetitions New infoNo new info

Experimental Setup / Data 120 dialogs : Dutch SDS providing train timetable information 487 utterances  44 (~10%) not used Users accepting a wrong result Barge-in Users starting their own contribution  Left 443 resulting adjacent S/U utterances

Results – Nr of words ~ProblemsProblems Explicit1.683.44 Implicit3.217.12

Results – Empty turns (%) ~ProblemsProblems Explicit0%2.6% Implicit3.4%10.3%

Results – Marked word order % ~ProblemsProblems Explicit3.3%4.4% Implicit1.2%26.9%

Results – Yes/No ~ProblemsProblems ExplicitYes92.8%6.1% No0%56.6% Other7.1%37.1% ImplicitYes0% No0%15.4% Other100% ?84.6%

Results – Repeated/Corrected/New ~ProblemsProblems ExplicitRepeated8.5%23.9% Corrected0%72.6% New11.4%12.4% ImplicitRepeated2.4%61.0% Corrected0%92.3% New53.6%36.5%

First conclusion People use more negative cues when there are problems And even more so for implicit confirmations (vs. explicit ones)

How well can you classify Using individual features  Look at precision/recall Explicit: absence of confirmation Implicit: non-zero number of corrections Multiple features  Used memory based learning 97% accuracy (maj. Baseline 68%) Confirm + Correct is winning, although individually less good This is overall, right ? How about for explicit vs. implicit ?

BUT !!! How many of these features are available on-line? PositiveNegative Short turnsLong turns Unmarked word orderMarked word order ConfirmDisconfirm AnswerNo answer No corrections ?Corrections ? No repetitions ?Repetitions ? New info ?No new info ?

What else can we throw at it ? Prosody (next paper) Lexical information Acoustic confidence scores  Maybe also of previous utterances Repetitions/Corrections/New info on transcript ? …

Papers Error Detection in Spoken Human- Machine Interaction [E.Krahmer, M. Swerts, M. Theune, M. Weegels] Problem Spotting in Human-Machine Interaction [E.Krahmer, M. Swerts, M. Theune, M. Weegels] The Dual of Denial: Discomfirmations in Dialogue and Their Prosodic Correlates [E.Krahmer, M. Swerts, M. Theune, M. Weegels]

Goals Investigate the prosodic correlates of disconfirmations  Is this slightly different than before ? (i.e. now looking at any corrections? Answer: No)  Looked at prosody on “NO” as a go_on vs a go_back:  Do you want to fly from Pittsburgh ?  Shall I summarize your trip ?

Human-human Higher pitch range, longer duration Preceded by a longer delay High H% boundary tone Expected to see same behavior for disconfirmation in human-machine

Prosodic correlates FeaturesPOSITIVE(‘go on’)NEGATIVE(‘go back’) Boundary toneLowHigh DurationShortLong DelayShortLong PauseShortLong Pitch rangeLowHigh  Yes, the correlations are there as expected

Perceptual analysis Took 40 “No” from No+stuff, 20 go_on and 20 go_back (note that some features are lost this way…) Forced choice randomized task, w/ no feedback; 25 native speakers of Dutch Results  17 go_on correctly identified above chance  15 go_back correctly identified above chance; but also 1 incorrectly identified above chance.

Discussion Q1: Blurred relationships …  Confidence annotation  Go_on / Go_back signal Is that the same as corrections ? Is that the most general case for responses to implicit/explicit verifications, or should we have a separate detector ? Q2: What other features could we throw at these problems ? What are the “most juicy” ones ?

Discussion Q3: For implicit confirms, are these different in terms of induced response behavior ?  When do you want to leave Pittsburgh ?  Travelling from Pittsburgh … when do you want to leave ?  When do you want to leave from Pittsburgh to Boston ?

Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002.

Similar presentations

Presentation on theme: "Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002.

Similar presentations

Presentation on theme: "Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002."— Presentation transcript:

Similar presentations

About project

Feedback