Local and Global Adaptation in Hyperarticulation Amanda Stent, Susan Brennan, Marie Huffman.

Local and Global Adaptation in Hyperarticulation Amanda Stent, Susan Brennan, Marie Huffman

Outline Introduction: Adaptation User adaptation: Hyperarticulation Current and future work

Adaptation in Spoken Dialog There is considerable variation in spoken dialog. Much of this variation is designed to be adaptive. Speakers may converge (e.g. Brennan and Clark 96) or complement each other (e.g. Oviatt 95, Brennan 90). Adaptation may be partner-specific or generic (Brown and Dell 87). Adaptation may be local or global.

Adaptation in Spoken Dialog Interesting questions include: How do humans adapt to each other in spoken dialog? Speaking style, e.g. dialect, speaking rate Lexical and syntactic choices Initiative Are these adaptations partner-specific or generic, local or global? How does adaptation in human-computer dialog differ from adaptation in human-human dialog? Can we use adaptation in human-computer dialog to improve dialog outcomes?

Adaptation: The User Humans adapt to their dialog partners, including computers, at many levels: Phonetic (e.g. hyperarticulation) Lexical/syntactic (e.g. producing simpler utterances, rephrasing, mirroring system’s choice of words) Dialog and task (e.g. skipping acknowledgments, following system initiative) Some of these adaptations reflect incorrect models of the conversational partner, and/or are known to be maladaptive (e.g. hyperarticulation, some rephrasing).

Adaptation: The System Systems can adapt to make the user feel more comfortable or to mimic human adaptations (responsive generation). Converging on the user’s choice of referring expression. Following the user’s topic shifts. Systems can construct interactions that guide the user to useful forms of adaptation (directive generation). Using words that can be recognized/parsed. Suggesting rephrases on misrecognition. Presenting their capabilities accurately.

Experiment: The problem Hypotheses Experiment design Experiment results Discussion

The Problem When users experience speech recognition errors, they try to adapt in ways that do not lead to performance improvements Hyperarticulation (Soltau and Waibel 98, Wade et al. 92) Rephrasing to out of grammar (Fischer 99, Choularton and Dale 04) Our questions: Considered as a form of adaptation, how exactly does hyperarticulation function?

Hypotheses In repairs of misrecognitions, subjects will exhibit hyperarticulation. Slower speaking rate, longer pauses, more careful speech (Oviatt et al. 98; Levow 98, 99; Hirschberg et al. 99, 00) (Local impact) Hyperarticulation will be more likely to appear around the actual misrecognition than elsewhere in the utterance. (Global impact) Once users start hyperarticulating, this behaviour will persist even if errors stopped occurring.

Experiment Design Wizard-of-Oz procedure Subjects answered prerecorded questions about a children’s softball team database. Subjects were told to answer in complete sentences and to repeat until heard correctly. System feedback was provided in text. Usually “I heard you say …” For unplanned errors by subjects (e.g. disfluencies, use of pronouns or ellipsis, incomplete utterances), other feedback was provided. For selected planned error utterances, system feedback contained misrecognitions.

Unplanned error Repair Example: Unplanned Error Q. What is Ryan Dade bringing to the food sale? U. Ryan Dade is bringing cat collars, and a basket, and pet toys to the foo, to the garage sale, oops S. Please repeat U. Ryan Dade is bringing cat collars, a basket, and pet toys to the garage sale

Planned error Repair Example: Planned Error Q. What is Kate Tolstoy bringing to the food sale? U. Kate Tolstoy is bringing some cookie dough and a picnic table to the food sale S. You said: Kate Tolstoy is bringing some cooking label in a pickle to the food sale U. Kate Tolstoy is bringing some cookie dough and a picnic table to the food sale S. You said: Kate Tolstoy is bringing some cookie dough and a picnic table to the food sale

Measurements Speaking rate (syllables/sec.) Average pause length (ms.) Phonetic features indicating careful speech: mid-word /t/ tapping vs. flap /D/ e.g. Peter, tutor, party, forty, writer Word-final /t/ release vs. non-release e.g. Kate, scientist, peat, flute, dart /t/ release after /n/ vs. non-release e.g. Kanter, scientist, Planters, dentist, Santa Tense a in indefinite articles /d/ in and

Measurements Local impact of hyperarticulation Target phonemes were coded for all planned and unplanned errors and all repairs Global impact of hyperarticulation Planned errors were placed so that very few errors occurred in the 1st third of each dialog, errors occurred every 1-3 utterances in the 2nd third, and a run of 5 errors occurred in the last third Impact of hyperarticulation on SR: All utterances were run through two speech recognizers, one grammar-based and one statistical

Experiment 16 subjects (9 women, 7 men, mean age 22 years) participated in the experiment All native speakers of English 10 monolingual, 6 bilingual but English-dominant Each answered 66 questions 2 additional subjects’ data were discarded due to equipment failure Some utterances were discarded due to major disfluencies or being cut off Result: 1202 utterances -- 373 planned errors and repairs

Data Coding Utterance length, number of words, number of syllables were computed automatically PRAAT was used to measure number and length of utterance-internal pauses greater than 10 ms. in length Phonetic annotation of target words in errors and repairs was done by hand

Measures of Hyperarticulation: Speaking Rate Speaking rate and clear speech are reliably correlated (r = -.239, p <.001). Speakers spoke more slowly in a repair than in a planned error, 3.62 syl./sec to 4.12 syl./sec. (p =<.001). For all paired utterances taken together, repairs were slower than errors, 3.67 to 4.17 syl./sec. (p <.001)

Measures of Hyperarticulation: Careful Speech On average, speakers produced more clear forms in repairs than in errors, 38% to 30% (p <.001). Of the 5 phonetic features coded for the paired utterances: 3 were more likely to be pronounced in their clear forms in the repair than in the error: /t/ tapping vs. flap /D/, word-final /t/ release vs. non-release, /t/ release vs. non-release after /n/. and 2 were not: tense a in indefinite articles, and /d/ in and.

Measures of Hyperarticulation: Careful Speech Content words were produced in clear form 13% more often in a repair than in an error (p =.002). Function words were produced in clear form only 4% more often in a repair than in an error (p =.002).

Local Impact of Hyperarticulation Do speakers hyperarticulate as a precise form of correction aimed at repairing the most troublesome part of the utterance? The percentage of clear forms increased 12% for the misunderstood portion during the repair, significantly greater than the before and after portions (only 4.3% and 4.7%, respectively).

Global Impact of Hyperarticulation Is hyperarticulation a “switch” or a “dial”? The closer an utterance was to the most recent previous error, the more carefully it was produced (speaking rate, clear forms) (p <.005). Speakers gradually return to relaxed speech about 4-7 utterances after seeing evidence of misrecognition.

Individual Differences Individual speakers displayed substantial variability in average speaking rate (2.43—5.27 syl./sec). BUT All speakers slowed their speaking rate during repairs, relative to before repairs (.04 syl./sec -- 1.33 syl./sec).

Individual Differences All but 3 speakers produced more clear speech during repairs than before repairs. Speaking rate and careful speech were correlated across speakers; that is, those who spoke rapidly tended to produce more relaxed forms and those who spoke slowly tended to produce more clear forms.

Individual Differences A few speakers adopted a hyperarticulate style of speaking throughout the experiment; those who experienced the most unplanned errors spoke the slowest during non-repairs. Both monolingual and bilingual speakers slowed their speaking rate equally during repairs (and there was no difference in average speaking rates of monolinguals versus bilinguals). However, monolinguals increased their proportion of clear speech marginally more than did bilinguals.

Impact on Speech Recognition For the statistical speech recognizer, higher word error rates were associated with slower speech (p <.001) but not with more careful speech. For the grammar-based recognizer, higher word error rates were correlated with faster speech (p <.001), and with more careful speech (p =.05). For both recognizers, the effect sizes (by Cohen’s 88 standards) are rather small.

Impact on Speech Recognition As (Wade et al. 92) found, not all aspects of hyperarticulation cause problems, and any effects depend a great deal on how the acoustic model was trained. Misrecognition errors may cause more problems due to users’ rephrasing than to users’ switching to hyperarticulate speech.

Discussion Hyperarticulation varies both by location within the utterance and over time. The type and degree of hyperarticulation depend somewhat on the individual speaker. Once hyperarticulation has been detected, the system can try to guide the user away from hyperarticulation by modifying its behaviours (Hockey et al. 03). However, hyperarticulation is not as maladaptive as rephrasing to out-of-grammar.

Models of System (Weaver et al.) The problem Experiment design Preliminary results

The Problem Users may develop inaccurate models of dialog systems, leading to maladaptive interactions. Our question: How can we construct system behaviors that reduce user maladaptation?

Experiment Design Same as experiment 1, except: Questions and system feedback provided using TTS. Planned errors appear throughout dialog -- each phonetic category is represented in each quarter of the dialog, and in each location (before, during and after error). Subjects assigned to one of two conditions: (Graceful) System model is one of a system that understands human language. (Nongraceful) System model is one of a system that recognizes but does not understand speech.

Experiment Design System model is presented to subjects in experiment setup, through choice of TTS voice, and through construction of planned errors. For example: (True) Hunter Mariano plays #center# (Graceful) Hunter Mariano plays #better# Semantically and syntactically meaningful (Nongraceful) Hunter Mariano plays #venture# Phonetically similar, syntactically nonsensical

Preliminary Results Subjects hyperarticulate in repairs regardless of condition; however, there is a trend to clearer speech in the nongraceful condition before errors. Subjects in the graceful condition use less clear speech initially (26%, increasing to 44% on repairs). Their speaking rate slows down an average of.25 syl./sec on repairs. Subjects in the nongraceful condition use more clear speech initially (38%, increasing to 49% on repairs). Their speaking rate slows down an average of.52 syl./sec on repairs.

System Adaptation (Marge, Gerrig, Stent et al.) Experiment design: Subjects interact with a spoken dialog system to fill out a survey. Two variables: intiative and lexical choice. Initiative: System chooses topics and their order (directive) System chooses topics, user chooses order (mixed) User chooses topics and their order (nondirective) Lexical choice: System does not adapt to user’s choice of topic labels, choice of tense (directive) System does adapt to user’s choice of topic labels, choice of tense (adaptive)

Directive Generation Measures: Initiative: Topic choice, order Requests for help, prompt repetition Length of user responses Number of hangups Match between system’s and user’s estimate of user’s overall opinion of course Lexical choice: Number of misrecognitions Pause length between prompt and response

Conclusions Variation in human-human dialog is omnipresent. Much of it is purposeful or adaptive. We do not know enough about adaptation in human-computer dialog. We may be able to use humans’ tendencies to adapt to improve outcomes for spoken dialog systems.

Local and Global Adaptation in Hyperarticulation Amanda Stent, Susan Brennan, Marie Huffman.

Similar presentations

Presentation on theme: "Local and Global Adaptation in Hyperarticulation Amanda Stent, Susan Brennan, Marie Huffman."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Local and Global Adaptation in Hyperarticulation Amanda Stent, Susan Brennan, Marie Huffman.

Similar presentations

Presentation on theme: "Local and Global Adaptation in Hyperarticulation Amanda Stent, Susan Brennan, Marie Huffman."— Presentation transcript:

Similar presentations

About project

Feedback