Presentation is loading. Please wait.

Presentation is loading. Please wait.

Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Backchannel-Inviting Cues in Task-Oriented.

Similar presentations


Presentation on theme: "Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Backchannel-Inviting Cues in Task-Oriented."— Presentation transcript:

1 Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Backchannel-Inviting Cues in Task-Oriented Dialogue

2 Agustín Gravano Interspeech 20092 Interactive Voice Response Systems Quickly spreading. Mostly simple functionality. “Uncomfortable”, “awkward”. ASR+TTS account for most IVR problems. As ASR and TTS improve, other problems revealed. Coordination of system-user exchanges. Backchannels. Introduction

3 Agustín Gravano Interspeech 20093 Short expressions uttered by listeners to: Convey that they are paying attention. Encourage the speaker to continue. Examples: okay, uh-huh, mm-hm, alright. Very frequent in task-oriented dialogue. Thus, modeling human usage of BC should lead to an improved system-user coordination. Introduction Backchannels

4 Agustín Gravano Interspeech 20094 Goal Learn when backchannels are likely to occur. Find “backchannel-inviting” cues. Cues displayed by the speaker “inviting” the listener to produce a backchannel response. This could improve the coordination of IVRs: Speech understanding: Detect points in the user’s turn where a backchannel would be welcome. Speech generation: Display cues inviting the user to produce a backchannel. Introduction

5 Agustín Gravano Interspeech 20095 Talk Outline Previous work Material Method Results Conclusions

6 Agustín Gravano Interspeech 20096 Previous Work Duncan 1972, 1973, 1974, inter alia. Hypothesized six turn-yielding cues in face-to-face dialogue. Several studies continued this line of research, but always excluded backchannels. Ward & Tsukahara 2000. Region of low pitch lasting 110ms or more. Cathcart et al. 2003. Language model based on pause duration and part- of-speech tags to predict the location of BC. Backchannel-Inviting Cues

7 Agustín Gravano Interspeech 20097 Columbia Games Corpus 12 task-oriented spontaneous dialogues. Standard American English. 13 subjects: 6 female, 7 male. Series of collaborative computer games. No eye contact. No speech restrictions. 9 hours of dialogue. Manual orthographic transcription, alignment. Manual prosodic annotations (ToBI). Material

8 Agustín Gravano Interspeech 20098 Player 1: DescriberPlayer 2: Follower Material Columbia Games Corpus

9 Agustín Gravano Interspeech 20099 Backchannel-Inviting Cues Cues displayed by the speaker “inviting” the listener to produce a backchannel response.

10 Agustín Gravano Interspeech 200910 Method 3 trained annotators identified Backchannels using a labeling scheme described in [Gravano et al. 2007]. To find BC-inviting cues, we compare: IPUs preceding Holds, IPUs preceding Backchannels. Backchannel-Inviting Cues IPU (Inter Pausal Unit): Maximal sequence of words from the same speaker surrounded by silence ≥ 50ms. HoldBackchannel Speaker A: Speaker B: IPU1IPU2 IPU3 IPU4

11 Agustín Gravano Interspeech 200911 Backchannel-Inviting Cues Individual Cues 1. Final rising intonation: 81% of IPUs before BC end in H-H% or L-H%. 2. Higher pitch level. 3. Higher intensity level. 4. Lower NHR (voice quality). 5. Longer IPU duration (seconds, #words). 6. Final POS bigram: 72% of IPUs before BC end in DT NN, JJ NN, or NN NN. } entire IPU final 1.0 sec final 0.5 sec

12 Agustín Gravano Interspeech 200912 Defining Presence of a Cue 2 representative features for each cue: Final intonationPitch slope over final 200ms, 300ms. Intensity levelMean intensity over final 500ms, 1000ms. Pitch levelMean pitch over final 500ms, 1000ms. Voice qualityNHR over final 500ms, 1000ms. IPU durationDuration in ms, and in number of words. Final POS bigram{‘DT NN’, ‘JJ NN’, ‘NN NN’} vs. Rest (binary). Define presence/absence based on whether the value is closer to the mean before BC or H. Backchannel-Inviting Cues

13 Agustín Gravano Interspeech 200913 Top Frequencies of Complex Cues BC-inviting cues: 1: Final intonation 2: Intensity level 3: Pitch level 4: IPU duration 5: Voice quality 6: Final POS bigram digit == cue present dot == cue absent

14 Agustín Gravano Interspeech 200914 Backchannel-Inviting Cues Combined Cues Number of cues conjointly displayed Percentage of IPUs followed by a BC r 2 = 0.993

15 Agustín Gravano Interspeech 200915 Backchannel-Inviting Cues IVR Systems After each IPU from the user: if estimated likelihood > threshold then produce a backchannel To elicit a backchannel from the user, if desired: Include as many cues as possible in the system’s final IPU.

16 Agustín Gravano Interspeech 200916 Summary Study of backchannel-inviting cues. Objective, automatically computable. Combined cues. Improve turn-taking decisions of IVR systems. Results drawn from task-oriented dialogues. Not necessarily generalizable. Suitable for most IVR domains. SIGdial 2009: Study of turn-yielding cues.

17 Agustín Gravano Interspeech 200917 Special thanks to… My advisor, Julia Hirschberg Thesis Committee Members Maxine Eskenazi, Kathy McKeown, Becky Passonneau, Amanda Stent. Speech Lab at Columbia University Stefan Benus, Fadi Biadsy, Sasha Caskey, Bob Coyne, Frank Enos, Martin Jansche, Jackson Liscombe, Sameer Maskey, Andrew Rosenberg. Collaborators Gregory Ward and Elisa Sneed German (Northwestern U); Ani Nenkova (UPenn); Héctor Chávez, David Elson, Michel Galley, Enrique Henestroza, Hanae Koiso, Shira Mitchell, Michael Mulley, Kristen Parton, Ilia Vovsha, Lauren Wilcox.


Download ppt "Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Backchannel-Inviting Cues in Task-Oriented."

Similar presentations


Ads by Google