Agustín Gravano1,2 Julia Hirschberg1

Slides:



Advertisements
Similar presentations
Turn-Taking and Affirmative Cue Words in Task-Oriented Dialogue
Advertisements

Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Backchannel-Inviting Cues in Task-Oriented.
“Effect of Genre, Speaker, and Word Class on the Realization of Given and New Information” Julia Agustín Gravano & Julia Hirschberg {agus,
“Downstepped contours in the given/new distinction” Agustín Gravano Spoken Language Processing Group Columbia University, New York On the Role of Prosody.
Detecting Certainness in Spoken Tutorial Dialogues Liscombe, Hirschberg & Venditti Using System and User Performance Features to Improve Emotion Detection.
Agustín Gravano 1 · Stefan Benus 2 · Julia Hirschberg 1 Elisa Sneed German 3 · Gregory Ward 3 1 Columbia University 2 Univerzity Konštantína Filozofa.
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.
Spoken Language Processing Lab Who we are: Julia Hirschberg, Stefan Benus, Fadi Biadsy, Frank Enos, Agus Gravano, Jackson Liscombe, Sameer Maskey, Andrew.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
High Frequency Word Entrainment in Spoken Dialogue ACL, June Columbus, OH Department of Computer and Information Science University of Pennsylvania.
Context and Prosody in the Interpretation of Cue Phrases in Dialogue Julia Hirschberg Columbia University and KTH 11/22/07 Spoken Dialog with Humans and.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.
Turn-Taking in Spoken Dialogue Systems CS4706 Julia Hirschberg.
1 Back Channel Communication Antoine Raux Dialogs on Dialogs 02/25/2005.
Varying Input Segmentation for Story Boundary Detection Julia Hirschberg GALE PI Meeting March 23, 2007.
Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented.
Schizophrenia and Depression – Evidence in Speech Prosody Student: Yonatan Vaizman Advisor: Prof. Daphna Weinshall Joint work with Roie Kliper and Dr.
A Study in Cross-Cultural Interpretations of Back-Channeling Behavior Yaffa Al Bayyari Nigel Ward The University of Texas at El Paso Department of Computer.
circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.
Breathing and speech planning in turn-taking Francisco Torreira Sara Bögels Stephen Levinson Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
Turn-taking Discourse and Dialogue CS 359 November 6, 2001.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
The Games Corpus Design, implementation and annotation Agustín Gravano Spoken Language Processing Group Columbia University.
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
Nuclear Accent Shape and the Perception of Syllable Pitch Rachael-Anne Knight LAGB 16 April 2003.
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg, Kate Forbes-Riley University of Pittsburgh.
On the role of context and prosody in the interpretation of ‘okay’ Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Héctor Chávez, and Lauren Wilcox.
A Text-free Approach to Assessing Nonnative Intonation Joseph Tepperman, Abe Kazemzadeh, and Shrikanth Narayanan Signal Analysis and Interpretation Laboratory,
Language Identification and Part-of-Speech Tagging
Public Speaking – Toastmasters Competent Communication
What is a Hidden Markov Model?
Investigating Pitch Accent Recognition in Non-native Speech
Towards Emotion Prediction in Spoken Tutoring Dialogues
Conditional Random Fields for ASR
Tone in Sherpa (Sino-Tibetan) Joyce McDonough1, Rebecca Baier2 and
Why Study Spoken Language?
Studying Intonation Julia Hirschberg CS /21/2018.
Studying Intonation Julia Hirschberg CS /21/2018.
Issues in Spoken Dialogue Systems
Spoken Dialogue Systems
Intonational and Its Meanings
Intonational and Its Meanings
The American School and ToBI
Academic Communication Lesson 2
Dialogue Acts Julia Hirschberg CS /18/2018.
Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,
Why Study Spoken Language?
Turn-taking and Disfluencies
Studying Spoken Language Text 17, 18 and 19
Advanced NLP: Speech Research and Technologies
“Downstepped contours in the given/new distinction”
High Frequency Word Entrainment in Spoken Dialogue
Agustín Gravano & Julia Hirschberg {agus,
Advanced NLP: Speech Research and Technologies
Spoken Dialogue Systems
Discourse Structure in Generation
Agustín Gravano1,2 Julia Hirschberg1
Tetsuya Nasukawa, IBM Tokyo Research Lab
Agustín Gravano1 · Stefan Benus2 · Julia Hirschberg1
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Low Level Cues to Emotion
Acoustic-Prosodic and Lexical Entrainment in Deceptive Dialogue
Guest Lecture: Advanced Topics in Spoken Language Processing
Automatic Prosodic Event Detection
Presentation transcript:

Agustín Gravano1,2 Julia Hirschberg1 Backchannel-Inviting Cues in Task-Oriented Dialogue Agustín Gravano1,2 Julia Hirschberg1 Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina

Interactive Voice Response Systems Introduction Interactive Voice Response Systems Quickly spreading. Mostly simple functionality. “Uncomfortable”, “awkward”. ASR+TTS account for most IVR problems. As ASR and TTS improve, other problems revealed. Coordination of system-user exchanges. Backchannels. Begin to show  revealed Agustín Gravano Interspeech 2009

Backchannels Short expressions uttered by listeners to: Introduction Backchannels Short expressions uttered by listeners to: Convey that they are paying attention. Encourage the speaker to continue. Examples: okay, uh-huh, mm-hm, alright. Very frequent in task-oriented dialogue. Thus, modeling human usage of BC should lead to an improved system-user coordination. The usage  human usage Agustín Gravano Interspeech 2009

Goal Learn when backchannels are likely to occur. Introduction Goal Learn when backchannels are likely to occur. Find “backchannel-inviting” cues. Cues displayed by the speaker “inviting” the listener to produce a backchannel response. This could improve the coordination of IVRs: Speech understanding: Detect points in the user’s turn where a backchannel would be welcome. Speech generation: Display cues inviting the user to produce a backchannel. I would put ‘backchannel-inviting’ and ‘inviting’ in subbullets below in scare-quotes Agustín Gravano Interspeech 2009

Talk Outline Previous work Material Method Results Conclusions Rest of this talk  Outline of this Talk Agustín Gravano Interspeech 2009

Previous Work Duncan 1972, 1973, 1974, inter alia. Backchannel-Inviting Cues Previous Work Duncan 1972, 1973, 1974, inter alia. Hypothesized six turn-yielding cues in face-to-face dialogue. Several studies continued this line of research, but always excluded backchannels. Ward & Tsukahara 2000. Region of low pitch lasting 110ms or more. Cathcart et al. 2003. Language model based on pause duration and part-of-speech tags to predict the location of BC. I would make it clear that Duncan hypothesized these cues but did not provide real evidence that they are reliably associated with backchannels. I think people who have not read Duncan wonder what the point of further work is, if he has already done it. You could change ‘Six…’ to ‘Hypothesized six…’ e.g. Agustín Gravano Interspeech 2009

Columbia Games Corpus 12 task-oriented spontaneous dialogues. Material Columbia Games Corpus 12 task-oriented spontaneous dialogues. Standard American English. 13 subjects: 6 female, 7 male. Series of collaborative computer games. No eye contact. No speech restrictions. 9 hours of dialogue. Manual orthographic transcription, alignment. Manual prosodic annotations (ToBI). Agustín Gravano Interspeech 2009

Columbia Games Corpus Material Player 1: Describer Player 2: Follower In an Objects games, each player saw a board with 5-7 objects. The boards were almost identical, with one object misplaced. One of the players had to describe the position of the target object to the other player, who had to move it to the correct position. Agustín Gravano Interspeech 2009

Backchannel-Inviting Cues Cues displayed by the speaker “inviting” the listener to produce a backchannel response. Again, i’d use scare-quotes around ‘inviting’ or ‘backchannel-inviting’ Agustín Gravano Interspeech 2009

Method Backchannel-Inviting Cues IPU (Inter Pausal Unit): Maximal sequence of words from the same speaker surrounded by silence ≥ 50ms. Hold Backchannel Speaker A: Speaker B: IPU1 IPU2 IPU3 IPU4 3 trained annotators identified Backchannels using a labeling scheme described in [Gravano et al. 2007]. To find BC-inviting cues, we compare: IPUs preceding Holds, IPUs preceding Backchannels. Agustín Gravano Interspeech 2009

} Individual Cues Final rising intonation: Higher pitch level. Backchannel-Inviting Cues Individual Cues Final rising intonation: 81% of IPUs before BC end in H-H% or L-H%. Higher pitch level. Higher intensity level. Lower NHR (voice quality). Longer IPU duration (seconds, #words). Final POS bigram: 72% of IPUs before BC end in DT NN, JJ NN, or NN NN. } entire IPU final 1.0 sec final 0.5 sec You’ll explain here that you investigated many other potential cues but these were the ones that proved discriminatory, right? Agustín Gravano Interspeech 2009

Defining Presence of a Cue Backchannel-Inviting Cues Defining Presence of a Cue 2 representative features for each cue: Final intonation Pitch slope over final 200ms, 300ms. Intensity level Mean intensity over final 500ms, 1000ms. Pitch level Mean pitch over final 500ms, 1000ms. Voice quality NHR over final 500ms, 1000ms. IPU duration Duration in ms, and in number of words. Final POS bigram {‘DT NN’, ‘JJ NN’, ‘NN NN’} vs. Rest (binary). Define presence/absence based on whether the value is closer to the mean before BC or H. Agustín Gravano Interspeech 2009

Top Frequencies of Complex Cues digit == cue present dot == cue absent BC-inviting cues: 1: Final intonation 2: Intensity level 3: Pitch level 4: IPU duration 5: Voice quality 6: Final POS bigram Agustín Gravano Interspeech 2009

Number of cues conjointly displayed Backchannel-Inviting Cues Combined Cues Percentage of IPUs followed by a BC r 2 = 0.993 Number of cues conjointly displayed Agustín Gravano Interspeech 2009

IVR Systems After each IPU from the user: Backchannel-Inviting Cues IVR Systems After each IPU from the user: if estimated likelihood > threshold then produce a backchannel To elicit a backchannel from the user, if desired: Include as many cues as possible in the system’s final IPU. For the second bullet, you might say ‘To elicit a backchannel from the user, if desired:’ and mention what the system motivation might be, if you haven’t already Agustín Gravano Interspeech 2009

Summary Study of backchannel-inviting cues. Objective, automatically computable. Combined cues. Improve turn-taking decisions of IVR systems. Results drawn from task-oriented dialogues. Not necessarily generalizable. Suitable for most IVR domains. SIGdial 2009: Study of turn-yielding cues. Agustín Gravano Interspeech 2009

Special thanks to… My advisor, Julia Hirschberg Thesis Committee Members Maxine Eskenazi, Kathy McKeown, Becky Passonneau, Amanda Stent. Speech Lab at Columbia University Stefan Benus, Fadi Biadsy, Sasha Caskey, Bob Coyne, Frank Enos, Martin Jansche, Jackson Liscombe, Sameer Maskey, Andrew Rosenberg. Collaborators Gregory Ward and Elisa Sneed German (Northwestern U); Ani Nenkova (UPenn); Héctor Chávez, David Elson, Michel Galley, Enrique Henestroza, Hanae Koiso, Shira Mitchell, Michael Mulley, Kristen Parton, Ilia Vovsha, Lauren Wilcox. Agustín Gravano Interspeech 2009